New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slowdown in m//g on COW strings of certain lengths #15266
Comments
From sschoeling@linet-services.deCreated by sschoeling@linet-services.deThe following oneliner creates a large string, creates a COW view with offset 1 perl -e 'my $f = "0" x (4e6+1); $g = substr($f, 1); 1 while $g =~ m/0/g;' On my machines this takes 0.1s-2s depending on CPU and perl version. Changing the string length to 4e6 will hang. 4e6-1 is fine again. This seems to be the case for all lengths that are multiples of 256. It's not Some times from this machine: 256000: 3.819 512000: 15.614 perl versions: I don't currently have a 5.23 compiled. Sorry. Perl Info
|
From @jkeenanOn Thu Apr 07 12:20:45 2016, sschoeling@linet-services.de wrote:
Confirmed on blead. ##### [perl] 6 $ time ./perl -e 'my $f = "0" x (4e6+1); $g = substr($f, 1); 1 while $g =~ m/0/g;' real 0m0.302s real 0m14.689s real 0m0.294s real 0m0.293s real 0m0.301s real 0m7.211s real 0m0.287s real 0m0.294s real 0m6.437s ##### -- |
The RT System itself - Status changed from 'new' to 'open' |
From @demerphqOn 8 April 2016 at 12:41, James E Keenan via RT
I bet this is related to making $& work properly, and possibly how My bet is in the slow cases we are copying the string every character. We used to not copy the subject of a //g in scalar context to make This fixed bugs like this: $foo="bar"; My initial thought is somewhere this is going wrong. That it happens Yves -- |
From @demerphqOn 8 April 2016 at 12:53, demerphq <demerphq@gmail.com> wrote:
We are copying the subject string every 0 in the string. I dont know norole:yorton@Styx:blead:/git_tree/perl$ time ./perl -e 'my $f = ("0" real 0m0.075s real 0m0.579s real 0m5.647s -- |
From @demerphqOn 8 April 2016 at 13:06, demerphq <demerphq@gmail.com> wrote:
This was the reason. Perl_sv_setpvn() was only overallocating 1 byte I have pushed three patches for this, the last changes SvGROW() to commit bcc9f60 More generalized fix for #127855, overallocate in SvGROW and not If we overallocate in SvGROW() and not just sv_grow() we can ensure commit e19cb11 fix #127855, in Perl_sv_setpvn() we have to overallocate to enable COW We need to overallocate by 2 to do COW strings. One for the null, commit d14d585 test for #127855 - Slowdown in m//g on COW strings of certain lengths Yves |
From @jkeenanOn Fri Apr 08 12:43:02 2016, demerphq wrote:
These patches worked for me on both Linux/x86_64 and older Darwin/PPC. |
From @iabynOn Fri, Apr 08, 2016 at 09:42:22PM +0200, demerphq wrote:
I took the libery of pushing a couple of updates to that: commit 31f8924 new perf test in pat.t: avoid timing failure M t/re/pat.t commit 68b940a move perf test from re/pat.t to re/speed.t M t/re/pat.t -- |
From @demerphqOn 9 April 2016 at 17:50, Dave Mitchell <davem@iabyn.com> wrote:
Sounds good. Sorry I was not aware of/failed to notice re/speed.t
-- |
From @iabynOn Fri, Apr 08, 2016 at 09:42:22PM +0200, demerphq wrote:
This is causing the breakage described in Basically by making SvGROW(sv,len) include a calculation involving I suggest for 5.24 we just revert this commit, and readdress the issue -- |
From @demerphqOn 28 April 2016 at 11:20, Dave Mitchell <davem@iabyn.com> wrote:
That will leave the original bug fixed right? Will we have this Yves |
From @iabynOn Thu, Apr 28, 2016 at 11:30:17AM +0200, demerphq wrote:
Turns out that reverting just the one commit introduced a subtle bug with my $s = some_string; Where pp_stringify() and sv_copypv() (which it calls) don't check whether However, I now think both commits are wrong. There is already a mechanism * reverts your two commits Rik, I think our two options at this stage are: a) Just revert the two commits for 5.24.0. This means that a slowdown -- |
From @demerphqOn 2 May 2016 at 16:26, Dave Mitchell <davem@iabyn.com> wrote:
While I am fine with whatever you think is best for right now I have Reverting would return us to the following definition: # define SvGROW(sv,len) (SvLEN(sv) < (len) ? sv_grow(sv,len) : SvPVX(sv)) We compare with the bare 'len' that the caller supplies, which means It seems to me that it is a mistake to expect the caller to keep track It also seems odd that logic that causes us to overallocate more often
I am fine with either. BTW, I think its worth noting the general history of this bug. It is It really seems to me that when COW was added we modified sv_grow() cheers, |
From @iabynOn Mon, May 02, 2016 at 07:43:39PM +0200, demerphq wrote:
Something creating a new string, "abc"\0 say, will call SvGROW(sv, 3+1) Something extending an existing string will cause sv_grow() to be called So I think this is case is (relatively) rare.
We're not (in general). sv_grow does it no behalf of the caller.
It's clearly documented since 5.004 that SvGROW's caller has to do the +1,
It's specifically the 'x' operator, which is where it easy to grow very
I vote for (b), if that's ok with Rik.
For post-5.24, I want to consider more radical options, for example -- |
From @cpansproutOn Mon May 02 11:44:13 2016, davem wrote:
I think (b) is a good idea. The patch looks safe to me.
That was Nicholas Clark’s original proposal, but others pointed out that on some systems word-aligned string comparison was much faster than non-aligned, and that on such systems mallocked strings would be aligned. That was why I chose the end of the buffer. -- Father Chrysostomos |
From @demerphqOn 2 May 2016 at 20:43, Dave Mitchell <davem@iabyn.com> wrote:
Something creating a new string "abc" wont call SvGROW at all. It will
I don't see how you say that. Any time they use either of the last two SvLEN=10, Requested = 9, Uncowable.
I want to see data that agrees. I think its a lot less rare than you think. I think that lots of code does concatenate ops on SV's and will thus
No. SvGROW() doesnt know about it, so it forces the user to know about it.
Just because we documented it doesnt mean its smart or right. I dont
Ok, thanks. So then we can fix SvGROW to not do that.
I think we should do the latter. It occurs to me that the wrapping problem is a good example of why the
FWIW, I am not a fan of this idea. I was looking into making the I would be less against the idea if we were to raise the space I really do not consider the debate about SvGROW settled yet. I really Yves -- |
From @demerphqOn 3 May 2016 at 08:11, demerphq <demerphq@gmail.com> wrote:
Er, I apologise, that could be interpreted it differently than I intended. I think its fine to ignore this for now, and settle it in a future release. I just feel like we should understand things better before we say that Anyway, for now lets just revert my patches and go with Daves solution cheers, -- |
From @iabynOn Mon, May 02, 2016 at 02:01:21PM -0700, Father Chrysostomos via RT wrote:
Ah, that's an interesting point. -- |
From @iabyn[ to be read in the context of your followup-email, that we're now just On Tue, May 03, 2016 at 08:11:04AM +0200, demerphq wrote:
And newSV() calls sv_grow() to allocate an initial buffer. The point I was
Consider the following PVX buffers, where X represents an spare byte in "abc\0" "abc\0X" "abc\0XX" So in general only concatenating a number of bytes exactly equal to the For example, with this code: use Devel::Peek; $ perl5238 ~/tmp/p 7 '' 2>&1 | egrep 'CUR|LEN' It's only the 'xx' concat that leaves an uncowable SV.
Its a worthy goal, we just need to decide whether the means to achieve it
Well we're kind of stuck with it. We could introduce a new SvGROW0 macro
I think the cases where a string buffer is COW-shared more than 255 times One thing I definitely want to do post 5.24 is introduce a startup-time -- |
Migrated from rt.perl.org#127855 (status was 'open')
Searchable as RT127855$
The text was updated successfully, but these errors were encountered: