New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regex end of line match very slow #14539
Comments
From @prsalazarCreated by @prsalazarReply-To: paul.salazar@testspectrum.com This is a bug report for perl from paul.salazar@testspectrum.com, ----------------------------------------------------------------- use Time::HiRes qw(gettimeofday); my $iters = 100000; my $strng = ''; my $dat = '123'; print "Running match $reg on $iters iterations on concatenated string...\n"; $strt = gettimeofday(); $strng = ''; $dat = '123'; print "\nRunning match $reg on $iters iterations on concatenated string...\n"; $strt = gettimeofday(); The output is as follows: Running match \s$ on 100000 iterations on concatenated string... Running match \S$ on 100000 iterations on concatenated string... This problem does not exist in perl 5.12. I'm not sure when it was Perl Info
|
From @iabynOn Tue, Feb 24, 2015 at 11:26:31AM -0800, Paul Salazar wrote:
It's a side-effect of copy-on-write (COW) strings, which first appeared in $s =~ /abc/; leave $x containing garbage. With COW, a successfully matched string is marked as COW, with the regex Conversely in the days before COW, something like $s = 'x' x 1_000_000; would go quadratic on length. So pre-COW: the regex engine *always* copied after a successful match in the presence Post-COW: on a successful match the string is marked as COW, but no physical I can't yet thing of any elegant solution for this. Some theoretical workarounds that don't actually work are: * 5.21.8 and later include the /n modifier, which doesn't set $1 etc, * Perform another successful match on a different string after the first $long_string =~ /.../; In theory that changes the "current regex" (PL_curpm) which is used -- |
The RT System itself - Status changed from 'new' to 'open' |
From @wolfsageOn Wed, Feb 25, 2015 at 9:36 AM, Dave Mitchell <davem@iabyn.com> wrote:
Is this a bug of /n? By design /n only prevents grouping parens from filling in $1, $2, etc... Other things like $&, named captures, etc still work. -- Matthew Horsfall (alh) |
From @khwilliamsonOn 02/25/2015 09:07 AM, Matthew Horsfall (alh) wrote:
And, FWIW I think this is the correct design |
From @iabynOn Wed, Feb 25, 2015 at 11:07:32AM -0500, Matthew Horsfall (alh) wrote:
Ah, I misunderstood its purpose. -- |
From @prsalazar
All makes sense except I'm not seeing the pre-COW going quadratic. Same ~$ perl -v This is perl, v5.8.8 built for msys-64int Running match \s$ on 100000 iterations on concatenated string... Running match \S$ on 100000 iterations on concatenated string... |
From @wolfsageOn Wed, Feb 25, 2015 at 11:32 AM, Paul Salazar
So, this was fine in 5.17.7 up until: 1a904fc is the first bad commit Disable PL_sawampersand PL_sawampersand actually causes bugs (e.g., perl #4289), because the Using copy-on-write for the pre-match copy (preceding patches do that) PL_sawampersand is now #defined to be equal to what it would be if I left the PL_sawampersand code in place, in case this commit proves Perhaps something else is going on here? -- Matthew Horsfall (alh) |
From @khwilliamsonOn 02/25/2015 12:55 PM, Matthew Horsfall (alh) wrote:
As I vaguely recall, there is code in regexec that used to be able to |
From @iabynOn Wed, Feb 25, 2015 at 10:32:21AM -0600, Paul Salazar wrote:
Try adding $& to the code or changing the regex to "(\\s)\$"; you'll then -- |
@wolfsage you said something in this ticket I don't understand.
I don't quite get that. Named captures are $1, $2, etc but with a label. Was that a typo/thinko? |
@demerphq What I meant was this, does it help?
/n turns () into (?:...), but (?<...>...) still works |
Yeah as I lay in bed trying to fall asleep i realized that you had meant that simple capturing parens were treated as non capturing. The language you used threw me off a touch, not that it was wrong just sufficiently different from the words I would have used that it didn't click at 2am. My bad. Thanks for the confirmation and sorry for the confusion. Rereading now with a more rested brain I see my mistake. |
Migrated from rt.perl.org#123918 (status was 'open')
Searchable as RT123918$
The text was updated successfully, but these errors were encountered: