New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Regex matches when it shouldn't #15753
Comments
From @ikegamiCreated by @ikegami$ perl -E'say $& if "aaagaggaaaaggggaaaaggggaaaaggggaaa" =~ This shouldn't match. http://stackoverflow.com/questions/41071498/perl-regular-expression-mismatches-repetitive-strings Perl Info
|
From @demerphqOn 10 December 2016 at 05:38, Eric Brine <perlbug-followup@perl.org> wrote:
Agreed, it should not match. A bit of superficial poking suggests two areas to follow up on: first, if I recode it to eliminate the [atgc]{1,7} the match fails as second, i see the super linear cache kicking in, right before we hugo and dave know that code the best i would say, On the other hand, it shows off Dave's constant string optimisation cheers. |
The RT System itself - Status changed from 'new' to 'open' |
From @iabynOn Fri, Dec 09, 2016 at 08:38:10PM -0800, Eric Brine wrote:
I'm not seeing a bug. Changing the g to an X to make it more visually distinct, and adding some print "[$1][$2][$3][$4] = [$&]\n" if "aaaXaXXaaaaXXXXaaaaXXXXaaaaXXXXaaa" =~ I get: [XXXaaaaXXXXaaaaX][XXX][XaaaaX][XXX] = [XXXaaaaXXXXaaaaXXXX] Breaking $& down, we get first second third (the split between the first and second iterations is speculative - the Note that there's no requirement that the second and subsequent iterations -- |
From @demerphqOn 10 December 2016 at 10:49, Dave Mitchell <davem@iabyn.com> wrote:
Which does not match the pattern stated. we have /(?:X{3,}Y{1,7}){3,}X{3,}/. $1 should contain at least 9 X's, and contain at least 3 sequences of In total we need: 3 X's, followed by 1-7 chars, 3 X's, followed by 1-7 the minimum string this should accept is XXXaXXXaXXXaXXX Note the difference to the accepted match, it does not have enough Also if you reformulate the pattern to not use {3,} it does not match. If we accept that X{3,} is the same as XXX+ and that X{1,7} is the perl -le' print But it prints "failed".
Also, if I run the match twice, the second time on $& from the first, perl -E'say $& if "aaagaggaaaaggggaaaaggggaaaaggggaaa" =~ cheers, -- |
From @hvdsOn Sat, 10 Dec 2016 01:50:00 -0800, davem wrote:
For this to match, I'd expect the target string to need 4 distinct groups of X{3,}, so I'm surprised that you're not surprised. Reducing it slightly, and then expanding some of the quantifiers, I have this: Like Yves, I suspect the superlinear cache, I'll try to dig some over the weekend. Hugo |
From @iabynOn Sat, Dec 10, 2016 at 11:34:21AM +0100, demerphq wrote:
Ah yeah - total misread on my part. Need to think some more. -- |
From explorer@joaquinferrero.comConfirmed the workaround found by the stackoverflow author with v5.24.0: False positive: $ perl -E 'say $& while "aaagaggaaaaggggaaaaggggaaaaggggaaa" =~ /((G{3,}[ATGC]{1,7}){3,}G{3,})/gi' Workaround: remove /i $ perl -E'say $& while "aaagaggaaaaggggaaaaggggaaaaggggaaa" =~ /(([gG]{3,}[aAtTgGcC]{1,7}){3,}[gG]{3,})/g' |
From @hvdsI believe the root cause here is that after enabling the super-linear cache, when we short-circuit on a cache hit we fail to do proper cleanup on backtracking. The particular issue is that this chunk: This helps to highlight what's going wrong: % ./miniperl -Dr -we '"ABC.EFG.IJK.." =~ m{(((([A-Z]{2,})([.A-Z]{1,4})(?{ local @t = (@t, "[$4 $5]"); warn sprintf "whilem captures @t\n"})){3,})[A-Z]{2,})}' 2>&1 | grep whilem | grep -C4 'matched 4' Hugo |
From @iabynOn Sun, Dec 11, 2016 at 08:04:28AM -0800, Hugo van der Sanden via RT wrote:
I think that should be sufficient. We only need to undo what's done case WHILEM: and the if (cache hit) since we're trying to pretend that the WHILEM match never happened. I'm amazed this was never encountered before. -- |
From @hvdsOn Sun, 11 Dec 2016 08:04:28 -0800, hv wrote:
I've now pushed d3c48e8: I don't see other side-effects that need reversing, but it'd be useful to have a second opinion from DaveM if he has time for a look. When tracing through what was happening here, it also seemed to me that the way this works is not very effective - even once the cache is turned on, it seems to do a lot of repetitive work before getting as far as recording any fails - so I plan to look at this further to see if there are additional opportunities for improvement. I don't think this particularly merits a perldelta entry; I also assume it does not qualify for maint (the bug was introduced in 2006 by c476f42), but should be a safe and easy backport if it does. Hugo |
@hvds - Status changed from 'open' to 'pending release' |
From @hvdsOn Mon, 12 Dec 2016 07:26:21 -0800, hv wrote:
Ah, I see he already has, we crossed in the post. Thanks Dave. Hugo |
From @khwilliamsonThank you for filing this report. You have helped make Perl better. With the release today of Perl 5.26.0, this and 210 other issues have been Perl 5.26.0 may be downloaded via: If you find that the problem persists, feel free to reopen this ticket. |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#130307 (status was 'resolved')
Searchable as RT130307$
The text was updated successfully, but these errors were encountered: