Skip Menu |
Report information
Id: 126327
Status: open
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: 0perlbugs [at] rexegg.com
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)



Subject: (*SKIP) not triggering correctly
Download (untitled) / with headers
text/plain 997b
(*SKIP) should get triggered if the engine attempts to backtrack across it. Perhaps due to internal optimizations, (*SKIP) is not getting triggered in cases where backtracking across (*SKIP) is expected. === Case 1 === if ('aaaardvark aaardwolf' =~ /a{1,2}(*SKIP)ard\w+/ ) { print "\$&=\"$&\"\n"; } # After failing to match the "r", an attempt to backtrack into the {1,2} should trigger (*SKIP) # expected: aaardwolf # matched: aaardwark # note: PCRE matches the expected aaardwolf, as does Python's alternate "regex" Package === Case 2 === if ('aaaardvark aaardwolf' =~ /aa(*SKIP)ard\w+/ ) { print "\$&=\"$&\"\n"; } # matched: aaardwark # This is more open to interpretation. Even though there is nothing to backtrack to the left of (*SKIP), a naive path exploration would cause the engine to backtrack to the beginning of the string, triggering (*SKIP). This is the interpretation chosen by PCRE (even though internally it does not backtrack) as well as Python's alternate "regex" Package.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 318b
The Case 2 behavior is also inconsistent with the ever so popular (*SKIP)(*FAIL) construct, where the engine fires the (*SKIP) even when there is nothing to backtrack to the left of it. For instance, if ('tatatiti' =~ /tata(*SKIP)(*FAIL)|.{4}/ ) { print "\$&=\"$&\"\n"; } # $&="titi" # this shows that (*SKIP) fired
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 215b
Case 2 is also inconsistent with if ('123ABC' =~ /123(*SKIP)B|.{3}/ ) { print "\$&='$&'\n"; } where (*SKIP) fires (correctly IMO) even though there is nothing to backtrack to the left of it, eventually matching ABC
Date: Mon, 12 Oct 2015 17:43:20 +0200
Subject: Re: [perl #126327] (*SKIP) not triggering correctly
CC: "bugs-bitbucket [...] rt.perl.org" <bugs-bitbucket [...] rt.perl.org>
From: demerphq <demerphq [...] gmail.com>
To: Perl5 Porteros <perl5-porters [...] perl.org>
Download (untitled) / with headers
text/plain 2.4k
On 12 October 2015 at 02:40, Rex <perlbug-followup@perl.org> wrote: Show quoted text
> # New Ticket Created by Rex > # Please include the string: [perl #126327] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org/Ticket/Display.html?id=126327 > > > > (*SKIP) should get triggered if the engine attempts to backtrack across it. > > Perhaps due to internal optimizations, (*SKIP) is not getting triggered in cases where backtracking across (*SKIP) is expected.
Yes, other optimizations kick in which mean that in some cases it does not even try the pattern. Show quoted text
> > === Case 1 === > if ('aaaardvark aaardwolf' =~ /a{1,2}(*SKIP)ard\w+/ ) { print "\$&=\"$&\"\n"; } > # After failing to match the "r", an attempt to backtrack into the {1,2} should trigger (*SKIP) > # expected: aaardwolf > # matched: aaardwark > # note: PCRE matches the expected aaardwolf, as does Python's alternate "regex" Package
The mandatrory minimal string in the pattern is aard. If we do not see an aard in the string then we do not even try the regex engine. Show quoted text
> === Case 2 === > if ('aaaardvark aaardwolf' =~ /aa(*SKIP)ard\w+/ ) { print "\$&=\"$&\"\n"; } > # matched: aaardwark > # This is more open to interpretation. Even though there is nothing to backtrack to the left of (*SKIP), a naive path exploration would cause the engine to backtrack to the beginning of the string, triggering (*SKIP). This is the interpretation chosen by PCRE (even though internally it does not backtrack) as well as Python's alternate "regex" Package.
Again this an interaction with the minimal substring optimization. We jump directly to the 2nd char, which is the first place the mandatory substring "aaard" is found. When I originally implemented these directives I decided that they would NOT disable general optimizations. In a few cases I had been frustrated by (??{}) and (?{}) doing so, and decided not to repeat the same for the backtracking verbs. I think probably the bestway to fix this is to have a modifier flag which disables start position optimisations, so people can opt in if they wish. Alternatively, maybe I just didnt make the right decision about which verbs should disable optimisations. I was going to say that you can stick (??{ "" }) in your pattern to disable the required string optimisation, but either I misremember that that used to work, or something has changed with how that works. I will try to follow up on this stuff when I get time. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
RT-Send-CC: demerphq [...] gmail.com, perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 1.2k
Hi Yves, Thank you very much for looking into this! Let me preface this by saying that I find it wonderful that these verbs are even in the language. Thank you for this facility and all your hard work. A few weeks ago (*SKIP) and (*PRUNE) were picked up by Python's alternate `regex` package, so their influence is slowly spreading. I'm a little sad about the crippling of (*SKIP) by optimizations. Could this be a case where it's less important to save time by studying the pattern than to preserve the intent expressed by the pattern writer? You mentioned two possible directions: disabling optimizations for (*SKIP), or introducing a verb to do that. If you choose the second direction, may I suggest (*NO_START_OPT) ? This would make it compatible with PCRE. This modifier is explained in this section about start-of-pattern modifiers. http://www.rexegg.com/regex-modifiers.html#pcre Usually PCRE regex borrows from Perl, but there have been occasions when the reverse has taken place: http://www.rexegg.com/pcre-documentation.html#perl_pcre I'm preparing a long page to explain backtracking control verbs in the three engines that support them (Perl, PCRE and to a lesser extent Python via the alternate regex package), and that's how I happened to notice these behaviors. With many thanks and kindest regards, Rex


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org