Skip Menu |
Report information
Id: 130117
Status: resolved
Priority: 0/
Queue: perl6

Owner: smls75 [at] gmail.com
Requestors: alex.jakimenko [at] gmail.com
Cc:
AdminCc:

Severity: (no value)
Tag: regex
Platform: (no value)
Patch Status: (no value)
VM: (no value)



From: Aleks-Daniel Jakimenko-Aleksejev <alex.jakimenko [...] gmail.com>
Date: Thu, 17 Nov 2016 00:32:46 +0200
To: rakudobug [...] perl.org
Subject: [REGEX] :r does not prevent backtracking (say ‘abcz’ ~~ /:r [‘a’ || ‘abc’ ] ‘z’ /)
Download (untitled) / with headers
text/plain 1.6k
Code:
say ‘abcz’ ~~ /:r [‘a’ || ‘abc’ ] ‘z’ /

Result:
「abcz」

:r (:ratchet) should prevent backtracking (trying different ways to match a string). However, it seems like it does not work as intended (or even at all?).

To make the example above as clear as possible: || does not do LTM (longest token matching), so it will match things from left to right. ‘a’ matches just fine, so it proceeds. Then ‘z’ will not match, at which point it should give up because of :r. However, it goes back and tries ‘abc’, which is exactly what somebody would expect from backtracking, but we are trying to turn it off.

There are several ways to make it clear that backtracking actually happens when using || with :r (if example above is not enough):

Code:
grammar G { token TOP { [ ‘a’ {say $/.CURSOR.pos} || {say $/.CURSOR.pos} ‘abc’ ] {say ‘here’} ‘z’ } }; say G.parse(‘abcz’)

Result:
1
here
0
here
「abcz」

Code:
say ‘abcdefghij’ ~~ /:r [.||.||.||.||.||.||.||.] [.||.||.||.||.||.||.||.] [.||.||.||.||.||.||.||.] [.||.||.||.||.||.||.||.] [.||.||.||.||.||.||.||.] [.||.||.||.||.||.||.||.] [.||.||.||.||.||.||.||.] [.||.||.||.||.||.||.||.] [.||.||.||.||.||.||.||.] [.||.||.||.||.||.||.||.] [.||.||.||.||.||.||.||.] <!> /

Result:
(Takes a really long time to finish because it attempts to try all of the paths, even though “token” has implicit :r)


According to committable, this behavior has been there since the beginning ( https://gist.github.com/Whateverable/814019538d89ec53fca6c09269136ebd ). Therefore, I am not sure how much code will break because of us fixing this bug. However, I am hoping to see some performance improvement after we fix this.
Download (untitled) / with headers
text/plain 403b
Relevant: https://rt.perl.org/Ticket/Display.html?id=123934#txn-1401917 In short, `||` alternations don't respect `:` in Rakudo, whereas `|` alternations (and other atoms such as quantifiers) do respect it. Simpler test-case: ➜ say "ab" ~~ / [ "ab" | "a" ]: "b" /; Nil ➜ say "ab" ~~ / [ "ab" || "a" ]: "b" /; 「ab」 (Remember that `:ratchet` simply adds `:` to every atom.)
I sent a pull request which fixes this bug: https://github.com/perl6/nqp/pull/368 Please review.
Download (untitled) / with headers
text/plain 320b
On Mon, 28 Aug 2017 11:50:51 -0700, smls75@gmail.com wrote: Show quoted text
> I sent a pull request which fixes this bug: > > https://github.com/perl6/nqp/pull/368 > > Please review.
The PR was merged (and Rakudo's nqp version bumped). Marking the ticket TESTNEEDED. (Note that some possible tests are listed in the PR description!)


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org