New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixed longest (ltm) and lexical alternation makes wrong match - "42" ~~ / [ "0" || "42" ] | "4" / #6003
Comments
From @ronaldxsSo "42" ~~ / [ "0" || "42" ] | "4" / matches 4 but if you stick to just "42" ~~ / [ "0" | "42" ] | "4" / "42" ~~ / [ "0" || "42" ] || "4" / the match correctly comes back with 42. It looks like in the mixed case A test might look like: is ~("42" ~~ / [ "0" || "42" ] | "4" /), "42", "mixed longest with |
From @jnthnOn Sun, 15 Jan 2017 05:06:21 -0800, ronaldxs wrote:
This is not a bug. `||` is by design not declarative, and so only the first branch of it is significant for LTM purposes. This behavior can be confirmed by re-ordering the imperative alternation:
Therefore, the longest declarative match is 4, and so that is the branch that LTM selects. Since there's no anchoring, there's furthermore no reason for it to backtrack into the second branch of the `|` to try the other `||` branch. I'm pretty sure this behavior is relied upon in the Perl 6 grammar itself; off the top of my head, it shows up in a bunch of places for the sake of error handling. So, working as designed. Thanks, /jnthn |
The RT System itself - Status changed from 'new' to 'open' |
@jnthn - Status changed from 'open' to 'rejected' |
From 1parrota@gmail.comAny time there's a bug report based on a serious misunderstanding of On 1/15/17, jnthn@jnthn.net via RT <perl6-bugs-followup@perl.org> wrote:
|
From @AlexDanielCorrect. See Raku/doc#1141 On 2017-01-16 08:38:26, 1parrota@gmail.com wrote:
|
From @ronaldxsI contacted jnthn privately today and he was very helpful. The behavior seems to be specced in S05 in the last paragraph of section https://design.perl6.org/S05.html#Longest-token_matching starting "The || form has the old short-circuit semantics,". The paragraph goes on to say "The first || in a regex makes the token patterns on its left available to the outer longest-token matcher, but hides any subsequent tests from longest-token matching. " I also came across another relevant RT #125608 https://rt.perl.org/Ticket/Display.html?id=125608 . There are tests for the RT in roast - https://github.com/perl6/roast/blob/master/S05-metasyntax/longest-alternative.t#L449 . I don't see a test that exactly matches this ticket and am considering adding one together with the longest-alternative.t RT125608 tests. jnthn and I discussed documentation of the concern in traps. I mentioned that the split into | ltm and || sequential regex alternatives was new to Perl 6 and was not in Perl 5 regexes. jnthn replied: 'So maybe the trap is "using | in regexes without understanding its Perl 6 semantics"'. jnthn also said he thought the concept of "declarative prefixes" might be important and when asked a bit more agreed that the "term may have originated with Perl 6". The docs do have some help for alternatives in the context of 5 to 6 migration here: https://docs.perl6.org/language/5to6-nutshell#Longest_token_matching_(LTM)_displaces_alternation . I read through the section and think it could use more work including (at the least) a pointer to https://docs.perl6.org/language/regexes#Alternation:_|| . I first hit thr problem when working on a grammar that used '|' inheriting another grammar that used exclusively '||'. With people |
From @cokeIf you'd like to turn this into a doc-u-bug, that's fine; please open https://github.com/perl6/doc/issues On Mon, Jan 16, 2017 at 11:37 AM, Parrot Raiser <1parrota@gmail.com> wrote:
-- |
From @ronaldxsOn 2017-01-17 21:35, Will Coleda wrote:
Already done and described on earlier timestamp on ticket "Mon, 16 Jan Thanks |
Migrated from rt.perl.org#130562 (status was 'rejected')
Searchable as RT130562$
The text was updated successfully, but these errors were encountered: