New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grammar.parse DWIMs only if your TOP rule has $ on the end #5796
Comments
From @AlexDanielTo demonstrate how it works, let's say we have this grammar: grammar G { regex TOP { ‘a’ || ‘abc’ } }; So you might think that it will match either “a” or “abc”, but no, “abc” *Code:* *Result:* Why? Well, see this: Basically, TOP will first match ‘a’, and given that it is a good enough I am not sure how this behavior could be useful or even how could anyone But yes, we can close our eyes on this issue and add yet another trap to Another closely related issue is method .subparse. As it seems, subparse is |
From @zoffixznetAdditional TimToady's comments: https://irclog.perlgeek.de/perl6/2016-11-13#i_13560371 On Sat, 12 Nov 2016 18:36:15 -0800, alex.jakimenko@gmail.com wrote:
|
From @zoffixznetMy personal two cents from someone who knows very little about the workings of grammars and regexes is this is quite a bit of a WAT. `regex` is meant to backtrack and it does, for example here: <ZoffixW> m: say grammar { regex TOP { <foo> 'foo' }; regex foo { [ ‘a’ || ‘abc’ ] } }.parse: 'abcfoo' However, if the `regex` is a top-level rule, then we add an extra requirement that to enable backtracking the user also needs an anchor to the end of string. It's a special exception to the rule and would need to be documented in Traps, but why do we have it? On Sat, 12 Nov 2016 18:36:15 -0800, alex.jakimenko@gmail.com wrote:
|
The RT System itself - Status changed from 'new' to 'open' |
From @AlexDanielAfter a really long discussion today, here is one finding: current behavior appeared in 2014.03 after this commit: rakudo/rakudo@4d8734d (from https://irclog.perlgeek.de/perl6/2016-11-15#i_13573639 ) On 2016-11-12 18:36:15, alex.jakimenko@gmail.com wrote:
|
From @jnthnOn Sat, 12 Nov 2016 18:36:15 -0800, alex.jakimenko@gmail.com wrote:
This is how regexes work. When they produce a successful match they return the Cursor indicating the match that they produced. Backtracking into a regex works by asking that Cursor to try again (if it's got anything else to try).
It's more that the parse method that should ask the Cursor it gets back to try again. I've implemented that in Rakudo 4ccb2f3, and added a test in S05-grammar/parse_and_parsefile.t, so now it does what you'd expect. I'm sympathetic to the arguments that building a grammar out of regexes is, in general, not wise. However, that doesn't really justify the parse method being uncooperative towards folks who choose to so, especially given it's an easy fix and adds no cost to grammars that don't need it. And in reality, if we don't change parse the way I have, folks won't go and re-think their grammar, they'll just add the $. The memory and performance hit when they get to parsing something big with a grammar full of regexes will be a much better incentive for them to learn how to write grammars better. :-) /jnthn |
@jnthn - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#130081 (status was 'resolved')
Searchable as RT130081$
The text was updated successfully, but these errors were encountered: