New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LTM favors some character class atoms over others, instead of using text order as a tie breaker #6024
Comments
From @ronaldxs# https://design.perl6.org/S05.html#Longest-token_matching says: use Test; grammar text-order { is text-order.parse('1', :rule<alt-na_1>).keys[0], 'n', result: ok 1 - Match first textual rule OK Posted on IRC, https://irclog.perlgeek.de/perl6/2017-01-20#i_13960886 The IRC discussion notes that the tests should be expected to pass and See also roast issue: Raku/roast#224 |
From @smlsThis bug is still present in Rakudo version 2017.08-104-g76f1d8970 However, there's also a design question to be answered here: On Sat, 21 Jan 2017 10:32:39 -0800, ronaldxs wrote:
This S05 paragraph was clearly written with LTM of `proto` regexes in mind. What does it mean for LTM of `|` alternations? I.e., should the test-case above match 'n'... a) because `token n { ... }` is declared above `token na_2 { ... }`? The test-case at hand does not distinguish between those two interpretations (Rakudo is currently wrong either way), but it's an important distinction that needs to be clarified if the bug is to be fixed. http://design.perl6.org/S05.html#Overview has a sentence on this tie-breaker which actually mentions alternations:
I'd interpret that as option (b), but am not 100% sure. |
The RT System itself - Status changed from 'new' to 'open' |
From @smlsActually... Rakudo *does* generally follow interpretation (b): ➜ 'x' ~~ / .* { say '*' } | .? { say '?' } /; # * The observed bug is specifically with character classes: Following some more experimentation, here are various atoms for matching the digit '1', sorted into three categories based on how much LTM favors them in current Rakudo: tier 1: '1' That the literal `1` is preferred over everything else by LTM is to be expected ("longest literal prefix" tie-breaker). However, that the character classes are split into two tiers - with the syntactic ones being preferred over the named and uniprop ones - seems strange. http://design.perl6.org/S05.html#Overview says that LTM is transitive through subrules, so even if the named character classes are treated as subrule calls they shouldn't be disfavored, right? |
From @smlsConfirmed on IRC that this is a bug. Relevant log¹ (edited for clarity): <smls> Is it intended that LTM prefers \d and <[0..9]> <TimToady> no, that wasn't intended <jnthn> m: say '1' ~~ / \d | <digit> { say 'digit' }/ <jnthn> Interesting <jnthn> m: say '1' ~~ / \d | <:N> { say 'digit' }/ <jnthn> That one I don't understand |
Migrated from rt.perl.org#130612 (status was 'open')
Searchable as RT130612$
The text was updated successfully, but these errors were encountered: