Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong order with %% $<delim>=.* #4317

Closed
p6rt opened this issue Jun 11, 2015 · 6 comments
Closed

Wrong order with %% $<delim>=.* #4317

p6rt opened this issue Jun 11, 2015 · 6 comments

Comments

@p6rt
Copy link

p6rt commented Jun 11, 2015

Migrated from rt.perl.org#125391 (status was 'resolved')

Searchable as RT125391$

@p6rt
Copy link
Author

p6rt commented Jun 11, 2015

From @AlexDaniel

Code​:
say grammar Gram { regex TOP { ('XX')+ %% $<delim>=<[a..z]>* };
}.parse('XXXXXX');

Output​:
「XXXXXX」
0 => 「XX」
0 => 「XX」
delim => 「」
0 => 「XX」
delim => 「」
delim => 「」

The order is wrong. I expected this​:
「XXXXXX」
0 => 「XX」
delim => 「」
0 => 「XX」
delim => 「」
0 => 「XX」
delim => 「」

This problem is more obvious if you add actual delimiters (in which case it
works correctly)​:
say grammar Gram { regex TOP { ('XX')+ %% $<delim>=<[a..z]>* };
}.parse('XXaXXbXXc');

Output​:
「XXaXXbXXc」
0 => 「XX」
delim => 「a」
0 => 「XX」
delim => 「b」
0 => 「XX」
delim => 「c」

This output is correct.

@p6rt
Copy link
Author

p6rt commented Jun 12, 2015

From @pmichaud

On Thu, Jun 11, 2015 at 02​:53​:15PM -0700, Alex Jakimenko wrote​:

say grammar Gram { regex TOP { ('XX')+ %% $<delim>=<[a..z]>* };
}.parse('XXXXXX');

Output​:
「XXXXXX」
0 => 「XX」
0 => 「XX」
delim => 「」
0 => 「XX」
delim => 「」
delim => 「」

The order is wrong.

I'm not entirely certain this is a bug, although it might be less than optimal. By using 'say' we're seeing the .gist of the resulting Match object, which in turn is using .caps . The .caps method sorts the captured parts based on the .from of each submatch.

In this case, $<delim>[0] and $0[1] both have a .from of 2. Since their .from's result in a tie, the one that happens to be earlier in the capture hash ('0' in this case) ends up appearing first (since sorting is stable).

I guess we can update .caps to sort based on both the .from and .to values of each submatch, although doing so will likely make .caps a fair bit slower.

Pm

@p6rt
Copy link
Author

p6rt commented Jun 12, 2015

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Jun 13, 2015

From @TimToady

Since the sort is stable, I believe it suffices simply to sort on the .to instead of the .from.

Larry

@p6rt
Copy link
Author

p6rt commented Jul 15, 2015

From @jnthn

On Fri Jun 12 17​:19​:02 2015, larry wrote​:

Since the sort is stable, I believe it suffices simply to sort on the
.to instead of the .from.

Seems so. Added a test in S05-capture/caps.t.

@p6rt
Copy link
Author

p6rt commented Jul 15, 2015

@jnthn - Status changed from 'open' to 'resolved'

@p6rt p6rt closed this as completed Jul 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant