Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backreferences to quantified captures are inconsistent in rakudo #2404

Closed
p6rt opened this issue Apr 12, 2011 · 6 comments
Closed

backreferences to quantified captures are inconsistent in rakudo #2404

p6rt opened this issue Apr 12, 2011 · 6 comments

Comments

@p6rt
Copy link

p6rt commented Apr 12, 2011

Migrated from rt.perl.org#88340 (status was 'resolved')

Searchable as RT88340$

@p6rt
Copy link
Author

p6rt commented Apr 12, 2011

From @moritz

10​:49 < moritz> rakudo​: say 'aaaa' ~~ /(\w)+$0/; say $0
10​:49 <+p6eval> rakudo 4bf132​: OUTPUT«aaaa␤a a a␤»
10​:50 < moritz> huh. Is that correct?
10​:50 < masak> don't think so.
10​:51 * moritz facepalms
10​:51 < moritz> $0 is an array
10​:51 < moritz> what happens when you interpolate an array into a regex?
10​:51 < moritz> you get an alternation, no?
10​:52 * moritz presents the case to the Regex High Court of TimToady,
pmichaud and sorear
10​:52 < moritz> is that another "doctor, it hurts when I do this"? :-)
10​:52 < masak> no.
10​:52 < masak> this is bad.
10​:53 < masak> you're clearly not intending $0 as an array, but as a
submatch string.
10​:54 < moritz> rakudo​: say 'abcd' ~~ /(.)+$0/; say $0
10​:54 <+p6eval> rakudo 4bf132​: OUTPUT«␤␤»
10​:54 < moritz> rakudo​: say 'abca' ~~ /(.)+$0/; say $0
10​:54 <+p6eval> rakudo 4bf132​: OUTPUT«␤␤»
10​:54 < moritz> rakudo​: say 'abcc' ~~ /(.)+$0/; say $0
10​:54 <+p6eval> rakudo 4bf132​: OUTPUT«abcc␤a b c␤»
10​:55 < moritz> ok that's not what it does
10​:55 < moritz> it just takes the last item

I think it is inconsistent that $0 in the regex only matches $0[*-1],
but is reported as the full array outside the regex.

I'm not sure what the desired behavior is.

@p6rt
Copy link
Author

p6rt commented Nov 12, 2015

From @TimToady

Matching only the final match is not desirable, in my opinion. It should match the string traversed by all the matches in $0, including intervening separators, if any. The problem is that ~$0 has spaces interpolated, which will match only if the separators happen to be a single space, so there is now a $0.backref that returns the string actually traversed by the entire list of matches. Matching $0 as a backref should use that method.

  > p6 'my $abc = " a,b,c, "; $abc ~~ /(\w+ % ",")/; say $0.backref'
  a,b,c

BTW, matching just the final match can be accomplished with something like $($0[*-1]), I suspect.

1 similar comment
@p6rt
Copy link
Author

p6rt commented Nov 12, 2015

From @TimToady

Matching only the final match is not desirable, in my opinion. It should match the string traversed by all the matches in $0, including intervening separators, if any. The problem is that ~$0 has spaces interpolated, which will match only if the separators happen to be a single space, so there is now a $0.backref that returns the string actually traversed by the entire list of matches. Matching $0 as a backref should use that method.

  > p6 'my $abc = " a,b,c, "; $abc ~~ /(\w+ % ",")/; say $0.backref'
  a,b,c

BTW, matching just the final match can be accomplished with something like $($0[*-1]), I suspect.

@p6rt
Copy link
Author

p6rt commented Nov 12, 2015

From @jnthn

On Tue Apr 12 02​:01​:28 2011, moritz wrote​:

10​:49 < moritz> rakudo​: say 'aaaa' ~~ /(\w)+$0/; say $0
10​:49 <+p6eval> rakudo 4bf132​: OUTPUT«aaaa␤a a a␤»
10​:50 < moritz> huh. Is that correct?
10​:50 < masak> don't think so.
10​:51 * moritz facepalms
10​:51 < moritz> $0 is an array
10​:51 < moritz> what happens when you interpolate an array into a regex?
10​:51 < moritz> you get an alternation, no?
10​:52 * moritz presents the case to the Regex High Court of TimToady,
pmichaud and sorear
10​:52 < moritz> is that another "doctor, it hurts when I do this"? :-)
10​:52 < masak> no.
10​:52 < masak> this is bad.
10​:53 < masak> you're clearly not intending $0 as an array, but as a
submatch string.
10​:54 < moritz> rakudo​: say 'abcd' ~~ /(.)+$0/; say $0
10​:54 <+p6eval> rakudo 4bf132​: OUTPUT«␤␤»
10​:54 < moritz> rakudo​: say 'abca' ~~ /(.)+$0/; say $0
10​:54 <+p6eval> rakudo 4bf132​: OUTPUT«␤␤»
10​:54 < moritz> rakudo​: say 'abcc' ~~ /(.)+$0/; say $0
10​:54 <+p6eval> rakudo 4bf132​: OUTPUT«abcc␤a b c␤»
10​:55 < moritz> ok that's not what it does
10​:55 < moritz> it just takes the last item

I think it is inconsistent that $0 in the regex only matches $0[*-1],
but is reported as the full array outside the regex.

I'm not sure what the desired behavior is.

The new behavior is that we take the latest contiguous sequence of captures and the backref is the string ranging from the .from of the first capture in that sequence to the .to of the last capture in that sequence. So now​:

$ perl6-m -e "say 'aaaa' ~~ /(\w)+$0/; say $0"
「aaaa」
0 => 「a」
0 => 「a」
[「a」 「a」]

Note the contiguous constraint is needed to fix this case while also not breaking cases like​:

"bookkeeper" ~~ m/(((\w)$0)+)/

Which should match "ookkee".

Tests in S05-capture/dot.t.

/jnthn

@p6rt
Copy link
Author

p6rt commented Nov 12, 2015

The RT System itself - Status changed from 'new' to 'open'

@p6rt p6rt closed this as completed Nov 12, 2015
@p6rt
Copy link
Author

p6rt commented Nov 12, 2015

@jnthn - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant