Skip Menu |
Report information
Id: 88340
Status: resolved
Priority: 0/
Queue: perl6

Owner: Nobody
Requestors: moritz <moritz.lenz+perl [at] gmail.com>
Cc:
AdminCc:

Severity: (no value)
Tag: (no value)
Platform: (no value)
Patch Status: (no value)
VM: (no value)



Subject: backreferences to quantified captures are inconsistent in rakudo
Date: Tue, 12 Apr 2011 11:01:18 +0200
To: rakudobug [...] perl.org
From: Moritz Lenz <moritz [...] faui2k3.org>
Download (untitled) / with headers
text/plain 1.1k
10:49 < moritz> rakudo: say 'aaaa' ~~ /(\w)+$0/; say $0 10:49 <+p6eval> rakudo 4bf132: OUTPUT«aaaa␤a a a␤» 10:50 < moritz> huh. Is that correct? 10:50 < masak> don't think so. 10:51 * moritz facepalms 10:51 < moritz> $0 is an array 10:51 < moritz> what happens when you interpolate an array into a regex? 10:51 < moritz> you get an alternation, no? 10:52 * moritz presents the case to the Regex High Court of TimToady, pmichaud and sorear 10:52 < moritz> is that another "doctor, it hurts when I do this"? :-) 10:52 < masak> no. 10:52 < masak> this is bad. 10:53 < masak> you're clearly not intending $0 as an array, but as a submatch string. 10:54 < moritz> rakudo: say 'abcd' ~~ /(.)+$0/; say $0 10:54 <+p6eval> rakudo 4bf132: OUTPUT«␤␤» 10:54 < moritz> rakudo: say 'abca' ~~ /(.)+$0/; say $0 10:54 <+p6eval> rakudo 4bf132: OUTPUT«␤␤» 10:54 < moritz> rakudo: say 'abcc' ~~ /(.)+$0/; say $0 10:54 <+p6eval> rakudo 4bf132: OUTPUT«abcc␤a b c␤» 10:55 < moritz> ok that's not what it does 10:55 < moritz> it just takes the last item I think it is inconsistent that $0 in the regex only matches $0[*-1], but is reported as the full array outside the regex. I'm not sure what the desired behavior is.
RT-Send-CC: perl6-compiler [...] perl.org
Download (untitled) / with headers
text/plain 620b
Matching only the final match is not desirable, in my opinion. It should match the string traversed by all the matches in $0, including intervening separators, if any. The problem is that ~$0 has spaces interpolated, which will match only if the separators happen to be a single space, so there is now a $0.backref that returns the string actually traversed by the entire list of matches. Matching $0 as a backref should use that method. Show quoted text
> p6 'my $abc = " a,b,c, "; $abc ~~ /(\w+ % ",")/; say $0.backref'
a,b,c BTW, matching just the final match can be accomplished with something like $($0[*-1]), I suspect.
RT-Send-CC: perl6-compiler [...] perl.org
Download (untitled) / with headers
text/plain 1.8k
On Tue Apr 12 02:01:28 2011, moritz wrote: Show quoted text
> 10:49 < moritz> rakudo: say 'aaaa' ~~ /(\w)+$0/; say $0 > 10:49 <+p6eval> rakudo 4bf132: OUTPUT«aaaa␤a a a␤» > 10:50 < moritz> huh. Is that correct? > 10:50 < masak> don't think so. > 10:51 * moritz facepalms > 10:51 < moritz> $0 is an array > 10:51 < moritz> what happens when you interpolate an array into a regex? > 10:51 < moritz> you get an alternation, no? > 10:52 * moritz presents the case to the Regex High Court of TimToady, > pmichaud and sorear > 10:52 < moritz> is that another "doctor, it hurts when I do this"? :-) > 10:52 < masak> no. > 10:52 < masak> this is bad. > 10:53 < masak> you're clearly not intending $0 as an array, but as a > submatch string. > 10:54 < moritz> rakudo: say 'abcd' ~~ /(.)+$0/; say $0 > 10:54 <+p6eval> rakudo 4bf132: OUTPUT«␤␤» > 10:54 < moritz> rakudo: say 'abca' ~~ /(.)+$0/; say $0 > 10:54 <+p6eval> rakudo 4bf132: OUTPUT«␤␤» > 10:54 < moritz> rakudo: say 'abcc' ~~ /(.)+$0/; say $0 > 10:54 <+p6eval> rakudo 4bf132: OUTPUT«abcc␤a b c␤» > 10:55 < moritz> ok that's not what it does > 10:55 < moritz> it just takes the last item > > I think it is inconsistent that $0 in the regex only matches $0[*-1], > but is reported as the full array outside the regex. > > I'm not sure what the desired behavior is.
The new behavior is that we take the latest contiguous sequence of captures and the backref is the string ranging from the .from of the first capture in that sequence to the .to of the last capture in that sequence. So now: $ perl6-m -e "say 'aaaa' ~~ /(\w)+$0/; say $0" 「aaaa」 0 => 「a」 0 => 「a」 [「a」 「a」] Note the contiguous constraint is needed to fix this case while also not breaking cases like: "bookkeeper" ~~ m/(((\w)$0)+)/ Which should match "ookkee". Tests in S05-capture/dot.t. /jnthn


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org