New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behavior with backreferences nested inside subpattern references #13619
Comments
From nbtrap@nbtrap.comThis bug report corresponds to the discussion at: Perl (5.16.3 and 5.18.1 at least) doesn't always handle backreferences say 'fofof' =~ /(.)((o)\1)(?2)/ ? 'true' : 'false' say 'fffffff' =~ /(.)(?2)((\1)(?4)(\1))/ ? 'true' : 'false' say 'foffoff' =~ /(.)(?2)((.)(?4)(\1))/ ? 'true' : 'false' "use re 'debug'" shows that the second and third examples fail at the A similar (same?) problem occurs with forward or self-referential print 'abcb' =~ /^(.\2?)(.)(?1)$/ ? "true\n" : "false\n" print 'abcb' =~ /^(.(?{ printf "%d, print 'aba' =~ /^(.\1?)(?1)$/ ? "true\n" : "false\n" print 'aba' =~ /^(.(?{printf "%d, In second example immediately above, the diagnostics say that we're |
From nbtrap@nbtrap.comNathan Trapuzzano <nbtrap@nbtrap.com> writes:
I tried to come up with a test that used backreferences of capture say 'abbcdcabbda' =~ /^ (a|\3(?1)\2|(?2)) ((b|c)(?4)?) (?1) (d(?1)) $/x ? 'true' : 'false' Pretty sure this should match. I had to use paper and pencil to walk Here's the short and sweet of it. After matching the 'b' at pos 2, we Test patch attached |
From nbtrap@nbtrap.comperl.patchFrom 1a55ddecc2867b27fcb0ce64b06263a6f4a4fb7b Mon Sep 17 00:00:00 2001
From: Nathan Trapuzzano <nbtrap@nbtrap.com>
Date: Sat, 22 Feb 2014 11:13:19 -0500
Subject: [PATCH] Add pathological test case for RT #121299.
---
t/re/re_tests | 2 ++
1 file changed, 2 insertions(+)
diff --git a/t/re/re_tests b/t/re/re_tests
index c55e3d3..d066330 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1848,10 +1848,12 @@ A+(*PRUNE)BC(?{}) AAABC y $& AAABC
/^\S+=/d \x{3a3}=\x{3a0} y $& \x{3a3}=
/^\S+=/u \x{3a3}=\x{3a0} y $& \x{3a3}=
+# This group is from RT #121299
/(.)((o)\1)(?2)/ fofof y $2 of
/(.)(?2)((\1)(?4)(\1))/ fffffff y $1 f
/(.)(?2)((.)(?4)(\1))/ foffoff y $2 off
/^(.\2?)(.)(?1)$/ abcb y $2 b
/^(.\1?)(?1)$/ aba y $1 a
+/^(a|\3(?1)\2|(?2))((b|c)(?4)?)(?1)(d(?1))$/ abbcdcabbda y $1 a
# Keep these lines at the end of the file
# vim: softtabstop=0 noexpandtab
--
1.9.0
|
From nbtrap@nbtrap.comNathan Trapuzzano <nbtrap@nbtrap.com> writes:
Let me give a simpler example of what I mean: say 'aaba' =~ /^ (?3)? ((.)) (\2(?1)\2) $/x ? 'true' : 'false' See how the second occurrence of '\2' matches 'a' and not 'b'? Now this bug becomes evident when we switch the ordering: say 'aaba' =~ /^ (\3(?2)\3)? ((.)) (?1) $/x ? 'true' : 'false' In stable Perl, this fails because '\3' isn't saved when we enter the |
From @demerphqOn 22 February 2014 19:15, Nathan Trapuzzano <nbtrap@nbtrap.com> wrote:
I think this should match, and that the the second occurrence of \2
Hmm. I think this should also match. First we should match a single Yves -- |
The RT System itself - Status changed from 'new' to 'open' |
From nbtrap@nbtrap.comdemerphq <demerphq@gmail.com> writes:
Agreed. I was not clear on that point.
Yes.
($1==a and $2==a) should become true at pos 1, after the initial 'a' is
Yes, sine the capture group bindings are dynamic.,
Agreed, if it wasn't clear. Here we see the inconsistency of the
Not quite. I believe \3 would fail outright, because undefined capture say '' =~ /\1()/ ? 'true' : 'false'
See above. |
From @demerphqOn 23 February 2014 16:47, Nathan Trapuzzano <nbtrap@nbtrap.com> wrote:
Me too. :-)
Yep, you are correct. I was mistaken there.
Well, i think we are on the same page. :-) I think I have a patch for this, let me see how it goes. Yves -- |
From @demerphqOn 23 February 2014 17:05, demerphq <demerphq@gmail.com> wrote:
I pushed the following: d1b2014 fix RT #121299 - Inconsistent Which I think fixes the issues you have raised. Please verify with the It is a rework of my earlier patch to take aboard Dave's comments in Also, in order to make it easier to debug what REF was doing during a 2395827 Improve how regprop dumps Assuming this fixes the issues you noted, I think this ticket can be closed. cheers, |
From ambrus@math.bme.huSee also the related ticket https://rt-archive.perl.org/perl5/Ticket/Display.html?id=76622 . |
From nbtrap@nbtrap.com"Zsban Ambrus via RT" <perlbug-followup@perl.org> writes:
Interesting. So this issue was (kind of) raised before but dismissed I guess it just raises the issue again: is this really a bug? Perhaps |
From nbtrap@nbtrap.comdemerphq <demerphq@gmail.com> writes:
That's helpful, thanks.
I found a couple of problems so far. Most immediately: say 'match' if "abcdefghijk\12S" =~ /(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\12\123/ causes Perl to lock up. It doesn't dump core, but it doesn't do But relating to this issue specifically, the engine still behaves say 'match' if 'aaba' =~ /^ ((?:(a)|(b))\2) (?1) $/x This matches in stable perl, and it still matches with your patch. I It comes down to the issue of "when do the capture variables receive new This is problematic. The semantics of backreferences referring to local It may help illustrate what I have in mind if I explain how I So far as I know, this is a concise and full definition of subpattern |
From nbtrap@nbtrap.comNathan Trapuzzano <nbtrap@nbtrap.com> writes:
Yves, please take note that I provided you with an incorrect example print 'aba' =~ /^(.\1?)(?1)$/ ? "true\n" : "false\n" Disregard what I said earlier about this one. I believe it should fail. |
From nbtrap@nbtrap.comNathan Trapuzzano <nbtrap@nbtrap.com> writes:
Just realized I've been describing the "referential tranparency" of http://en.wikipedia.org/wiki/Referential_transparency_%28computer_science%29 |
From @demerphqOn 25 February 2014 01:54, Nathan Trapuzzano <nbtrap@nbtrap.com> wrote:
Well, that isn't related to my patch sequence. And is a Perl 5.20 showstopper. The rest of your comments l will investigate later. cheers, |
From @demerphqOn 25 February 2014 01:54, Nathan Trapuzzano <nbtrap@nbtrap.com> wrote:
Thanks, this is fixed as part of RT #121321 by commit: commit 845ab12 Fix RT #121321 - Fencepost error causes infinite loop in regex compilation Due to a fencepost error if a pattern had more than 9 capture buffers This fixes the fencepost error, adds tests, and adds some comments to |
From @demerphqOn 25 February 2014 01:54, Nathan Trapuzzano <nbtrap@nbtrap.com> wrote:
I need to think about this a bit. My first take is that I expect your I need to think about that. One could make the argument that after However this gets onto poorly defined ground as it conflicts with the "aaaa" =~ /(a)*/ we expect $1 to contain the last successful 'a'. However this would It is very possible that Perl has done a poor job at *defining* the Yves -- |
From @demerphqOn 25 February 2014 02:09, Nathan Trapuzzano <nbtrap@nbtrap.com> wrote:
I am not sure about this. You could argue it doesn't get a new binding Yves -- |
From @druud62On 2014-02-25 12:37, demerphq wrote:
Illustration: perl -wE' -- |
From nbtrap@nbtrap.comdemerphq <demerphq@gmail.com> writes:
What I'm arguing is for is something slightly stronger: the 'a' capture
I don't think that follows. If it did follow, I agree it would be Look at it like this: if subpattern calls are like function calls, then
At this point, the precise aim of this ticket has become obscured, since |
From nbtrap@nbtrap.comdemerphq <demerphq@gmail.com> writes:
You could argue it doesn't get its new _value_ until it hits close paren There may be a language barrier here, since Lisp and Perl have different use strict; print $x; # compile-time error: $x is _unbound_ my $x; # $x gets a binding, but is still undefined { print $x; # prints nothing since $x is undef } # newer binding destroyed, old binding unshadowed; $x is 5 again If $1 didn't get its bindings until close paren of its group, then |
From @iabynOn Tue, Feb 25, 2014 at 08:13:49PM -0500, Nathan Trapuzzano wrote:
Just as a data point, sometimes while executing a regex in perl, the value of "abcd" =~ /( . (?{ print "[$1]\n" }) )+/x or die; which outputs: [] So this: "aababcabcdabcde" =~ /((?:\b|\1)\w)+/ or die; matches and prints "[abcde]": -- |
From nbtrap@nbtrap.comDave Mitchell <davem@iabyn.com> writes:
Interesting point. For the quantifier/iteration analogy to hold, this |
From @demerphqOn 26 February 2014 02:13, Nathan Trapuzzano <nbtrap@nbtrap.com> wrote:
I have asked Damian Conway to share his opinions on this. I would like Yves -- |
We have cases like this
where we expect that on the second iteration of the quantifier that \1 refers to the value of the capture buffer from the previous iteration. This also applies to GOSUB. So when you jump into a paren group it does not reset the capture buffer it is associated with. We also have cases like this:
That is we have forward references. It doesn't make sense until you think of loops. We did have inconsistencies in our back reference logic, but i think we have fixed everything from this ticket. |
Migrated from rt.perl.org#121299 (status was 'open')
Searchable as RT121299$
The text was updated successfully, but these errors were encountered: