Skip Menu |
Report information
Id: 126182
Status: resolved
Priority: 0/
Queue: perl5

Owner: khw <khw [at] cpan.org>
Requestors: victor [at] drawall.cc
Cc:
AdminCc:

Operating System: Linux
PatchStatus: (no value)
Severity: Wishlist
Type: core
Perl Version: 5.23.4
Fixed In: (no value)



From: Victor ADAM <victor [...] drawall.cc>
To: perlbug [...] perl.org
Date: Fri, 25 Sep 2015 14:50:16 +0200
Subject: /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
Download (untitled) / with headers
text/plain 3.3k
This is a bug report for perl from victor.adam@derpymail.org, generated with the help of perlbug 1.40 running under perl 5.23.4. ----------------------------------------------------------------- [Please describe your issue here] How to reproduce ---------------- Show quoted text
> perl5.23.4 -e '/(.(?2))((?<=(?=(?1)).))/'
Expected behavior ----------------- Perl should immediately die with the following diagnostic: Show quoted text
> Infinite recursion in regex at -e line 1.
This is the current behavior of almost everything that would cause a regex to never complete, creating the expectation that the regex engine is safe from this kind of problem. Actual behavior --------------- Perl becomes unresponsive, allocating more and more RAM [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=wishlist --- Site configuration information for perl 5.23.4: Configured by grimy at Tue Sep 22 21:18:14 CEST 2015. Summary of my perl5 (revision 5 version 23 subversion 4) configuration: Commit id: 2d9b5f101563ac9fee41e6ca496f79db6222d2e3 Platform: osname=linux, osvers=4.0.7-2-arch, archname=x86_64-linux uname='linux localhost 4.0.7-2-arch #1 smp preempt tue jun 30 07:50:21 utc 2015 x86_64 gnulinux ' config_args='-ds -e -Dusedevel' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include' ccversion='', gccversion='5.1.0', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/x86_64-unknown-linux-gnu/5.1.0/include-fixed /usr/lib /lib/../lib /usr/lib/../lib /lib /lib64 /usr/lib64 libs=-lpthread -lnsl -lnm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -lnsl -lnm -ldl -lm -lcrypt -lutil -lc libc=libc-2.21.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.21' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong' --- @INC for perl 5.23.4: /usr/local/lib/perl5/site_perl/5.23.4/x86_64-linux /usr/local/lib/perl5/site_perl/5.23.4 /usr/local/lib/perl5/5.23.4/x86_64-linux /usr/local/lib/perl5/5.23.4 /usr/local/lib/perl5/site_perl . --- Environment for perl 5.23.4: HOME=/home/grimy LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/grimy/bin:/home/grimy/.nvim/scripts:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/opt/plan9/bin PERL_BADLANG (unset) SHELL (unset)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 554b
I am unable to reproduce this. I've tried on both -DEBUGGING builds, and not, and in 5.22.2 The regex compiles in all builds I tried it as Final program: 1: OPEN1 (3) 3: REG_ANY (4) 4: GOSUB2[+5] (7) 7: CLOSE1 (9) 9: OPEN2 (11) 11: IFMATCH[-1] (23) 13: IFMATCH[0] (20) 15: GOSUB1[-14] (18) 18: SUCCEED (0) 19: TAIL (20) 20: REG_ANY (21) 21: SUCCEED (0) 22: TAIL (23) 23: CLOSE2 (25) 25: END (0) minlen 1 r->extflags: NO_INPLACE_SUBST r->intflags: [none-set] -- Karl Williamson
From: Victor ADAM <victor [...] drawall.cc>
Date: Wed, 14 Oct 2015 08:05:09 +0200
To: perlbug-followup [...] perl.org
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
Download (untitled) / with headers
text/plain 474b
My bad, my “How to reproduce section” is wrong. When matched against an empty string, the regex will fail immediately, as expected. The problems only start when it is applied to a non-empty string, as in : perl -e '"a"=~/(.(?2))((?<=(?=(?1)).))/' I’ve just reproduced it on the latest blead (f83e001e339db3eb180f5f1918c268681665839d). I’d recommend doing a `ulimit -Sv 1048576` or some such before running the command, to prevent it from getting out of control.
CC: Perl RT Bug Tracker <perlbug-followup [...] perl.org>
From: demerphq <demerphq [...] gmail.com>
To: Victor ADAM <victor [...] drawall.cc>
Date: Wed, 14 Oct 2015 09:48:29 +0200
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
Download (untitled) / with headers
text/plain 671b
On 14 October 2015 at 08:05, Victor ADAM <victor@drawall.cc> wrote: Show quoted text
> My bad, my “How to reproduce section” is wrong. When matched against > an empty string, the regex will fail immediately, as expected. The > problems only start when it is applied to a non-empty string, as in : > > perl -e '"a"=~/(.(?2))((?<=(?=(?1)).))/' > > I’ve just reproduced it on the latest blead > (f83e001e339db3eb180f5f1918c268681665839d). > > I’d recommend doing a `ulimit -Sv 1048576` or some such before running > the command, to prevent it from getting out of control.
This should trigger an infinite recursion error. yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
To: perlbug-followup [...] perl.org
Date: Wed, 14 Oct 2015 11:11:39 +0200
From: Victor ADAM <victor [...] drawall.cc>
Yes, it *should* trigger an infinite recursion error, but it actually doesn’t.
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
Date: Wed, 14 Oct 2015 10:01:55 -0600
From: Karl Williamson <public [...] khwilliamson.com>
To: Victor ADAM <victor [...] drawall.cc>, perlbug-followup [...] perl.org, demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 2.1k
On 10/14/2015 03:11 AM, Victor ADAM wrote: Show quoted text
> Yes, it *should* trigger an infinite recursion error, but it actually doesn’t. >
Thanks for the ulimit hint. This is an area of the engine that I have never looked at, so it would take me a while to get up to speed to figure out how to fix it. Here's the first bit of the output of -Dr execution. I see why the current check doesn't work in this situation, but am unsure of how one might fix it without breaking other things: Matching REx "(.(?2))((?<=(?=(?1)).))" against "a" 0 <> <a> | 1:OPEN1(3) 0 <> <a> | 3:REG_ANY(4) 1 <a> <> | 4:GOSUB2[+5](7) 1 <a> <> | 9: OPEN2(11) 1 <a> <> | 11: IFMATCH[-1](23) 0 <> <a> | 13: IFMATCH[0](20) 0 <> <a> | 15: GOSUB1[-14](18) 0 <> <a> | 1: OPEN1(3) 0 <> <a> | 3: REG_ANY(4) 1 <a> <> | 4: GOSUB2[+5](7) 1 <a> <> | 9: OPEN2(11) 1 <a> <> | 11: IFMATCH[-1](23) 0 <> <a> | 13: IFMATCH[0](20) 0 <> <a> | 15: GOSUB1[-14](18) 0 <> <a> | 1: OPEN1(3) 0 <> <a> | 3: REG_ANY(4) 1 <a> <> | 4: GOSUB2[+5](7) 1 <a> <> | 9: OPEN2(11) 1 <a> <> | 11: IFMATCH[-1](23) 0 <> <a> | 13: IFMATCH[0](20) 0 <> <a> | 15: GOSUB1[-14](18) 0 <> <a> | 1: OPEN1(3) 0 <> <a> | 3: REG_ANY(4) 1 <a> <> | 4: GOSUB2[+5](7) 1 <a> <> | 9: OPEN2(11) 1 <a> <> | 11: IFMATCH[-1](23) 0 <> <a> | 13: IFMATCH[0](20) 0 <> <a> | 15: GOSUB1[-14](18)
To: Karl Williamson <public [...] khwilliamson.com>
From: demerphq <demerphq [...] gmail.com>
CC: Victor ADAM <victor [...] drawall.cc>, Perl RT Bug Tracker <perlbug-followup [...] perl.org>
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
Date: Wed, 14 Oct 2015 19:09:22 +0200
Download (untitled) / with headers
text/plain 714b
On 14 October 2015 at 18:01, Karl Williamson <public@khwilliamson.com> wrote: Show quoted text
> On 10/14/2015 03:11 AM, Victor ADAM wrote:
>> >> Yes, it *should* trigger an infinite recursion error, but it actually >> doesn’t. >>
> > Thanks for the ulimit hint. > > This is an area of the engine that I have never looked at, so it would take > me a while to get up to speed to figure out how to fix it.
I have a fix. (Bit vector of visited nodes). Show quoted text
> Here's the first > bit of the output of -Dr execution. I see why the current check doesn't > work in this situation, but am unsure of how one might fix it without > breaking other things
Exactly. I am still investigating. Might take me a few days, but ill get there. Yves
CC: Victor ADAM <victor [...] drawall.cc>, Perl RT Bug Tracker <perlbug-followup [...] perl.org>
To: demerphq <demerphq [...] gmail.com>
From: Karl Williamson <public [...] khwilliamson.com>
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
Date: Wed, 14 Oct 2015 11:28:52 -0600
Download (untitled) / with headers
text/plain 790b
On 10/14/2015 11:09 AM, demerphq wrote: Show quoted text
> On 14 October 2015 at 18:01, Karl Williamson <public@khwilliamson.com> wrote:
>> On 10/14/2015 03:11 AM, Victor ADAM wrote:
>>> >>> Yes, it *should* trigger an infinite recursion error, but it actually >>> doesn’t. >>>
>> >> Thanks for the ulimit hint. >> >> This is an area of the engine that I have never looked at, so it would take >> me a while to get up to speed to figure out how to fix it.
> > I have a fix. (Bit vector of visited nodes). >
>> Here's the first >> bit of the output of -Dr execution. I see why the current check doesn't >> work in this situation, but am unsure of how one might fix it without >> breaking other things
> > Exactly. I am still investigating. Might take me a few days, but ill get there. > > Yves >
Yves++
Date: Fri, 25 Dec 2015 22:24:13 -0700
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
CC: Victor ADAM <victor [...] drawall.cc>, Perl RT Bug Tracker <perlbug-followup [...] perl.org>
From: Karl Williamson <public [...] khwilliamson.com>
To: demerphq <demerphq [...] gmail.com>
Download (untitled) / with headers
text/plain 803b
On 10/14/2015 11:09 AM, demerphq wrote: Show quoted text
> On 14 October 2015 at 18:01, Karl Williamson <public@khwilliamson.com> wrote:
>> On 10/14/2015 03:11 AM, Victor ADAM wrote:
>>> >>> Yes, it *should* trigger an infinite recursion error, but it actually >>> doesn’t. >>>
>> >> Thanks for the ulimit hint. >> >> This is an area of the engine that I have never looked at, so it would take >> me a while to get up to speed to figure out how to fix it.
> > I have a fix. (Bit vector of visited nodes). >
>> Here's the first >> bit of the output of -Dr execution. I see why the current check doesn't >> work in this situation, but am unsure of how one might fix it without >> breaking other things
> > Exactly. I am still investigating. Might take me a few days, but ill get there. > > Yves >
How is this coming?
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
Date: Wed, 30 Dec 2015 03:44:47 +0100
To: Karl Williamson <public [...] khwilliamson.com>
From: demerphq <demerphq [...] gmail.com>
CC: Victor ADAM <victor [...] drawall.cc>, Perl RT Bug Tracker <perlbug-followup [...] perl.org>
Download (untitled) / with headers
text/plain 1.2k
I'm really sorry, a series of personal issues, family in hospital type things, have kept me from finishing this. I hope to find time at some point, but I cant say when. If it would help I will try to find time to upload what work-in-progress I have. Otherwise I will do my best to get to it when time permits. Yves On 26 December 2015 at 06:24, Karl Williamson <public@khwilliamson.com> wrote: Show quoted text
> On 10/14/2015 11:09 AM, demerphq wrote:
>> >> On 14 October 2015 at 18:01, Karl Williamson <public@khwilliamson.com> >> wrote:
>>> >>> On 10/14/2015 03:11 AM, Victor ADAM wrote:
>>>> >>>> >>>> Yes, it *should* trigger an infinite recursion error, but it actually >>>> doesn’t. >>>>
>>> >>> Thanks for the ulimit hint. >>> >>> This is an area of the engine that I have never looked at, so it would >>> take >>> me a while to get up to speed to figure out how to fix it.
>> >> >> I have a fix. (Bit vector of visited nodes). >>
>>> Here's the first >>> bit of the output of -Dr execution. I see why the current check doesn't >>> work in this situation, but am unsure of how one might fix it without >>> breaking other things
>> >> >> Exactly. I am still investigating. Might take me a few days, but ill get >> there. >> >> Yves >>
> > How is this coming?
-- perl -Mre=debug -e "/just|another|perl|hacker/"
RT-Send-CC: demerphq [...] gmail.com, perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 145b
I've added this to the 5.24 blockers list. Yves, if you don't have the tuits for this, could you post what you got so far? -- Karl Williamson
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 812b
On Tue Oct 13 23:05:46 2015, victor@drawall.cc wrote: Show quoted text
> My bad, my “How to reproduce section” is wrong. When matched against > an empty string, the regex will fail immediately, as expected. The > problems only start when it is applied to a non-empty string, as in : > > perl -e '"a"=~/(.(?2))((?<=(?=(?1)).))/' > > I’ve just reproduced it on the latest blead > (f83e001e339db3eb180f5f1918c268681665839d). > > I’d recommend doing a `ulimit -Sv 1048576` or some such before running > the command, to prevent it from getting out of control. >
And here it is at today's blead (on Linux x86_64): ##### $ git show | head -1 commit f24480bcc6bbe9061c0ff1595df8d3eadb2ab8db $ ulimit -Sv 1048576 && ./perl -Ilib -e '"a"=~/(.(?2))((?<=(?=(?1)).))/' Out of memory! ##### -- James E Keenan (jkeenan@cpan.org)
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
CC: Perl5 Porteros <perl5-porters [...] perl.org>
To: Perl RT Bug Tracker <perlbug-followup [...] perl.org>
From: demerphq <demerphq [...] gmail.com>
Date: Wed, 2 Mar 2016 11:01:30 +0100
Download (untitled) / with headers
text/plain 1.1k
On 2 March 2016 at 00:58, James E Keenan via RT <perlbug-followup@perl.org> wrote: Show quoted text
> On Tue Oct 13 23:05:46 2015, victor@drawall.cc wrote:
>> My bad, my “How to reproduce section” is wrong. When matched against >> an empty string, the regex will fail immediately, as expected. The >> problems only start when it is applied to a non-empty string, as in : >> >> perl -e '"a"=~/(.(?2))((?<=(?=(?1)).))/' >> >> I’ve just reproduced it on the latest blead >> (f83e001e339db3eb180f5f1918c268681665839d). >> >> I’d recommend doing a `ulimit -Sv 1048576` or some such before running >> the command, to prevent it from getting out of control. >>
> > And here it is at today's blead (on Linux x86_64): > ##### > $ git show | head -1 > commit f24480bcc6bbe9061c0ff1595df8d3eadb2ab8db > > $ ulimit -Sv 1048576 && ./perl -Ilib -e '"a"=~/(.(?2))((?<=(?=(?1)).))/' > Out of memory!
Yeah, I know. This is still on my todo list. All I can say is that I am now "back" hacking after being away for a while dealing with the passing of my father and hope to get to it sooner than later. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
CC: Perl5 Porteros <perl5-porters [...] perl.org>
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
Date: Sun, 6 Mar 2016 18:09:30 +0100
From: demerphq <demerphq [...] gmail.com>
To: Perl RT Bug Tracker <perlbug-followup [...] perl.org>
On 2 March 2016 at 11:01, demerphq <demerphq@gmail.com> wrote: Show quoted text
> On 2 March 2016 at 00:58, James E Keenan via RT > <perlbug-followup@perl.org> wrote:
>> And here it is at today's blead (on Linux x86_64): >> ##### >> $ git show | head -1 >> commit f24480bcc6bbe9061c0ff1595df8d3eadb2ab8db >> >> $ ulimit -Sv 1048576 && ./perl -Ilib -e '"a"=~/(.(?2))((?<=(?=(?1)).))/' >> Out of memory!
> > hope to get to it sooner than later.
And this should be fixed now with: ba6840fbf2fdde3e7f1bda1a26f46c901f36d5ec This might need a doc fix. It is no longer illegal to do "infinite recursion", instead the match "fails" to match at the given position. so for instance this: /a(?R)?b/ now matches "ab", "aabb", etc. I am not sure this is the best wording, but now a pattern sub-routine will fail to match if entered recursively from the same position. Note this does not apply to EVAL (??{ ... }) style recursion, only GOSUB style recursion (?&foo). Eg: "foobar"=~/(?&y)(?(DEFINE)(?<x>(?&y)?foo)(?<y>(?&x)?bar))/ matches. However: "foobar"=~/(?&y)(?(DEFINE)(?<x>(?&y)foo)(?<y>(?&x)bar))/ does not, and produces output like this: floating "foobar" at 0..9223372036854775807 (checking floating) minlen 6 Matching REx "(?&y)(?(DEFINE)(?<x>(?&y)foo)(?<y>(?&x)bar))" against "foobar" Intuit: trying to determine minimum start position... doing 'check' fbm scan, [0..6] gave 0 Found floating substr "foobar" at offset 0 (rx_origin now 0)... (multiline anchor test skipped) Intuit: Successfully guessed: match at offset 0 0 <> <foobar> | 1:GOSUB2[+16:17] 'y'(4) 0 <> <foobar> | 17: OPEN2 'y'(19) 0 <> <foobar> | 19: GOSUB1[-11:8] 'x'(22) 0 <> <foobar> | 8: OPEN1 'x'(10) 0 <> <foobar> | 10: GOSUB2[+7:17] 'y'(13) pattern left-recursion without consuming input always fails... failed... Match failed Freeing REx: "(?&y)(?(DEFINE)(?<x>(?&y)foo)(?<y>(?&x)bar))" So I think this ticket can be closed. cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Date: Tue, 8 Mar 2016 00:36:23 +0000
From: Zefram <zefram [...] fysh.org>
To: perl5-porters [...] perl.org
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
Download (untitled) / with headers
text/plain 4.3k
demerphq wrote: Show quoted text
> It is no longer illegal to do "infinite >recursion", instead the match "fails" to match at the given position.
This is broken behaviour, in multiple respects. Though we already have some similar to it. If a recursion occurs without moving through the string, this does not necessarily mean that the regexp execution would fail to terminate if left to its own devices. The arbitrary code that we can embed in a regexp means that it's easy to have part of a regexp behave differently on different iterations. But even excluding that possibility, non-embedding regexp features are quite sufficient to make a regexp that must iterate a couple of times without moving in order to match. For example, "ab" =~ /a(?(1)z|(?(2)()|()))*\1b/ should match, with exactly two iterations of the * loop, each consuming no characters. The first iteration captures $2 and can't match any other way, the second captures $1 and can't match any other way, after that it can't iterate more, and \1 must be matched to complete the match. In fact, trying this out on perl 5.22.0, perl says that it doesn't match. This is not due to any error in the structure of the regexp: perl will happily perform an equivalent match if some character is introduced such that each iteration of the loop advances through the string: "axxb" =~ /a(?:x(?(1)z|(?(2)()|())))*\1b/ So perl has a bug here in failing to perform the earlier match. Now let's consider cases that actually could recurse infinitely. The simplest case is something like "ab" =~ /a(?:)*b/ Note that in actual *regular* expressions, in the sense of matching a regular language, this kind of infinite iteration is no problem at all. If this kind of pattern is compiled down to a DFA, then the DFA just runs along the string, in constant time per character, and tells you whether or not it matches. It doesn't iterate for the iteration that appears in the expression, and so executes the infinite loop in finite time. Srsly. Of course, you can't then ask it how many iterations of the empty loop were included in its match, because it doesn't generate that sort of information. perl does have a problem with infinite iteration, because it uses an NFA with backtracking. On the above simple case it does correctly find the match, but emits a warning while doing so. And here we *can* ask how many times it went round the empty loop. In principle, in this simple case, any number of times round the loop, zero or more, could produce a valid match. The rub is that larger numbers of iterations are preferred, so with no largest valid number of iterations there is no most-preferred match. perl actually took exactly one iteration, so although it was correct to say that it matched, it was actually incorrect in which way it matched. That doesn't make any difference in the simplest case, but it's detectable with a bit of extra work. Incidentally, if we change the * to a *?, preferring lower numbers of iterations, then perl correctly produces a zero-iterations match. Though it still emits the warning. We can muck about with the loop structure to get worse misbehaviour from perl. For example, "ab" =~ /a(?:()|())*\1\2b/ should match, and in fact has infinitely many ways to match. It requires at least two iterations of the loop, at least one taking each branch. But perl declares no match, this time emitting the warning as it does so. The way in which this fails is a lot like the finite example that I started with, but that first one doesn't generate the warning. This infinite case shares with an earlier case the problem of having no most-preferred match, but changing to the closest variant that has a most-preferred match "ab" =~ /a(?(2)()|())*?\1\2b/ doesn't make perl recognise the match. So there are many cases here where the extra logic put in to avoid hanging is resulting in failing to find a match, or in finding the wrong match. We do define regexps to try the options in a specific order, and to yield the most-preferred match, so the latter is not an insignificant matter. Detecting something that would otherwise hang is nice, but not essential. We should never compromise correctness to get it. If the programmer has instructed us to infinitely recurse, then hanging is a reasonable response, and so is exhausting memory. We should not be afraid of doing what the programmer said. -zefram
CC: Perl5 Porteros <perl5-porters [...] perl.org>
Date: Tue, 8 Mar 2016 13:17:54 +0100
From: demerphq <demerphq [...] gmail.com>
To: Zefram <zefram [...] fysh.org>
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
Download (untitled) / with headers
text/plain 6.3k
On 8 March 2016 at 01:36, Zefram <zefram@fysh.org> wrote: Show quoted text
> demerphq wrote:
>> It is no longer illegal to do "infinite >>recursion", instead the match "fails" to match at the given position.
> > This is broken behaviour, in multiple respects. Though we already have > some similar to it.
I understand your argument, but my reading of the docs says that it is broken by design. Specifically: "To break the loop, the following match after a zero-length match is prohibited to have a length of zero. This prohibition interacts with backtracking (see L<"Backtracking">), and so the I<second best> match is chosen if the I<best> match is of zero length. " Show quoted text
> > If a recursion occurs without moving through the string, this does > not necessarily mean that the regexp execution would fail to terminate > if left to its own devices. The arbitrary code that we can embed in a > regexp means that it's easy to have part of a regexp behave differently on > different iterations. But even excluding that possibility, non-embedding > regexp features are quite sufficient to make a regexp that must iterate > a couple of times without moving in order to match. For example, > > "ab" =~ /a(?(1)z|(?(2)()|()))*\1b/ > > should match, with exactly two iterations of the * loop, each consuming > no characters. The first iteration captures $2 and can't match any > other way, the second captures $1 and can't match any other way, after > that it can't iterate more, and \1 must be matched to complete the match. > In fact, trying this out on perl 5.22.0, perl says that it doesn't match. > This is not due to any error in the structure of the regexp: perl will > happily perform an equivalent match if some character is introduced such > that each iteration of the loop advances through the string: > > "axxb" =~ /a(?:x(?(1)z|(?(2)()|())))*\1b/ > > So perl has a bug here in failing to perform the earlier match. > > Now let's consider cases that actually could recurse infinitely. > The simplest case is something like > > "ab" =~ /a(?:)*b/ > > Note that in actual *regular* expressions, in the sense of matching a > regular language, this kind of infinite iteration is no problem at all. > If this kind of pattern is compiled down to a DFA, then the DFA just runs > along the string, in constant time per character, and tells you whether > or not it matches. It doesn't iterate for the iteration that appears in > the expression, and so executes the infinite loop in finite time. Srsly. > Of course, you can't then ask it how many iterations of the empty loop > were included in its match, because it doesn't generate that sort of > information. > > perl does have a problem with infinite iteration, because it uses an NFA > with backtracking. On the above simple case it does correctly find the > match, but emits a warning while doing so. And here we *can* ask how > many times it went round the empty loop. In principle, in this simple > case, any number of times round the loop, zero or more, could produce a > valid match. The rub is that larger numbers of iterations are preferred, > so with no largest valid number of iterations there is no most-preferred > match. perl actually took exactly one iteration, so although it was > correct to say that it matched, it was actually incorrect in which way > it matched. That doesn't make any difference in the simplest case, > but it's detectable with a bit of extra work. > > Incidentally, if we change the * to a *?, preferring lower numbers > of iterations, then perl correctly produces a zero-iterations match. > Though it still emits the warning. > > We can muck about with the loop structure to get worse misbehaviour > from perl. For example, > > "ab" =~ /a(?:()|())*\1\2b/ > > should match, and in fact has infinitely many ways to match. It requires > at least two iterations of the loop, at least one taking each branch. > But perl declares no match, this time emitting the warning as it > does so. The way in which this fails is a lot like the finite example > that I started with, but that first one doesn't generate the warning. > This infinite case shares with an earlier case the problem of having > no most-preferred match, but changing to the closest variant that has > a most-preferred match > > "ab" =~ /a(?(2)()|())*?\1\2b/ > > doesn't make perl recognise the match. > > So there are many cases here where the extra logic put in to avoid hanging > is resulting in failing to find a match, or in finding the wrong match. > We do define regexps to try the options in a specific order, and to yield > the most-preferred match, so the latter is not an insignificant matter.
Except see the documentation I quote above. Show quoted text
> > Detecting something that would otherwise hang is nice, but not essential. > We should never compromise correctness to get it. If the programmer > has instructed us to infinitely recurse, then hanging is a reasonable > response, and so is exhausting memory. We should not be afraid of doing > what the programmer said.
I am sympathetic to what you say here. But there is a subtle difference, Patterns are often supplied by the user, and until regex recursion was introduced, it was safe(ish) to allow them to do so. So, to me, making every Perl program that allows user supplied regexes be vulnerable to user supplied infinite loops is not a good thing. Anyway, much of the dialog that you posted here IMO is unrelated to regex infinite recursion, it relates to how some parts of the regex handle matching the empty string. Can you please reframe your dialog to be purely about regex recursion? Can you show a pattern where regex recursion does not do the right thing? FWIW, I am open to restoring the old behavior, and making it a fatal exception to enter an infinite loop recursively from the same position due to left recursion in a pattern. But I do think that /(?<foo>(?&foo)foo)/ infinite looping and then eventually exiting due to memory exhaustion is NOT the right thing for perl to do in this case. Note that /(?<foo>a(?&foo)?b)/ works just fine. The only problem are things like /(?<foo>(?&foo))/. Thanks for your feedback. I will say that independently of this, we *CAN* implement some form of "maximum work" mechanism for matching, which would allow people to address the "insecure pattern" problem in a saner way. Which would to some degree ameliorate the concern I have here. cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
Date: Tue, 8 Mar 2016 17:20:58 +0000
From: Zefram <zefram [...] fysh.org>
To: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 2.4k
demerphq wrote: Show quoted text
>Anyway, much of the dialog that you posted here IMO is unrelated to >regex infinite recursion, it relates to how some parts of the regex >handle matching the empty string.
It's almost precisely equivalent. The problems you're addressing in (?R) type recursion are the same ones that occur with * iteration, because the iteration really operates in a recursive fashion. In general, /(x*)/ is equivalent to /((?:x(?1))?)/, and for the purposes of which ways it can match those are equivalent to /((?:(?1)x)?)/. The (documented) behaviour that captures by the recursively-called subpattern are not available to the calling pattern breaks some of my constructions, but generally the same problems apply. Show quoted text
>Can you show a pattern where regex recursion does not do the right >thing?
Just using left recursion, not emulating a zero-length iteration, consider "abbc" =~ /a((?:(?1)b)?)c/ Mathematically this does match, in exactly one way. But the naive execution of the pattern would infinitely recurse, attempting to try out infinitely many paths that would not match before it'll consider the way that actually does. So acceptable results are to yield a match, hang, exhaust memory, or signal an exception to forestall hanging. 5.22 signals "Infinite recursion in regex", which is fine. blead says it does not match, which is the wrong answer. The left recursion can be rescued by changing the ? to ??, so preferring less recursion. Of course this changes the equivalent iterative construct from * to *?. With that change, blead correctly shows a match, but 5.22 still signals "Infinite recursion". Looking at some zero-length-iteration cases, I found some curious behaviour on "abc" =~ /a((?1)??)c/ 5.22 signals "Infinite recursion" for this, not surprisingly, but blead hangs, slowly eating memory. Hanging is acceptable here, as I've said, but this seems to be a regression from the point of view of your objective of avoiding hanging. I'd expect your fail-on-recursion logic to (correctly) say that this does not match, in finite time. There are closely related cases, such as the same thing with ? in place of ??, for which blead produces the right answer and 5.22 signals "Infinite recursion". Show quoted text
>The only problem are things like /(?<foo>(?&foo))/.
With unconditional recursion, that has no way to match using a finite amount of structure, and mathematically it's tautological, so I'm fine about it being shortcutted. I'm concerned about things that in principle can actually match. -zefram
Subject: Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats all available RAM
CC: Perl5 Porteros <perl5-porters [...] perl.org>
From: demerphq <demerphq [...] gmail.com>
To: Zefram <zefram [...] fysh.org>
Date: Tue, 8 Mar 2016 18:40:42 +0100
On 8 March 2016 at 18:20, Zefram <zefram@fysh.org> wrote: Show quoted text
> demerphq wrote:
>>Anyway, much of the dialog that you posted here IMO is unrelated to >>regex infinite recursion, it relates to how some parts of the regex >>handle matching the empty string.
> > It's almost precisely equivalent. The problems you're addressing in (?R) > type recursion are the same ones that occur with * iteration, because the > iteration really operates in a recursive fashion. In general, /(x*)/ > is equivalent to /((?:x(?1))?)/, and for the purposes of which ways it > can match those are equivalent to /((?:(?1)x)?)/. The (documented) > behaviour that captures by the recursively-called subpattern are not > available to the calling pattern breaks some of my constructions, but > generally the same problems apply. >
>>Can you show a pattern where regex recursion does not do the right >>thing?
> > Just using left recursion, not emulating a zero-length iteration, consider > > "abbc" =~ /a((?:(?1)b)?)c/ > > Mathematically this does match, in exactly one way. But the naive > execution of the pattern would infinitely recurse, attempting to try out > infinitely many paths that would not match before it'll consider the > way that actually does. So acceptable results are to yield a match, > hang, exhaust memory, or signal an exception to forestall hanging. > 5.22 signals "Infinite recursion in regex", which is fine. blead says > it does not match, which is the wrong answer.
Ok, then I will restore the "Infinite recursion in regex" so we do not do the wrong thing. Show quoted text
> > The left recursion can be rescued by changing the ? to ??, so preferring > less recursion. Of course this changes the equivalent iterative construct > from * to *?. With that change, blead correctly shows a match, but 5.22 > still signals "Infinite recursion".
Hmm. Show quoted text
> > Looking at some zero-length-iteration cases, I found some curious > behaviour on > > "abc" =~ /a((?1)??)c/ > > 5.22 signals "Infinite recursion" for this, not surprisingly, but > blead hangs, slowly eating memory. Hanging is acceptable here, as I've > said, but this seems to be a regression from the point of view of your > objective of avoiding hanging. I'd expect your fail-on-recursion logic > to (correctly) say that this does not match, in finite time.
Yes, me too. I am looking into why it doesn't. Show quoted text
> There are > closely related cases, such as the same thing with ? in place of ??, > for which blead produces the right answer and 5.22 signals "Infinite > recursion".
Could you help me with this and post some test you think should pass? I am digging into the example you gave to figure out why it doesnt get caught by the new algorithm. Show quoted text
>
>>The only problem are things like /(?<foo>(?&foo))/.
> > With unconditional recursion, that has no way to match using a finite > amount of structure, and mathematically it's tautological, so I'm fine > about it being shortcutted. I'm concerned about things that in principle > can actually match.
I appreciate that concern and I share it too. cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 278b
In http://nntp.perl.org/group/perl.perl5.porters/235003, which got attached to the wrong ticket, it was proposed to possibly revert the patch. I do not believe that it is acceptable to eat up all the machine's available memory. which would be one consequence of that reversion.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 770b
On Thu Mar 10 10:33:39 2016, khw wrote: Show quoted text
> In http://nntp.perl.org/group/perl.perl5.porters/235003, which got > attached to the wrong ticket, it was proposed to possibly revert the > patch. I do not believe that it is acceptable to eat up all the > machine's available memory. which would be one consequence of that > reversion.
I believe that this ticket can be closed now. Outside of an issue with Solaris as far as I know I fixed the issues raised in this thread, including the ones from zefram. ce12e2548182b5bf6788188c520311eef0eca0ca 595de7616f5ea9813e28dc502de1c480ef2a5a97 d1c49ad5e2e1b02dbc27f05c9ecf435fae326293 and especially 401a80220e9b38609f41611345e25c9956fd7157 ba6840fbf2fdde3e7f1bda1a26f46c901f36d5ec If the OP concurs then I think this is done.
RT-Send-CC: perl5-porters [...] perl.org
On Mon Apr 04 03:35:12 2016, demerphq wrote: Show quoted text
> On Thu Mar 10 10:33:39 2016, khw wrote:
> > In http://nntp.perl.org/group/perl.perl5.porters/235003, which got > > attached to the wrong ticket, it was proposed to possibly revert the > > patch. I do not believe that it is acceptable to eat up all the > > machine's available memory. which would be one consequence of that > > reversion.
> > > I believe that this ticket can be closed now. Outside of an issue with > Solaris as far as I know I fixed the issues raised in this thread, > including the ones from zefram. > > ce12e2548182b5bf6788188c520311eef0eca0ca > 595de7616f5ea9813e28dc502de1c480ef2a5a97 > d1c49ad5e2e1b02dbc27f05c9ecf435fae326293 > > and especially > > 401a80220e9b38609f41611345e25c9956fd7157 > ba6840fbf2fdde3e7f1bda1a26f46c901f36d5ec > > If the OP concurs then I think this is done.
The OP has not responded one way or the other. I am taking this ticket, and if the OP continues to not respond, after at least 7 days, I will close the ticket -- Karl Williamson
RT-Send-CC: perl5-porters [...] perl.org
Resolved, as I threatened earlier -- Karl Williamson


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org