Skip Menu |
Report information
Id: 75586
Status: open
Priority: 0/
Queue: perl6

Owner: Nobody
Requestors: cognominal <cognominal [at] gmail.com>
drforr [at] pobox.com
matt-w <matthew [at] matthew-walton.co.uk>
pawel.pabian [at] implix.com
tomentiruran [at] gmail.com
Cc:
AdminCc:

Severity: (no value)
Tag: (no value)
Platform: (no value)
Patch Status: (no value)
VM: (no value)



Subject: [BUG] Infinite loop in Rakudo during subst
Date: Sun, 22 Mar 2009 18:25:16 +0000
To: rakudobug [...] perl.org
From: Matthew Walton <matthew [...] matthew-walton.co.uk>
Download (untitled) / with headers
text/plain 237b
Run this: perl6 -e "my $x = 'wibble'; $x.subst(/<ws>+ $$/, '');" Watch your processor usage go through the roof. It doesn't seem to terminate (as far as I can tell without solving the halting problem), but memory usage remains stable.
Download signature.asc
application/pgp-signature 197b

Message body not shown because it is not plain text.

Subject: Re: [perl #64094] AutoReply: [BUG] Infinite loop in Rakudo during subst
Date: Sun, 22 Mar 2009 19:04:56 +0000
To: perl6-bugs-followup [...] perl.org
From: Matthew Walton <matthew [...] matthew-walton.co.uk>
Download (untitled) / with headers
text/plain 215b
Turns out that the loop is actually caused by the <ws>+, not the subst, as I had the same inside an ordinary match. jnthn tells me that the + is a bit redundant, but it shouldn't cause a loop. <ws>* also succumbs.
Download signature.asc
application/pgp-signature 197b

Message body not shown because it is not plain text.

Subject: no protection against potentially infinite quantification on zero-width assertion
Date: Mon, 7 Jun 2010 18:09:46 +0200
To: bug <rakudobug [...] perl.org>
From: Stéphane Payrard <cognominal [...] gmail.com>
To reproduce : say 'a' ~~ m/''*/ and wait until the end of times. -- cognominal stef
Subject: Rakudo goes into infinite loop on multiplied zero-width matches like: ""+ or <ws>+
Date: Wed, 9 Jun 2010 16:32:13 +0200
To: rakudobug [...] perl.org
From: Paweł Pabian <pawel.pabian [...] implix.com>
Download (untitled) / with headers
text/plain 1.7k
[16:25] <bbkr> rakudo: grammar X { token TOP { <ws>+ } }; X.parse(" "); # why it goes into infinite loop (despite the fact that + is not necessary) ? [16:25] <p6eval> rakudo a54677: ( no output ) [16:25] <bbkr> known bug? [16:26] <masak> haven't seen it before, no. [16:26] <jnthn> Me either. [16:26] * bbkr reports [16:26] <jnthn> I guess since <ws> can match nothing, it can also match nothing a lot of times. ;-) [16:27] <masak> that might explain it. [16:27] <masak> so maybe it's expected behavior. [16:27] <masak> maybe even something that merits a warning from the compiler. [16:27] <masak> if someone does <ws>+ with the standard <ws> rule, they get a "you probably don't mean that" warning. [16:27] <bbkr> rakudo: grammar X { token TOP { ""+ } }; X.parse(" "); # looks like jnthn i right [16:28] <p6eval> rakudo a54677: ( no output ) [16:28] <-- jaldhar_ has left this server (Remote host closed the connection). [16:28] <masak> rakudo: " " ~~ / [ '' ]+ /; say 'alive' [16:28] <p6eval> rakudo a54677: ( no output ) [16:28] <jnthn> Ah, *any* zero-width match seems to exhibit it. [16:29] <jnthn> That probably is a bug. [16:29] <masak> no, it's not. [16:29] <masak> think about it. it's expected behavior. [16:29] <jnthn> rakudo: " " ~~ / [ x? ]+ /; say 'alive' [16:29] <masak> regexes are *expected* to do this. [16:29] <p6eval> rakudo a54677: ( no output ) [16:29] <masak> I have a plan for letting GGE detect these things, though. [16:29] <masak> it tends to bite people. [16:30] <masak> I also seem to recall that with a Thompson engine, you can't even think that kind of wrong thought. ;) [16:30] <jnthn> masak: It's inconsistent with Perl 5. [16:30] <jnthn> C:\>perl -e "' ' =~ /(x?)+/; print 42;" [16:30] <jnthn> 42 [16:30] <masak> oh? [16:30] <masak> then it probably merits pmichaud's and TimToady's attention.
Download (untitled) / with headers
text/plain 1.2k
another part of discussion [16:36] <jnthn> mathw: I'm trying to work out why it might not be so simple as "check if we advanced by any characters" [16:37] <mathw> jnthn: it'd be nice if it is that simple [16:39] <jnthn> rakudo: for 1..5 { my $x = 0; " " ~~ /(<?{ $x++; rand() Show quoted text
> 0.5 }> x?)+/; say $x; }
[16:39] <p6eval> rakudo a54677: OUTPUT«===SORRY!===␤Unsupported use of rand(); in Perl 6 please use rand at line 11, near "() > 0.5 }"␤» [16:40] <jnthn> rakudo: for 1..5 { my $x = 0; " " ~~ /(<?{ $x++; rand > 0.5 }> x?)+/; say $x; } [16:40] <p6eval> rakudo a54677: OUTPUT«2␤2␤2␤5␤4␤» [16:40] <jnthn> Well, we'd break that. ;-) [16:40] <mathw> oh boo [16:40] <mathw> I'm conflicted [16:40] <mathw> on the one hand is the side of me which says "people who rely on that kind of trick can figure out another way" [16:41] <mathw> it also says "we shouldn't make it trivial to introduce infinite loops into regexps via a common mistake" [16:41] <mathw> however the other side of me says "Perl gives you enough rope to hang yourself" [16:41] <mathw> but I can imagine being in here for years answering about why <ws>+ causes infinite loops [16:44] <bbkr> <ws>+ causing infinite loop is not DWIM-style, i think most people understan it as "\s+". beside: i cannot imagine where this "bug" can be used for purpose
Download (untitled) / with headers
text/plain 376b
On Wed Jun 09 07:49:39 2010, bbkr wrote: Show quoted text
> another part of discussion > > [16:36] <jnthn> mathw: I'm trying to work out why it might not be so > simple as "check if we advanced by any characters" > [16:37] <mathw> jnthn: it'd be nice if it is that simple
14:21 < diakopter> (that's what I did in the javascript peg for MGrammar - check if it advanced; seemed to work great)
Subject: Grammar OOM error
To: Rakudobug <rakudobug [...] perl.org>
Date: Fri, 27 Mar 2015 23:01:22 +0100
From: drforr [...] pobox.com
Download (untitled) / with headers
text/plain 730b
OS: Ubuntu 14.04 LTS on VirtualBox Host: Windows 8, dual Core i5 Rakudo version: current as of 3/25/2015 This edge case invokes the OOM killer on my test machine. It requires at least one level of nesting, and is probably analogous to the exponential backtracking in the '(a(a(a(...)*)*)*)' regular expression, although that expression will terminate. I don't think this one does :) It is a fairly extreme edge case, but if I did this by accident I'm sure someone else will. It also feels like something not resetting pos() after backtracking, but I don't claim to know the new regex's internals. The code is here: --cut here-- grammar Bug { token blank { \s* } token TOP { <blank>* } } Bug.parse(''); --cut here--
To: rakudobug [...] perl.org
Date: Tue, 20 Sep 2016 16:37:54 +0800
From: Andrew Buchanan <tomentiruran [...] gmail.com>
Subject: Simple Grammar Goes into infinite loop
Download (untitled) / with headers
text/plain 221b
This is a simple two-regex grammar. It is as if the ‘?’ modifier is set on the wildcards, and matching nothing doesn’t stop the parser.
--
Andrew Buchanan





Download Grammar-Hangs.t
text/plain 12.7k

Message body is not shown because sender requested not to inline it.

RT-Send-CC: perl6-compiler [...] perl.org
Download (untitled) / with headers
text/plain 243b
I'm not seeing the bug here, to be honest. The `Body` is asking for one or more tokens `Text`, *nothing* is a valid match for those tokens, so after matching the provided text, your grammar continues to match nothing infinite number of times.
RT-Send-CC: perl6-compiler [...] perl.org
Download (untitled) / with headers
text/plain 270b
Here's a much shorter way to reproduce it: perl6 -e '"foo" ~~ /(.*)+/' # hangs While my previous explanation for why this occurs makes sense, it's worth noting this behaviour is not observed in Perl 5, for example: perl -e '"foo" =~ /(.*)+/' # does not hang
Download (untitled) / with headers
text/plain 389b
On Tue Sep 20 06:06:59 2016, cpan@zoffix.com wrote: Show quoted text
> Here's a much shorter way to reproduce it: > > perl6 -e '"foo" ~~ /(.*)+/' # hangs > > While my previous explanation for why this occurs makes sense, it's > worth noting this behaviour is not observed in Perl 5, for example: > > perl -e '"foo" =~ /(.*)+/' # does not hang
This sounds like a dupe of #75586 -- Will "Coke" Coleda
To: perl6-bugs-followup [...] perl.org
Date: Tue, 20 Sep 2016 21:29:25 +0800
Subject: Re: [perl #129311] Simple Grammar Goes into infinite loop
From: Andrew Buchanan <tomentiruran [...] gmail.com>
Download (untitled) / with headers
text/plain 684b
Is this also technically correct, even though it clearly shouldn't match?

perl6 -e '"foo" ~~ /(.*)+\:/'  # hangs

In either case, going into an infinite loop is not exactly DWIM. 


Show quoted text
On 20 Sep 2016, at 9:12 PM, Will Coleda via RT <perl6-bugs-followup@perl.org> wrote:

On Tue Sep 20 06:06:59 2016, cpan@zoffix.com wrote:
Here's a much shorter way to reproduce it:

perl6 -e '"foo" ~~ /(.*)+/'  # hangs

While my previous explanation for why this occurs makes sense, it's
worth noting this behaviour is not observed in Perl 5, for example:

perl -e '"foo" =~ /(.*)+/'  # does not hang

This sounds like a dupe of #75586

--
Will "Coke" Coleda

Download (untitled) / with headers
text/plain 1.3k
Show quoted text
> Is this also technically correct, even though it clearly shouldn't > match? > > perl6 -e '"foo" ~~ /(.*)+\:/' # hangs
Looks right to me. You're asking for "Capture zero or more characters, as often as possible". So you get: 1) three characters 2) zero characters 3) zero characters 4) zero characters ...and so on. Show quoted text
> In either case, going into an infinite loop is not exactly DWIM.
Regexes are code, and when you write an infinite loop, it hangs - just like when you write an infinite loop in normal Perl 6 code. Perl 5 has a special fallback mechanism to force-advance the regex cursor by one character when it sees that it has not moved for two iterations of the same quantifier, and that makes some things "DWIM", but also causes its fair share of confusion. In Perl 6, where regex code and normal code can be intermixed easily and safely, Perl 5's special rule could mess up people's expectations for the execution order of embedded code blocks, and it's not even that crazy to imagine situations where you *want* an alternation to keep looping on the spot (e.g. until an embedded code block sets some condition to move one). Not to mention that Perl 6 doubled down on the whole "regexes are code, not magic strings" principle. So I'm not surprised that infinite loops behave exactly as they would in normal code, and the auto-advance mechanism from Perl 5 was not ported over.
Download (untitled) / with headers
text/plain 1.1k
For the record the problems the auto-advance feature causes even in Perl 5 (where embedded code blocks are experimental and rarely used), are twofold: 1) Writing an infinite loop can be indicative of a programmer mistake, and the auto-advance feature hides it by making the regex "do something" (which may or may not be what the programmer intended) instead of hanging (which would have caused the programmer to re-write the regex). 2) It can cause mysterious-seeming double captures. When there is a capture group in a quantified zero-width assertion, it will capture the same thing twice, because auto-advance kicks in when the cursor has not advanced after *two* iterations. Now, Perl 5 lets repeated captures of the same capture group overwrite each other, so this does not normally show up other than as a performance degradation (whereas in Perl 6 it would cause the same sub-matches to appear twice in $/). But it can show up in even Perl 5 with `while(m//g) { }` global matching. I remember a discussion on perlmonks a few years ago, where even experienced Perl hackers were tearing their hair out trying to figure out why such a while loop iterated over the same match twice.
To: perl6-bugs-followup [...] perl.org
Date: Wed, 21 Sep 2016 21:20:31 +0800
Subject: Re: [perl #129311] Simple Grammar Goes into infinite loop
From: Andrew Buchanan <tomentiruran [...] gmail.com>
Download (untitled) / with headers
text/plain 1.5k
Ok, that clarifies things. Now that I understand what is happening, it is straightforward to recognise and fix the problem. A sentence in the documentation might help other perl 5 transitioners from getting bitten, perhaps at the explanation of the * quantifier. 




Show quoted text
On 21 Sep 2016, at 6:35 PM, Sam S. via RT <perl6-bugs-followup@perl.org> wrote:

For the record the problems the auto-advance feature causes even in Perl 5 (where embedded code blocks are experimental and rarely used), are twofold:

1) Writing an infinite loop can be indicative of a programmer mistake, and the auto-advance feature hides it by making the regex "do something" (which may or may not be what the programmer intended) instead of hanging (which would have caused the programmer to re-write the regex).

2) It can cause mysterious-seeming double captures. When there is a capture group in a quantified zero-width assertion, it will capture the same thing twice, because auto-advance kicks in when the cursor has not advanced after *two* iterations. Now, Perl 5 lets repeated captures of the same capture group overwrite each other, so this does not normally show up other than as a performance degradation (whereas in Perl 6 it would cause the same sub-matches to appear twice  in $/). But it can show up in even Perl 5 with `while(m//g) { }` global matching. I remember a discussion on perlmonks a few years ago, where even experienced Perl hackers were tearing their hair out trying to figure out why such a while loop iterated over the same match twice.

This reminds me of https://rt.perl.org/Ticket/Display.html?id=132004

On 2015-03-27 15:01:38, drforr@pobox.com wrote:
Show quoted text
> OS: Ubuntu 14.04 LTS on VirtualBox
> Host: Windows 8, dual Core i5
>
> Rakudo version: current as of 3/25/2015
>
> This edge case invokes the OOM killer on my test machine. It requires at
> least one level of nesting, and is probably analogous to the exponential
> backtracking in the '(a(a(a(...)*)*)*)' regular expression, although
> that expression will terminate. I don't think this one does :) It is a
> fairly extreme edge case, but if I did this by accident I'm sure someone
> else will. It also feels like something not resetting pos() after
> backtracking, but I don't claim to know the new regex's internals.
>
> The code is here:
> --cut here--
> grammar Bug {
> token blank { \s* }
> token TOP { <blank>* }
> }
> Bug.parse('');
> --cut here--




This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org