New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no protection against potentially infinite quantification on zero-width assertion #1806
Comments
From @mathwRun this: perl6 -e "my $x = 'wibble'; Watch your processor usage go through the roof. It doesn't seem to |
From @mathwTurns out that the loop is actually caused by the <ws>+, not the subst, <ws>* also succumbs. |
From @cognominalTo reproduce : say 'a' ~~ m/''*/ and wait until the end of times. -- |
From @bbkr[16:25] <bbkr> rakudo: grammar X { token TOP { <ws>+ } }; X.parse(" "); # why |
From @bbkranother part of discussion [16:36] <jnthn> mathw: I'm trying to work out why it might not be so
|
From @bbkr |
From @diakopterOn Wed Jun 09 07:49:39 2010, bbkr wrote:
14:21 < diakopter> (that's what I did in the javascript peg for MGrammar - |
The RT System itself - Status changed from 'new' to 'open' |
From @drforrOS: Ubuntu 14.04 LTS on VirtualBox Rakudo version: current as of 3/25/2015 This edge case invokes the OOM killer on my test machine. It requires at The code is here: |
From tomentiruran@gmail.comThis is a simple two-regex grammar. It is as if the �?� modifier is set on the wildcards, and matching nothing doesn�t stop the parser. |
From tomentiruran@gmail.comuse v6; # This Grammar goes into an infinite loop my $g = G.new; =finish |
From @zoffixznetI'm not seeing the bug here, to be honest. The `Body` is asking for one or more tokens `Text`, *nothing* is a valid match for those tokens, so after matching the provided text, your grammar continues to match nothing infinite number of times. |
The RT System itself - Status changed from 'new' to 'open' |
From @zoffixznetHere's a much shorter way to reproduce it: perl6 -e '"foo" ~~ /(.*)+/' # hangs While my previous explanation for why this occurs makes sense, it's worth noting this behaviour is not observed in Perl 5, for example: perl -e '"foo" =~ /(.*)+/' # does not hang |
From @cokeOn Tue Sep 20 06:06:59 2016, cpan@zoffix.com wrote:
This sounds like a dupe of #75586 -- |
From tomentiruran@gmail.comIs this also technically correct, even though it clearly shouldn't match? perl6 -e '"foo" ~~ /(.*)+\:/' # hangs In either case, going into an infinite loop is not exactly DWIM.
|
From @smls
Looks right to me. You're asking for "Capture zero or more characters, as often as possible". So you get: 1) three characters
Regexes are code, and when you write an infinite loop, it hangs - just like when you write an infinite loop in normal Perl 6 code. Perl 5 has a special fallback mechanism to force-advance the regex cursor by one character when it sees that it has not moved for two iterations of the same quantifier, and that makes some things "DWIM", but also causes its fair share of confusion. In Perl 6, where regex code and normal code can be intermixed easily and safely, Perl 5's special rule could mess up people's expectations for the execution order of embedded code blocks, and it's not even that crazy to imagine situations where you *want* an alternation to keep looping on the spot (e.g. until an embedded code block sets some condition to move one). Not to mention that Perl 6 doubled down on the whole "regexes are code, not magic strings" principle. So I'm not surprised that infinite loops behave exactly as they would in normal code, and the auto-advance mechanism from Perl 5 was not ported over. |
From @smlsFor the record the problems the auto-advance feature causes even in Perl 5 (where embedded code blocks are experimental and rarely used), are twofold: 1) Writing an infinite loop can be indicative of a programmer mistake, and the auto-advance feature hides it by making the regex "do something" (which may or may not be what the programmer intended) instead of hanging (which would have caused the programmer to re-write the regex). 2) It can cause mysterious-seeming double captures. When there is a capture group in a quantified zero-width assertion, it will capture the same thing twice, because auto-advance kicks in when the cursor has not advanced after *two* iterations. Now, Perl 5 lets repeated captures of the same capture group overwrite each other, so this does not normally show up other than as a performance degradation (whereas in Perl 6 it would cause the same sub-matches to appear twice in $/). But it can show up in even Perl 5 with `while(m//g) { }` global matching. I remember a discussion on perlmonks a few years ago, where even experienced Perl hackers were tearing their hair out trying to figure out why such a while loop iterated over the same match twice. |
From tomentiruran@gmail.comOk, that clarifies things. Now that I understand what is happening, it is straightforward to recognise and fix the problem. A sentence in the documentation might help other perl 5 transitioners from getting bitten, perhaps at the explanation of the * quantifier.
|
From @AlexDanielThis reminds me of https://rt-archive.perl.org/perl6/Ticket/Display.html?id=132004 On 2015-03-27 15:01:38, drforr@pobox.com wrote:
|
The RT System itself - Status changed from 'new' to 'open' |
Migrated from rt.perl.org#75586 (status was 'open')
Searchable as RT75586$
The text was updated successfully, but these errors were encountered: