Skip Menu |
Report information
Id: 122283
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: hv <hv [at] crypt.org>
Cc:
AdminCc:

Operating System: Linux
PatchStatus: (no value)
Severity: medium
Type: core
Perl Version: 5.20.0
Fixed In: (no value)



Subject: Possible regexp memory explosion in 5.20.0
CC: hv [...] crypt.org
To: perlbug [...] perl.org
Date: Sun, 13 Jul 2014 14:20:58 +0100
From: hv [...] crypt.org
Download (untitled) / with headers
text/plain 18.2k

Message body is not shown because it is too large.

RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 647b
Further to Hugo's report... I have now had the opportunity to investigate the problem, and have concluded that this has nothing to do with Regexp::Grammars per se, except that R::G is generating the enormous regex that 5.20 is failing to compile. The attached example (constructed by removing all the R::G syntactic sugar from Hugo's original) does not make use of Regex::Grammars at all, and still leaks endlessly under 5.20...whereas it compiles repidly and without complaint under all of: 5.10.1 5.12.5 5.14.4 5.16.3 5.18.2 I hope this additional information may be of help in tracking down the regression. Damian
Subject: Perl5.20_regex_compilation_problem.pl

Message body is not shown because it is too large.

Subject: Re: [perl #122283] Possible regexp memory explosion in 5.20.0
CC: Perl5 Porters <perl5-porters [...] perl.org>
From: Aaron Crane <arc [...] cpan.org>
Date: Mon, 14 Jul 2014 00:34:06 +0100
To: RT <perlbug-followup [...] perl.org>
Download (untitled) / with headers
text/plain 886b
Damian Conway via RT <perlbug-followup@perl.org> wrote: Show quoted text
> I have now had the opportunity to investigate the problem, and have concluded that this has nothing to do with Regexp::Grammars per se, except that R::G is generating the enormous regex that 5.20 is failing to compile.
Yes, it looks that way to me too. Thanks for supplying that reduction. The file attached cuts down this regex further still, by removing all the embedded code blocks, and the various DEFINEs whose names begin "______0_88". The symptoms I observed seem to be the same, though I also get a "panic: memory wrap" error (apparently when passing 4 GiB of allocated memory). In blead, it looks like the immediate culprit is study_chunk() — it starts 185 ms after the start of the sizing pass, and I haven't yet had the patience to let it run long enough to finish. -- Aaron Crane ** http://aaroncrane.co.uk/
Download large_rx.pl
text/x-perl 33.7k

Message body is not shown because sender requested not to inline it.

CC: RT <perlbug-followup [...] perl.org>, Perl5 Porters <perl5-porters [...] perl.org>, demerphq [...] gmail.com
Subject: Re: [perl #122283] Possible regexp memory explosion in 5.20.0
To: Aaron Crane <arc [...] cpan.org>
From: Dave Mitchell <davem [...] iabyn.com>
Date: Mon, 14 Jul 2014 11:13:58 +0100
Download (untitled) / with headers
text/plain 3.1k
On Mon, Jul 14, 2014 at 12:34:06AM +0100, Aaron Crane wrote: Show quoted text
> Damian Conway via RT <perlbug-followup@perl.org> wrote:
> > I have now had the opportunity to investigate the problem, and have concluded that this has nothing to do with Regexp::Grammars per se, except that R::G is generating the enormous regex that 5.20 is failing to compile.
> > Yes, it looks that way to me too. Thanks for supplying that reduction. > > The file attached cuts down this regex further still, by removing all > the embedded code blocks, and the various DEFINEs whose names begin > "______0_88". > > The symptoms I observed seem to be the same, though I also get a > "panic: memory wrap" error (apparently when passing 4 GiB of allocated > memory). > > In blead, it looks like the immediate culprit is study_chunk() — it > starts 185 ms after the start of the sizing pass, and I haven't yet > had the patience to let it run long enough to finish.
It bisects to the following. Yves...? commit 099ec7dcf9e085a650e6d9010c12ad9649209bf4 Author: Yves Orton <demerphq@gmail.com> Date: Fri Nov 22 01:08:39 2013 +0100 Fix RT #120600: Variable length lookbehind is not variable Inside of study_chunk() we have to guard against infinite recursion with recursive subpatterns. The existing logic sort of worked, but didn't address all cases properly. qr/ (?<W>a) (?<BB> (?=(?&W))(?<=(?&W)) ) (?&BB) /x; The pattern in the test would fail when the optimizer was expanding (&BB). When it recursed, it creates a bitmap for the recursion it performs, it then jumps back to the BB node and then eventually does the first (&W) call. At this point the bit for (&W) would be set in the bitmask. When the recursion for the (&W) exited (fake exit through the study frame logic) the bit was not /unset/. When the parser then entered the (&W) again it was treated as a nested and potentially infinite length pattern. The fake-recursion in study-chunk made it little less obvious what was going on in the debug output. By reorganizing the code and adding logic to unset the bitmap when exiting this bug was fixed. Unfortunately this also revealed another little issue with patterns like this: qr/x|(?0)/ qr/(x|(?1))/ which forced the creation of a new bitmask for each branch. Effectively study_chunk treats each branch as an independent pattern, so when we are expanding (?1) via the 'x' branch we dont want that to prevent us from detecting the infinite recursion in the (?1) branch. If you were to think of trips through study_chunk as paths, and [] as recursive processing you would get something like: BRANCH 'x' END BRANCH (?0) [ 'x' END ] BRANCH (?0) [ (?0) [ 'x' END ] ] ... When we want something like: BRANCH 'x' END BRANCH (?0) [ 'x' END ] BRANCH (?0) [ (?0) INFINITE_RECURSION ] So when we deal with a branch we need to make a new recursion bitmask. -- "Foul and greedy Dwarf - you have eaten the last candle." -- "Hordes of the Things", BBC Radio.
From: demerphq <demerphq [...] gmail.com>
Date: Mon, 14 Jul 2014 15:07:09 +0200
Subject: Re: [perl #122283] Possible regexp memory explosion in 5.20.0
CC: Aaron Crane <arc [...] cpan.org>, RT <perlbug-followup [...] perl.org>, Perl5 Porters <perl5-porters [...] perl.org>
To: Dave Mitchell <davem [...] iabyn.com>
Download (untitled) / with headers
text/plain 3.4k
The only useful thing I have to add /right now/ is that I am glad I wrote a decent commit message. :-)


On 14 July 2014 12:13, Dave Mitchell <davem@iabyn.com> wrote:
Show quoted text
On Mon, Jul 14, 2014 at 12:34:06AM +0100, Aaron Crane wrote:
> Damian Conway via RT <perlbug-followup@perl.org> wrote:
> > I have now had the opportunity to investigate the problem, and have concluded that this has nothing to do with Regexp::Grammars per se, except that R::G is generating the enormous regex that 5.20 is failing to compile.
>
> Yes, it looks that way to me too. Thanks for supplying that reduction.
>
> The file attached cuts down this regex further still, by removing all
> the embedded code blocks, and the various DEFINEs whose names begin
> "______0_88".
>
> The symptoms I observed seem to be the same, though I also get a
> "panic: memory wrap" error (apparently when passing 4 GiB of allocated
> memory).
>
> In blead, it looks like the immediate culprit is study_chunk() — it
> starts 185 ms after the start of the sizing pass, and I haven't yet
> had the patience to let it run long enough to finish.

It bisects to the following. Yves...?

commit 099ec7dcf9e085a650e6d9010c12ad9649209bf4
Author: Yves Orton <demerphq@gmail.com>
Date:   Fri Nov 22 01:08:39 2013 +0100

    Fix RT #120600: Variable length lookbehind is not variable

    Inside of study_chunk() we have to guard against infinite
    recursion with recursive subpatterns. The existing logic
    sort of worked, but didn't address all cases properly.

      qr/
        (?<W>a)
        (?<BB>
          (?=(?&W))(?<=(?&W))
        )
        (?&BB)
      /x;

    The pattern in the test would fail when the optimizer
    was expanding (&BB). When it recursed, it creates a bitmap
    for the recursion it performs, it then jumps back to
    the BB node and then eventually does the first (&W) call.
    At this point the bit for (&W) would be set in the bitmask.
    When the recursion for the (&W) exited (fake exit through
    the study frame logic) the bit was not /unset/. When the parser
    then entered the (&W) again it was treated as a nested and
    potentially infinite length pattern.

    The fake-recursion in study-chunk made it little less obvious
    what was going on in the debug output.

    By reorganizing the code and adding logic to unset the bitmap
    when exiting this bug was fixed. Unfortunately this also revealed
    another little issue with patterns like this:

      qr/x|(?0)/
      qr/(x|(?1))/

    which forced the creation of a new bitmask for each branch.
    Effectively study_chunk treats each branch as an independent
    pattern, so when we are expanding (?1) via the 'x' branch
    we dont want that to prevent us from detecting the infinite recursion
    in the (?1) branch. If you were to think of trips through study_chunk
    as paths, and [] as recursive processing you would get something like:

      BRANCH 'x' END
      BRANCH (?0) [ 'x' END ]
      BRANCH (?0) [ (?0) [ 'x' END ] ]
      ...

    When we want something like:

      BRANCH 'x' END
      BRANCH (?0) [ 'x' END ]
      BRANCH (?0) [ (?0) INFINITE_RECURSION ]

    So when we deal with a branch we need to make a new recursion bitmask.



--
"Foul and greedy Dwarf - you have eaten the last candle."
    -- "Hordes of the Things", BBC Radio.



--
perl -Mre=debug -e "/just|another|perl|hacker/"
CC: RT <perlbug-followup [...] perl.org>, Perl5 Porters <perl5-porters [...] perl.org>, demerphq [...] gmail.com
To: Dave Mitchell <davem [...] iabyn.com>, Aaron Crane <arc [...] cpan.org>
From: Karl Williamson <public [...] khwilliamson.com>
Subject: Re: [perl #122283] Possible regexp memory explosion in 5.20.0
Date: Mon, 14 Jul 2014 12:15:46 -0600
Download (untitled) / with headers
text/plain 509b
On 07/14/2014 04:13 AM, Dave Mitchell wrote: Show quoted text
> It bisects to the following
I'm curious as to how you bisected this. When I tried running Aaron's script on my machine, it quickly ate up all the memory available. What I was planning to do to bisect it was to add a call to setrlimit() to perlmain.c to cause it to die when it used up a much smaller amount of memory, long before my machine starts thrashing. But perhaps you have a better way that would be educational for me and others to hear about.
From: Dave Mitchell <davem [...] iabyn.com>
Date: Mon, 14 Jul 2014 21:12:36 +0100
To: Karl Williamson <public [...] khwilliamson.com>
Subject: Re: [perl #122283] Possible regexp memory explosion in 5.20.0
CC: Aaron Crane <arc [...] cpan.org>, RT <perlbug-followup [...] perl.org>, Perl5 Porters <perl5-porters [...] perl.org>, demerphq [...] gmail.com
Download (untitled) / with headers
text/plain 828b
On Mon, Jul 14, 2014 at 12:15:46PM -0600, Karl Williamson wrote: Show quoted text
> On 07/14/2014 04:13 AM, Dave Mitchell wrote:
> >It bisects to the following
> > I'm curious as to how you bisected this. When I tried running Aaron's > script on my machine, it quickly ate up all the memory available. What I > was planning to do to bisect it was to add a call to setrlimit() to > perlmain.c to cause it to die when it used up a much smaller amount of > memory, long before my machine starts thrashing. But perhaps you have a > better way that would be educational for me and others to hear about.
I just started a new shell and did $ ulimit -v 500000 then ran the bisect. (I had to experiment for a minute or so to find a suitable limit that ran ok on 5.18.0 and died quickly on 5.2.0.) -- My get-up-and-go just got up and went.
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 944b
A tool I found useful for this is massif from the valgrind suite, e.g.: valgrind --tool=massif --stacks=yes --alloc-fn=Perl_safesysmalloc --alloc-fn=Perl_safesyscalloc --alloc-fn=Perl_safesysrealloc --alloc-fn=Perl_Slab_Alloc --time-unit=B --max-snapshots=1000 perl -MMath::Prime::XS=:all -E 'say 1' (in this case, load up a module and do basically nothing else), then: massif-visualizer massif.out.#### [#### depending on the file] to see the graphical results. This shows, for example, a memory spike from Perl__invlist_union_maybe_complement_2nd that shows up in 5.20.0 and 5.21.2 that is not in 5.19.7 when processing constant.pm's: my $normal_constant_name = qr/^_?[^\W_0-9]\w*\z/; which means it hits lots of modules. Tracking down the Perl source that causes given memory behavior is not very straightforward, but I think the tool is pretty valuable for seeing how memory is being used, and what is causing the use, over time.
Subject: Re: [perl #122283] Possible regexp memory explosion in 5.20.0
CC: perl5-porters [...] perl.org
Date: Wed, 30 Jul 2014 11:35:02 -0600
To: perlbug-followup [...] perl.org, "OtherRecipients of perl Ticket #122283:;" [...] smtp.indra.com
From: Karl Williamson <public [...] khwilliamson.com>
Download (untitled) / with headers
text/plain 1.5k
On 07/22/2014 12:29 PM, Dana Jacobsen via RT wrote: Show quoted text
> A tool I found useful for this is massif from the valgrind suite, e.g.: > > valgrind --tool=massif --stacks=yes --alloc-fn=Perl_safesysmalloc --alloc-fn=Perl_safesyscalloc --alloc-fn=Perl_safesysrealloc --alloc-fn=Perl_Slab_Alloc --time-unit=B --max-snapshots=1000 perl -MMath::Prime::XS=:all -E 'say 1' > > (in this case, load up a module and do basically nothing else), then: > > massif-visualizer massif.out.#### [#### depending on the file] > > to see the graphical results. > > This shows, for example, a memory spike from Perl__invlist_union_maybe_complement_2nd that shows up in 5.20.0 and 5.21.2 that is not in 5.19.7 when processing constant.pm's: > > my $normal_constant_name = qr/^_?[^\W_0-9]\w*\z/; > > which means it hits lots of modules. Tracking down the Perl source that causes given memory behavior is not very straightforward, but I think the tool is pretty valuable for seeing how memory is being used, and what is causing the use, over time. > > --- > via perlbug: queue: perl5 status: open > https://rt.perl.org/Ticket/Display.html?id=122283 >
The memory spike occurs when taking the union of two lists. At the beginning, it allocates enough memory for the worst case scenario, in which the lists are completely disjoint, so the memory required is the sum of the memory required by each list. At the end, the resultant list is trimmed to the actual amount used. This is done to avoid having to ask for extra memory in the middle of the operation and potentially have to do extra copies.
To: Dave Mitchell <davem [...] iabyn.com>
Date: Thu, 31 Jul 2014 10:10:21 +0200
From: demerphq <demerphq [...] gmail.com>
Subject: Re: [perl #122283] Possible regexp memory explosion in 5.20.0
CC: Aaron Crane <arc [...] cpan.org>, RT <perlbug-followup [...] perl.org>, Perl5 Porters <perl5-porters [...] perl.org>
Download (untitled) / with headers
text/plain 3.2k
On 14 July 2014 12:13, Dave Mitchell <davem@iabyn.com> wrote:
Show quoted text
On Mon, Jul 14, 2014 at 12:34:06AM +0100, Aaron Crane wrote:
> Damian Conway via RT <perlbug-followup@perl.org> wrote:
> > I have now had the opportunity to investigate the problem, and have concluded that this has nothing to do with Regexp::Grammars per se, except that R::G is generating the enormous regex that 5.20 is failing to compile.
>
> Yes, it looks that way to me too. Thanks for supplying that reduction.
>
> The file attached cuts down this regex further still, by removing all
> the embedded code blocks, and the various DEFINEs whose names begin
> "______0_88".
>
> The symptoms I observed seem to be the same, though I also get a
> "panic: memory wrap" error (apparently when passing 4 GiB of allocated
> memory).
>
> In blead, it looks like the immediate culprit is study_chunk() — it
> starts 185 ms after the start of the sizing pass, and I haven't yet
> had the patience to let it run long enough to finish.

It bisects to the following. Yves...?

commit 099ec7dcf9e085a650e6d9010c12ad9649209bf4
Author: Yves Orton <demerphq@gmail.com>
Date:   Fri Nov 22 01:08:39 2013 +0100

    Fix RT #120600: Variable length lookbehind is not variable

    Inside of study_chunk() we have to guard against infinite
    recursion with recursive subpatterns. The existing logic
    sort of worked, but didn't address all cases properly.

      qr/
        (?<W>a)
        (?<BB>
          (?=(?&W))(?<=(?&W))
        )
        (?&BB)
      /x;

    The pattern in the test would fail when the optimizer
    was expanding (&BB). When it recursed, it creates a bitmap
    for the recursion it performs, it then jumps back to
    the BB node and then eventually does the first (&W) call.
    At this point the bit for (&W) would be set in the bitmask.
    When the recursion for the (&W) exited (fake exit through
    the study frame logic) the bit was not /unset/. When the parser
    then entered the (&W) again it was treated as a nested and
    potentially infinite length pattern.

    The fake-recursion in study-chunk made it little less obvious
    what was going on in the debug output.

    By reorganizing the code and adding logic to unset the bitmap
    when exiting this bug was fixed. Unfortunately this also revealed
    another little issue with patterns like this:

      qr/x|(?0)/
      qr/(x|(?1))/

    which forced the creation of a new bitmask for each branch.
    Effectively study_chunk treats each branch as an independent
    pattern, so when we are expanding (?1) via the 'x' branch
    we dont want that to prevent us from detecting the infinite recursion
    in the (?1) branch. If you were to think of trips through study_chunk
    as paths, and [] as recursive processing you would get something like:

      BRANCH 'x' END
      BRANCH (?0) [ 'x' END ]
      BRANCH (?0) [ (?0) [ 'x' END ] ]
      ...

    When we want something like:

      BRANCH 'x' END
      BRANCH (?0) [ 'x' END ]
      BRANCH (?0) [ (?0) INFINITE_RECURSION ]

    So when we deal with a branch we need to make a new recursion bitmask.

I will try to find some time for this.

Yves 
Date: Tue, 26 Aug 2014 17:33:39 +0200
To: Dave Mitchell <davem [...] iabyn.com>
From: demerphq <demerphq [...] gmail.com>
Subject: Re: [perl #122283] Possible regexp memory explosion in 5.20.0
CC: Aaron Crane <arc [...] cpan.org>, RT <perlbug-followup [...] perl.org>, Perl5 Porters <perl5-porters [...] perl.org>
Download (untitled) / with headers
text/plain 3.7k
On 14 July 2014 12:13, Dave Mitchell <davem@iabyn.com> wrote:
Show quoted text
On Mon, Jul 14, 2014 at 12:34:06AM +0100, Aaron Crane wrote:
> Damian Conway via RT <perlbug-followup@perl.org> wrote:
> > I have now had the opportunity to investigate the problem, and have concluded that this has nothing to do with Regexp::Grammars per se, except that R::G is generating the enormous regex that 5.20 is failing to compile.
>
> Yes, it looks that way to me too. Thanks for supplying that reduction.
>
> The file attached cuts down this regex further still, by removing all
> the embedded code blocks, and the various DEFINEs whose names begin
> "______0_88".
>
> The symptoms I observed seem to be the same, though I also get a
> "panic: memory wrap" error (apparently when passing 4 GiB of allocated
> memory).
>
> In blead, it looks like the immediate culprit is study_chunk() — it
> starts 185 ms after the start of the sizing pass, and I haven't yet
> had the patience to let it run long enough to finish.

It bisects to the following. Yves...?

commit 099ec7dcf9e085a650e6d9010c12ad9649209bf4
Author: Yves Orton <demerphq@gmail.com>
Date:   Fri Nov 22 01:08:39 2013 +0100

    Fix RT #120600: Variable length lookbehind is not variable

    Inside of study_chunk() we have to guard against infinite
    recursion with recursive subpatterns. The existing logic
    sort of worked, but didn't address all cases properly.

      qr/
        (?<W>a)
        (?<BB>
          (?=(?&W))(?<=(?&W))
        )
        (?&BB)
      /x;

    The pattern in the test would fail when the optimizer
    was expanding (&BB). When it recursed, it creates a bitmap
    for the recursion it performs, it then jumps back to
    the BB node and then eventually does the first (&W) call.
    At this point the bit for (&W) would be set in the bitmask.
    When the recursion for the (&W) exited (fake exit through
    the study frame logic) the bit was not /unset/. When the parser
    then entered the (&W) again it was treated as a nested and
    potentially infinite length pattern.

    The fake-recursion in study-chunk made it little less obvious
    what was going on in the debug output.

    By reorganizing the code and adding logic to unset the bitmap
    when exiting this bug was fixed. Unfortunately this also revealed
    another little issue with patterns like this:

      qr/x|(?0)/
      qr/(x|(?1))/

    which forced the creation of a new bitmask for each branch.
    Effectively study_chunk treats each branch as an independent
    pattern, so when we are expanding (?1) via the 'x' branch
    we dont want that to prevent us from detecting the infinite recursion
    in the (?1) branch. If you were to think of trips through study_chunk
    as paths, and [] as recursive processing you would get something like:

      BRANCH 'x' END
      BRANCH (?0) [ 'x' END ]
      BRANCH (?0) [ (?0) [ 'x' END ] ]
      ...

    When we want something like:

      BRANCH 'x' END
      BRANCH (?0) [ 'x' END ]
      BRANCH (?0) [ (?0) INFINITE_RECURSION ]

    So when we deal with a branch we need to make a new recursion bitmask.

Some basic details on this issue:

In order to detect infinite recursion, and to expand certain constructs to make match more efficient, we make study_chunk walk every possible pathway through the graph formed by the grammar.

So for instance if we have node C which uses D which uses E, and we can reach C from both A and B then we will walk the full C-D-E twice.

One could probably argue this is wrong and that we should somehow store that C-D-E is "safe" when first hit it, and then avoid walking it a second time.

This is coupled with the naive bitmask strategy for marking which nodes we have seen.

More to come later.

Yves


CC: "bugs-bitbucket [...] rt.perl.org" <bugs-bitbucket [...] rt.perl.org>
To: Perl5 Porteros <perl5-porters [...] perl.org>
From: demerphq <demerphq [...] gmail.com>
Date: Thu, 25 Sep 2014 09:42:06 +0200
Subject: Re: [perl #122283] Possible regexp memory explosion in 5.20.0
Download (untitled) / with headers
text/plain 3.7k
On 13 July 2014 16:27, Hugo van der Sanden <perlbug-followup@perl.org> wrote:
Show quoted text
# New Ticket Created by  Hugo van der Sanden
# Please include the string:  [perl #122283]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org/Ticket/Display.html?id=122283 >



This is a bug report for perl from hv@crypt.org,
generated with the help of perlbug 1.40 running under perl 5.20.0.


-----------------------------------------------------------------
[Please describe your issue here]

I've been experimenting with an attempt to take a SQL grammar expressed
in BNF and convert it (programmatically) into something that can parse
SQL with it as a Regexp::Grammars (v1.035) grammar.

The code below is (60%) cut down from an interim stage in that process;
this reaches about 10MB process size under perl-5.16.3; under perl-5.20.0
it grows to over 1GB. Cutting down the grammar rule by rule does gradually
reduce the memory use, but it remains a high multiple of the memory use
under perl-5.16.3, and I've not yet found any smoking gun; I've included
the full 200-odd lines here rather than risk eliding something important.

Damain and I are looking into it, but he suggested I perlbug it as a
heads-up of a possible problem in 5.20, likely of interest to davem
as potentially relating to regexp engine changes.

zen% ulimit -v # I've set a 1GB process-size limit
1000000
zen% /usr/bin/time /opt/perl-5.16.3/bin/perl ./t0 # top(1) shows peak 10MB VIRT
ok
8.52user 0.01system 0:08.54elapsed 99%CPU (0avgtext+0avgdata 34816maxresident)k
0inputs+0outputs (0major+2331minor)pagefaults 0swaps
zen% /usr/bin/time /opt/perl-5.20.0/bin/perl ./t0
Out of memory!
Command exited with non-zero status 1
41.59user 2.10system 0:43.83elapsed 99%CPU (0avgtext+0avgdata 3641344maxresident)k
0inputs+0outputs (0major+228082minor)pagefaults 0swaps
zen% cat t0
#!/opt/perl-5.20.0/bin/perl
use strict;
use warnings;
use Regexp::Grammars;

my $g = qr{
^ <query_specification> $

<rule: simple_Latin_letter> <simple_Latin_upper_case_letter> | <simple_Latin_lower_case_letter>
<token: simple_Latin_upper_case_letter> A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
<token: simple_Latin_lower_case_letter> a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
<token: digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

You really shoud use character classes here, and not use regex subs for insertable literals. IOW, (?&digit) should be replaced with $digit which would be defined as:

$digit= "[0-9]"

Similar for (?&ws) and similar patterns.

Anyway, I have pushed the following commit which should fix this. Please test.

commit a51d618a82a7057c3aabb600a7a8691d27f44a34
Author: Yves Orton <demerphq@gmail.com>
Date:   Fri Sep 19 19:57:34 2014 +0200

    rt 122283 - do not recurse into GOSUB/GOSTART when not SCF_DO_SUBSTR
    
    See also comments in patch. A complex regex "grammar" like that in
    RT 122283 causes perl to take literally forever, and exhaust all
    memory during the pattern optimization phase.
    
    Unfortunately I could not track down exacty why this occured, but
    it was very clear that the excessive recursion was unnecessary and
    excessive. By simply eliminating the unncessary recursion performance
    goes back to being acceptable.
    
    I have not thought of a good way to test this change, so this patch
    does not include any tests. Perhaps we can test it using alarm, but
    I will follow up on that later.

Ticket closers: please dont close the ticket until I have reported that I have applied tests for this. 

cheers,
Yves



--
perl -Mre=debug -e "/just|another|perl|hacker/"
Subject: Re: [perl #122283] Possible regexp memory explosion in 5.20.0
To: demerphq <demerphq [...] gmail.com>
Date: Thu, 25 Sep 2014 08:29:15 +0100
From: hv [...] crypt.org
CC: Perl5 Porteros <perl5-porters [...] perl.org>, "bugs-bitbucket [...] rt.perl.org" <bugs-bitbucket [...] rt.perl.org>
Download (untitled) / with headers
text/plain 383b
demerphq <demerphq@gmail.com> wrote: :Anyway, I have pushed the following commit which should fix this. Please :test. : :commit a51d618a82a7057c3aabb600a7a8691d27f44a34 :Author: Yves Orton <demerphq@gmail.com> :Date: Fri Sep 19 19:57:34 2014 +0200 : : rt 122283 - do not recurse into GOSUB/GOSTART when not SCF_DO_SUBSTR Thanks Yves, I'll try this out over the weekend. Hugo
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 569b
On Thu Sep 25 00:42:31 2014, demerphq wrote: Show quoted text
> Anyway, I have pushed the following commit which should fix this. Please > test. > > commit a51d618a82a7057c3aabb600a7a8691d27f44a34 > Author: Yves Orton <demerphq@gmail.com> > Date: Fri Sep 19 19:57:34 2014 +0200
Show quoted text
> Ticket closers: please dont close the ticket until I have reported that I > have applied tests for this.
You applied tests as d9a72fccda. Does that mean this can be closed? (a51d618 is the cause of bug #122890, but I don’t think this needs to stay open because of that.) -- Father Chrysostomos
To: Perl RT Bug Tracker <perlbug-followup [...] perl.org>
Date: Sun, 5 Oct 2014 10:36:30 +0200
From: demerphq <demerphq [...] gmail.com>
Subject: Re: [perl #122283] Possible regexp memory explosion in 5.20.0
CC: Perl5 Porteros <perl5-porters [...] perl.org>
Download (untitled) / with headers
text/plain 826b
On 5 October 2014 03:55, Father Chrysostomos via RT <perlbug-followup@perl.org> wrote:
Show quoted text
On Thu Sep 25 00:42:31 2014, demerphq wrote:
> Anyway, I have pushed the following commit which should fix this. Please
> test.
>
> commit a51d618a82a7057c3aabb600a7a8691d27f44a34
> Author: Yves Orton <demerphq@gmail.com>
> Date:   Fri Sep 19 19:57:34 2014 +0200

> Ticket closers: please dont close the ticket until I have reported that I
> have applied tests for this.

You applied tests as d9a72fccda.  Does that mean this can be closed?

(a51d618 is the cause of bug #122890, but I don’t think this needs to stay open because of that.

It is up to you. I would leave it open, but if you think its better to close it then I am happy with that.

Yves 

--
perl -Mre=debug -e "/just|another|perl|hacker/"


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org