Skip Menu |
Report information
Id: 123198
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: arocker [at] vex.net
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: 5.22.0



Subject: Memory leak in regex appears in 5.20.1
To: perlbug [...] perl.org
From: arocker [...] Vex.Net
Date: Thu, 13 Nov 2014 09:30:24 -0500
Download (untitled) / with headers
text/plain 280b
A memory leak from a previously-working regex has been posted in LinkedIn's Perl group. The reporter tried but failed to submit it with perlbug. This is obviously not a satisfactory report, but if a maintainer contacts me, I will try to get the necessary information supplied.
Date: Fri, 14 Nov 2014 12:02:51 +0000
To: perl5-porters [...] perl.org
From: Dave Mitchell <davem [...] iabyn.com>
Subject: Re: [perl #123198] Memory leak in regex appears in 5.20.1
Download (untitled) / with headers
text/plain 547b
On Thu, Nov 13, 2014 at 06:30:47AM -0800, arocker@vex.net wrote: Show quoted text
> A memory leak from a previously-working regex has been posted in > LinkedIn's Perl group. The reporter tried but failed to submit it with > perlbug. > > This is obviously not a satisfactory report, but if a maintainer contacts > me, I will try to get the necessary information supplied.
Yes, please supply some more info -- All wight. I will give you one more chance. This time, I want to hear no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers. -- Life of Brian
Date: Fri, 14 Nov 2014 09:38:38 -0500
Subject: [Fwd: Re: [perl #123198] Memory leak in regex appears in 5.20.1]
To: perlbug-followup [...] perl.org
From: arocker [...] Vex.Net
Download (untitled) / with headers
text/plain 1.9k
Show quoted text
---------------------------- Original Message ---------------------------- From: "Dave Mitchell via RT" <perlbug-followup@perl.org> Date: Fri, November 14, 2014 7:03 am To: arocker@vex.net -------------------------------------------------------------------------- The poster in LinkedIn's "Perl" discussion group is Nimrod Shlomo Chotzen Initial message: I switched this week to using perl-5.20.1, and I try to run an old (but working) script on it. in this loop in loop in one function which iterates around 50000 time foreach my $key_word (grep {/\w+/} @keywords) { if ($string_to_check =~ m/^\Q$key_word\E$|^\Q$key_word\E[\W]|[\W]\Q$key_word\E[s\W]|[\W]\Q$key_word\E$|^\Q$key_word\Eies|[\W]\Q$key_word\Eies|^\Q$key_word\Ees|[\W]\Q$key_word\Ees/i){ $found_key_words{lc($key_word)}=1; } } the memory usage increases very fast. at the next call of that function it continues to increase, until it just freezes my machine. ----------------------------------------------------------------------- Second: I simplified the regexp to this: $string_to_check =~ m/(?<=\W)\Q$key_word\E(?=\W|(s|es|ies\W))/i) and now it's working great. Though I still wonder when will the next eruption of memory occur :) ------------------------------------------------------------------------ Response to a request for sample data: there are no two different datasets. just different perls 5.18.2 and 5-20.1(where the memory leak occurs) the dataset is very simple: about 50000 terms: one words,two words or three words. something like: "lantern,green lantern,green lantern movie,movie,reviews,bad reviews,bad movie,bad movie reviews"...I can't give the real set... ------------------------------------------------------------------------- It looks like a failure to free memory in the handling of regex patterns. If that area was changed between 5.18.2 & 5.20, that's probably where it is. If this isn't sufficient, I'll try to elicit more clues.
Date: Mon, 17 Nov 2014 12:14:54 +0000
To: arocker [...] Vex.Net
From: Dave Mitchell <davem [...] iabyn.com>
CC: perlbug-followup [...] perl.org, public [...] khwilliamson.com
Subject: Re: [Fwd: Re: [perl #123198] Memory leak in regex appears in 5.20.1]
Download (untitled) / with headers
text/plain 2.8k
On Fri, Nov 14, 2014 at 09:38:38AM -0500, arocker@Vex.Net wrote: Show quoted text
> The poster in LinkedIn's "Perl" discussion group is Nimrod Shlomo Chotzen > > Initial message: > > I switched this week to using perl-5.20.1, and I try to run an old (but > working) script on it. in this loop in loop in one function which iterates > around 50000 time > > foreach my $key_word (grep {/\w+/} @keywords) { > if ($string_to_check =~ > m/^\Q$key_word\E$|^\Q$key_word\E[\W]|[\W]\Q$key_word\E[s\W]|[\W]\Q$key_word\E$|^\Q$key_word\Eies|[\W]\Q$key_word\Eies|^\Q$key_word\Ees|[\W]\Q$key_word\Ees/i){ > $found_key_words{lc($key_word)}=1; > } > > }
I've managed to reproduce it with the following self-contained script: my @keywords; for (1..20_000) { push @keywords, "lantern$_"; push @keywords, "green lantern$_"; push @keywords, "green lantern movie$_"; } my $string_to_check = "xntsstnstnstnstnstnstnstbsgthsthsthsthsthshsrharhar"; foreach my $key_word (grep {/\w+/} @keywords) { if ($string_to_check =~ m/^\Q$key_word\E$|^\Q$key_word\E[\W]|[\W]\Q$key_word\E[s\W]|[\W]\Q$key_word\E$|^\Q$key_word\Eies|[\W]\Q$key_word\Eies|^\Q$key_word\Ees|[\W]\Q$key_word\Ees/i ) { $found_key_words{lc($key_word)}=1; } } and bisected it using this (the perl binary is passed as a command-line arg): $x = `/usr/bin/time -v $ARGV[0] /home/davem/tmp/p 2>&1`; $x =~ /Maximum resident set size \(kbytes\): (\d+)/ or exit 2; $n = $1; print "n=[$n]\n"; exit 1 if $n > 1_000_000; exit 0; It bisects to the following. I haven't looked further yet. Karl, is this something you want to look at? commit 3c075feabc1b777553a63a5a7d87ef482f2e3d49 Author: Karl Williamson <public@khwilliamson.com> AuthorDate: Sun Feb 2 22:30:10 2014 -0700 Commit: Karl Williamson <public@khwilliamson.com> CommitDate: Sun Feb 2 22:59:58 2014 -0700 PATCH [perl #121144]: \S, \W, etc fail for above ASCII There were three things wrong with these couple of lines of code.that help generate the synthetic start class (SSC). 1) It used PL_regkind, instead of straight OP. This meant that /u matches were treated the same as /d matches since they both have the same regkind. 2) For what it thought was just for /d, it used the complement of ASCII, which matches 128-INFINITY, whereas it wanted 128-255 only.. 3) It did a union of this complement, instead of a subtract of the non-complement, forgetting that we are about to complement the result, so that if we want the end result to have something, we better have the input not have that something, or the complementing will screw it up. -- The warp engines start playing up a bit, but seem to sort themselves out after a while without any intervention from boy genius Wesley Crusher. -- Things That Never Happen in "Star Trek" #17
CC: perlbug-followup [...] perl.org
Date: Mon, 17 Nov 2014 11:48:08 -0700
Subject: Re: [Fwd: Re: [perl #123198] Memory leak in regex appears in 5.20.1]
To: Dave Mitchell <davem [...] iabyn.com>, arocker [...] Vex.Net
From: Karl Williamson <public [...] khwilliamson.com>
Download (untitled) / with headers
text/plain 197b
On 11/17/2014 05:14 AM, Dave Mitchell wrote: Show quoted text
> It bisects to the following. I haven't looked further yet. Karl, is this > something you want to look at?
Not particularly, but I'm doing it anyway
CC: perlbug-followup [...] perl.org
Date: Mon, 17 Nov 2014 22:26:23 -0700
From: Karl Williamson <public [...] khwilliamson.com>
To: Dave Mitchell <davem [...] iabyn.com>, arocker [...] Vex.Net
Subject: Re: [Fwd: Re: [perl #123198] Memory leak in regex appears in 5.20.1]
Download (untitled) / with headers
text/plain 2.3k
On 11/17/2014 05:14 AM, Dave Mitchell wrote: Show quoted text
> On Fri, Nov 14, 2014 at 09:38:38AM -0500, arocker@Vex.Net wrote:
>> The poster in LinkedIn's "Perl" discussion group is Nimrod Shlomo Chotzen >> >> Initial message: >> >> I switched this week to using perl-5.20.1, and I try to run an old (but >> working) script on it. in this loop in loop in one function which iterates >> around 50000 time >> >> foreach my $key_word (grep {/\w+/} @keywords) { >> if ($string_to_check =~ >> m/^\Q$key_word\E$|^\Q$key_word\E[\W]|[\W]\Q$key_word\E[s\W]|[\W]\Q$key_word\E$|^\Q$key_word\Eies|[\W]\Q$key_word\Eies|^\Q$key_word\Ees|[\W]\Q$key_word\Ees/i){ >> $found_key_words{lc($key_word)}=1; >> } >> >> }
> > I've managed to reproduce it with the following self-contained script: > > my @keywords; > for (1..20_000) { > push @keywords, "lantern$_"; > push @keywords, "green lantern$_"; > push @keywords, "green lantern movie$_"; > } > my $string_to_check = "xntsstnstnstnstnstnstnstbsgthsthsthsthsthshsrharhar"; > > foreach my $key_word (grep {/\w+/} @keywords) { > if ($string_to_check =~ > m/^\Q$key_word\E$|^\Q$key_word\E[\W]|[\W]\Q$key_word\E[s\W]|[\W]\Q$key_word\E$|^\Q$key_word\Eies|[\W]\Q$key_word\Eies|^\Q$key_word\Ees|[\W]\Q$key_word\Ees/i > ) { > $found_key_words{lc($key_word)}=1; > } > > } > > and bisected it using this (the perl binary is passed as a command-line > arg): > > $x = `/usr/bin/time -v $ARGV[0] /home/davem/tmp/p 2>&1`; > > $x =~ /Maximum resident set size \(kbytes\): (\d+)/ > or exit 2; > $n = $1; > print "n=[$n]\n"; > exit 1 if $n > 1_000_000; > exit 0; > > It bisects to the following. I haven't looked further yet. Karl, is this > something you want to look at? > > commit 3c075feabc1b777553a63a5a7d87ef482f2e3d49 > Author: Karl Williamson <public@khwilliamson.com>
A fix is now smoking in http://perl5.git.perl.org/perl.git/shortlog/refs/heads/smoke-me/khw-leak This passes your test, but I tried getting blead to fail with the infrastructure in t/op/svleak.t, and couldn't, even knowing the exact cause of the problem. (For future readers trying to reproduce this, the file in the script above "/home/davem/tmp/p" should contain the perl program designamted as the "self-contained script".)
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 413b
On Mon Nov 17 21:26:52 2014, public@khwilliamson.com wrote: Show quoted text
> A fix is now smoking in > > http://perl5.git.perl.org/perl.git/shortlog/refs/heads/smoke-me/khw- > leak > > This passes your test, but I tried getting blead to fail with the > infrastructure in t/op/svleak.t, and couldn't, even knowing the exact > cause of the problem.
I have just added a to-do test in commit 43275f00a. -- Father Chrysostomos
RT-Send-CC: arocker [...] Vex.Net, perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 219b
Now fixed by commit 512e01ab009bc6309309e05891effe8ae3c0e9da in blead Thanks for the help in tracking this down and making a test for it. This should be a candidate for the next 5.20 maint release -- Karl Williamson
RT-Send-CC: perl5-porters [...] perl.org
Reopen so can be moved to pending release, instead of resolved -- Karl Williamson
RT-Send-CC: perl5-porters [...] perl.org
Download (untitled) / with headers
text/plain 295b
On Tue Nov 18 09:13:53 2014, khw wrote: Show quoted text
> Now fixed by commit 512e01ab009bc6309309e05891effe8ae3c0e9da > in blead > > Thanks for the help in tracking this down and making a test for it. > > This should be a candidate for the next 5.20 maint release
It gets my wote. -- Father Chrysostomos
Subject: Your ticket against Perl 5 has been resolved
Download (untitled) / with headers
text/plain 263b
Thanks for submitting this ticket The issue should be resolved with the release today of Perl v5.22, available at http://www.perl.org/get.html If you find that the problem persists, feel free to reopen this ticket -- Karl Williamson for the Perl 5 porters team


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org