Skip Menu |
Report information
Id: 59516
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: robin.hill [at] biowisdom.com
Cc:
AdminCc:

Operating System: Linux
PatchStatus: (no value)
Severity: medium
Type: core
Perl Version: 5.10.0
Fixed In: (no value)



Subject: Memory leak with regex in 5.10.0
Date: Wed, 1 Oct 2008 13:53:44 +0100 (BST)
To: perlbug [...] perl.org
From: hillrobi [...] biowisdom.com
Download (untitled) / with headers
text/plain 4.4k
This is a bug report for perl from robin.hill@biowisdom.com, generated with the help of perlbug 1.36 running under perl 5.10.0. ----------------------------------------------------------------- I've been having problems with a script consuming all memory on the system and have tracked this down to the regex. The problem only seems to occur with a combination of quoted variables and singular character classes. The following example script steadily increases in memory usage while running: ######################################################### #!/usr/bin/perl -w use strict; use warnings; use Time::HiRes qw(usleep); my $text = 'Test string'; for my $str (1..10000) { my ($res) = $text =~ /\Q$str\E[a][b][c][d][e][f]/; usleep(5); } ######################################################### Changing the character classes to include more than one character appears to eliminate the leak. (In the actual script I'm trying to check for brackets and following the recommendation of using singular character classes instead of escaping the metacharacter). [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=medium --- This perlbug was built using Perl 5.10.0 - Tue Jul 15 14:37:49 UTC 2008 It is being executed now by Perl 5.10.0 - Tue Jul 15 14:31:57 UTC 2008. Site configuration information for perl 5.10.0: Configured by abuild at Tue Jul 15 14:31:57 UTC 2008. Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Platform: osname=linux, osvers=2.6.25, archname=x86_64-linux-thread-multi uname='linux stravinsky 2.6.25 #1 smp 20080210 20:01:04 utc x86_64 x86_64 x86_64 gnulinux ' config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe', cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe' ccversion='', gccversion='4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036]', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib64' libpth=/lib64 /usr/lib64 /usr/local/lib64 libs=-lm -ldl -lcrypt -lpthread perllibs=-lm -ldl -lcrypt -lpthread libc=/lib64/libc-2.8.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.8' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.10.0/x86_64-linux-thread-multi/CORE' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64' Locally applied patches: --- @INC for perl 5.10.0: /home/hillrobi/svn/perl_scripts /home/hillrobi/svn/perl_scripts /home/hillrobi/svn/perl_scripts /usr/lib/perl5/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/5.10.0 /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.10.0 /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.10.0 /usr/lib/perl5/vendor_perl . --- Environment for perl 5.10.0: HOME=/home/hillrobi LANG=en_GB.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH=/opt/oracle/OraHome1/lib:/opt/oracle/OraHome1/ctx/lib:/opt/oracle/OraHome1/lib32 LOGDIR (unset) PATH=/home/hillrobi/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/oracle/OraHome1/bin:/usr/local/bin:/home/hillrobi/bin PERL5LIB=/home/hillrobi/svn/perl_scripts:/home/hillrobi/svn/perl_scripts:/home/hillrobi/svn/perl_scripts PERL_BADLANG (unset) SHELL=/bin/bash
Subject: Re: [perl #59516] Memory leak with regex in 5.10.0
Date: Sat, 4 Oct 2008 12:55:04 +0100
To: perl5-porters [...] perl.org
From: Dave Mitchell <davem [...] iabyn.com>
Download (untitled) / with headers
text/plain 956b
On Wed, Oct 01, 2008 at 05:51:41AM -0700, robin.hill@biowisdom.com (via RT) wrote: Show quoted text
> The following example script steadily increases in memory usage > while running: > > ######################################################### > #!/usr/bin/perl -w > > use strict; > use warnings; > use Time::HiRes qw(usleep); > > my $text = 'Test string'; > > for my $str (1..10000) { > my ($res) = $text =~ /\Q$str\E[a][b][c][d][e][f]/; > usleep(5); > } > ######################################################### > > Changing the character classes to include more than one character > appears to eliminate the leak.
The leak appears to be somewhere in the compilation of character classes. The following code leaks like a sieve on 5.10.0 and bleed, but not 5.8.8: while (1) { qr/[a]/; } -- The Enterprise successfully ferries an alien VIP from one place to another without serious incident. -- Things That Never Happen in "Star Trek" #7
CC: perl5-porters [...] perl.org
Subject: Re: [perl #59516] Memory leak with regex in 5.10.0
Date: Sat, 4 Oct 2008 14:51:09 +0100
To: Dave Mitchell <davem [...] iabyn.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 1.6k
On Sat, Oct 04, 2008 at 12:55:04PM +0100, Dave Mitchell wrote: Show quoted text
> The leak appears to be somewhere in the compilation of character classes. > The following code leaks like a sieve on 5.10.0 and bleed, but not 5.8.8: > > while (1) { > qr/[a]/; > }
Are you sure that it's character classes? ./perl -le 'while (1) { qr// }' merrily chews through memory like it's going out of fashion. However, all memory is freed at the end of the program: $ PERL_DESTRUCT_LEVEL=2 valgrind ./perl -le 'for (1..100000) { qr/[a]/ }' ==31194== Memcheck, a memory error detector. ==31194== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al. ==31194== Using LibVEX rev 1658, a library for dynamic binary translation. ==31194== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. ==31194== Using valgrind-3.2.1-Debian, a dynamic binary instrumentation framework. ==31194== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. ==31194== For more details, rerun with: -v ==31194== ==31194== ==31194== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 1) ==31194== malloc/free: in use at exit: 0 bytes in 0 blocks. ==31194== malloc/free: 101,653 allocs, 101,653 frees, 4,948,035 bytes allocated. ==31194== For counts of detected errors, rerun with: -v ==31194== All heap blocks were freed -- no leaks are possible. Which makes me think that the problem is somewhere in how regexps are allocated (which differs between 5.10.x and 5.11, but both seem to exhibit the same problem. This seems to be consistent with a bug report that the reference count of regexps is 1 too high in 5.11 (where they are now first class SVs)) Nicholas Clark
Subject: Re: [perl #59516] Memory leak with regex in 5.10.0
Date: Sat, 4 Oct 2008 15:02:32 +0100
To: perl5-porters [...] perl.org
From: Dave Mitchell <davem [...] iabyn.com>
Download (untitled) / with headers
text/plain 951b
On Sat, Oct 04, 2008 at 02:51:09PM +0100, Nicholas Clark wrote: Show quoted text
> On Sat, Oct 04, 2008 at 12:55:04PM +0100, Dave Mitchell wrote: >
> > The leak appears to be somewhere in the compilation of character classes. > > The following code leaks like a sieve on 5.10.0 and bleed, but not 5.8.8: > > > > while (1) { > > qr/[a]/; > > }
> > Are you sure that it's character classes? > > ./perl -le 'while (1) { qr// }' > > merrily chews through memory like it's going out of fashion. However, all > memory is freed at the end of the program:
Hmm, maybe I reduced the original code too much, and threw out the original bug but gained a new one. This doesn't leak: my $n = 1; while (1) { $n = 1 - $n; "abc" =~ /$n/; } This does: my $n = 1; while (1) { $n = 1 - $n; "abc" =~ /[a]${n}/; } (The non-constant $n is to defeat regex compilation caching). -- Fire extinguisher (n) a device for holding open fire doors.
CC: perl5-porters [...] perl.org
Subject: Re: [perl #59516] Memory leak with regex in 5.10.0
Date: Sat, 4 Oct 2008 15:09:01 +0100
To: Dave Mitchell <davem [...] iabyn.com>
From: Nicholas Clark <nick [...] ccl4.org>
Download (untitled) / with headers
text/plain 1.7k
On Sat, Oct 04, 2008 at 03:02:32PM +0100, Dave Mitchell wrote: Show quoted text
> On Sat, Oct 04, 2008 at 02:51:09PM +0100, Nicholas Clark wrote:
Show quoted text
> > Are you sure that it's character classes? > > > > ./perl -le 'while (1) { qr// }' > > > > merrily chews through memory like it's going out of fashion. However, all > > memory is freed at the end of the program:
> > Hmm, maybe I reduced the original code too much, and threw out the > original bug but gained a new one. > > > This doesn't leak: > > my $n = 1; > while (1) { > $n = 1 - $n; > "abc" =~ /$n/; > } > > This does: > > my $n = 1; > while (1) { > $n = 1 - $n; > "abc" =~ /[a]${n}/; > } > > (The non-constant $n is to defeat regex compilation caching).
Hmm, but that one still cleans up after itself: $ cat 59516 my $n = 1; for (1..10000) { $n = 1 - $n; "abc" =~ /[a]${n}/; } $ PERL_DESTRUCT_LEVEL=1 valgrind ./perl 59516 ==31497== Memcheck, a memory error detector. ==31497== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al. ==31497== Using LibVEX rev 1658, a library for dynamic binary translation. ==31497== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. ==31497== Using valgrind-3.2.1-Debian, a dynamic binary instrumentation framework. ==31497== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. ==31497== For more details, rerun with: -v ==31497== ==31497== ==31497== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 1) ==31497== malloc/free: in use at exit: 0 bytes in 0 blocks. ==31497== malloc/free: 100,770 allocs, 100,770 frees, 4,650,887 bytes allocated. ==31497== For counts of detected errors, rerun with: -v ==31497== All heap blocks were freed -- no leaks are possible. (the while loop version still munches away though) Nicholas Clark
CC: Dave Mitchell <davem [...] iabyn.com>, perl5-porters [...] perl.org
Subject: Re: [perl #59516] Memory leak with regex in 5.10.0
Date: Sat, 18 Oct 2008 20:14:23 +0200
To: Nicholas Clark <nick [...] ccl4.org>
From: Marcus Holland-Moritz <mhx-perl [...] gmx.net>
On 2008-10-04, at 14:51:09 +0100, Nicholas Clark wrote: Show quoted text
> On Sat, Oct 04, 2008 at 12:55:04PM +0100, Dave Mitchell wrote: >
> > The leak appears to be somewhere in the compilation of character classes. > > The following code leaks like a sieve on 5.10.0 and bleed, but not 5.8.8: > > > > while (1) { > > qr/[a]/; > > }
> > Are you sure that it's character classes? > > ./perl -le 'while (1) { qr// }'
Fixed with the following change: Change 34506 by mhx@mhx-r2d2 on 2008/10/18 18:04:40 Fix memory leak in qr// operator. This was most probably introduced with #30849. Affected files ... ... //depot/perl/pp_hot.c#578 edit Differences ... ==== //depot/perl/pp_hot.c#578 (text) ==== @@ -1212,6 +1212,7 @@ if (pkg) { HV* const stash = gv_stashpv(SvPV_nolen(pkg), GV_ADD); + SvREFCNT_dec(pkg); (void)sv_bless(rv, stash); } Show quoted text
> merrily chews through memory like it's going out of fashion. However, all > memory is freed at the end of the program:
-- The world is moving so fast these days that the man who says it can't be done is generally interrupted by someone doing it. -- E. Hubbard
Download signature.asc
application/pgp-signature 197b

Message body not shown because it is not plain text.

CC: perl5-porters [...] perl.org
Subject: Re: [perl #59516] Memory leak with regex in 5.10.0
Date: Sat, 18 Oct 2008 20:15:27 +0200
To: Dave Mitchell <davem [...] iabyn.com>
From: Marcus Holland-Moritz <mhx-perl [...] gmx.net>
Download (untitled) / with headers
text/plain 1.7k
On 2008-10-04, at 15:02:32 +0100, Dave Mitchell wrote: Show quoted text
> On Sat, Oct 04, 2008 at 02:51:09PM +0100, Nicholas Clark wrote:
> > On Sat, Oct 04, 2008 at 12:55:04PM +0100, Dave Mitchell wrote: > >
> > > The leak appears to be somewhere in the compilation of character classes. > > > The following code leaks like a sieve on 5.10.0 and bleed, but not 5.8.8: > > > > > > while (1) { > > > qr/[a]/; > > > }
> > > > Are you sure that it's character classes? > > > > ./perl -le 'while (1) { qr// }' > > > > merrily chews through memory like it's going out of fashion. However, all > > memory is freed at the end of the program:
> > Hmm, maybe I reduced the original code too much, and threw out the > original bug but gained a new one. > > > This doesn't leak: > > my $n = 1; > while (1) { > $n = 1 - $n; > "abc" =~ /$n/; > } > > This does: > > my $n = 1; > while (1) { > $n = 1 - $n; > "abc" =~ /[a]${n}/; > } > > (The non-constant $n is to defeat regex compilation caching).
Fixed with the following change: Change 34507 by mhx@mhx-r2d2 on 2008/10/18 18:11:57 Fix memory leak in // caused by single-char character class optimization. This was most probably introduced with #28262. This change fixes perl #59516. Affected files ... ... //depot/perl/regcomp.c#660 edit Differences ... ==== //depot/perl/regcomp.c#660 (text) ==== @@ -8350,6 +8350,9 @@ *STRING(ret)= (char)value; STR_LEN(ret)= 1; RExC_emit += STR_SZ(1); + if (listsv) { + SvREFCNT_dec(listsv); + } return ret; } /* optimize case-insensitive simple patterns (e.g. /[a-z]/i) */ -- Langsam's Laws: (1) Everything depends. (2) Nothing is always. (3) Everything is sometimes.
Download signature.asc
application/pgp-signature 197b

Message body not shown because it is not plain text.

RT-Send-CC: perl5-porters [...] perl.org
Fixed in bleadperl by change #34507.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org