Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with regex in 5.10.0 #9504

Closed
p5pRT opened this issue Oct 1, 2008 · 10 comments
Closed

Memory leak with regex in 5.10.0 #9504

p5pRT opened this issue Oct 1, 2008 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 1, 2008

Migrated from rt.perl.org#59516 (status was 'resolved')

Searchable as RT59516$

@p5pRT
Copy link
Author

p5pRT commented Oct 1, 2008

From robin.hill@biowisdom.com

Created by robin.hill@biowisdom.com

This is a bug report for perl from robin.hill@​biowisdom.com,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
I've been having problems with a script consuming all memory on
the system and have tracked this down to the regex. The problem
only seems to occur with a combination of quoted variables and
singular character classes.

The following example script steadily increases in memory usage
while running​:

#########################################################
#!/usr/bin/perl -w

use strict;
use warnings;
use Time​::HiRes qw(usleep);

my $text = 'Test string';

for my $str (1..10000) {
  my ($res) = $text =~ /\Q$str\E[a][b][c][d][e][f]/;
  usleep(5);
}
#########################################################

Changing the character classes to include more than one character
appears to eliminate the leak. (In the actual script I'm trying to
check for brackets and following the recommendation of using singular
character classes instead of escaping the metacharacter).

Perl Info

Flags:
    category=core
    severity=medium

This perlbug was built using Perl 5.10.0 - Tue Jul 15 14:37:49 UTC 2008
It is being executed now by  Perl 5.10.0 - Tue Jul 15 14:31:57 UTC 2008.

Site configuration information for perl 5.10.0:

Configured by abuild at Tue Jul 15 14:31:57 UTC 2008.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.25, archname=x86_64-linux-thread-multi
    uname='linux stravinsky 2.6.25 #1 smp 20080210 20:01:04 utc x86_64 x86_64 x86_64 gnulinux '
    config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe'
    ccversion='', gccversion='4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036]', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/libc-2.8.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.8'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.10.0/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64'

Locally applied patches:
    


@INC for perl 5.10.0:
    /home/hillrobi/svn/perl_scripts
    /home/hillrobi/svn/perl_scripts
    /home/hillrobi/svn/perl_scripts
    /usr/lib/perl5/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/5.10.0
    /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/vendor_perl
    .


Environment for perl 5.10.0:
    HOME=/home/hillrobi
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH=/opt/oracle/OraHome1/lib:/opt/oracle/OraHome1/ctx/lib:/opt/oracle/OraHome1/lib32
    LOGDIR (unset)
    PATH=/home/hillrobi/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/oracle/OraHome1/bin:/usr/local/bin:/home/hillrobi/bin
    PERL5LIB=/home/hillrobi/svn/perl_scripts:/home/hillrobi/svn/perl_scripts:/home/hillrobi/svn/perl_scripts
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2008

From @iabyn

On Wed, Oct 01, 2008 at 05​:51​:41AM -0700, robin.hill@​biowisdom.com (via RT) wrote​:

The following example script steadily increases in memory usage
while running​:

#########################################################
#!/usr/bin/perl -w

use strict;
use warnings;
use Time​::HiRes qw(usleep);

my $text = 'Test string';

for my $str (1..10000) {
my ($res) = $text =~ /\Q$str\E[a][b][c][d][e][f]/;
usleep(5);
}
#########################################################

Changing the character classes to include more than one character
appears to eliminate the leak.

The leak appears to be somewhere in the compilation of character classes.
The following code leaks like a sieve on 5.10.0 and bleed, but not 5.8.8​:

  while (1) {
  qr/[a]/;
  }

--
The Enterprise successfully ferries an alien VIP from one place to another
without serious incident.
  -- Things That Never Happen in "Star Trek" #7

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2008

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2008

From @nwc10

On Sat, Oct 04, 2008 at 12​:55​:04PM +0100, Dave Mitchell wrote​:

The leak appears to be somewhere in the compilation of character classes.
The following code leaks like a sieve on 5.10.0 and bleed, but not 5.8.8​:

while \(1\) \{
qr/\[a\]/;
\}

Are you sure that it's character classes?

./perl -le 'while (1) { qr// }'

merrily chews through memory like it's going out of fashion. However, all
memory is freed at the end of the program​:

$ PERL_DESTRUCT_LEVEL=2 valgrind ./perl -le 'for (1..100000) { qr/[a]/ }'
==31194== Memcheck, a memory error detector.
==31194== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==31194== Using LibVEX rev 1658, a library for dynamic binary translation.
==31194== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==31194== Using valgrind-3.2.1-Debian, a dynamic binary instrumentation framework.
==31194== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==31194== For more details, rerun with​: -v
==31194==
==31194==
==31194== ERROR SUMMARY​: 0 errors from 0 contexts (suppressed​: 8 from 1)
==31194== malloc/free​: in use at exit​: 0 bytes in 0 blocks.
==31194== malloc/free​: 101,653 allocs, 101,653 frees, 4,948,035 bytes allocated.
==31194== For counts of detected errors, rerun with​: -v
==31194== All heap blocks were freed -- no leaks are possible.

Which makes me think that the problem is somewhere in how regexps are allocated
(which differs between 5.10.x and 5.11, but both seem to exhibit the same
problem. This seems to be consistent with a bug report that the reference
count of regexps is 1 too high in 5.11 (where they are now first class SVs))

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2008

From @iabyn

On Sat, Oct 04, 2008 at 02​:51​:09PM +0100, Nicholas Clark wrote​:

On Sat, Oct 04, 2008 at 12​:55​:04PM +0100, Dave Mitchell wrote​:

The leak appears to be somewhere in the compilation of character classes.
The following code leaks like a sieve on 5.10.0 and bleed, but not 5.8.8​:

while \(1\) \{
qr/\[a\]/;
\}

Are you sure that it's character classes?

./perl -le 'while (1) { qr// }'

merrily chews through memory like it's going out of fashion. However, all
memory is freed at the end of the program​:

Hmm, maybe I reduced the original code too much, and threw out the
original bug but gained a new one.

This doesn't leak​:

  my $n = 1;
  while (1) {
  $n = 1 - $n;
  "abc" =~ /$n/;
  }

This does​:

  my $n = 1;
  while (1) {
  $n = 1 - $n;
  "abc" =~ /[a]${n}/;
  }

(The non-constant $n is to defeat regex compilation caching).

--
Fire extinguisher (n) a device for holding open fire doors.

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2008

From @nwc10

On Sat, Oct 04, 2008 at 03​:02​:32PM +0100, Dave Mitchell wrote​:

On Sat, Oct 04, 2008 at 02​:51​:09PM +0100, Nicholas Clark wrote​:

Are you sure that it's character classes?

./perl -le 'while (1) { qr// }'

merrily chews through memory like it's going out of fashion. However, all
memory is freed at the end of the program​:

Hmm, maybe I reduced the original code too much, and threw out the
original bug but gained a new one.

This doesn't leak​:

my $n = 1;
while \(1\) \{
$n = 1 \- $n;
"abc" =~ /$n/;
\}

This does​:

my $n = 1;
while \(1\) \{
$n = 1 \- $n;
"abc" =~ /\[a\]$\{n\}/;
\}

(The non-constant $n is to defeat regex compilation caching).

Hmm, but that one still cleans up after itself​:

$ cat 59516
my $n = 1;
for (1..10000) {
  $n = 1 - $n;
  "abc" =~ /[a]${n}/;
}
$ PERL_DESTRUCT_LEVEL=1 valgrind ./perl 59516
==31497== Memcheck, a memory error detector.
==31497== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==31497== Using LibVEX rev 1658, a library for dynamic binary translation.
==31497== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==31497== Using valgrind-3.2.1-Debian, a dynamic binary instrumentation framework.
==31497== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==31497== For more details, rerun with​: -v
==31497==
==31497==
==31497== ERROR SUMMARY​: 0 errors from 0 contexts (suppressed​: 8 from 1)
==31497== malloc/free​: in use at exit​: 0 bytes in 0 blocks.
==31497== malloc/free​: 100,770 allocs, 100,770 frees, 4,650,887 bytes allocated.
==31497== For counts of detected errors, rerun with​: -v
==31497== All heap blocks were freed -- no leaks are possible.

(the while loop version still munches away though)

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2008

From @mhx

On 2008-10-04, at 14​:51​:09 +0100, Nicholas Clark wrote​:

On Sat, Oct 04, 2008 at 12​:55​:04PM +0100, Dave Mitchell wrote​:

The leak appears to be somewhere in the compilation of character classes.
The following code leaks like a sieve on 5.10.0 and bleed, but not 5.8.8​:

while \(1\) \{
qr/\[a\]/;
\}

Are you sure that it's character classes?

./perl -le 'while (1) { qr// }'

Fixed with the following change​:

Change 34506 by mhx@​mhx-r2d2 on 2008/10/18 18​:04​:40

  Fix memory leak in qr// operator. This was most probably
  introduced with #30849.

Affected files ...

... //depot/perl/pp_hot.c#578 edit

Differences ...

==== //depot/perl/pp_hot.c#578 (text) ====

@​@​ -1212,6 +1212,7 @​@​

  if (pkg) {
  HV* const stash = gv_stashpv(SvPV_nolen(pkg), GV_ADD);
+ SvREFCNT_dec(pkg);
  (void)sv_bless(rv, stash);
  }

merrily chews through memory like it's going out of fashion. However, all
memory is freed at the end of the program​:

--
The world is moving so fast these days that the man who says it can't be
done is generally interrupted by someone doing it.
  -- E. Hubbard

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2008

From @mhx

On 2008-10-04, at 15​:02​:32 +0100, Dave Mitchell wrote​:

On Sat, Oct 04, 2008 at 02​:51​:09PM +0100, Nicholas Clark wrote​:

On Sat, Oct 04, 2008 at 12​:55​:04PM +0100, Dave Mitchell wrote​:

The leak appears to be somewhere in the compilation of character classes.
The following code leaks like a sieve on 5.10.0 and bleed, but not 5.8.8​:

while \(1\) \{
qr/\[a\]/;
\}

Are you sure that it's character classes?

./perl -le 'while (1) { qr// }'

merrily chews through memory like it's going out of fashion. However, all
memory is freed at the end of the program​:

Hmm, maybe I reduced the original code too much, and threw out the
original bug but gained a new one.

This doesn't leak​:

my $n = 1;
while \(1\) \{
$n = 1 \- $n;
"abc" =~ /$n/;
\}

This does​:

my $n = 1;
while \(1\) \{
$n = 1 \- $n;
"abc" =~ /\[a\]$\{n\}/;
\}

(The non-constant $n is to defeat regex compilation caching).

Fixed with the following change​:

Change 34507 by mhx@​mhx-r2d2 on 2008/10/18 18​:11​:57

  Fix memory leak in // caused by single-char character class
  optimization. This was most probably introduced with #28262.
  This change fixes perl #59516.

Affected files ...

... //depot/perl/regcomp.c#660 edit

Differences ...

==== //depot/perl/regcomp.c#660 (text) ====

@​@​ -8350,6 +8350,9 @​@​
  *STRING(ret)= (char)value;
  STR_LEN(ret)= 1;
  RExC_emit += STR_SZ(1);
+ if (listsv) {
+ SvREFCNT_dec(listsv);
+ }
  return ret;
  }
  /* optimize case-insensitive simple patterns (e.g. /[a-z]/i) */

--
Langsam's Laws​:
  (1) Everything depends.
  (2) Nothing is always.
  (3) Everything is sometimes.

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2008

From @mhx

Fixed in bleadperl by change #34507.

@p5pRT
Copy link
Author

p5pRT commented Oct 18, 2008

@mhx - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant