Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGV with complicated regexp and long string #7546

Closed
p5pRT opened this issue Oct 19, 2004 · 7 comments
Closed

SEGV with complicated regexp and long string #7546

p5pRT opened this issue Oct 19, 2004 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 19, 2004

Migrated from rt.perl.org#32041 (status was 'resolved')

Searchable as RT32041$

@p5pRT
Copy link
Author

p5pRT commented Oct 19, 2004

From zefram@fysh.org

Created by zefram@fysh.org

test program t0<<<
#!/usr/bin/perl

use warnings;
use strict;

my $domain_label = qr/z(?​:z*)?/;
my $domain = qr/$domain_label(?​:\.$domain_label)+/;
my $quoted_string = qr/\"(?​:z|\\z)*\"/;
my $unquoted_string = qr/(?​:z|\\z)+/;
my $local_part = qr/$unquoted_string|$quoted_string/;
my $mailbox = qr/$local_part\@​$domain/;
my $route = qr/\@​$domain(?​:,\@​$domain)*/;
my $path = qr/<(?​:$route​:)?$mailbox>/;

my $str = "<z\@​z.z>\n" . ("x" x 170000000);
$str =~ /\A$path/;
print "still here\n";

end of test program<<<

zsh% ./t0
zsh​: segmentation fault ./t0
zsh%

Note the lengthy string required, 170 MB. I get this SEGV on a
machine with 4 GiB RAM; on a machine with a mere 384 MiB RAM I get
"Out of memory!". A string of 160 MB does not produce this fault;
the program executes successfully.

This problem comes from a mail handling program. The complicated regexp
was originally to match an email address. I've simplified it a great deal
from the original. Curiously, all the remaining complexity seems to be
required in order to trigger the fault​: almost any simplification (such as
changing qr/z(?​:z*)?/ to qr/zz*/) makes the program execute successfully.
This even applies to parts of the regexp that are never executed.

Resource limits​:
zsh% limit
cputime unlimited
filesize unlimited
datasize unlimited
stacksize 8MB
coredumpsize 0kB
memoryuse unlimited
maxproc 7168
descriptors 1024
memorylocked unlimited
addressspace unlimited
maxfilelocks unlimited
zsh%

I tried changing stacksize to 16MB, in case it was overflowing, and
there was no change in behaviour.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.4:

Configured by Debian Project at Sun Sep 26 12:11:30 CEST 2004.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
  Platform:
    osname=linux, osvers=2.6.8-1-686, archname=i386-linux-thread-multi
    uname='linux cachaca 2.6.8-1-686 #1 sat aug 28 14:11:39 edt 2004 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.4 -Dsitearch=/usr/local/lib/perl/5.8.4 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.4 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='3.3.4 (Debian 1:3.3.4-12)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so.5.8.4
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.4:
    /etc/perl
    /usr/local/lib/perl/5.8.4
    /usr/local/share/perl/5.8.4
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    /usr/local/lib/perl/5.8.3
    /usr/local/share/perl/5.8.3
    /usr/local/lib/perl/5.8.2
    /usr/local/share/perl/5.8.2
    .


Environment for perl v5.8.4:
    HOME=/root
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/sbin:/sbin:/usr/local/sbin:/usr/local/armourplate/bin:/usr/bin:/bin:/usr/local/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2004

From @rspier

Note the lengthy string required, 170 MB. I get this SEGV on a
machine with 4 GiB RAM; on a machine with a mere 384 MiB RAM I get
"Out of memory!". A string of 160 MB does not produce this fault;
the program executes successfully.
zsh% limit
stacksize 8MB

This sounds like the the well known regex recursion & stack issue.

-R (mental note, make this a child of the umbrella ticket for the issue)

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2004

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2004

From zefram@fysh.org

Robert via RT wrote​:

This sounds like the the well known regex recursion & stack issue.

I've seen the stack overflow due to backtracking records, in several
circumstances. This problem with the long string doesn't look like that.
Note that the extremely long string is required even though the regexp
execution can't possibly get past the first few characters, and that the
code in unexecuted parts of the regexp affects the result. Neither of
these features resembles the stack overflow problems that I've seen.
Also, as I pointed out, it's not affected by stack size.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2006

From @smpeters

[zefram@​fysh.org - Wed Oct 20 13​:36​:58 2004]​:

Robert via RT wrote​:

This sounds like the the well known regex recursion & stack issue.

I've seen the stack overflow due to backtracking records, in several
circumstances. This problem with the long string doesn't look like that.
Note that the extremely long string is required even though the regexp
execution can't possibly get past the first few characters, and that the
code in unexecuted parts of the regexp affects the result. Neither of
these features resembles the stack overflow problems that I've seen.
Also, as I pointed out, it's not affected by stack size.

Based on the fact that this regexp is still core dumping, another issue
is involved here. I'd include the backtrace, but

"/home/steve/smoke/perl-current/core" is not a core dump​: File format
not recognize

Hmmm...Hate!

@p5pRT
Copy link
Author

p5pRT commented May 20, 2006

From @iabyn

On Wed, Mar 29, 2006 at 09​:48​:41AM -0800, Steve Peters via RT wrote​:

[zefram@​fysh.org - Wed Oct 20 13​:36​:58 2004]​:

Robert via RT wrote​:

This sounds like the the well known regex recursion & stack issue.

I've seen the stack overflow due to backtracking records, in several
circumstances. This problem with the long string doesn't look like that.
Note that the extremely long string is required even though the regexp
execution can't possibly get past the first few characters, and that the
code in unexecuted parts of the regexp affects the result. Neither of
these features resembles the stack overflow problems that I've seen.
Also, as I pointed out, it's not affected by stack size.

Based on the fact that this regexp is still core dumping, another issue
is involved here. I'd include the backtrace, but

"/home/steve/smoke/perl-current/core" is not a core dump​: File format
not recognize

The super-linear cache thinggy seeds PL_reg_maxiter with the value
(length of string) * (index number of this CURLYX).

For length=170E6 and index=13, this causes wrap-round to a negative
value, which means the code assumes the cache has already been allocated,
and SEGV hilarity ensues. Fixed by the change below.

--
The warp engines start playing up a bit, but seem to sort themselves out
after a while without any intervention from boy genius Wesley Crusher.
  -- Things That Never Happen in "Star Trek" #17

Change 28248 by davem@​davem-splatty on 2006/05/20 00​:43​:42

  [perl #32041] SEGV with complicated regexp and long string
  PL_reg_maxiter was wrapping to a negative value

Affected files ...

... //depot/perl/op.c#823 edit
... //depot/perl/regexec.c#423 edit

Differences ...

==== //depot/perl/op.c#823 (text) ====

==== //depot/perl/regexec.c#423 (text) ====

@​@​ -3652,6 +3652,9 @​@​
  *that* much linear. */
  if (!PL_reg_maxiter) {
  PL_reg_maxiter = (PL_regeol - PL_bostr + 1) * (scan->flags>>4);
+ /* possible overflow for long strings and many CURLYX's */
+ if (PL_reg_maxiter < 0)
+ PL_reg_maxiter = I32_MAX;
  PL_reg_leftiter = PL_reg_maxiter;
  }
  if (PL_reg_leftiter-- == 0) {

@p5pRT
Copy link
Author

p5pRT commented Jul 24, 2007

@smpeters - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant