Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple regexp causes segfault #7931

Closed
p5pRT opened this issue May 28, 2005 · 8 comments
Closed

Simple regexp causes segfault #7931

p5pRT opened this issue May 28, 2005 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented May 28, 2005

Migrated from rt.perl.org#36020 (status was 'resolved')

Searchable as RT36020$

@p5pRT
Copy link
Author

p5pRT commented May 28, 2005

From Martin.Ward@durham.ac.uk

Created by Martin.Ward@durham.ac.uk

The following two lines cause a segfault​:

$str = "{" . ("0x00, " x 25600) . "0x00}";
$str =~ /^(0|0x00+|\{(0x00,\s*)*0x00\})$/;

Perl Info

Flags:
    category=core
    severity=critical

Site configuration information for perl v5.8.5:

Configured by Mandrakesoft at Tue Apr 26 15:06:04 MDT 2005.

Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
  Platform:
    osname=linux, osvers=2.6.3-25mdk-i686-up-4gb, 
archname=i386-linux-thread-multi
    uname='linux mercury.mandriva.com 2.6.3-25mdk-i686-up-4gb #1 fri jan 14 
03:39:39 mst 2005 i686 intel(r) pentium(r) 4 cpu 3.00ghz unknown gnulinux '
    config_args='-des -Dinc_version_list=5.8.4/i386-linux-thread-multi 5.8.4 
5.8.3/i386-linux-thread-multi 5.8.3 5.8.2/i386-linux-thread-multi 5.8.2 
5.8.1/i386-linux-thread-multi 5.8.1 5.8.0/i386-linux-thread-multi 5.8.0 5.6.1 
5.6.0 -Darchname=i386-linux -Dcc=gcc -Doptimize=-O2 -fomit-frame-pointer 
-pipe -march=i586 -mtune=pentiumpro  -Dprefix=/usr -Dvendorprefix=/usr 
-Dsiteprefix=/usr -Dman3ext=3pm -Dcf_by=Mandrakesoft -Dmyhostname=localhost 
-Dperladmin=root@localhost -Dd_dosuid -Ud_csh -Duseshrplib 
-Accflags=-DPERL_DISABLE_PMC -Dusethreads'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define 
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS 
-DPERL_DISABLE_PMC -fno-strict-aliasing -pipe -I/usr/local/include 
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O2 -fomit-frame-pointer -pipe -march=i586 -mtune=pentiumpro ',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS 
-DPERL_DISABLE_PMC -fno-strict-aliasing -pipe -I/usr/local/include 
-I/usr/include/gdbm'
    ccversion='', gccversion='3.4.1 (Mandrakelinux 10.1 3.4.1-4mdk)', 
gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.3.3.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.3.3'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E 
-Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    MandrakeSoft patches (cf the source RPM)


@INC for perl v5.8.5:
    /usr/lib/perl5/5.8.5/i386-linux-thread-multi
    /usr/lib/perl5/5.8.5
    /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.5
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.5
    /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.4
    /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.3
    /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.1
    /usr/lib/perl5/vendor_perl
    .


Environment for perl v5.8.5:
    HOME=/home/martin
    LANG=en_US
    LANGUAGE=en_US:en
    LC_ALL=POSIX
    LC_COLLATE=POSIX
    LC_CTYPE=en_US
    LC_MESSAGES=en_US
    LC_MONETARY=en_US
    LC_NUMERIC=en_US
    LC_SOURCED=1
    LC_TIME=en_US
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/local/gcc-3.4.0/bin:/home/TeX/bin/i386-linux:/home/martin/fermat2/bin:/home/martin/bin:/usr/local/netpbm/bin:/usr/local/sbin:/usr/local/bin:/bin:/usr/bin:/usr/sbin:/sbin:/usr/X11R6/bin
    PERL_BADLANG (unset)
    SHELL=/bin/tcsh


-- 
			Martin

Martin.Ward@durham.ac.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/

@p5pRT
Copy link
Author

p5pRT commented May 29, 2005

From @demerphq

The following two lines cause a segfault​:

$str = "{" . ("0x00, " x 25600) . "0x00}";
$str =~ /^(0|0x00+|\{(0x00,\s*)*0x00\})$/;

Here is the debug output of a reduced version of this (x 2 in the code
above). If you remove the \s* and use a space instead the problem goes
away. From the output of the debug it looks like the eval stack isnt
getting popped. Every time the \s* pattern matches it pushes a new
eval scope that never gets freed which apparently leads to the the
segfault.

D​:\dev\util>perl -Mre=debug C​:\TMP\segfault.pl
Freeing REx​: `","'
Compiling REx `^(0|0x00+|\{(0x00,\s*)*0x00\})$'
size 35 Got 284 bytes for offset annotations.
first at 2
  1​: BOL(2)
  2​: OPEN1(4)
  4​: BRANCH(7)
  5​: EXACT <0>(32)
  7​: BRANCH(13)
  8​: EXACT <0x0>(10)
  10​: PLUS(32)
  11​: EXACT <0>(0)
  13​: BRANCH(32)
  14​: EXACT <{>(16)
  16​: CURLYX[0] {0,32767}(28)
  18​: OPEN2(20)
  20​: EXACT <0x00,>(23)
  23​: STAR(25)
  24​: SPACE(0)
  25​: CLOSE2(27)
  27​: WHILEM[1/1](0)
  28​: NOTHING(29)
  29​: EXACT <0x00}>(32)
  32​: CLOSE1(34)
  34​: EOL(35)
  35​: END(0)
floating `'$ at 1..2147483647 (checking floating) anchored(BOL) minlen 1
Offsets​: [35]
  1[1] 2[1] 0[0] 2[1] 3[1] 0[0] 4[1] 5[3] 0[0] 9[1] 8[1] 0[0]
10[1] 11[2] 0[0] 23[1] 0[0] 13[1] 0[0] 14[5] 0[0] 0[
0] 21[1] 19[2] 22[1] 0[0] 23[0] 23[0] 24[6] 0[0] 0[0] 30[1] 0[0] 31[1] 32[0]
Guessing start of match, REx `^(0|0x00+|\{(0x00,\s*)*0x00\})$' against
`{0x00, 0x00, 0x00}'...
Found floating substr `'$ at offset 18...
Guessed​: match at offset 0
Matching REx `^(0|0x00+|\{(0x00,\s*)*0x00\})$' against `{0x00, 0x00, 0x00}'
  Setting an EVAL scope, savestack=3
  0 <> <{0x00, 0x00,> | 1​: BOL
  0 <> <{0x00, 0x00,> | 2​: OPEN1
  0 <> <{0x00, 0x00,> | 4​: BRANCH
  Setting an EVAL scope, savestack=13
  0 <> <{0x00, 0x00,> | 5​: EXACT <0>
  failed...
  0 <> <{0x00, 0x00,> | 8​: EXACT <0x0>
  failed...
  0 <> <{0x00, 0x00,> | 14​: EXACT <{>
  1 <{> <0x00, 0x00,> | 16​: CURLYX[0] {0,32767}
  1 <{> <0x00, 0x00,> | 27​: WHILEM[1/1]
  0 out of 0..32767 cc=140fc1c
  Setting an EVAL scope, savestack=23
  1 <{> <0x00, 0x00,> | 18​: OPEN2
  1 <{> <0x00, 0x00,> | 20​: EXACT <0x00,>
  6 <0x00,> < 0x00, > | 23​: STAR
  SPACE can match 1 times out of 2147483647...
  Setting an EVAL scope, savestack=23
  7 <x00, > <0x00, 0> | 25​: CLOSE2
  7 <x00, > <0x00, 0> | 27​: WHILEM[1/1]
  1 out of 0..32767 cc=140fc1c
  Setting an EVAL scope, savestack=37
  7 <x00, > <0x00, 0> | 18​: OPEN2
  7 <x00, > <0x00, 0> | 20​: EXACT <0x00,>
  12 < 0x00,> < 0x00}> | 23​: STAR
  SPACE can match 1 times out of 2147483647...
  Setting an EVAL scope, savestack=37
  13 < 0x00, > <0x00}> | 25​: CLOSE2
  13 < 0x00, > <0x00}> | 27​: WHILEM[1/1]
  2 out of 0..32767 cc=140fc1c
  Setting an EVAL scope, savestack=51
  13 < 0x00, > <0x00}> | 18​: OPEN2
  13 < 0x00, > <0x00}> | 20​: EXACT <0x00,>
  failed...
  restoring \1 to -1(0)..-1
  restoring \2 to 7(7)..13
  failed, try continuation...
  13 < 0x00, > <0x00}> | 28​: NOTHING
  13 < 0x00, > <0x00}> | 29​: EXACT <0x00}>
  18 < 0x00, 0x00}> <> | 32​: CLOSE1
  18 < 0x00, 0x00}> <> | 34​: EOL
  18 < 0x00, 0x00}> <> | 35​: END
Match successful!
Freeing REx​: `"^(0|0x00+|\\{(0x00,\\s*)*0x00\\})$"'

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 29, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 29, 2005

From @iabyn

On Sun, May 29, 2005 at 01​:28​:42PM +0200, demerphq wrote​:

The following two lines cause a segfault​:

$str = "{" . ("0x00, " x 25600) . "0x00}";
$str =~ /^(0|0x00+|\{(0x00,\s*)*0x00\})$/;

Here is the debug output of a reduced version of this (x 2 in the code
above). If you remove the \s* and use a space instead the problem goes
away. From the output of the debug it looks like the eval stack isnt
getting popped. Every time the \s* pattern matches it pushes a new
eval scope that never gets freed which apparently leads to the the
segfault.

I think the underlying problem is the usual one​: the regex engine is
recursive, and certain patterns eventually blow the processor stack.

A stack traceback shows​:

...
#13095 0x081664d5 in S_regmatch (my_perl=0x81b1008, prog=0x81ba0e8) at regexec.c​:4140
#13096 0x0816392d in S_regmatch (my_perl=0x81b1008, prog=0x81ba104) at regexec.c​:3644
#13097 0x081664d5 in S_regmatch (my_perl=0x81b1008, prog=0x81ba0e8) at regexec.c​:4140
#13098 0x0816392d in S_regmatch (my_perl=0x81b1008, prog=0x81ba104) at regexec.c​:3644
#13099 0x081664d5 in S_regmatch (my_perl=0x81b1008, prog=0x81ba0e8) at regexec.c​:4140
#13100 0x0816392d in S_regmatch (my_perl=0x81b1008, prog=0x81ba104) at regexec.c​:3644
#13101 0x081664d5 in S_regmatch (my_perl=0x81b1008, prog=0x81ba0e8) at regexec.c​:4140
#13102 0x0816392d in S_regmatch (my_perl=0x81b1008, prog=0x81ba10c) at regexec.c​:3644
#13103 0x081629ed in S_regmatch (my_perl=0x81b1008, prog=0x81ba0a4) at regexec.c​:3472

and so on ad nauseum.

--
"There's something wrong with our bloody ships today, Chatfield."
  -- Admiral Beatty at the Battle of Jutland, 31st May 1916.

@p5pRT
Copy link
Author

p5pRT commented May 29, 2005

From @demerphq

On 5/29/05, Dave Mitchell <davem@​iabyn.com> wrote​:

On Sun, May 29, 2005 at 01​:28​:42PM +0200, demerphq wrote​:

The following two lines cause a segfault​:

$str = "{" . ("0x00, " x 25600) . "0x00}";
$str =~ /^(0|0x00+|\{(0x00,\s*)*0x00\})$/;

Here is the debug output of a reduced version of this (x 2 in the code
above). If you remove the \s* and use a space instead the problem goes
away. From the output of the debug it looks like the eval stack isnt
getting popped. Every time the \s* pattern matches it pushes a new
eval scope that never gets freed which apparently leads to the the
segfault.

I think the underlying problem is the usual one​: the regex engine is
recursive, and certain patterns eventually blow the processor stack.

A stack traceback shows​:

...
#13095 0x081664d5 in S_regmatch (my_perl=0x81b1008, prog=0x81ba0e8) at regexec.c​:4140
#13096 0x0816392d in S_regmatch (my_perl=0x81b1008, prog=0x81ba104) at regexec.c​:3644

This pattern of 4140 to 3644 and back seems to be a good place to
start. Ill see what i can learn.

and so on ad nauseum.

Actually i think we are saying the same thing here, just in different
ways. Every time you enter regmatch a new eval scope is pushed on the
savestack. So the above is just a different symptom of the same
problem. Note that none of this occurs if the \s* is changed to a
literal space.

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 29, 2005

From @iabyn

On Sun, May 29, 2005 at 02​:55​:17PM +0200, demerphq wrote​:

Actually i think we are saying the same thing here, just in different
ways. Every time you enter regmatch a new eval scope is pushed on the
savestack. So the above is just a different symptom of the same
problem. Note that none of this occurs if the \s* is changed to a
literal space.

Not quite the same. Nothing is actually pushed onto the savestack;
REGCP_SET() (which produces the "Setting an EVAL scope" message), just
remembers the current savestack index.

You may be able to fix this particular problem by optimising the compiled
code to avoid the recursion (I have no idea whether that is viable).

Failing that, this bug will just have to be added to the big list of
"bugs that won't be fixed until someone rewrites the regex engine to be
iterative rather than recursive".

Dave.

--
In the 70's we wore flares because we didn't know any better.
What possible excuse does the current generation have?

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2006

From @smpeters

[Martin.Ward@​durham.ac.uk - Sat May 28 09​:44​:24 2005]​:

This is a bug report for perl from Martin.Ward@​durham.ac.uk,
generated with the help of perlbug 1.35 running under perl v5.8.5.

-----------------------------------------------------------------
[Please enter your report here]

The following two lines cause a segfault​:

$str = "{" . ("0x00, " x 25600) . "0x00}";
$str =~ /^(0|0x00+|\{(0x00,\s*)*0x00\})$/;

This has been fixed with change #27598.

steve@​kirk​:~/smoke/perl-current$ perl -wle'$str = "{" . ("0x00, " x
25600) . "0x00}";$str =~ /^(0|0x00+|\{(0x00,\s*)*0x00\})$/;'
Segmentation fault
steve@​kirk​:~/smoke/perl-current$ ./perl -wle'$str = "{" . ("0x00, " x
25600) . "0x00}";$str =~ /^(0|0x00+|\{(0x00,\s*)*0x00\})$/;'

@p5pRT
Copy link
Author

p5pRT commented Mar 29, 2006

@smpeters - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant