Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in regcomp code leading to panic #9201

Closed
p5pRT opened this issue Jan 22, 2008 · 16 comments
Closed

bug in regcomp code leading to panic #9201

p5pRT opened this issue Jan 22, 2008 · 16 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 22, 2008

Migrated from rt.perl.org#50114 (status was 'resolved')

Searchable as RT50114$

@p5pRT
Copy link
Author

p5pRT commented Jan 22, 2008

From mls@suse.de

Created by mls@suse.de

On ppc/s390x the following perl code panics​:

$ perl -e '("0"x51)=~/0{50}/'
panic​: unknown regstclass 0 at -e line 1.

The compiled re looks like this​:
  1​: CURLY {50,50} (5)
  3​: EXACT <0> (0)
  5​: END (0)

This is caused by a bug in the "kip introductions and multiplicators >= 1"
code (lines 4428-4450 in regcomp.c). The body of the loop looks like​:

  if (OP(first) == PLUS)
  sawplus = 1;
  else
  first += regarglen[OP(first)];
  if (OP(first) == IFMATCH) {
  first = NEXTOPER(first);
  first += EXTRA_STEP_2ARGS;
  } else /* XXX possible optimisation for /(?=)/ */
  first = NEXTOPER(first);

As OP(first) == CURLY in our case, first is incremented by
regarglen[OP(first)], which happens to be 1.
The result of the next "if (OP(first) == IFMATCH)" is undefined, as
first points into the middle of "struct regnode_2".
On ppc, OP(first) is the first argument of the CURLY operator, thus
50. 50 is IFMATCH...

I don't know the correct fix for this, because I do not understand
what the IFMATCH test is for. Maybe, the code should read
something like​:

  if (OP(first) == PLUS) {
  sawplus = 1;
  first = NEXTOPER(first);
  } else if (OP(first) == IFMATCH) {
  first += regarglen[OP(first)];
  first = NEXTOPER(first);
  first += EXTRA_STEP_2ARGS;
  } else {
  first += regarglen[OP(first)];
  first = NEXTOPER(first);
  }

Probably not, as the EXTRA_STEP_2ARGS is strange. (IFMATCH is of type
regnode_1).

Perl Info

Flags:
    category=core
    severity=high

This perlbug was built using Perl 5.10.0 - Sat Jan 19 16:41:23 UTC 2008
It is being executed now by  Perl 5.10.0 - Sat Jan 19 16:36:48 UTC 2008.

Site configuration information for perl 5.10.0:

Configured by abuild at Sat Jan 19 16:36:48 UTC 2008.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.23, archname=x86_64-linux-thread-multi
    uname='linux e75 2.6.23 #1 smp 20071119 15:02:58 utc x86_64 x86_64 x86_64 gnulinux '
    config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe'
    ccversion='', gccversion='4.3.0 20080117 (experimental) [trunk revision 131592]', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/libc-2.7.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.7'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.10.0/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64'

Locally applied patches:
    


@INC for perl 5.10.0:
    /usr/lib/perl5/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/5.10.0
    /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/vendor_perl
    .


Environment for perl 5.10.0:
    HOME=/suse/mls
    LANG (unset)
    LANGUAGE (unset)
    LC_COLLATE=POSIX
    LC_CTYPE=de_DE@euro
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/suse/mls/bin:/opt/kde3/bin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/usr/X11R6/bin:/bin:/usr/lib/java/bin:/usr/games/bin:/usr/games:/opt/gnome/bin:/opt/kde/bin:/usr/openwin/bin:/opt/pilotsdk/bin:/suse/mls/korn
    PERL_BADLANG (unset)
    SHELL=/bin/tcsh

@p5pRT
Copy link
Author

p5pRT commented Feb 16, 2008

From Michael.Schroeder@informatik.uni-erlangen.de

On Tue, Jan 22, 2008 at 10​:27​:50AM -0800, mls @​ suse. de wrote​:

On ppc/s390x the following perl code panics​:

$ perl -e '("0"x51)=~/0{50}/'
panic​: unknown regstclass 0 at -e line 1.

No patch for this one? Yves? Tels?

Cheers,
  Michael.

--
Michael Schroeder mlschroe@​informatik.uni-erlangen.de
main(_){while(_=getchar())putchar(_-1/(~(_|32)/13*2-11)*13);}

@p5pRT
Copy link
Author

p5pRT commented Feb 16, 2008

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Feb 16, 2008

From nospam-abuse@bloodgate.com

Moin,

On Friday 15 February 2008 16​:39​:04 Michael Schroeder wrote​:

On Tue, Jan 22, 2008 at 10​:27​:50AM -0800, mls @​ suse. de wrote​:

On ppc/s390x the following perl code panics​:

$ perl -e '("0"x51)=~/0{50}/'
panic​: unknown regstclass 0 at -e line 1.

No patch for this one? Yves? Tels?

Not sure why you ask me, regexp is not my area :)

On my 64bit Ubuntu with 5.8.8, it "works"​:

  # perl -e '("0"x51)=~/0{50}/'

All the best,

Tels

--
Signed on Sat Feb 16 16​:17​:57 2008 with key 0x93B84C15.
Get one of my photo posters​: http​://bloodgate.com/posters
PGP key on http​://bloodgate.com/tels.asc or per email.

"Obviously, Perl 5 has been aging, but there is no reason to bury it
just yet. And we haven't. But Perl 5 will eventually succumb to the
bitrot that overtakes all large projects, and we have to be ready for
the transition to Perl 6 when it is time."

  -- Larry Wall in 'The State of the Onion' 2002

@p5pRT
Copy link
Author

p5pRT commented Feb 16, 2008

From @demerphq

On 15/02/2008, Michael Schroeder
<Michael.Schroeder@​informatik.uni-erlangen.de> wrote​:

On Tue, Jan 22, 2008 at 10​:27​:50AM -0800, mls @​ suse. de wrote​:

On ppc/s390x the following perl code panics​:

$ perl -e '("0"x51)=~/0{50}/'
panic​: unknown regstclass 0 at -e line 1.

No patch for this one? Yves? Tels?

Sorry ive been very busy with other things.

What happens if you just remove

  if (OP(first) == IFMATCH) {
  first = NEXTOPER(first);
  first += EXTRA_STEP_2ARGS;
  } else /* XXX possible optimisation for /(?=)/ */

so it looks like​:

  if (OP(first) == PLUS)
  sawplus = 1;
  else
  first += regarglen[OP(first)];
  first = NEXTOPER(first);

(TBH looking at the code Im wondering what i was thinking when i added
this, it doesnt seem to be needed at all.)

With this extra block removed does it fix the problem (i guess so),
and what does the output of re=debug for

/(?=foo)bar/

look like?

To explain a bit, regnode_1 and regnode_2 are actually the same size,
its just that one has a single 32 bit argument, and the other has two
16 bit arguments.

I think i was probably intending something like this​:

  if (OP(first) == PLUS) {
  first = NEXTOPER(first);
  sawplus = 1;
  }else if (OP(first) == IFMATCH) {
  /* XXX possible optimisation for /(?=)/ */
  first = NEXTOPER(first);
  first += EXTRA_STEP_2ARGS;
  } else {
  first += regarglen[OP(first)];
  first = NEXTOPER(first);
  }

but digging a bit it turns out that regarglen[OP(first)] would be the
same as EXTRA_STEP_2ARGS so this can be reduced to​:

  if (OP(first) == PLUS) {
  first = NEXTOPER(first);
  sawplus = 1;
  } else {
  first += regarglen[OP(first)];
  first = NEXTOPER(first);
  }

which is the same as

  if (OP(first) == PLUS) {
  sawplus = 1;
  } else {
  first += regarglen[OP(first)];
  }
  first = NEXTOPER(first);

The point of this code is step over/into the quantifier opecodes so
that we can find the first thing that must appear in the pattern. So
in the case of

/0{50}/

we have to step over the CURLY to find the EXACT. In the case of
/(?=foo)/ we have to step over the IFMATCH to find the EXACT, but
IFMATCH and CURLY are the same size.

Please let me know how this goes. I dont have time to dig further right now.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 17, 2008

From @demerphq

On 22/01/2008, via RT mls @​ suse. de <perlbug-followup@​perl.org> wrote​:

# New Ticket Created by mls@​suse.de
# Please include the string​: [perl #50114]
# in the subject line of all future correspondence about this issue.
# <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=50114 >

This is a bug report for perl from mls@​suse.de,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

On ppc/s390x the following perl code panics​:

$ perl -e '("0"x51)=~/0{50}/'
panic​: unknown regstclass 0 at -e line 1.

The compiled re looks like this​:
1​: CURLY {50,50} (5)
3​: EXACT <0> (0)
5​: END (0)

This is caused by a bug in the "kip introductions and multiplicators >= 1"
code (lines 4428-4450 in regcomp.c). The body of the loop looks like​:

if \(OP\(first\) == PLUS\)
    sawplus = 1;
else
    first \+= regarglen\[OP\(first\)\];
if \(OP\(first\) == IFMATCH\) \{
    first = NEXTOPER\(first\);
    first \+= EXTRA\_STEP\_2ARGS;
\} else  /\* XXX possible optimisation for /\(?=\)/  \*/
    first = NEXTOPER\(first\);

As OP(first) == CURLY in our case, first is incremented by
regarglen[OP(first)], which happens to be 1.
The result of the next "if (OP(first) == IFMATCH)" is undefined, as
first points into the middle of "struct regnode_2".
On ppc, OP(first) is the first argument of the CURLY operator, thus
50. 50 is IFMATCH...

I don't know the correct fix for this, because I do not understand
what the IFMATCH test is for. Maybe, the code should read
something like​:

if \(OP\(first\) == PLUS\) \{
    sawplus = 1;
    first = NEXTOPER\(first\);
\} else if \(OP\(first\) == IFMATCH\) \{
    first \+= regarglen\[OP\(first\)\];
    first = NEXTOPER\(first\);
    first \+= EXTRA\_STEP\_2ARGS;
\} else \{
    first \+= regarglen\[OP\(first\)\];
    first = NEXTOPER\(first\);
\}

Probably not, as the EXTRA_STEP_2ARGS is strange. (IFMATCH is of type
regnode_1).

Hopefully change #33324 resolves this problem. Please confirm and we
can close this ticket.

http​://public.activestate.com/cgi-bin/perlbrowse/p/33324

Thanks for the report.

yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 19, 2008

From Michael.Schroeder@informatik.uni-erlangen.de

On Sat, Feb 16, 2008 at 05​:52​:52PM +0100, demerphq wrote​:

With this extra block removed does it fix the problem (i guess so),
and what does the output of re=debug for

/(?=foo)bar/

look like?

With your patch applied, the output is​:

Compiling REx "(?=foo)bar"
Final program​:
  1​: IFMATCH[0] (7)
  3​: EXACT <foo> (5)
  5​: SUCCEED (0)
  6​: TAIL (7)
  7​: EXACT <bar> (9)
  9​: END (0)
anchored "bar" at 0 (checking anchored) minlen 3
Freeing REx​: "(?=foo)bar"

Looks good, I guess. And the panic is also gone.

Thanks for your patch,
  Michael.

--
Michael Schroeder mlschroe@​informatik.uni-erlangen.de
main(_){while(_=getchar())putchar(_-1/(~(_|32)/13*2-11)*13);}

@p5pRT
Copy link
Author

p5pRT commented Feb 19, 2008

From Michael.Schroeder@informatik.uni-erlangen.de

On Tue, Feb 19, 2008 at 02​:23​:15PM +0100, Michael Schroeder wrote​:

Looks good, I guess. And the panic is also gone.

While I've got your attention, the following code works in 5.8
and fails in 5.10​:

  my $r = qr{^bar};
  die unless "foo\nbar" =~ /$r/m;

This is because the qr{} compiles the regexp to​:
  1​: BOL (2)
  2​: EXACT <bar> (4)
  4​: END (0)

As it's already compiled, the //m modifier can't turn it into
MBOL. Perl-5.8's BOL code looked like​:

  if (locinput == PL_bostr || (PL_multiline &&
  (nextchr || locinput < PL_regeol) && locinput[-1] == '\n') )

whereas perl-5.10 just has

  if (locinput == PL_bostr)

Is this intentional? Or a bug inserted when $* got removed?

Thanks,
  Michael.

--
Michael Schroeder mlschroe@​informatik.uni-erlangen.de
main(_){while(_=getchar())putchar(_-1/(~(_|32)/13*2-11)*13);}

@p5pRT
Copy link
Author

p5pRT commented Feb 19, 2008

From rick@bort.ca

On Feb 19 2008, Michael Schroeder wrote​:

While I've got your attention, the following code works in 5.8
and fails in 5.10​:

my $r = qr\{^bar\};
die unless "foo\\nbar" =~ /$r/m;

This is a bug fix. qr{^bar} gives the pattern /(?-xism​:^bar)/ so it
explicitly turns off any external /m modifiers.

--
Rick Delaney
rick@​bort.ca

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2008

From Michael.Schroeder@informatik.uni-erlangen.de

On Tue, Feb 19, 2008 at 10​:18​:40AM -0500, Rick Delaney wrote​:

On Feb 19 2008, Michael Schroeder wrote​:

While I've got your attention, the following code works in 5.8
and fails in 5.10​:

my $r = qr\{^bar\};
die unless "foo\\nbar" =~ /$r/m;

This is a bug fix. qr{^bar} gives the pattern /(?-xism​:^bar)/ so it
explicitly turns off any external /m modifiers.

Hmm, this fix is not mentioned in perldelta and breaks a couple of modules
like Mail​::Mbox​::MessageParser...

Cheers,
  Michael.

--
Michael Schroeder mlschroe@​informatik.uni-erlangen.de
main(_){while(_=getchar())putchar(_-1/(~(_|32)/13*2-11)*13);}

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2008

From @demerphq

On 20/02/2008, Michael Schroeder
<Michael.Schroeder@​informatik.uni-erlangen.de> wrote​:

On Tue, Feb 19, 2008 at 10​:18​:40AM -0500, Rick Delaney wrote​:

On Feb 19 2008, Michael Schroeder wrote​:

While I've got your attention, the following code works in 5.8
and fails in 5.10​:

my $r = qr\{^bar\};
die unless "foo\\nbar" =~ /$r/m;

This is a bug fix. qr{^bar} gives the pattern /(?-xism​:^bar)/ so it
explicitly turns off any external /m modifiers.

Hmm, this fix is not mentioned in perldelta and breaks a couple of modules
like Mail​::Mbox​::MessageParser...

I dont think it was a deliberate fix, I think it was something that
was fixed "by accident" in the process of fixing or changing something
else. It only appeared on the p5p radar post 5.10 release, and since
it wasnt correct behaviour in the first place we had no regression
tests for it. (obviously)

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2008

From rick@bort.ca

On Feb 20 2008, demerphq wrote​:

On 20/02/2008, Michael Schroeder
<Michael.Schroeder@​informatik.uni-erlangen.de> wrote​:

On Tue, Feb 19, 2008 at 10​:18​:40AM -0500, Rick Delaney wrote​:

On Feb 19 2008, Michael Schroeder wrote​:

While I've got your attention, the following code works in 5.8
and fails in 5.10​:

my $r = qr\{^bar\};
die unless "foo\\nbar" =~ /$r/m;

This is a bug fix. qr{^bar} gives the pattern /(?-xism​:^bar)/ so it
explicitly turns off any external /m modifiers.

Hmm, this fix is not mentioned in perldelta and breaks a couple of modules
like Mail​::Mbox​::MessageParser...

I dont think it was a deliberate fix, I think it was something that
was fixed "by accident" in the process of fixing or changing something
else. It only appeared on the p5p radar post 5.10 release, and since
it wasnt correct behaviour in the first place we had no regression
tests for it. (obviously)

The fix was deliberate enough in that it was meant to decouple /m from
PL_multiline so that the /m-ness of $re=qr/pat/ and /$re/ would be
independant.

The particular case I was fixing was for

  $re = qr/^bar/m;
  "foo\nbar" =~ /$re/;

not matching when it should, but the opposite case​:

  $re = qr/^bar/;
  "foo\nbar" =~ /$re/m;

matching when it should not would clearly be fixed by the same patch. I
must admit to not adding a regression test for this case. Good help is
so hard to find.

--
Rick Delaney
rick@​bort.ca

@p5pRT
Copy link
Author

p5pRT commented Feb 20, 2008

From @demerphq

On 20/02/2008, Rick Delaney <rick@​bort.ca> wrote​:

On Feb 20 2008, demerphq wrote​:

On 20/02/2008, Michael Schroeder
<Michael.Schroeder@​informatik.uni-erlangen.de> wrote​:

On Tue, Feb 19, 2008 at 10​:18​:40AM -0500, Rick Delaney wrote​:

On Feb 19 2008, Michael Schroeder wrote​:

While I've got your attention, the following code works in 5.8
and fails in 5.10​:

my $r = qr\{^bar\};
die unless "foo\\nbar" =~ /$r/m;

This is a bug fix. qr{^bar} gives the pattern /(?-xism​:^bar)/ so it
explicitly turns off any external /m modifiers.

Hmm, this fix is not mentioned in perldelta and breaks a couple of modules
like Mail​::Mbox​::MessageParser...

I dont think it was a deliberate fix, I think it was something that
was fixed "by accident" in the process of fixing or changing something
else. It only appeared on the p5p radar post 5.10 release, and since
it wasnt correct behaviour in the first place we had no regression
tests for it. (obviously)

The fix was deliberate enough in that it was meant to decouple /m from
PL_multiline so that the /m-ness of $re=qr/pat/ and /$re/ would be
independant.

The particular case I was fixing was for

$re = qr/^bar/m;
"foo\\nbar" =~ /$re/;

not matching when it should, but the opposite case​:

$re = qr/^bar/;
"foo\\nbar" =~ /$re/m;

matching when it should not would clearly be fixed by the same patch. I
must admit to not adding a regression test for this case. Good help is
so hard to find.

Personally i dont think its a big deal. Worst aspect of it was it was
unannounced. Something that can be fixed in 5.10.1

yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Feb 21, 2008

From Michael.Schroeder@informatik.uni-erlangen.de

On Wed, Feb 20, 2008 at 11​:17​:58AM -0500, Rick Delaney wrote​:

The particular case I was fixing was for

$re = qr/^bar/m;
"foo\\nbar" =~ /$re/;

not matching when it should

Hmm, but that did match in perl-5.8, as the regexp compiler used MBOL
in that case.

, but the opposite case​:

$re = qr/^bar/;
"foo\\nbar" =~ /$re/m;

matching when it should not would clearly be fixed by the same patch. I
must admit to not adding a regression test for this case. Good help is
so hard to find.

Yes, regression tests are always hard to write.

Cheers,
  Michael

--
Michael Schroeder mlschroe@​informatik.uni-erlangen.de
main(_){while(_=getchar())putchar(_-1/(~(_|32)/13*2-11)*13);}

@p5pRT
Copy link
Author

p5pRT commented Mar 3, 2008

@smpeters - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed Mar 3, 2008
@p5pRT
Copy link
Author

p5pRT commented May 28, 2009

From @nwc10

Dave notes​:

add to perldelta the fact (missing from 5.10.0 delta) that /m
no longer propagates into a /...${qr}.../

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant