Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in Regex-Engine (?) #12949

Closed
p5pRT opened this issue May 8, 2013 · 10 comments
Closed

Bug in Regex-Engine (?) #12949

p5pRT opened this issue May 8, 2013 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented May 8, 2013

Migrated from rt.perl.org#117917 (status was 'resolved')

Searchable as RT117917$

@p5pRT
Copy link
Author

p5pRT commented May 8, 2013

From kochnorman@rocketmail.com

I am not sure, whether perlbug correctly sent this or not. Sorry, if
this is duplicate content to you.

The attached file is the bugreport, that I (hope) I just sent. If so,
ignore this mail please.

@p5pRT
Copy link
Author

p5pRT commented May 8, 2013

From kochnorman@rocketmail.com

Created by kochnorman@rocketmail.com

I was just playing around trying to build a regex for the rather complex problem of matching a q-like-operator-string in Perl (like qq/hallo/, but also "hallo" and so on). So, I came to trying this​:

use strict;
use warnings;

use re "debugcolor";

sub getDel {
  my $z = shift;
  die $z;
  return ')' if $z eq '(';
  return '>' if $z eq '<';
  return ']' if $z eq '[';
  return '}' if $z eq '{';
  return $z;
}

'qq"hello"' =~ /(?<stringstart>(?​:q(?​:[xwrq])?\s*)?)(?<delim>(??{ (exists ($+{stringstart}) &amp;&amp; $+{stringstart} =~ m#q#) ? q#"|'# : '\\w' })).*?(?<!\\)(??{ die $+{delim}; defined $1 ? quotemeta(getDel($1)) : '(?!)' })/

When executing this code, I get this output (including re "debugcolor"-Output)​:

Compiling REx "(?<stringstart>(?​:q(?​:[xwrq])?\s*)?)(?<delim>(??{ (exists ($"...
Compiling REx "q"
Final program​:
  1​: EXACT <q> (3)
  3​: END (0)
anchored "q" at 0 (checking anchored isall) minlen 1
Final program​:
  1​: OPEN1 'stringstart' (3)
  3​: CURLYX[0] {0,1} (23)
  5​: EXACT <q> (7)
  7​: CURLY {0,1} (20)
  9​: ANYOF[qrwx][] (0)
  20​: STAR (22)
  21​: SPACE (0)
  22​: WHILEM (0)
  23​: NOTHING (24)
  24​: CLOSE1 'stringstart' (26)
  26​: OPEN2 'delim' (28)
  28​: LOGICAL[2] (29)
  29​: EVAL (31)
  31​: CLOSE2 'delim' (33)
  33​: MINMOD (34)
  34​: STAR (36)
  35​: REG_ANY (0)
  36​: UNLESSM[-1] (42)
  38​: EXACT <\\> (40)
  40​: SUCCEED (0)
  41​: TAIL (42)
  42​: LOGICAL[2] (43)
  43​: EVAL (45)
  45​: END (0)
minlen 0 with eval
Matching REx "(?<stringstart>(?​:q(?​:[xwrq])?\s*)?)(?<delim>(??{ (exists ($"... against "qq%"hallo%""
  0 <qq"hallo">| 1​:OPEN1 'stringstart'(3)
  0 <qq"hallo">| 3​:CURLYX[0] {0,1}(23)
  0 <qq"hallo">| 22​: WHILEM(0)
  whilem​: matched 0 out of 0..1
  0 <qq"hallo">| 5​: EXACT <q>(7)
  1 <qq"hallo">| 7​: CURLY {0,1}(20)
  ANYOF[qrwx][] can match 1 times out of 1...
  2 <qq"hallo">| 20​: STAR(22)
  SPACE can match 0 times out of 2147483647...
  2 <qq"hallo">| 22​: WHILEM(0)
  whilem​: matched 1 out of 0..1
  2 <qq"hallo">| 23​: NOTHING(24)
  2 <qq"hallo">| 24​: CLOSE1 'stringstart'(26)
  2 <qq"hallo">| 26​: OPEN2 'delim'(28)
  2 <qq"hallo">| 28​: LOGICAL[2](29)
  2 <qq"hallo">| 29​: EVAL(31)
Guessing start of match in sv for REx "q" against "qq"
Found anchored substr "q" at offset 0...
Guessed​: match at offset 0
Matching embedded REx "%"|'" against "%"hallo%""
  2 <qq"hallo">| 1​: TRIE-EXACT["'](7)
  2 <qq"hallo">| State​: 1 Accepted​: N Charid​: 1 CP​: 22 After State​: 2
  3 <qq"hallo">| State​: 2 Accepted​: Y Charid​: 0 CP​: 0 After State​: 0
  got 1 possible matches
  TRIE matched word #1, continuing
[1] 8691 segmentation fault perl qq.pl

... and I cannot really see why perl crashes here. I didn't except that regex to work perfectly or even to work at all (regarding to matching the correct things), because this task is maybe too complex to achieve it with a regex, but I certainly did not expect perl to crash with a segmentation fault.

When disabling re "debugcolor", the output is "Use of uninitialized value $+{"delim"} in die at (re_eval 2) line 1.
Died at (re_eval 2) line 1.". Probably re cannot handle that by now?

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.14.2:

Configured by Debian Project at Fri Apr 12 09:56:36 UTC 2013.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=2.6.32-5-686-bigmem, archname=i486-linux-gnu-thread-multi-64int
    uname='linux murphy 2.6.32-5-686-bigmem #1 smp mon feb 25 01:53:47 utc 2013 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Dldflags= -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.14 -Darchlib=/usr/lib/perl/5.14 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.14.2 -Dsitearch=/usr/local/lib/perl/5.14.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.14.2 -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.7.2', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/i386-linux-gnu /lib/../lib /usr/lib/i386-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=, so=so, useshrplib=true, libperl=libperl.so.5.14.2
    gnulibc_version='2.13'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'

Locally applied patches:
    


@INC for perl 5.14.2:
    /usr/share/perl/5.14.2
    /usr/share/perl
    /etc/perl
    /usr/local/lib/perl/5.14.2
    /usr/local/share/perl/5.14.2
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.14
    /usr/share/perl/5.14
    /usr/local/lib/site_perl
    .


Environment for perl 5.14.2:
    HOME=/home/nok
    LANG=de_DE.UTF-8
    LANGUAGE=
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/nok/.vim/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
    PERL5LIB=/usr/share/perl
    PERL_BADLANG (unset)
    PERL_LOCAL_LIB_ROOT=/home/nok/perl5
    PERL_MB_OPT=--install_base /home/nok/perl5
    PERL_MM_OPT=INSTALL_BASE=/home/nok/perl5
    SHELL=/usr/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented May 9, 2013

From @jkeenan

On Wed May 08 13​:36​:33 2013, kochnorman@​rocketmail.com wrote​:

I am not sure, whether perlbug correctly sent this or not. Sorry, if
this is duplicate content to you.

The attached file is the bugreport, that I (hope) I just sent. If so,
ignore this mail please.

The program as I have extracted it from your attachment​:

#####
$ cat 117917_regex.pl
use strict;
use warnings;

use re "debugcolor";

sub getDel {
  my $z = shift;
  die $z;
  return ')' if $z eq '(';
  return '>' if $z eq '<';
  return ']' if $z eq '[';
  return '}' if $z eq '{';
  return $z;
}

'qq"hello"' =~ /(?<stringstart>(?​:q(?​:[xwrq])?\s*)?)(?<delim>(??{
(exists ($+{stringstart}) &amp;&amp; $+{stringstart} =~ m#q#) ? q#"|'# : '\\w'
})).*?(?<!\\)(??{ die $+{delim}; defined $1 ? quotemeta(getDel($1)) :
'(?!)' })/
#####

When I run this with 'use re "debugcolor";' commented out, I get this​:

#####
$ perl 117917_regex.pl
Use of uninitialized value $+{"delim"} in die at (re_eval 2) line 1.
Died at (re_eval 2) line 1.
#####

When I activate 'use re "debugcolor";', the program ends with a bus
error (see attachment).

Now, I have never actually used named captures in regular expressions,
nor have I ever used 'use re' anything. So I can't claim to say
anything about the possible interaction of these features.

But if this were my program I'd deal *first* with the warning and error
message thrown when you run without 'debugcolor'. I would then try to
build up the pattern match in small pieces, with and without
'debugcolor'. The objective would be to determine what tips the program
into a segfault or bus error.

I don't see any evidence yet that there is a problem with the Perl 5
core distribution, which is the subject of this mailing list. You might
be better off taking this problem first to a place such as
perlmonks.org. If discussion there points to the Perl 5 regex engine as
the problem -- as distinct from use of that engine -- then we could
resume discussion here.

My two cents.

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented May 9, 2013

From @jkeenan

$ perl 117917_regex.pl
Compiling REx "(?<stringstart>(?​:q(?​:[xwrq])?\s*)?)(?<delim>(??{ (exists ($"...
Compiling REx "q"
Final program​:
  1​: EXACT <q> (3)
  3​: END (0)
anchored "q" at 0 (checking anchored isall) minlen 1
Final program​:
  1​: OPEN1 'stringstart' (3)
  3​: CURLYX[0] {0,1} (23)
  5​: EXACT <q> (7)
  7​: CURLY {0,1} (20)
  9​: ANYOF[qrwx][] (0)
  20​: STAR (22)
  21​: SPACE (0)
  22​: WHILEM (0)
  23​: NOTHING (24)
  24​: CLOSE1 'stringstart' (26)
  26​: OPEN2 'delim' (28)
  28​: LOGICAL[2] (29)
  29​: EVAL (31)
  31​: CLOSE2 'delim' (33)
  33​: MINMOD (34)
  34​: STAR (36)
  35​: REG_ANY (0)
  36​: UNLESSM[-1] (42)
  38​: EXACT <\\> (40)
  40​: SUCCEED (0)
  41​: TAIL (42)
  42​: LOGICAL[2] (43)
  43​: EVAL (45)
  45​: END (0)
minlen 0 with eval
Matching REx "(?<stringstart>(?​:q(?​:[xwrq])?\s*)?)(?<delim>(??{ (exists ($"... against "qq%"hello%""
  0 <qq"hello">| 1​:OPEN1 'stringstart'(3)
  0 <qq"hello">| 3​:CURLYX[0] {0,1}(23)
  0 <qq"hello">| 22​: WHILEM(0)
  whilem​: matched 0 out of 0..1
  0 <qq"hello">| 5​: EXACT <q>(7)
  1 <qq"hello">| 7​: CURLY {0,1}(20)
  ANYOF[qrwx][] can match 1 times out of 1...
  2 <qq"hello">| 20​: STAR(22)
  SPACE can match 0 times out of 2147483647...
  2 <qq"hello">| 22​: WHILEM(0)
  whilem​: matched 1 out of 0..1
  2 <qq"hello">| 23​: NOTHING(24)
  2 <qq"hello">| 24​: CLOSE1 'stringstart'(26)
  2 <qq"hello">| 26​: OPEN2 'delim'(28)
  2 <qq"hello">| 28​: LOGICAL[2](29)
  2 <qq"hello">| 29​: EVAL(31)
Guessing start of match in sv for REx "q" against "qq"
Found anchored substr "q" at offset 0...
Guessed​: match at offset 0
Matching embedded REx "%"|'" against "%"hello%""
  2 <qq"hello">| 1​: TRIE-EXACT["'](7)
  2 <qq"hello">| State​: 1 Accepted​: N Charid​: 1 CP​: 22 After State​: 2
  3 <qq"hello">| State​: 2 Accepted​: Y Charid​: 0 CP​: 0 After State​: 0
  got 1 possible matches
  TRIE matched word #1, continuing
Bus error

@p5pRT
Copy link
Author

p5pRT commented May 9, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 9, 2013

From @tonycoz

On Wed, May 08, 2013 at 06​:14​:45PM -0700, James E Keenan via RT wrote​:

The program as I have extracted it from your attachment​:

#####
$ cat 117917_regex.pl
use strict;
use warnings;

use re "debugcolor";

sub getDel {
my $z = shift;
die $z;
return ')' if $z eq '(';
return '>' if $z eq '<';
return ']' if $z eq '[';
return '}' if $z eq '{';
return $z;
}

'qq"hello"' =~ /(?<stringstart>(?​:q(?​:[xwrq])?\s*)?)(?<delim>(??{
(exists ($+{stringstart}) &amp;&amp; $+{stringstart} =~ m#q#) ? q#"|'# : '\\w'
})).*?(?<!\\)(??{ die $+{delim}; defined $1 ? quotemeta(getDel($1)) :
'(?!)' })/
#####

I don't see any evidence yet that there is a problem with the Perl 5
core distribution, which is the subject of this mailing list. You might
be better off taking this problem first to a place such as
perlmonks.org. If discussion there points to the Perl 5 regex engine as
the problem -- as distinct from use of that engine -- then we could
resume discussion here.

Given it's perl code using only core modules, the match is against a
short string, and it crashes a with a Bus error, I'd call this a bug
in perl.

That said, blead doesn't crash (but still has an undefined $+{delim}).

Tony

@p5pRT
Copy link
Author

p5pRT commented May 9, 2013

From @nwc10

On Thu, May 09, 2013 at 11​:32​:19AM +1000, Tony Cook wrote​:

Given it's perl code using only core modules, the match is against a
short string, and it crashes a with a Bus error, I'd call this a bug
in perl.

That said, blead doesn't crash (but still has an undefined $+{delim}).

Tweaking the test case to this (warn, instead of die)​:

$ cat ../117917.pl
#!./perl
use strict;
use warnings;

use re "debugcolor";

sub getDel {
  my $z = shift;
  warn $z;
  return ')' if $z eq '(';
  return '>' if $z eq '<';
  return ']' if $z eq '[';
  return '}' if $z eq '{';
  return $z;
}

'qq"hello"' =~ /(?<stringstart>(?​:q(?​:[xwrq])?\s*)?)(?<delim>(??{
(exists ($+{stringstart}) &amp;&amp; $+{stringstart} =~ m#q#) ? q#"|'# : '\\w'
})).*?(?<!\\)(??{ warn $+{delim}; defined $1 ? quotemeta(getDel($1)) :
'(?!)' })/

bisecting it with

$ bisect.pl --expect-fail --start v5.16.0 ../117917.pl

says

8a45afe is the first bad commit
commit 8a45afe
Author​: David Mitchell <davem@​iabyn.com>
Date​: Fri Oct 21 15​:00​:47 2011 +0100

  unlink re_eval code blocks from op list

  In the list of ops generated by something like /abc(?{...})def/,

  const(abc)
  null/special
  ...
  const(...)
  const(def)

  link the list, but skip the DO blocks. This means that for the runtime
  case, we no longer need the temporary measure of deleting the DO blocks,
  and it will facilitate the next step of handling literal code at runtime,
  i.e. /$runtime(?{...})/.

:100644 100644 1d7a7fde3ab4a110f7a6c1d540cddc6c02c3dc81 75667dff430ffb0ad56549aa1a0e4ffffac67e0f M op.c
bisect run success
That took 852 seconds

ie "bad" means that that commit fixed the bus error.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented May 9, 2013

From @iabyn

On Thu, May 09, 2013 at 11​:32​:19AM +1000, Tony Cook wrote​:

That said, blead doesn't crash (but still has an undefined $+{delim}).

This can be reduced to;

'ab' =~ /^(a(?{ 'x' =~ m{x}})b)(?{ warn "inner undef!\n" unless defined $1 })/;
warn "outer undef!\n" unless defined $1;

which produces​:

  inner undef!

removing the match from inside the (?{}) makes the issue go away, so it looks
like the regex state isn't being saved/restored properly on re-entrancy.

I'm looking into why.

--
A walk of a thousand miles begins with a single step...
then continues for another 1,999,999 or so.

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2013

From @cpansprout

On Thu May 09 08​:54​:30 2013, davem wrote​:

On Thu, May 09, 2013 at 11​:32​:19AM +1000, Tony Cook wrote​:

That said, blead doesn't crash (but still has an undefined
$+{delim}).

This can be reduced to;

'ab' =~ /^(a(?{ 'x' =~ m{x}})b)(?{ warn "inner undef!\n" unless
defined $1 })/;
warn "outer undef!\n" unless defined $1;

which produces​:

inner undef\!

removing the match from inside the (?{}) makes the issue go away, so
it looks
like the regex state isn't being saved/restored properly on re-
entrancy.

I'm looking into why.

I have beaten you to it. :-)

Fixed in f5df269.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2013

@cpansprout - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant