Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perl regexp state gets clobbered (includes a fix) #8098

Closed
p5pRT opened this issue Sep 8, 2005 · 12 comments
Closed

Perl regexp state gets clobbered (includes a fix) #8098

p5pRT opened this issue Sep 8, 2005 · 12 comments

Comments

@p5pRT
Copy link

p5pRT commented Sep 8, 2005

Migrated from rt.perl.org#37102 (status was 'resolved')

Searchable as RT37102$

@p5pRT
Copy link
Author

p5pRT commented Sep 8, 2005

From geoff@gate.net

The following problem has been observed under versions of Perl after 5.8.4
on Linux and Windows including the most recent stable version (5.8.7). The
script that experienced the problem is too complex to include and requires
a connection to a database server. A patch with a fix is provided at the
end of this email.

The following regex was causing either an Out of Memory condition or a
segmentation fault after two iterations​:

  if ($a =~ m/^([A-Za-z]+)(.*)/)

It was caused by the PL_regeol variable of regexec.c getting overwritten
when in the "av_store()" call on line 4370 of regexec.c (see gdb backtrace
below)​:

  if (a)
  sw = *a;
  else if (si && doinit) {
  sw = swash_init("utf8", "", si, 1, 0);
  (void)av_store(av, 1, sw);
  }

When av_store is called, the previous positional parameter is
de-referenced and its DESTROY method is called. I could not figure out
what object was being destroyed but the gdb backtrace will probably ring a
few bells for you. The script was working with UTF-8 content and wide
characters, which I suspect was related to the problem. In any case, the
DESTROY method that was called included a regular expression that would
trash the state of the regular expression engine a few methods up the stack.

My solution is given in a patch at the very end of this email. Please
excuse the comment with my name but it will allow me to find my fix in my
copy of Perl should I need to in the future.

I added the following call to sv_clear() in sv.c​:

  save_re_context();

just prior to the call to the object's DESTROY method. This allows regular
expressions to be used in the DESTROY method of whatever object is being
used as a positional variable in a regular expression (a Unicode string?).

I agree with the comments above the save_re_context() function that this
is a really kludgey way for the regular expression engine to be written.
The engine should store its state in a stack or allocated structure that
is passed to its subroutines and not use global variables. I am afraid
that there may be other sections in Perl that will also require calls to
save_re_context() that have not been discovered yet.

Thanks for getting my fix in to the Perl distribution.

Sincerely,

Geoff Mottram
geoff@​gate.net

--------------Output of perl -V ------------

Summary of my perl5 (revision 5 version 8 subversion 7) configuration​:
  Platform​:
  osname=linux, osvers=2.4.21-27.0.2.elsmp, archname=i686-linux
  uname='linux classificationweb.net 2.4.21-27.0.2.elsmp #1 smp wed jan
12 23​:35​:44 est 2005 i686 i686 i386 gnulinux '
  config_args='-de'
  hint=previous, useposix=true, d_sigaction=define
  usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING',
  optimize='-g',
  cppflags='-fno-strict-aliasing -pipe -I/usr/local/include
-I/usr/include/gdbm -fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING'
  ccversion='', gccversion='3.2.3 20030502 (Red Hat Linux 3.2.3-49)',
gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib
  libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.3.2'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Characteristics of this binary (from libperl)​:
  Compile-time options​: DEBUGGING USE_LARGE_FILES
  Built under linux
  Compiled at Sep 7 2005 06​:10​:19
  @​INC​:
  /usr/local/lib/perl5/5.8.7/i686-linux
  /usr/local/lib/perl5/5.8.7
  /usr/local/lib/perl5/site_perl/5.8.7/i686-linux
  /usr/local/lib/perl5/site_perl/5.8.7
  /usr/local/lib/perl5/site_perl
  .


------------Output of perbug -d -------------


Flags​:
  category=
  severity=


Site configuration information for perl v5.8.7​:

Configured by minaret at Wed Sep 7 06​:08​:36 EDT 2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration​:
  Platform​:
  osname=linux, osvers=2.4.21-27.0.2.elsmp, archname=i686-linux
  uname='linux classificationweb.net 2.4.21-27.0.2.elsmp #1 smp wed jan
12 23​:35​:44 est 2005 i686 i686 i386 gnulinux '
  config_args='-de'
  hint=previous, useposix=true, d_sigaction=define
  usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING',
  optimize='-g',
  cppflags='-fno-strict-aliasing -pipe -I/usr/local/include
-I/usr/include/gdbm -fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -DDEBUGGING'
  ccversion='', gccversion='3.2.3 20030502 (Red Hat Linux 3.2.3-49)',
gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=4, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /lib /usr/lib
  libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
  libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.3.2'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:


@​INC for perl v5.8.7​:
  /usr/local/lib/perl5/5.8.7/i686-linux
  /usr/local/lib/perl5/5.8.7
  /usr/local/lib/perl5/site_perl/5.8.7/i686-linux
  /usr/local/lib/perl5/site_perl/5.8.7
  /usr/local/lib/perl5/site_perl
  .


Environment for perl v5.8.7​:
  HOME=/minaret
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)

PATH=/usr/kerberos/bin​:/usr/local/bin​:/bin​:/usr/bin​:/usr/X11R6/bin​:/usr/kerberos/bin​:/usr/local/bin​:/bin​:/usr/bin​:/usr/X11R6/bin/bin​:/usr/local/java/jdk1.5.0_04/bin​:/usr/local/java/jdk1.5.0_04/jre/bin​:/minaret/m/bin​:/minaret/m/util​:/minaret/bin
  PERL_BADLANG (unset)
  SHELL=/bin/bash


----- gdb backtrace showing how PL_regeol gets clobbered by
Perl_re_intuit_start-------------------------

#0 Perl_re_intuit_start (prog=0x90e5070, sv=0x8fda2a0, strpos=0x9727d28
"utf8​::DESTROY", strend=0x9727d35 "", flags=2, data=0x0)
  at regexec.c​:454
#1 0x080ca09f in Perl_pp_match () at pp_hot.c​:1328
#2 0x080b288d in Perl_runops_debug () at dump.c​:1452
#3 0x08063509 in S_call_body (myop=0xbfffa050, is_eval=0) at perl.c​:2364
#4 0x0806318f in Perl_call_sv (sv=0x95d17b0, flags=150) at perl.c​:2282
#5 0x080d9de2 in Perl_sv_clear (sv=0x972d270) at sv.c​:5110
#6 0x080da66c in Perl_sv_free (sv=0x972d270) at sv.c​:5356
#7 0x080da24c in Perl_sv_clear (sv=0x978cf10) at sv.c​:5208
#8 0x080da66c in Perl_sv_free (sv=0x978cf10) at sv.c​:5356
#9 0x080c3800 in Perl_av_store (av=0x95509a0, key=1, val=0x9702280) at
av.c​:335
#10 0x0812e348 in Perl_regclass_swash (node=0x9552c84, doinit=1 '\001',
listsvp=0x0, altsvp=0xbfffa27c) at regexec.c​:4370
#11 0x0812e551 in S_reginclass (n=0x9552c84, p=0x9760c01 "3279.K86",
lenp=0x0, do_utf8=1 '\001') at regexec.c​:4420
#12 0x0812d438 in S_regrepeat (p=0x9552c84, max=2147483647) at regexec.c​:4112
#13 0x0812c4ae in S_regmatch (prog=0x9552c74) at regexec.c​:3774
#14 0x08126f7c in S_regtry (prog=0x9552c30, startpos=0x9760c00
"B3279.K86") at regexec.c​:2204
#15 0x08125a61 in Perl_regexec_flags (prog=0x9552c30, stringarg=0x9760c00
"B3279.K86", strend=0x9760c09 "", strbeg=0x9760c00 "B3279.K86",
  minend=0, sv=0x95507fc, data=0x0, flags=3) at regexec.c​:1753
#16 0x080ca120 in Perl_pp_match () at pp_hot.c​:1340
#17 0x080b288d in Perl_runops_debug () at dump.c​:1452
#18 0x08062878 in S_run_body (oldscope=1) at perl.c​:1995
#19 0x08062431 in perl_run (my_perl=0x8fda008) at perl.c​:1919
#20 0x0805e81b in main (argc=20, argv=0xbfffa914, env=0xbfffa968) at
perlmain.c​:98
(gdb) p *prog
$111 = {startp = 0x90dfb48, endp = 0x90e4e88, regstclass = 0x0, substrs =
0x90e1c88, precomp = 0x90e5018 "^threads​::new$", data = 0x0,
  subbeg = 0x0, offsets = 0x90e5030, sublen = 0, refcnt = 1, minlen = 12,
prelen = 14, nparens = 0, lastparen = 0, lastcloseparen = 0,
  reganch = 137363457, program = {{flags = 156 '\234', type = 0 '\0',
next_off = 0}}}

---------------- Patch for sv.c to fix problem ---------------

Inline Patch
--- sv.c.orig	2005-05-27 06:38:11.000000000 -0400
+++ sv.c	2005-09-07 19:31:19.000000000 -0400
@@ -5102,6 +5102,8 @@
  		    SV* tmpref = newRV(sv);
  	            SvREADONLY_on(tmpref);   /* DESTROY() could be naughty */
  		    ENTER;
+		    /* Next line added by Geoff Mottram 9/7/05. */
+		    save_re_context();
  		    PUSHSTACKi(PERLSI_DESTROY);
  		    EXTEND(SP, 2);
  		    PUSHMARK(SP);

---------------------------------------------------------------

@p5pRT
Copy link
Author

p5pRT commented Sep 18, 2005

From geoff@gate.net

The problem with regular expressions getting clobbered only occurs while
running Perl with the "-d" (debug) option. I believe it is the "sub()"
method of the DB.pm module that contains the regular expression that
clobbers the running script's regular expression as described in my bug
report.

Best,

Geoff Mottram
geoff@​gate.net

@p5pRT
Copy link
Author

p5pRT commented Sep 19, 2005

From @rgarcia

On 9/18/05, Geoff Mottram <geoff@​gate.net> wrote​:

The problem with regular expressions getting clobbered only occurs while
running Perl with the "-d" (debug) option. I believe it is the "sub()"
method of the DB.pm module that contains the regular expression that
clobbers the running script's regular expression as described in my bug
report.

That's probable. In this case, other Devel​:: modules might be affected
if they use regular expressions this way. The proper fix would be
somehow (hand waving follows) to save and restore the regexp context
when entering a DB​:: routine.

@p5pRT
Copy link
Author

p5pRT commented Sep 19, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 9, 2005

From @rgarcia

On 9/8/05, via RT Geoff Mottram <perlbug-followup@​perl.org> wrote​:

The following regex was causing either an Out of Memory condition or a
segmentation fault after two iterations​:

         if \($a =~ m/^\(\[A\-Za\-z\]\+\)\(\.\*\)/\)

It was caused by the PL_regeol variable of regexec.c getting overwritten
when in the "av_store()" call on line 4370 of regexec.c (see gdb backtrace
below)​:

         if \(a\)
             sw = \*a;
         else if \(si && doinit\) \{
             sw = swash\_init\("utf8"\, ""\, si\, 1\, 0\);
             \(void\)av\_store\(av\, 1\, sw\);
         \}

When av_store is called, the previous positional parameter is
de-referenced and its DESTROY method is called. I could not figure out
what object was being destroyed but the gdb backtrace will probably ring a
few bells for you. The script was working with UTF-8 content and wide
characters, which I suspect was related to the problem. In any case, the
DESTROY method that was called included a regular expression that would
trash the state of the regular expression engine a few methods up the stack.

Given that your bug happens inside the debugger, I wonder whether the
following patch isn't more appropriate :

==== //depot/perl/pp_hot.c#415 - /opt/bleadperl/p4/perl/pp_hot.c ====

Inline Patch
--- /home/rafael/tmp/tmp.14639.0        2005-10-09 18:59:52.901594208 +0200
+++ /opt/bleadperl/p4/perl/pp_hot.c     2005-10-09 18:57:32.849885304 +0200
@@ -2657,6 +2657,7 @@ PP(pp_entersub)
         if (CvASSERTION(cv) && PL_DBassertion)
            sv_setiv(PL_DBassertion, 1);

+       save_re_context();
        cv = get_db_sub(&sv, cv);
        if (!cv || (!CvXSUB(cv) && !CvSTART(cv)))
            DIE(aTHX_ "No DB::sub routine defined");
End of Patch.

However, I can't reproduce your problem, so I'm not sure if it fixes it.

@p5pRT
Copy link
Author

p5pRT commented Oct 9, 2005

From @rgarcia

On 10/9/05, Rafael Garcia-Suarez <rgarciasuarez@​gmail.com> wrote​:

Given that your bug happens inside the debugger, I wonder whether the
following patch isn't more appropriate :
[...snipped...]

Meanwhile I applied this workaround :

Change 25721 by rgs@​marais on 2005/10/09 16​:42​:26

  Don't use a regexp in DB​::sub().
  This should work around perlbug #37102.

Affected files ...

... //depot/perl/lib/DB.pm#6 edit

Differences ...

==== //depot/perl/lib/DB.pm#6 (text) ====

@​@​ -63,8 +63,7 @​@​
  push(@​stack, $DB​::single);
  $DB​::single &= 1;
  $DB​::single |= 4 if $#stack == $deep;
-# print $DB​::sub, "\n";
- if ($DB​::sub =~ /(?​:^|​::)DESTROY$/ or not defined wantarray) {
+ if ($DB​::sub eq 'DESTROY' or substr($DB​::sub, -9) eq '​::DESTROY' or
not defined wantarray) {
  &$DB​::sub;
  $DB​::single |= pop(@​stack);
  $DB​::ret = undef;
End of patch.

@p5pRT
Copy link
Author

p5pRT commented Oct 9, 2005

@rgs - Status changed from 'open' to 'stalled'

@p5pRT
Copy link
Author

p5pRT commented Jul 5, 2007

From guest@guest.guest.xxxxxxxx

I noticed this bug is marked as 'stalled', however I've recently run
into this problem on 5.8.7 under cygwin. I tried to create a simple
test case that exposed the problem I've attached that here. I tested
the same code under perl 5.8.8 and the problem seems to have been
resolved there. Probably as a result of SWASHGET being replaced by a C
function in 5.8.8. The utf8_heavy.pl has this related comment​:

# Now SWASHGET is recasted into a C function S_swash_get (see utf8.c).

code that I believe recreates this problem is attached.

It looks like this bug can probably be closed for 5.8.8 but would be
good to have more people verify this.

@p5pRT
Copy link
Author

p5pRT commented Jul 5, 2007

@p5pRT
Copy link
Author

p5pRT commented Jul 5, 2007

The RT System itself - Status changed from 'stalled' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2008

From @smpeters

On Thu Jul 05 09​:24​:52 2007, guest wrote​:

I noticed this bug is marked as 'stalled', however I've recently run
into this problem on 5.8.7 under cygwin. I tried to create a simple
test case that exposed the problem I've attached that here. I tested
the same code under perl 5.8.8 and the problem seems to have been
resolved there. Probably as a result of SWASHGET being replaced by a C
function in 5.8.8. The utf8_heavy.pl has this related comment​:

# Now SWASHGET is recasted into a C function S_swash_get (see utf8.c).

code that I believe recreates this problem is attached.

It looks like this bug can probably be closed for 5.8.8 but would be
good to have more people verify this.

This does appear to be fixed.

worf​:perl-current steve$ ./perl -Ilib rt_37102.pl
..1 ..2 ..3 ..4 ..5 ..6 ..7 ..8 ..9 ..10
..11 ..12 ..13 ..14 ..15 ..16 ..17 ..18 ..19 ..20
..21 ..22 ..23 ..24 ..25 ..26 ..27 ..28 ..29 ..30
..31 ..32 ..33 ..34 ..35 ..36 ..37 ..38 ..39 ..40
..41 ..42 ..43 ..44 ..45 ..46 ..47 ..48 ..49 ..50
..51 ..52 ..53 ..54 ..55 ..56 ..57 ..58 ..59 ..60
..61 ..62 ..63 ..64 ..65 ..66 ..67 ..68 ..69 ..70
..71 ..72 ..73 ..74 ..75 ..76 ..77 ..78 ..79 ..80
..81 ..82 ..83 ..84 ..85 ..86 ..87 ..88 ..89 ..90
..91 ..92 ..93 ..94 ..95 ..96 ..97 ..98 ..99 ..100

@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2008

@smpeters - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant