Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion rx->sublen >= (s - rx->subbeg) + i failed: file "regcomp.c" #9566

Closed
p5pRT opened this issue Nov 12, 2008 · 46 comments
Closed

Assertion rx->sublen >= (s - rx->subbeg) + i failed: file "regcomp.c" #9566

p5pRT opened this issue Nov 12, 2008 · 46 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 12, 2008

Migrated from rt.perl.org#60508 (status was 'resolved')

Searchable as RT60508$

@p5pRT
Copy link
Author

p5pRT commented Nov 12, 2008

From @clintongormley

Created by @clintongormley

This is a bug report for perl from clint@​traveljury.com,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
I've just had this error message return​:

Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c",
line 5109 at (eval 147)

I can't tell you what regex or value was being checked at the time,
but it might have been this call​:

  return HTML​::Entities​::encode_entities( $_[0], q(<>&"') );

All of my text is decoded utf8.

I don't know what other information I can give you to help
debug this?

thanks

Clint

Perl Info

Flags:
    category=core
    severity=medium

This perlbug was built using Perl 5.10.0 - Tue Jul 15 14:37:49 UTC 2008
It is being executed now by  Perl 5.10.0 - Tue Jul 15 14:31:57 UTC 2008.

Site configuration information for perl 5.10.0:

Configured by abuild at Tue Jul 15 14:31:57 UTC 2008.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.25, archname=x86_64-linux-thread-multi
    uname='linux stravinsky 2.6.25 #1 smp 20080210 20:01:04 utc x86_64
x86_64 x86_64 gnulinux '
    config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr
-Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm
-Duseshrplib=true -Doptimize=-O2 -fmessage-length=0 -Wall
-D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe
-Accflags=-DPERL_USE_SAFE_PUTENV'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV
-DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
    optimize='-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2
-fstack-protector -g -Wall -pipe',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV
-DDEBUGGING -fno-strict-aliasing -pipe'
    ccversion='', gccversion='4.3.1 20080507 (prerelease)
[gcc-4_3-branch revision 135036]', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/libc-2.8.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.8'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
-Wl,-rpath,/usr/lib/perl5/5.10.0/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64'

Locally applied patches:
    


@INC for perl 5.10.0:
    /usr/lib/perl5/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/5.10.0
    /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/vendor_perl
    .


Environment for perl 5.10.0:
    HOME=/root
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

PATH=/opt/apache/sites/iAnnounce/bin:/usr/local/bin:/usr/sbin:/bin:/usr/bin:/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash



@p5pRT
Copy link
Author

p5pRT commented Nov 13, 2008

From @rgs

2008/11/12 via RT Clinton Gormley <perlbug-followup@​perl.org>​:

I've just had this error message return​:

Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c",
line 5109 at (eval 147)

I can't tell you what regex or value was being checked at the time,
but it might have been this call​:

return HTML​::Entities​::encode_entities( $_[0], q(<>&"') );

All of my text is decoded utf8.

I don't know what other information I can give you to help
debug this?

If this bug is reproducible, we'd need a small program to replicate
the bug, if possible.

@p5pRT
Copy link
Author

p5pRT commented Nov 13, 2008

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2008

From @clintongormley

If this bug is reproducible, we'd need a small program to replicate
the bug, if possible.

Yeah, sorry - I realise it's a rubbish bug report. Unfortunately, I've
only seen this once, and I know neither the data nor the regex involved.

There are only two places I could see this coming from, neither of them
directly involving regexes​:

The first is Template Toolkit (v2.20) using the XS stash​:

  From the compiled template​:
  $output .= $stash->get(['form', 0, 'field', [ 'keywords' ], 'render', 0]);

  This is pretty standard stuff, and shouldn't have anything to do with user data.

And the second (which is the only regexy thing called by the above), is​:
#===================================
sub _encode {
#===================================
  return HTML​::Entities​::encode_entities( $_[0], q(<>&"') );
}

where $_[0] would be the UTF8-decoded value passed in by the user (the
value of which I don't know).

The error was thrown on a polish site, so it is likely that non Latin-1
characters were present.

hope this helps (a little)

clint

@p5pRT
Copy link
Author

p5pRT commented Nov 20, 2008

From @clintongormley

If this bug is reproducible, we'd need a small program to replicate
the bug, if possible.

OK - I added some debugging and have seen this error a few times, but am
unable to reproduce it.

It looks like a string which will trigger this error is​:

 "np. imi\x{119}, nazwisko, s\x{142}owa kluczowe"

and the only regex-related code which touches it is​:

#===================================
sub _encode {
#===================================
  return HTML​::Entities​::encode_entities( $_[0], q(<>&"') );
}

(HTML​::Entities v 1.35)

I've tried running the same code on the same object, and it works just
fine. I've visited the same page in my browser and it renders just
fine...

I can only assume this is intermittent. I've only seen this error 5
times in the last week, and this is a frequently visited page, so I'm
not sure what triggers it.

Other strings that have triggered the bug​:
- "Twoje imi\x{119} i nazwisko"
- "Nice day, <a \x{f2}\x{f0}\x{e8}\x{ea}\x{ee}\x{f2}\x{e0}\x{e6}, :-)),"
- and possibly "S\x{142}owa kluczowe"

Hope this helps

Clint

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2009

From dwtrusty@hotmail.com

On Thu Nov 13 08​:57​:24 2008, rafael wrote​:

2008/11/12 via RT Clinton Gormley <perlbug-followup@​perl.org>​:

I've just had this error message return​:

Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c",
line 5109 at (eval 147)

I can't tell you what regex or value was being checked at the time,
but it might have been this call​:

return HTML​::Entities​::encode_entities( $_[0], q(<>&"') );

All of my text is decoded utf8.

I don't know what other information I can give you to help
debug this?

If this bug is reproducible, we'd need a small program to replicate
the bug, if possible.

I have a test case which reproduces this problem. I have attached
the program and test data. Just run the program "parse.pl" in the
attached tarfile.

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2009

From dwtrusty@hotmail.com

bugtest.tgz

@p5pRT
Copy link
Author

p5pRT commented Apr 3, 2009

From @gannett-ggreer

I too have this problem, although with XML​::Simple rather than
XML​::Twig. I simplified dtrusty's test case down to​:

- - - 8< - - - 8< - - -
#!/usr/bin/perl
use strict;
use XML​::Twig;

my $twig = XML​::Twig->new(twig_handlers => {
  x => sub {
  my ($twig, $x) = @​_;
  $x->xml_text;
  return; # Just to point out that the return value doesn't matter.
  },
  });
$twig->parse("<xml><x>ap < < <</x><x>< ö</x></xml>");
print "XML​::Twig ok\n";
- - - 8< - - - 8< - - -

$ ./parse.pl
Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c",
line 5098 at /usr/lib/perl5/vendor_perl/5.10.0/XML/Twig.pm line 7806.
at ./parse.pl line 12

Twig.pm​:7806 is​:

$string=~ s/([&<])/$XML​::Twig​::base_ent{$1}/g unless( $keep_encoding ||
$elt->{asis});

Versions​:
perl-5.10.0-56.fc10.i386
perl-XML-Twig-3.32-1.fc9.noarch

(My own crash has too much code and data to easily cut down.)

@p5pRT
Copy link
Author

p5pRT commented Apr 5, 2009

From @gannett-ggreer

Further testing indicates Perl built from git repository works and Perl
built from source RPM package fails, suggesting it is a patch applied by
Fedora and apparently SuSE. I'll try to narrow down which patch,
hopefully soon.

@p5pRT
Copy link
Author

p5pRT commented Apr 6, 2009

From @gannett-ggreer

Correction to the above​: not a patch, but the parameters given to
Configure cause it. Using Fedora's Configure string causes blead to
also fail​:

perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion
`rx->sublen >= (s - rx->subbeg) + i' failed.

@p5pRT
Copy link
Author

p5pRT commented Apr 7, 2009

From @gannett-ggreer

Removing "-g" from -Doptimize passes the test case.

@p5pRT
Copy link
Author

p5pRT commented Apr 8, 2009

From @gannett-ggreer

I submitted to Fedora​:
https://bugzilla.redhat.com/show_bug.cgi?id=494773

@p5pRT
Copy link
Author

p5pRT commented Apr 8, 2009

From mmaslano@redhat.com

On Tue Apr 07 20​:24​:15 2009, greerga wrote​:

I submitted to Fedora​:
https://bugzilla.redhat.com/show_bug.cgi?id=494773

As was said in Fedora's bugzilla this is not problem of rpm or distro.
"-g" stands for debugging which allows show up assertion.

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2009

From @gannett-ggreer

I have distilled this assertion failure down to the following program​:

- - - 8< - - - 8< - - -
sub f { $_[0] =~ s/([&<>])/xxxxxx/g; }

$k1 = "<a> xxxxx x x <>";
f($k1);

$k2 = "pu\x{f1}al \x{2022}";
$k2 =~ s/([\200-\377])/"&#".ord($1).";"/ge;
f($k2);
- - - 8< - - - 8< - - -

Notes​:
1. The function call is important, but it doesn't matter if it is built
from a string eval'd sub, an anonymous sub, or a regular sub.
2. The \200-\377 replacement is also important, although I did not test
with the \x{f1} changed to outside that range.
3. The seemingly innocuous $k1 work is required, as well as the contents
of $k1 itself are important.
4. The length of the replacement in f() matters, as well as the /g.

For Perl 5.10.0 from Fedora (perl-5.10.0-56.fc10.i386)​:
Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c",
line 5098 at crash.pl line 5.

For bleadperl in git as of this writing
(6fea91d)​:
perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion
`rx->sublen >= (s - rx->subbeg) + i' failed.

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2009

From lubo.rintel@gooddata.com

Bisected to​:

commit c74340f
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Wed Nov 15 13​:29​:39 2006 +0100

  Re​: [PATCH] Fix RT#19049 and add relative backreferences
  Message-ID​:
<9b18b3110611150329l206e4552w887ae5f0a3f7ca80@​mail.gmail.com>
 
  p4raw-id​: //depot/perl@​29279

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2009

From @avar

On Sun, Apr 19, 2009 at 7​:27 AM, George Greer via RT
<perlbug-followup@​perl.org> wrote​:

I have distilled this assertion failure down to the following program​:

- - - 8< - - - 8< - - -
sub f { $_[0] =~ s/([&<>])/xxxxxx/g; }

$k1 = "<a> xxxxx x x <>";
f($k1);

$k2 = "pu\x{f1}al \x{2022}";
$k2 =~ s/([\200-\377])/"&#".ord($1).";"/ge;
f($k2);
- - - 8< - - - 8< - - -

Notes​:
1. The function call is important, but it doesn't matter if it is built
from a string eval'd sub, an anonymous sub, or a regular sub.
2. The \200-\377 replacement is also important, although I did not test
with the \x{f1} changed to outside that range.
3. The seemingly innocuous $k1 work is required, as well as the contents
of $k1 itself are important.
4. The length of the replacement in f() matters, as well as the /g.

For Perl 5.10.0 from Fedora (perl-5.10.0-56.fc10.i386)​:
Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c",
line 5098 at crash.pl line 5.

For bleadperl in git as of this writing
(6fea91d)​:
perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion
`rx->sublen >= (s - rx->subbeg) + i' failed.

According to git bisect (bisect scripts attached)​:

c74340f is first bad commit
commit c74340f
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Wed Nov 15 13​:29​:39 2006 +0100

  Re​: [PATCH] Fix RT#19049 and add relative backreferences
  Message-ID​: <9b18b3110611150329l206e4552w887ae5f0a3f7ca80@​mail.gmail.com>

  p4raw-id​: //depot/perl@​29279

:040000 040000 83188149c434a6cc022cceedd0d4f022869d37e9
c927650cf235b482e33028a4b0d60eed91e8c11b M ext
:040000 040000 c7f57d9a189a95bc7a6b7fa8b1b68e24f704d8af
702f39622af0427c6426ba58e6762e36f00288bc M pod
:100644 100644 9099194b070773063fc815d9c9b9de68422c1b87
6d916f1ff1fa552ccfa7cba5536fbcbada51ef0c M regcomp.c
:100644 100644 a0637a822f4ff0d41a81dbf743f9eab61f7c2efc
d547ff71a29fb1441d81faa7daab29d38805cf06 M regexec.c
:100644 100644 8d0868229068588c11c9d131d1a4c5462aedce62
d59fa832b26b48ab7b13cc309fd65a2d63ea3d27 M regexp.h
:040000 040000 ad6b74de72e11914a7f96973fba04a4f11781326
38db3cc9445020366949d615834b07dac267787d M t
bisect run success

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2009

From @avar

bisect.sh

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2009

From @avar

bisect.pl

@p5pRT
Copy link
Author

p5pRT commented Apr 19, 2009

From @avar

2009/4/19 Ævar Arnfjörð Bjarmason <avarab@​gmail.com>​:

On Sun, Apr 19, 2009 at 7​:27 AM, George Greer via RT
<perlbug-followup@​perl.org> wrote​:

I have distilled this assertion failure down to the following program​:

- - - 8< - - - 8< - - -
sub f { $_[0] =~ s/([&<>])/xxxxxx/g; }

$k1 = "<a> xxxxx x x <>";
f($k1);

$k2 = "pu\x{f1}al \x{2022}";
$k2 =~ s/([\200-\377])/"&#".ord($1).";"/ge;
f($k2);
- - - 8< - - - 8< - - -

Notes​:
1. The function call is important, but it doesn't matter if it is built
from a string eval'd sub, an anonymous sub, or a regular sub.
2. The \200-\377 replacement is also important, although I did not test
with the \x{f1} changed to outside that range.
3. The seemingly innocuous $k1 work is required, as well as the contents
of $k1 itself are important.
4. The length of the replacement in f() matters, as well as the /g.

For Perl 5.10.0 from Fedora (perl-5.10.0-56.fc10.i386)​:
Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c",
line 5098 at crash.pl line 5.

For bleadperl in git as of this writing
(6fea91d)​:
perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion
`rx->sublen >= (s - rx->subbeg) + i' failed.

According to git bisect (bisect scripts attached)​:

c74340f is first bad commit
commit c74340f
Author​: Yves Orton <demerphq@​gmail.com>
Date​:   Wed Nov 15 13​:29​:39 2006 +0100

   Re​: [PATCH] Fix RT#19049 and add relative backreferences
   Message-ID​: <9b18b3110611150329l206e4552w887ae5f0a3f7ca80@​mail.gmail.com>

   p4raw-id​: //depot/perl@​29279

:040000 040000 83188149c434a6cc022cceedd0d4f022869d37e9
c927650cf235b482e33028a4b0d60eed91e8c11b M  ext
:040000 040000 c7f57d9a189a95bc7a6b7fa8b1b68e24f704d8af
702f39622af0427c6426ba58e6762e36f00288bc M  pod
:100644 100644 9099194b070773063fc815d9c9b9de68422c1b87
6d916f1ff1fa552ccfa7cba5536fbcbada51ef0c M  regcomp.c
:100644 100644 a0637a822f4ff0d41a81dbf743f9eab61f7c2efc
d547ff71a29fb1441d81faa7daab29d38805cf06 M  regexec.c
:100644 100644 8d0868229068588c11c9d131d1a4c5462aedce62
d59fa832b26b48ab7b13cc309fd65a2d63ea3d27 M  regexp.h
:040000 040000 ad6b74de72e11914a7f96973fba04a4f11781326
38db3cc9445020366949d615834b07dac267787d M  t
bisect run success

Hrm, actually that's a different failure, from git bisect output​:

Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c",
line 4775 at /home/avar/src/bisect.pl line 1.
Bisecting​: 455 revisions left to test after this
[5fe1b2e] Swap SVpbm_TAIL and SVpbm_VALID.

[...]

Assertion rx->sublen >= s1 failed​: file "mg.c", line 866 at
/home/avar/src/bisect.pl line 1.
Bisecting​: 227 revisions left to test after this
[cdfeb70] was RE​: Perl_die() /
Perl_croak() From​: "Robin Barker" <Robin.Barker@​npl.co.uk> Message-ID​:
<2C2E01334A940D4792B3\
E115F95B7226149377@​exchsvr1.npl.ad.local>

I.e. the revision I found has an assertion error too, but somewhere
between cdfeb70 and
[5fe1b2e we get a different assertion
error, which is the subject of this thread.

But I'm done bisecting for today :)

@p5pRT
Copy link
Author

p5pRT commented Apr 27, 2009

From @gannett-ggreer

More fun with recursive regex engine? The backtrace​:

- - - 8< - - - 8< - - -
perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion
`rx->sublen >= (s - rx->subbeg) + i' failed.

Program received signal SIGABRT, Aborted.
0x009ab416 in __kernel_vsyscall ()
(gdb) bt
#0 0x009ab416 in __kernel_vsyscall ()
#1 0x004c0460 in raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c​:64
#2 0x004c1e28 in abort () at abort.c​:88
#3 0x004b940e in __assert_fail (
  assertion=0x83252c8 "rx->sublen >= (s - rx->subbeg) + i",
  file=0x8321f9e "regcomp.c", line=5176,
  function=0x83252a9 "Perl_reg_numbered_buff_fetch") at assert.c​:78
#4 0x08105b1f in Perl_reg_numbered_buff_fetch (r=0x83b9da0, paren=1,
  sv=0x83b9eb0) at regcomp.c​:5176
#5 0x08141747 in Perl_magic_get (sv=0x83b9eb0, mg=0x83bf638) at mg.c​:934
#6 0x0813e9c9 in Perl_mg_get (sv=0x83b9eb0) at mg.c​:221
#7 0x0820fc97 in Perl_save_scalar (gv=0x83b9ea0) at scope.c​:206
#8 0x0811dc98 in Perl_save_re_context () at regcomp.c​:9886
#9 0x082b780e in Perl_swash_init (pkg=0x83917bd "utf8", name=0x838d7bc "",
  listsv=0x83b9db0, minbits=1, none=0) at utf8.c​:1687
#10 0x082b2467 in Perl_regclass_swash (prog=0x83bdf48, node=0x83ad53c,
  doinit=1 '\001', listsvp=0x0, altsvp=0xbfffecf4) at regexec.c​:5733
#11 0x082b271b in S_reginclass (prog=0x83bdf48, n=0x83ad53c,
  p=0x83cd429 "▒\200▒", lenp=0x0, do_utf8=1 '\001') at regexec.c​:5792
#12 0x0829a9dc in S_find_byclass (prog=0x83bdf48, c=0x83ad53c,
  s=0x83cd429 "▒\200▒", strend=0x83cd42c "", reginfo=0xbffff0f0)
  at regexec.c​:1204
#13 0x082a22a8 in Perl_regexec_flags (rx=0x83b9da0,
  stringarg=0x83cd423 "241al ▒\200▒", strend=0x83cd42c "",
  strbeg=0x83cd420 "pu&241al ▒\200▒", minend=0, sv=0x83b9e40, data=0x0,
  flags=25) at regexec.c​:2089
#14 0x0817afe6 in Perl_pp_subst () at pp_hot.c​:2330
#15 0x0812da53 in Perl_runops_debug () at dump.c​:1981
#16 0x08086738 in S_run_body (oldscope=1) at perl.c​:2313
#17 0x08085c7b in perl_run (my_perl=0x83a6008) at perl.c​:2233
#18 0x0805e175 in main (argc=2, argv=0xbffff634, env=0xbffff640)
  at perlmain.c​:117
- - - 8< - - - 8< - - -

Some details at the time of the assertion​:

- - - 8< - - - 8< - - -
#4 0x08105b1f in Perl_reg_numbered_buff_fetch (r=0x83b9da0, paren=1,
  sv=0x83b9eb0) at regcomp.c​:5176
5176 assert(rx->sublen >= (s - rx->subbeg) + i );
(gdb) info locals
rx = (struct regexp * const) 0x83bdf48
s = 0x83cd42e "​:\b"
i = 1
s1 = 14
t1 = 15
__PRETTY_FUNCTION__ = "Perl_reg_numbered_buff_fetch"
(gdb) p *rx
$1 = {xnv_u = {xnv_nv = 8.6089369239461098e-311, xgv_stash = 0x0,
  xpad_cop_seq = {xlow = 0, xhigh = 4057}, xbm_s = {xbm_previous = 0,
  xbm_flags = 217 '▒', xbm_rare = 15 '\017'}}, xpv_cur = 16, xpv_len
= 20,
  xiv_u = {xivu_iv = 0, xivu_uv = 0, xivu_p1 = 0x0, xivu_i32 = 0,
  xivu_namehek = 0x0, xivu_hv = 0x0}, xmg_u = {xmg_magic = 0x0,
  xmg_ourstash = 0x0}, xmg_stash = 0x0, engine = 0x8321f00, mother_re
= 0x0,
  extflags = 34603008, minlen = 1, minlenret = 1, gofs = 0,
  substrs = 0x83ad600, nparens = 1, intflags = 0, pprivate = 0x83ad520,
  lastparen = 1, lastcloseparen = 1, swap = 0x83b4c40, offs = 0x83ad670,
  subbeg = 0x83cd420 "pu&241al ▒\200▒", sublen = 12, pre_prefix = 8,
  seen_evals = 0}
(gdb) p *rx->offs
$4 = {start = 14, end = 15}
- - - 8< - - - 8< - - -

Assert​: rx->sublen >= (s - rx->subbeg) + i
rx->sublen = 12
s - rx->subbeg = 14
i = 1
12 >= 14 + 1

Interestingly​:

- - - 8< - - - 8< - - -
(gdb) p *PL_reg_state.re_state_regoffs
$25 = {start = 2, end = 3}
- - - 8< - - - 8< - - -

which is what I would expect since "([&<>])" matches the & in
"pu&241..." there. Last bit of RE debug lines are​:

- - - 8< - - - 8< - - -
Matching REx "([&<>])" against "pu&241al %x{2022}"
UTF-8 string...
Matching stclass ANYOF[&<>][] against "pu&241al %x{2022}" (12 chars)
  2 <pu> <&241al > | 1​:OPEN1(3)
  2 <pu> <&241al > | 3​:ANYOF[&<>][](14)
  3 <pu&> <241al > | 14​:CLOSE1(16)
  3 <pu&> <241al > | 16​:END(0)
Match successful!
Matching REx "([&<>])" against "241al %x{2022}"
UTF-8 string...
Matching stclass ANYOF[&<>][] against "241al %x{2022}" (9 chars)
perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion
`rx->sublen >= (s - rx->subbeg) + i' failed.
- - - 8< - - - 8< - - -

Something not restoring ->offs (or clobbering the saved one) or I'm
confused or both, which is likely since I'm new to digging into Perl
internals. "s" at least points off into la-la land.

@p5pRT
Copy link
Author

p5pRT commented May 20, 2009

From @gannett-ggreer

Attached a guess patch to fix versus blead. I don't guarantee anything
about memory leaks or thoroughness though.

Original bug report test case with blead+patch​:


17​:28​:55 ggreer@​ggreer-l​:~/projects/git/perl$ ./perl -Ilib
-I/home/ggreer/.cpan/build/XML-Twig-3.32-14eqF3/blib/lib
-I/home/ggreer/.cpan/build/XML-Parser-2.36-qYznQt/blib/arch
-I/home/ggreer/.cpan/build/XML-Parser-2.36-qYznQt/blib/lib
/tmp/ffff/parse.pl
parse_pmid​: PMID=17395669
parse_pmid​: PMID=17395670
parse_pmid​: PMID=17483125
Parsing complete.

Original bug report test case with Perl 5.10 (Fedora)​:


17​:29​:06 ggreer@​ggreer-l​:~/projects/git/perl$ perl /tmp/ffff/parse.pl
parse_pmid​: PMID=17395669
parse_pmid​: PMID=17395670
parse_pmid​: PMID=17483125
Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c",
line 5098 at /usr/lib/perl5/vendor_perl/5.10.0/XML/Twig.pm line 7806.
at /tmp/ffff/parse.pl line 76
at /tmp/ffff/parse.pl line 76

My bug report test case with blead+patch​:


17​:29​:14 ggreer@​ggreer-l​:~/projects/git/perl$ ./perl -CS -Ilib
~/tmp/perltestcase.pl
k1 = ....
k2.1 = >•
k2.2 = •

My bug report test case with Perl 5.10 (Fedora)​:


17​:35​:04 ggreer@​ggreer-l​:~/projects/git/perl$ perl -CS ~/tmp/perltestcase.pl
k1 = ....
k2.1 = >•
Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c",
line 5098 at /home/ggreer/tmp/perltestcase.pl line 2.

The above uses a slightly tweaked version of my test case​:
- - - 8< - - - 8< - - -
#!/usr/bin/perl
sub f { $_[0] =~ s/([>X])//g; }

$k1 = "." x 4 . ">>";
f($k1);
print "k1 = $k1\n";

$k2 = "\x{f1}\x{2022}";
$k2 =~ s/([\360-\362])/>/g;
print "k2.1 = $k2\n";
f($k2);
print "k2.2 = $k2\n";
- - - 8< - - - 8< - - -

make test says​:
All tests successful.
u=5.98 s=1.49 cu=578.88 cs=41.05 scripts=1566 tests=221188
(I think it is being generous.)

@p5pRT
Copy link
Author

p5pRT commented May 20, 2009

From @gannett-ggreer

60508.patch
diff --git a/proto.h b/proto.h
index 3f95eb5..35c104e 100644
--- a/proto.h
+++ b/proto.h
@@ -5431,11 +5431,6 @@ STATIC char*	S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, cons
 #define PERL_ARGS_ASSERT_FIND_BYCLASS	\
 	assert(prog); assert(c); assert(s); assert(strend)
 
-STATIC void	S_swap_match_buff(pTHX_ regexp * prog)
-			__attribute__nonnull__(pTHX_1);
-#define PERL_ARGS_ASSERT_SWAP_MATCH_BUFF	\
-	assert(prog)
-
 STATIC void	S_to_utf8_substr(pTHX_ regexp * prog)
 			__attribute__nonnull__(pTHX_1);
 #define PERL_ARGS_ASSERT_TO_UTF8_SUBSTR	\
diff --git a/regcomp.c b/regcomp.c
index e061528..1b7eaee 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -9382,7 +9382,6 @@ Perl_pregfree2(pTHX_ REGEXP *rx)
     if (r->saved_copy)
         SvREFCNT_dec(r->saved_copy);
 #endif
-    Safefree(r->swap);
     Safefree(r->offs);
 }
 
@@ -9441,7 +9440,6 @@ Perl_reg_temp_copy (pTHX_ REGEXP *rx)
     ret->saved_copy = NULL;
 #endif
     ret->mother_re = rx;
-    ret->swap = NULL;
     
     return ret_x;
 }
@@ -9609,10 +9607,6 @@ Perl_re_dup_guts(pTHX_ const REGEXP *sstr, REGEXP *dstr, CLONE_PARAMS *param)
     npar = r->nparens+1;
     Newx(ret->offs, npar, regexp_paren_pair);
     Copy(r->offs, ret->offs, npar, regexp_paren_pair);
-    if(ret->swap) {
-        /* no need to copy these */
-        Newx(ret->swap, npar, regexp_paren_pair);
-    }
 
     if (ret->substrs) {
 	/* Do it this way to avoid reading from *r after the StructCopy().
diff --git a/regexec.c b/regexec.c
index 93fadab..c9358b4 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1734,28 +1734,6 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
 	return s;
 }
 
-static void 
-S_swap_match_buff (pTHX_ regexp *prog)
-{
-    regexp_paren_pair *t;
-
-    PERL_ARGS_ASSERT_SWAP_MATCH_BUFF;
-
-    if (!prog->swap) {
-    /* We have to be careful. If the previous successful match
-       was from this regex we don't want a subsequent paritally
-       successful match to clobber the old results. 
-       So when we detect this possibility we add a swap buffer
-       to the re, and switch the buffer each match. If we fail
-       we switch it back, otherwise we leave it swapped.
-    */
-        Newxz(prog->swap, (prog->nparens + 1), regexp_paren_pair);
-    }
-    t = prog->swap;
-    prog->swap = prog->offs;
-    prog->offs = t;
-}    
-
 
 /*
  - regexec_flags - match a regexp against a string
@@ -1785,7 +1763,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
     I32 multiline;
     RXi_GET_DECL(prog,progi);
     regmatch_info reginfo;  /* create some info to pass to regtry etc */
-    bool swap_on_fail = 0;
+    regexp_paren_pair *swap = NULL;
     GET_RE_DEBUG_FLAGS_DECL;
 
     PERL_ARGS_ASSERT_REGEXEC_FLAGS;
@@ -1863,9 +1841,19 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
 	    reginfo.ganch = strbeg;
     }
     if (PL_curpm && (PM_GETRE(PL_curpm) == rx)) {
-        swap_on_fail = 1;
-        swap_match_buff(prog); /* do we need a save destructor here for
-                                  eval dies? */
+        regexp_paren_pair *t;
+        /* We have to be careful. If the previous successful match
+           was from this regex we don't want a subsequent partially
+           successful match to clobber the old results. 
+           So when we detect this possibility we add a swap buffer
+           to the re, and switch the buffer each match. If we fail
+           we switch it back, otherwise we leave it swapped.
+        */
+        /* do we need a save destructor here for eval dies? */
+        Newxz(swap, (prog->nparens + 1), regexp_paren_pair);
+        t = prog->offs;
+        prog->offs = swap;
+        swap = t;
     }
     if (!(flags & REXEC_CHECKED) && (prog->check_substr != NULL || prog->check_utf8 != NULL)) {
 	re_scream_pos_data d;
@@ -2166,6 +2154,9 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
 got_it:
     RX_MATCH_TAINTED_set(rx, PL_reg_flags & RF_tainted);
 
+    /* Keep successful match buffer rather than the original one. */
+    Safefree(swap);
+
     if (PL_reg_eval_set)
 	restore_pos(aTHX_ prog);
     if (RXp_PAREN_NAMES(prog)) 
@@ -2209,9 +2200,13 @@ phooey:
 			  PL_colors[4], PL_colors[5]));
     if (PL_reg_eval_set)
 	restore_pos(aTHX_ prog);
-    if (swap_on_fail) 
+    if (swap)  {
+        regexp_paren_pair *t;
         /* we failed :-( roll it back */
-        swap_match_buff(prog);
+        t = prog->offs;
+        prog->offs = swap;
+        Safefree(t);
+    }
     
     return 0;
 }
diff --git a/regexp.h b/regexp.h
index e8a8cc0..7a2e917 100644
--- a/regexp.h
+++ b/regexp.h
@@ -88,7 +88,6 @@ typedef struct regexp_paren_pair {
 	/* during matching */						\
 	U32 lastparen;			/* last open paren matched */	\
 	U32 lastcloseparen;		/* last close paren matched */	\
-	regexp_paren_pair *swap;	/* Swap copy of *offs */	\
 	/* Array of offsets for (@-) and (@+) */			\
 	regexp_paren_pair *offs;					\
 	/* saved or original string so \digit works forever. */		\

@p5pRT
Copy link
Author

p5pRT commented May 28, 2009

From @nwc10

Dave notes​:

looks like a regression in 5.10.0, maint, blead.

Nick notes​:

There is a patch for review.

@p5pRT
Copy link
Author

p5pRT commented May 28, 2009

From @gannett-ggreer

On Thu May 28 07​:57​:21 2009, nicholas wrote​:

Nick notes​:

There is a patch for review.

I'm revising the patch to include more areas that need to be removed to
completely get rid of ->swap (such as pod entries) and also add the
test, but I suspect I'll need to review the pre-existing comment about
leaking memory on an eval die to be complete.

@p5pRT
Copy link
Author

p5pRT commented May 29, 2009

From @gannett-ggreer

Sent a revised patch to p5p although since I recreated it on a different
machine rather than edit the old patch, I forgot to remove it from the
regexp.h struct and so a couple vestigial parts are left. I'll re-edit
and attach to this ticket when finished.

@p5pRT
Copy link
Author

p5pRT commented Jun 21, 2009

From @rgs

Link to my last comment on P5P :
http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-06/msg00643.html

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2009

From @gannett-ggreer

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

@p5pRT
Copy link
Author

p5pRT commented Jul 12, 2009

From @schwern

Looks like this is hung waiting for review of George's last patch.
http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

I've written up a test for it. I couldn't fathom what file to put it in
or even enough about what its about to come up with a good name for it,
so its just named reg_60508.t. Feel free to change.

There's also a small patch to fresh_perl_is(). It already strips the
trailing newlines off of the results to normalize them (VMS is
unpredictable about that) but it doesn't off the expected results. This
is surprising for the test author who's unaware of the stripping, so
fresh_perl_is() now strips the expected results to match.

@p5pRT
Copy link
Author

p5pRT commented Jul 12, 2009

From @schwern

0001-Make-fresh_perl_is-strip-newlines-off-the-expected.patch
From 326ac6e9382ab3a8a33ccb56c0f8b7e20d6043c3 Mon Sep 17 00:00:00 2001
From: Michael G. Schwern <schwern@pobox.com>
Date: Sun, 12 Jul 2009 02:30:26 -0700
Subject: [PATCH 1/2] Make fresh_perl_is() strip newlines off the expected result just like it does the result so tests don't weirdly fail just because the author didn't realize it was normalizing newlines.

---
 t/test.pl |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/t/test.pl b/t/test.pl
index 32c4a37..4b2161f 100644
--- a/t/test.pl
+++ b/t/test.pl
@@ -681,7 +681,7 @@ sub _fresh_perl {
     my $status = $?;
 
     # Clean up the results into something a bit more predictable.
-    $results =~ s/\n+$//;
+    $results  =~ s/\n+$//;
     $results =~ s/at\s+$::tempfile_regexp\s+line/at - line/g;
     $results =~ s/of\s+$::tempfile_regexp\s+aborted/of - aborted/g;
 
@@ -722,6 +722,11 @@ sub _fresh_perl {
 
 sub fresh_perl_is {
     my($prog, $expected, $runperl_args, $name) = @_;
+
+    # _fresh_perl() is going to clip the trailing newlines off the result.
+    # This will make it so the test author doesn't have to know that.
+    $expected =~ s/\n+$//;
+
     local $Level = 2;
     _fresh_perl($prog,
 		sub { @_ ? $_[0] eq $expected : $expected },
-- 
1.6.2.4

@p5pRT
Copy link
Author

p5pRT commented Jul 12, 2009

From @schwern

0002-This-is-a-test-for-rt.cpan.org-60508-which-I-can-t-f.patch
From cec9ff832f6337e3ec1492742431f7813cfacaac Mon Sep 17 00:00:00 2001
From: Michael G. Schwern <schwern@pobox.com>
Date: Sun, 12 Jul 2009 02:33:47 -0700
Subject: [PATCH 2/2] This is a test for rt.cpan.org 60508 which I can't figure out where else
 to put it or what the underlying problem is, but it has to go somewhere.

---
 t/op/reg_60508.t |   40 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 40 insertions(+), 0 deletions(-)
 create mode 100644 t/op/reg_60508.t

diff --git a/t/op/reg_60508.t b/t/op/reg_60508.t
new file mode 100644
index 0000000..96a4fef
--- /dev/null
+++ b/t/op/reg_60508.t
@@ -0,0 +1,40 @@
+#!./perl
+
+# This is a test for rt.cpan.org 60508 which I can't figure out where else
+# to put it or what the underlying problem is, but it has to go somewhere.
+# --Schwern
+
+BEGIN {
+    chdir 't' if -d 't';
+    @INC = '../lib';
+    require './test.pl';
+}
+
+use utf8;
+plan tests => 1;
+
+{
+    my $expect = <<"EXPECT";
+k1 = ....
+k2.1 = >\x{2022}
+k2.2 = \x{2022}
+EXPECT
+    utf8::encode($expect);
+
+    local $TODO = "rt.cpan.org 60508";
+
+    fresh_perl_is(<<'CODE', $expect, {});
+binmode STDOUT, ":utf8";
+sub f { $_[0] =~ s/([>X])//g; }
+
+$k1 = "." x 4 . ">>";
+f($k1);
+print "k1 = $k1\n";
+
+$k2 = "\x{f1}\x{2022}";
+$k2 =~ s/([\360-\362])/>/g;
+print "k2.1 = $k2\n";
+f($k2);
+print "k2.2 = $k2\n";
+CODE
+}
-- 
1.6.2.4

@p5pRT
Copy link
Author

p5pRT commented Jul 12, 2009

From @gannett-ggreer

On Sun Jul 12 02​:59​:44 2009, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

Which reminded me to go make the 3rd revision of the patch which updates
the documentation and the two places that free/NULL that I missed in the
2nd revision because I hadn't (temporarily) removed the "swap" struct
definition to find lingering uses.

I've written up a test for it. I couldn't fathom what file to put it in
or even enough about what its about to come up with a good name for it,
so its just named reg_60508.t. Feel free to change.

It is probably better to use fresh_perl_is() like yours, but take a look
at the test I have in my patch. Figuring out how to get it to fail
without -DEBUGGING would be nice, since it relies on the assert() in the
code.

-George

@p5pRT
Copy link
Author

p5pRT commented Jul 12, 2009

From @gannett-ggreer

0001-Commit-c74340f9-added-backreferences-as-well-as-the.patch
From 57b3ac303670d51e25c82940e6cfa0ff0311bce3 Mon Sep 17 00:00:00 2001
From: George Greer <greerga@m-l.org>
Date: Sun, 12 Jul 2009 14:53:29 -0400
Subject: [PATCH] Commit c74340f9 added backreferences as well as the idea of a ->swap regex
 pointer to keep track of the match offsets in case of backtracking. The
 problem is that when Perl re-enters the regex engine to handle
 utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture
 from the utf8 (Perl) code could inadvertently modify the regex match data
 that caused the utf8 swash to get built.

---
 embed.fnc                        |    1 -
 embed.h                          |    2 -
 ext/Devel-PPPort/parts/embed.fnc |    1 -
 pod/perlreapi.pod                |    2 +-
 pod/perlreguts.pod               |   13 +++++-----
 proto.h                          |    5 ----
 regcomp.c                        |    2 -
 regexec.c                        |   46 ++++++++++++++------------------------
 regexp.h                         |    2 +-
 t/op/pat.t                       |   20 +++++++++++++++-
 10 files changed, 44 insertions(+), 50 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index 3ff1b89..6d89a40 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1678,7 +1678,6 @@ ERsn	|U8*	|reghop4	|NN U8 *s|I32 off|NN const U8 *llim \
 #endif
 ERsn	|U8*	|reghopmaybe3	|NN U8 *s|I32 off|NN const U8 *lim
 ERs	|char*	|find_byclass	|NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo
-Es	|void	|swap_match_buff|NN regexp * prog
 Es	|void	|to_utf8_substr	|NN regexp * prog
 Es	|void	|to_byte_substr	|NN regexp * prog
 ERs	|I32	|reg_check_named_buff_matched	|NN const regexp *rex \
diff --git a/embed.h b/embed.h
index 6f6877f..67c6fd2 100644
--- a/embed.h
+++ b/embed.h
@@ -1468,7 +1468,6 @@
 #if defined(PERL_CORE) || defined(PERL_EXT)
 #define reghopmaybe3		S_reghopmaybe3
 #define find_byclass		S_find_byclass
-#define swap_match_buff		S_swap_match_buff
 #define to_utf8_substr		S_to_utf8_substr
 #define to_byte_substr		S_to_byte_substr
 #define reg_check_named_buff_matched	S_reg_check_named_buff_matched
@@ -3814,7 +3813,6 @@
 #if defined(PERL_CORE) || defined(PERL_EXT)
 #define reghopmaybe3		S_reghopmaybe3
 #define find_byclass(a,b,c,d,e)	S_find_byclass(aTHX_ a,b,c,d,e)
-#define swap_match_buff(a)	S_swap_match_buff(aTHX_ a)
 #define to_utf8_substr(a)	S_to_utf8_substr(aTHX_ a)
 #define to_byte_substr(a)	S_to_byte_substr(aTHX_ a)
 #define reg_check_named_buff_matched(a,b)	S_reg_check_named_buff_matched(aTHX_ a,b)
diff --git a/ext/Devel-PPPort/parts/embed.fnc b/ext/Devel-PPPort/parts/embed.fnc
index 68f3817..48cb9f3 100644
--- a/ext/Devel-PPPort/parts/embed.fnc
+++ b/ext/Devel-PPPort/parts/embed.fnc
@@ -1677,7 +1677,6 @@ ERsn	|U8*	|reghop4	|NN U8 *s|I32 off|NN const U8 *llim \
 #endif
 ERsn	|U8*	|reghopmaybe3	|NN U8 *s|I32 off|NN const U8 *lim
 ERs	|char*	|find_byclass	|NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo
-Es	|void	|swap_match_buff|NN regexp * prog
 Es	|void	|to_utf8_substr	|NN regexp * prog
 Es	|void	|to_byte_substr	|NN regexp * prog
 ERs	|I32	|reg_check_named_buff_matched	|NN const regexp *rex \
diff --git a/pod/perlreapi.pod b/pod/perlreapi.pod
index b0d6275..03996fd 100644
--- a/pod/perlreapi.pod
+++ b/pod/perlreapi.pod
@@ -598,7 +598,7 @@ engine should use something else.
 
 =head2 C<swap>
 
-TODO: document
+Unused. Left in for compatibility with perl 5.10.0.
 
 =head2 C<offs>
 
diff --git a/pod/perlreguts.pod b/pod/perlreguts.pod
index 2049931..b9f306a 100644
--- a/pod/perlreguts.pod
+++ b/pod/perlreguts.pod
@@ -810,13 +810,12 @@ value to other engine implementations.
 
 =item C<swap>
 
-C<swap> is an extra set of startp/endp stored in a C<regexp_paren_ofs>
-struct. This is used when the last successful match was from the same pattern
-as the current pattern, so that a partial match doesn't overwrite the
-previous match's results. When this field is data filled the matching
-engine will swap buffers before every match attempt. If the match fails,
-then it swaps them back. If it's successful it leaves them. This field
-is populated on demand and is by default null.
+C<swap> formerly was an extra set of startp/endp stored in a 
+C<regexp_paren_ofs> struct. This was used when the last successful match 
+was from the same pattern as the current pattern, so that a partial 
+match didn't overwrite the previous match's results, but it caused a 
+problem with re-entrant code such as trying to build the UTF-8 swashes.  
+Currently unused and left for backward compatibility with 5.10.0.
 
 =item C<offsets>
 
diff --git a/proto.h b/proto.h
index 427600e..37d1371 100644
--- a/proto.h
+++ b/proto.h
@@ -5434,11 +5434,6 @@ STATIC char*	S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, cons
 #define PERL_ARGS_ASSERT_FIND_BYCLASS	\
 	assert(prog); assert(c); assert(s); assert(strend)
 
-STATIC void	S_swap_match_buff(pTHX_ regexp * prog)
-			__attribute__nonnull__(pTHX_1);
-#define PERL_ARGS_ASSERT_SWAP_MATCH_BUFF	\
-	assert(prog)
-
 STATIC void	S_to_utf8_substr(pTHX_ regexp * prog)
 			__attribute__nonnull__(pTHX_1);
 #define PERL_ARGS_ASSERT_TO_UTF8_SUBSTR	\
diff --git a/regcomp.c b/regcomp.c
index 50b0632..41211cf 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -9414,7 +9414,6 @@ Perl_pregfree2(pTHX_ REGEXP *rx)
     if (r->saved_copy)
         SvREFCNT_dec(r->saved_copy);
 #endif
-    Safefree(r->swap);
     Safefree(r->offs);
 }
 
@@ -9473,7 +9472,6 @@ Perl_reg_temp_copy (pTHX_ REGEXP *rx)
     ret->saved_copy = NULL;
 #endif
     ret->mother_re = rx;
-    ret->swap = NULL;
     
     return ret_x;
 }
diff --git a/regexec.c b/regexec.c
index 93fadab..dc2a01b 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1734,28 +1734,6 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
 	return s;
 }
 
-static void 
-S_swap_match_buff (pTHX_ regexp *prog)
-{
-    regexp_paren_pair *t;
-
-    PERL_ARGS_ASSERT_SWAP_MATCH_BUFF;
-
-    if (!prog->swap) {
-    /* We have to be careful. If the previous successful match
-       was from this regex we don't want a subsequent paritally
-       successful match to clobber the old results. 
-       So when we detect this possibility we add a swap buffer
-       to the re, and switch the buffer each match. If we fail
-       we switch it back, otherwise we leave it swapped.
-    */
-        Newxz(prog->swap, (prog->nparens + 1), regexp_paren_pair);
-    }
-    t = prog->swap;
-    prog->swap = prog->offs;
-    prog->offs = t;
-}    
-
 
 /*
  - regexec_flags - match a regexp against a string
@@ -1785,7 +1763,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
     I32 multiline;
     RXi_GET_DECL(prog,progi);
     regmatch_info reginfo;  /* create some info to pass to regtry etc */
-    bool swap_on_fail = 0;
+    regexp_paren_pair *swap = NULL;
     GET_RE_DEBUG_FLAGS_DECL;
 
     PERL_ARGS_ASSERT_REGEXEC_FLAGS;
@@ -1863,9 +1841,16 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
 	    reginfo.ganch = strbeg;
     }
     if (PL_curpm && (PM_GETRE(PL_curpm) == rx)) {
-        swap_on_fail = 1;
-        swap_match_buff(prog); /* do we need a save destructor here for
-                                  eval dies? */
+        /* We have to be careful. If the previous successful match
+           was from this regex we don't want a subsequent partially
+           successful match to clobber the old results.
+           So when we detect this possibility we add a swap buffer
+           to the re, and switch the buffer each match. If we fail
+           we switch it back, otherwise we leave it swapped.
+        */
+        swap = prog->offs;
+        /* do we need a save destructor here for eval dies? */
+        Newxz(prog->offs, (prog->nparens + 1), regexp_paren_pair);
     }
     if (!(flags & REXEC_CHECKED) && (prog->check_substr != NULL || prog->check_utf8 != NULL)) {
 	re_scream_pos_data d;
@@ -2164,6 +2149,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
     goto phooey;
 
 got_it:
+    Safefree(swap);
     RX_MATCH_TAINTED_set(rx, PL_reg_flags & RF_tainted);
 
     if (PL_reg_eval_set)
@@ -2209,10 +2195,12 @@ phooey:
 			  PL_colors[4], PL_colors[5]));
     if (PL_reg_eval_set)
 	restore_pos(aTHX_ prog);
-    if (swap_on_fail) 
+    if (swap) {
         /* we failed :-( roll it back */
-        swap_match_buff(prog);
-    
+        Safefree(prog->offs);
+        prog->offs = swap;
+    }
+
     return 0;
 }
 
diff --git a/regexp.h b/regexp.h
index e8a8cc0..6bac7e6 100644
--- a/regexp.h
+++ b/regexp.h
@@ -88,7 +88,7 @@ typedef struct regexp_paren_pair {
 	/* during matching */						\
 	U32 lastparen;			/* last open paren matched */	\
 	U32 lastcloseparen;		/* last close paren matched */	\
-	regexp_paren_pair *swap;	/* Swap copy of *offs */	\
+	regexp_paren_pair *swap;	/* Unused: 5.10.1 and later */	\
 	/* Array of offsets for (@-) and (@+) */			\
 	regexp_paren_pair *offs;					\
 	/* saved or original string so \digit works forever. */		\
diff --git a/t/op/pat.t b/t/op/pat.t
index 62ca4b2..53b4477 100644
--- a/t/op/pat.t
+++ b/t/op/pat.t
@@ -13,7 +13,7 @@ sub run_tests;
 
 $| = 1;
 
-my $EXPECTED_TESTS = 4061;  # Update this when adding/deleting tests.
+my $EXPECTED_TESTS = 4062;  # Update this when adding/deleting tests.
 
 BEGIN {
     chdir 't' if -d 't';
@@ -4346,6 +4346,24 @@ sub run_tests {
             iseq($str, "\$1 = undef, \$2 = undef, \$3 = undef, \$4 = undef, \$5 = undef, \$^R = undef");
        }
     }
+
+    # This only works under -DEBUGGING because it relies on an assert().
+    {
+        local $BugId = '60508';
+	local $Message = "Check capture offset re-entrancy of utf8 code.";
+
+        sub fswash { $_[0] =~ s/([>X])//g; }
+
+        my $k1 = "." x 4 . ">>";
+        fswash($k1);
+
+        my $k2 = "\x{f1}\x{2022}";
+        $k2 =~ s/([\360-\362])/>/g;
+        fswash($k2);
+
+        iseq($k2, "\x{2022}", "utf8::SWASHNEW doesn't cause capture leaks");
+    }
+
     #
     # This should be the last test.
     #
-- 
1.6.0.4

@p5pRT
Copy link
Author

p5pRT commented Jul 12, 2009

From @schwern

George Greer via RT wrote​:

On Sun Jul 12 02​:59​:44 2009, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

Which reminded me to go make the 3rd revision of the patch which updates
the documentation and the two places that free/NULL that I missed in the
2nd revision because I hadn't (temporarily) removed the "swap" struct
definition to find lingering uses.

Thanks.

I've written up a test for it. I couldn't fathom what file to put it in
or even enough about what its about to come up with a good name for it,
so its just named reg_60508.t. Feel free to change.

It is probably better to use fresh_perl_is() like yours, but take a look
at the test I have in my patch. Figuring out how to get it to fail
without -DEBUGGING would be nice, since it relies on the assert() in the
code.

What, in English or Perl, does that assert failing indicate has gone wrong?

--
'All anyone gets in a mirror is themselves,' she said. 'But what you
gets in a good gumbo is everything.'
  -- "Witches Abroad" by Terry Prachett

@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2009

From @greerga

On Sun, 12 Jul 2009, Michael G Schwern wrote​:

George Greer via RT wrote​:

It is probably better to use fresh_perl_is() like yours, but take a look
at the test I have in my patch. Figuring out how to get it to fail
without -DEBUGGING would be nice, since it relies on the assert() in the
code.

What, in English or Perl, does that assert failing indicate has gone wrong?

It means the paren-capture offsets point outside the match string. For
example, one of the assertion failures in the ticket​:

(gdb) info locals
  s = 0x83cd42e "​:\b"
  i = 1
  s1 = 14
  t1 = 15
gdb) p *rx
  ...
  subbeg = 0x83cd420 "pu&241al ▒\200▒", sublen = 12
  ...

which results in​:

Assert​: rx->sublen >= (s - rx->subbeg) + i
  rx->sublen = 12
  s - rx->subbeg = 14
  i = 1
  12 >= 14 + 1

In the case above, the utf8 SWASHNEW Perl code is where the lingering
match came from and has nothing to do with the user match that the code
was trying to do.

Unfortunately the assert only notices if the capture is later in the
string than the current match target, so there are cases that
(theoretically) garbage could result.

--
George Greer

@p5pRT
Copy link
Author

p5pRT commented Jul 22, 2009

From @demerphq

On Sun Jul 12 02​:59​:44 2009, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

I've written up a test for it. I couldn't fathom what file to put it in
or even enough about what its about to come up with a good name for it,
so its just named reg_60508.t. Feel free to change.

There's also a small patch to fresh_perl_is(). It already strips the
trailing newlines off of the results to normalize them (VMS is
unpredictable about that) but it doesn't off the expected results. This
is surprising for the test author who's unaware of the stripping, so
fresh_perl_is() now strips the expected results to match.

Thanks. Applied.

@p5pRT
Copy link
Author

p5pRT commented Jul 26, 2009

From @demerphq

2009/7/12 George Greer via RT <perlbug-followup@​perl.org>​:

On Sun Jul 12 02​:59​:44 2009, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

Which reminded me to go make the 3rd revision of the patch which updates
the documentation and the two places that free/NULL that I missed in the
2nd revision because I hadn't (temporarily) removed the "swap" struct
definition to find lingering uses.

I've written up a test for it.  I couldn't fathom what file to put it in
or even enough about what its about to come up with a good name for it,
so its just named reg_60508.t.  Feel free to change.

It is probably better to use fresh_perl_is() like yours, but take a look
at the test I have in my patch.  Figuring out how to get it to fail
without -DEBUGGING would be nice, since it relies on the assert() in the
code.

Hi, sorry it took so long to get to this, but I have applied it as

7a68ade9729c0afc78c5f9cbadc5c77928cfedb8

with some minor commit message munging.

Thanks a lot! I think your solution makes more sense than mine did.

Cheers
yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jul 27, 2009

From @iabyn

On Sun, Jul 26, 2009 at 11​:32​:27PM +0200, demerphq wrote​:

Hi, sorry it took so long to get to this, but I have applied it as

7a68ade9729c0afc78c5f9cbadc5c77928cfedb8

With 03fbf60b09ed44cc7d2e021c979a259636ae4488, I've backed out the change
to ext/Devel-PPPort/parts/embed.fnc, since this is a dual-life distro
whose changes should be made upstream first (CCing marcus).

--
A power surge on the Bridge is rapidly and correctly diagnosed as a faulty
capacitor by the highly-trained and competent engineering staff.
  -- Things That Never Happen in "Star Trek" #9

@p5pRT
Copy link
Author

p5pRT commented Jul 27, 2009

From @iabyn

On Sun, Jul 26, 2009 at 11​:32​:27PM +0200, demerphq wrote​:

2009/7/12 George Greer via RT <perlbug-followup@​perl.org>​:

On Sun Jul 12 02​:59​:44 2009, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

Which reminded me to go make the 3rd revision of the patch which updates
the documentation and the two places that free/NULL that I missed in the
2nd revision because I hadn't (temporarily) removed the "swap" struct
definition to find lingering uses.

I've written up a test for it.  I couldn't fathom what file to put it in
or even enough about what its about to come up with a good name for it,
so its just named reg_60508.t.  Feel free to change.

It is probably better to use fresh_perl_is() like yours, but take a look
at the test I have in my patch.  Figuring out how to get it to fail
without -DEBUGGING would be nice, since it relies on the assert() in the
code.

Hi, sorry it took so long to get to this, but I have applied it as

7a68ade9729c0afc78c5f9cbadc5c77928cfedb8

I'm having great difficulty pulling this into maint; specifically the
regexec.c part, due the changes in blead (but not maint) that made REGEXPs
into SVs; for example, the maint​:

  Perl_regexec_flags(pTHX_ REGEXP * const prog, ...

becomes the blead​:

  Perl_regexec_flags(pTHX_ REGEXP * const rx, ...
  ...
  struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and
sometimes to prog, and I think it would require someone with a clearer
head than what mine is at the moment to work it all out :-(

--
"Do not dabble in paradox, Edward, it puts you in danger of fortuitous wit."
  -- Lady Croom, "Arcadia"

@p5pRT
Copy link
Author

p5pRT commented Jul 27, 2009

From @demerphq

2009/7/27 Dave Mitchell <davem@​iabyn.com>​:

On Sun, Jul 26, 2009 at 11​:32​:27PM +0200, demerphq wrote​:

2009/7/12 George Greer via RT <perlbug-followup@​perl.org>​:

On Sun Jul 12 02​:59​:44 2009, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

Which reminded me to go make the 3rd revision of the patch which updates
the documentation and the two places that free/NULL that I missed in the
2nd revision because I hadn't (temporarily) removed the "swap" struct
definition to find lingering uses.

I've written up a test for it.  I couldn't fathom what file to put it in
or even enough about what its about to come up with a good name for it,
so its just named reg_60508.t.  Feel free to change.

It is probably better to use fresh_perl_is() like yours, but take a look
at the test I have in my patch.  Figuring out how to get it to fail
without -DEBUGGING would be nice, since it relies on the assert() in the
code.

Hi, sorry it took so long to get to this, but I have applied it as

7a68ade9729c0afc78c5f9cbadc5c77928cfedb8

I'm having great difficulty pulling this into maint; specifically the
regexec.c part, due the changes in blead (but not maint) that made REGEXPs
into SVs; for example, the maint​:

   Perl_regexec_flags(pTHX_ REGEXP * const prog, ...

becomes the blead​:

   Perl_regexec_flags(pTHX_ REGEXP * const rx, ...
   ...
   struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and
sometimes to prog, and I think it would require someone with a clearer
head than what mine is at the moment to work it all out :-(

Ill give it a try tomorrow if George doesnt beat me to it.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Aug 3, 2009

From @greerga

On Mon, 27 Jul 2009, Dave Mitchell wrote​:

I'm having great difficulty pulling this into maint; specifically the
regexec.c part, due the changes in blead (but not maint) that made REGEXPs
into SVs; for example, the maint​:

Perl_regexec_flags(pTHX_ REGEXP * const prog, ...

becomes the blead​:

Perl_regexec_flags(pTHX_ REGEXP * const rx, ...
...
struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and
sometimes to prog, and I think it would require someone with a clearer
head than what mine is at the moment to work it all out :-(

Doing a straight port and fixing up the conflicts passes the tests
(perhaps by luck?). I've attached the patch, which has the commit message
edited with a proper 1-line summary like the blead version did.

It has the ext/Devel-PPPort/ part again though.

--
George Greer

@p5pRT
Copy link
Author

p5pRT commented Aug 3, 2009

From @greerga

0001-much-better-swap-logic-to-support-reentrancy-and-fix.patch
From 8529230764704158520dab3daf6d8ac92fb46e06 Mon Sep 17 00:00:00 2001
From: George Greer <perl@greerga.m-l.org>
Date: Sun, 2 Aug 2009 20:05:09 -0400
Subject: [PATCH] much better swap logic to support reentrancy and fix assert failure

Commit c74340f9 added backreferences as well as the idea of a ->swap regex
pointer to keep track of the match offsets in case of backtracking. The
problem is that when Perl re-enters the regex engine to handle
utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture
from the utf8 (Perl) code could inadvertently modify the regex match data
that caused the utf8 swash to get built.
---
 embed.fnc                        |    1 -
 embed.h                          |    2 -
 ext/Devel-PPPort/parts/embed.fnc |    1 -
 pod/perlreapi.pod                |    2 +-
 pod/perlreguts.pod               |   13 +++++-----
 proto.h                          |    5 ----
 regcomp.c                        |    6 -----
 regexec.c                        |   46 ++++++++++++++------------------------
 regexp.h                         |    2 +-
 t/op/pat.t                       |   20 +++++++++++++++-
 10 files changed, 44 insertions(+), 54 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index 6cddb88..168aa93 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1658,7 +1658,6 @@ ERsn	|U8*	|reghop4	|NN U8 *s|I32 off|NN const U8 *llim \
 #endif
 ERsn	|U8*	|reghopmaybe3	|NN U8 *s|I32 off|NN const U8 *lim
 ERs	|char*	|find_byclass	|NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo
-Es	|void	|swap_match_buff|NN regexp * prog
 Es	|void	|to_utf8_substr	|NN regexp * prog
 Es	|void	|to_byte_substr	|NN regexp * prog
 ERs	|I32	|reg_check_named_buff_matched	|NN const regexp *rex \
diff --git a/embed.h b/embed.h
index e968707..5ec3a05 100644
--- a/embed.h
+++ b/embed.h
@@ -1452,7 +1452,6 @@
 #if defined(PERL_CORE) || defined(PERL_EXT)
 #define reghopmaybe3		S_reghopmaybe3
 #define find_byclass		S_find_byclass
-#define swap_match_buff		S_swap_match_buff
 #define to_utf8_substr		S_to_utf8_substr
 #define to_byte_substr		S_to_byte_substr
 #define reg_check_named_buff_matched	S_reg_check_named_buff_matched
@@ -3783,7 +3782,6 @@
 #if defined(PERL_CORE) || defined(PERL_EXT)
 #define reghopmaybe3		S_reghopmaybe3
 #define find_byclass(a,b,c,d,e)	S_find_byclass(aTHX_ a,b,c,d,e)
-#define swap_match_buff(a)	S_swap_match_buff(aTHX_ a)
 #define to_utf8_substr(a)	S_to_utf8_substr(aTHX_ a)
 #define to_byte_substr(a)	S_to_byte_substr(aTHX_ a)
 #define reg_check_named_buff_matched(a,b)	S_reg_check_named_buff_matched(aTHX_ a,b)
diff --git a/ext/Devel-PPPort/parts/embed.fnc b/ext/Devel-PPPort/parts/embed.fnc
index 68f3817..48cb9f3 100644
--- a/ext/Devel-PPPort/parts/embed.fnc
+++ b/ext/Devel-PPPort/parts/embed.fnc
@@ -1677,7 +1677,6 @@ ERsn	|U8*	|reghop4	|NN U8 *s|I32 off|NN const U8 *llim \
 #endif
 ERsn	|U8*	|reghopmaybe3	|NN U8 *s|I32 off|NN const U8 *lim
 ERs	|char*	|find_byclass	|NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo
-Es	|void	|swap_match_buff|NN regexp * prog
 Es	|void	|to_utf8_substr	|NN regexp * prog
 Es	|void	|to_byte_substr	|NN regexp * prog
 ERs	|I32	|reg_check_named_buff_matched	|NN const regexp *rex \
diff --git a/pod/perlreapi.pod b/pod/perlreapi.pod
index b0d6275..03996fd 100644
--- a/pod/perlreapi.pod
+++ b/pod/perlreapi.pod
@@ -598,7 +598,7 @@ engine should use something else.
 
 =head2 C<swap>
 
-TODO: document
+Unused. Left in for compatibility with perl 5.10.0.
 
 =head2 C<offs>
 
diff --git a/pod/perlreguts.pod b/pod/perlreguts.pod
index 2049931..b9f306a 100644
--- a/pod/perlreguts.pod
+++ b/pod/perlreguts.pod
@@ -810,13 +810,12 @@ value to other engine implementations.
 
 =item C<swap>
 
-C<swap> is an extra set of startp/endp stored in a C<regexp_paren_ofs>
-struct. This is used when the last successful match was from the same pattern
-as the current pattern, so that a partial match doesn't overwrite the
-previous match's results. When this field is data filled the matching
-engine will swap buffers before every match attempt. If the match fails,
-then it swaps them back. If it's successful it leaves them. This field
-is populated on demand and is by default null.
+C<swap> formerly was an extra set of startp/endp stored in a 
+C<regexp_paren_ofs> struct. This was used when the last successful match 
+was from the same pattern as the current pattern, so that a partial 
+match didn't overwrite the previous match's results, but it caused a 
+problem with re-entrant code such as trying to build the UTF-8 swashes.  
+Currently unused and left for backward compatibility with 5.10.0.
 
 =item C<offsets>
 
diff --git a/proto.h b/proto.h
index ccb9eb3..835aa3b 100644
--- a/proto.h
+++ b/proto.h
@@ -5448,11 +5448,6 @@ STATIC char*	S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, cons
 #define PERL_ARGS_ASSERT_FIND_BYCLASS	\
 	assert(prog); assert(c); assert(s); assert(strend)
 
-STATIC void	S_swap_match_buff(pTHX_ regexp * prog)
-			__attribute__nonnull__(pTHX_1);
-#define PERL_ARGS_ASSERT_SWAP_MATCH_BUFF	\
-	assert(prog)
-
 STATIC void	S_to_utf8_substr(pTHX_ regexp * prog)
 			__attribute__nonnull__(pTHX_1);
 #define PERL_ARGS_ASSERT_TO_UTF8_SUBSTR	\
diff --git a/regcomp.c b/regcomp.c
index 49e69b2..372c680 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -9361,7 +9361,6 @@ Perl_pregfree(pTHX_ REGEXP *r)
     if (r->saved_copy)
         SvREFCNT_dec(r->saved_copy);
 #endif
-    Safefree(r->swap);
     Safefree(r->offs);
     Safefree(r);
 }
@@ -9413,7 +9412,6 @@ Perl_reg_temp_copy (pTHX_ REGEXP *r) {
     ret->saved_copy = NULL;
 #endif
     ret->mother_re = r; 
-    ret->swap = NULL;
     
     return ret;
 }
@@ -9588,10 +9586,6 @@ Perl_re_dup(pTHX_ const regexp *r, CLONE_PARAMS *param)
     StructCopy(r, ret, regexp);
     Newx(ret->offs, npar, regexp_paren_pair);
     Copy(r->offs, ret->offs, npar, regexp_paren_pair);
-    if(ret->swap) {
-        /* no need to copy these */
-        Newx(ret->swap, npar, regexp_paren_pair);
-    }
 
     if (ret->substrs) {
 	/* Do it this way to avoid reading from *r after the StructCopy().
diff --git a/regexec.c b/regexec.c
index 7a42c4f..3dda67b 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1733,28 +1733,6 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
 	return s;
 }
 
-static void 
-S_swap_match_buff (pTHX_ regexp *prog)
-{
-    regexp_paren_pair *t;
-
-    PERL_ARGS_ASSERT_SWAP_MATCH_BUFF;
-
-    if (!prog->swap) {
-    /* We have to be careful. If the previous successful match
-       was from this regex we don't want a subsequent paritally
-       successful match to clobber the old results. 
-       So when we detect this possibility we add a swap buffer
-       to the re, and switch the buffer each match. If we fail
-       we switch it back, otherwise we leave it swapped.
-    */
-        Newxz(prog->swap, (prog->nparens + 1), regexp_paren_pair);
-    }
-    t = prog->swap;
-    prog->swap = prog->offs;
-    prog->offs = t;
-}    
-
 
 /*
  - regexec_flags - match a regexp against a string
@@ -1783,7 +1761,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const prog, char *stringarg, register char *st
     I32 multiline;
     RXi_GET_DECL(prog,progi);
     regmatch_info reginfo;  /* create some info to pass to regtry etc */
-    bool swap_on_fail = 0;
+    regexp_paren_pair *swap = NULL;
     GET_RE_DEBUG_FLAGS_DECL;
 
     PERL_ARGS_ASSERT_REGEXEC_FLAGS;
@@ -1861,9 +1839,16 @@ Perl_regexec_flags(pTHX_ REGEXP * const prog, char *stringarg, register char *st
 	    reginfo.ganch = strbeg;
     }
     if (PL_curpm && (PM_GETRE(PL_curpm) == prog)) {
-        swap_on_fail = 1;
-        swap_match_buff(prog); /* do we need a save destructor here for
-                                  eval dies? */
+	/* We have to be careful. If the previous successful match
+	   was from this regex we don't want a subsequent partially
+	   successful match to clobber the old results.
+	   So when we detect this possibility we add a swap buffer
+	   to the re, and switch the buffer each match. If we fail
+	   we switch it back, otherwise we leave it swapped.
+	*/
+	swap = prog->offs;
+	/* do we need a save destructor here for eval dies? */
+	Newxz(prog->offs, (prog->nparens + 1), regexp_paren_pair);
     }
     if (!(flags & REXEC_CHECKED) && (prog->check_substr != NULL || prog->check_utf8 != NULL)) {
 	re_scream_pos_data d;
@@ -2162,6 +2147,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const prog, char *stringarg, register char *st
     goto phooey;
 
 got_it:
+    Safefree(swap);
     RX_MATCH_TAINTED_set(prog, PL_reg_flags & RF_tainted);
 
     if (PL_reg_eval_set)
@@ -2207,10 +2193,12 @@ phooey:
 			  PL_colors[4], PL_colors[5]));
     if (PL_reg_eval_set)
 	restore_pos(aTHX_ prog);
-    if (swap_on_fail) 
+    if (swap) {
         /* we failed :-( roll it back */
-        swap_match_buff(prog);
-    
+        Safefree(prog->offs);
+        prog->offs = swap;
+    }
+
     return 0;
 }
 
diff --git a/regexp.h b/regexp.h
index a1417af..6cd2e7b 100644
--- a/regexp.h
+++ b/regexp.h
@@ -88,7 +88,7 @@ typedef struct regexp {
         /* Data about the last/current match. These are modified during matching*/
         U32 lastparen;		/* last open paren matched */
 	U32 lastcloseparen;	/* last close paren matched */
-        regexp_paren_pair *swap;  /* Swap copy of *offs */ 
+        regexp_paren_pair *swap;  /* Unused: 5.10.1 and later */
         regexp_paren_pair *offs;  /* Array of offsets for (@-) and (@+) */
 
 	char *subbeg;		/* saved or original string 
diff --git a/t/op/pat.t b/t/op/pat.t
index 0b2c729..5902c96 100644
--- a/t/op/pat.t
+++ b/t/op/pat.t
@@ -13,7 +13,7 @@ sub run_tests;
 
 $| = 1;
 
-my $EXPECTED_TESTS = 4065;  # Update this when adding/deleting tests.
+my $EXPECTED_TESTS = 4066;  # Update this when adding/deleting tests.
 
 BEGIN {
     chdir 't' if -d 't';
@@ -4363,6 +4363,24 @@ sub run_tests {
 	    ok($s =~ $pat, $pat);
 	}
     }
+
+    # This only works under -DEBUGGING because it relies on an assert().
+    {
+        local $BugId = '60508';
+	local $Message = "Check capture offset re-entrancy of utf8 code.";
+
+        sub fswash { $_[0] =~ s/([>X])//g; }
+
+        my $k1 = "." x 4 . ">>";
+        fswash($k1);
+
+        my $k2 = "\x{f1}\x{2022}";
+        $k2 =~ s/([\360-\362])/>/g;
+        fswash($k2);
+
+        iseq($k2, "\x{2022}", "utf8::SWASHNEW doesn't cause capture leaks");
+    }
+
     #
     # This should be the last test.
     #
-- 
1.6.0.4

@p5pRT
Copy link
Author

p5pRT commented Aug 3, 2009

From @iabyn

On Sun, Aug 02, 2009 at 08​:12​:37PM -0400, George Greer wrote​:

On Mon, 27 Jul 2009, Dave Mitchell wrote​:

I'm having great difficulty pulling this into maint; specifically the
regexec.c part, due the changes in blead (but not maint) that made REGEXPs
into SVs; for example, the maint​:

Perl_regexec_flags(pTHX_ REGEXP * const prog, ...

becomes the blead​:

Perl_regexec_flags(pTHX_ REGEXP * const rx, ...
...
struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and
sometimes to prog, and I think it would require someone with a clearer
head than what mine is at the moment to work it all out :-(

Doing a straight port and fixing up the conflicts passes the tests
(perhaps by luck?). I've attached the patch, which has the commit
message edited with a proper 1-line summary like the blead version did.

Thanks, but given the closeness to 5.10.1 release, and the risk of this
patch, I think I'll defer it till after 5.10.1.

--
A walk of a thousand miles begins with a single step...
then continues for another 1,999,999 or so.

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2009

From @demerphq

2009/8/3 Dave Mitchell <davem@​iabyn.com>​:

On Sun, Aug 02, 2009 at 08​:12​:37PM -0400, George Greer wrote​:

On Mon, 27 Jul 2009, Dave Mitchell wrote​:

I'm having great difficulty pulling this into maint; specifically the
regexec.c part, due the changes in blead (but not maint) that made REGEXPs
into SVs; for example, the maint​:

   Perl_regexec_flags(pTHX_ REGEXP * const prog, ...

becomes the blead​:

   Perl_regexec_flags(pTHX_ REGEXP * const rx, ...
   ...
   struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and
sometimes to prog, and I think it would require someone with a clearer
head than what mine is at the moment to work it all out :-(

Doing a straight port and fixing up the conflicts passes the tests
(perhaps by luck?).  I've attached the patch, which has the commit
message edited with a proper 1-line summary like the blead version did.

Thanks, but given the closeness to 5.10.1 release, and the risk of this
patch, I think I'll defer it till after 5.10.1.

So is it ok to go with it now?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2009

From @iabyn

On Wed, Sep 02, 2009 at 06​:19​:14PM +0200, demerphq wrote​:

2009/8/3 Dave Mitchell <davem@​iabyn.com>​:

On Sun, Aug 02, 2009 at 08​:12​:37PM -0400, George Greer wrote​:

On Mon, 27 Jul 2009, Dave Mitchell wrote​:

I'm having great difficulty pulling this into maint; specifically the
regexec.c part, due the changes in blead (but not maint) that made REGEXPs
into SVs; for example, the maint​:

   Perl_regexec_flags(pTHX_ REGEXP * const prog, ...

becomes the blead​:

   Perl_regexec_flags(pTHX_ REGEXP * const rx, ...
   ...
   struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and
sometimes to prog, and I think it would require someone with a clearer
head than what mine is at the moment to work it all out :-(

Doing a straight port and fixing up the conflicts passes the tests
(perhaps by luck?).  I've attached the patch, which has the commit
message edited with a proper 1-line summary like the blead version did.

Thanks, but given the closeness to 5.10.1 release, and the risk of this
patch, I think I'll defer it till after 5.10.1.

So is it ok to go with it now?

I haven't really caught up with my p5p mailbox since 5.10.1 was released,
and I haven't yet initiated a discussion about the nature of future 5.10.x
releases (a nd who will release it), so for the moment anything 5.10.x
related is kind of on hold.

--
Lady Nancy Astor​: If you were my husband, I would flavour your coffee
with poison.
Churchill​: Madam - if I were your husband, I would drink it.

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2009

From @demerphq

fixed in perl 5.11

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2009

@demerphq - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant