Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure in Perl_reg_numbered_buff_fetch (regcomp.c:8646) #16952

Open
p5pRT opened this issue Apr 14, 2019 · 18 comments
Open

Assertion failure in Perl_reg_numbered_buff_fetch (regcomp.c:8646) #16952

p5pRT opened this issue Apr 14, 2019 · 18 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 14, 2019

Migrated from rt.perl.org#134026 (status was 'new')

Searchable as RT134026$

@p5pRT
Copy link
Author

p5pRT commented Apr 14, 2019

From @dur-randir

Created by @dur-randir

While fuzzing perl v5.29.9-63-g2496d8f3f7 built with afl and run
under libdislocator, I found the following program

s/d|(?{})!//.$&>0for$0,l..a0,0..0

to cause an assertion failure

perl​: regcomp.c​:8646​: void Perl_reg_numbered_buff_fetch(REGEXP *const,
const I32, SV *const)​: Assertion `(STRLEN)rx->sublen >= (STRLEN)((s -
rx->subbeg) + i)' failed.

GDB stack trace is following

#0 __GI_raise (sig=sig@​entry=6) at ../sysdeps/unix/sysv/linux/raise.c​:50
#1 0x00007ffff7c25535 in __GI_abort () at abort.c​:79
#2 0x00007ffff7c2540f in __assert_fail_base (fmt=0x7ffff7d87ee0
"%s%s%s​:%u​: %s%sAssertion `%s' failed.\n%n",
  assertion=0x555555a82de0 "(STRLEN)rx->sublen >= (STRLEN)((s -
rx->subbeg) + i)", file=0x555555a7fcd0 "regcomp.c", line=8646,
function=<optimized out>)
  at assert.c​:92
#3 0x00007ffff7c330f2 in __GI___assert_fail (assertion=0x555555a82de0
"(STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)",
  file=0x555555a7fcd0 "regcomp.c", line=8646,
function=0x555555a9ee70 <__PRETTY_FUNCTION__.22774>
"Perl_reg_numbered_buff_fetch") at assert.c​:101
#4 0x00005555556a1ed6 in Perl_reg_numbered_buff_fetch
(r=0x555555b74ea8, paren=0, sv=0x555555b74fb0) at regcomp.c​:8646
#5 0x0000555555722094 in Perl_magic_get (sv=0x555555b74fb0,
mg=0x555555b7bfe0) at mg.c​:913
#6 0x000055555571fd35 in Perl_mg_get (sv=0x555555b74fb0) at mg.c​:201
#7 0x000055555561513b in Perl_try_amagic_bin (method=70, flags=4) at gv.c​:3001
#8 0x000055555575d22e in Perl_pp_concat () at pp_hot.c​:329
#9 0x000055555570ba97 in Perl_runops_debug () at dump.c​:2537
#10 0x00005555555ed560 in S_run_body (oldscope=1) at perl.c​:2716
#11 0x00005555555ecade in perl_run (my_perl=0x555555b4e260) at perl.c​:2639
#12 0x00005555555a114e in main (argc=3, argv=0x7fffffffe1a8,
env=0x7fffffffe1c8) at perlmain.c​:127

I suspect this being stack-not-refcounted (but not 100% sure), bisect points to

commit 13b0f67
Author​: David Mitchell <davem@​iabyn.com>
Date​: Wed May 22 16​:38​:29 2013 +0100

  re-enable Copy-on-Write by default.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.29.9:

Configured by dur-randir at Wed Feb 27 14:51:01 MSK 2019.

Summary of my perl5 (revision 5 version 29 subversion 9) configuration:
  Commit id: c1e47bad34ce1d9c84ed57c9b8978bcbd5a02e98
  Platform:
    osname=darwin
    osvers=13.4.0
    archname=darwin-thread-multi-2level
    uname='darwin isengard.local 13.4.0 darwin kernel version 13.4.0:
mon jan 11 18:17:34 pst 2016; root:xnu-2422.115.15~1release_x86_64
x86_64 '
    config_args='-de -Dusedevel -DDEBUGGING -Dusethreads'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.9
-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -DPERL_USE_SAFE_PUTENV'
    optimize='-O3 -g'
    cppflags='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.9
-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include'
    ccversion=''
    gccversion='4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -mmacosx-version-min=10.9 -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.0/lib
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib
/usr/lib
    libs=-lpthread -lgdbm -ldbm -ldl -lm -lutil -lc
    perllibs=-lpthread -ldl -lm -lutil -lc
    libc=
    so=dylib
    useshrplib=false
    libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=bundle
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' -mmacosx-version-min=10.9 -bundle -undefined
dynamic_lookup -L/usr/local/lib -fstack-protector'



@INC for perl 5.29.9:
    lib
    /usr/local/lib/perl5/site_perl/5.29.9/darwin-thread-multi-2level
    /usr/local/lib/perl5/site_perl/5.29.9
    /usr/local/lib/perl5/5.29.9/darwin-thread-multi-2level
    /usr/local/lib/perl5/5.29.9


Environment for perl 5.29.9:
    DYLD_LIBRARY_PATH (unset)
    HOME=/Users/dur-randir
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/Users/dur-randir/perlbrew/bin:/Users/dur-randir/perlbrew/perls/perl-5.22.1/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/texbin
    PERLBREW_HOME=/Users/dur-randir/.perlbrew
    PERLBREW_MANPATH=/Users/dur-randir/perlbrew/perls/perl-5.22.1/man
    PERLBREW_PATH=/Users/dur-randir/perlbrew/bin:/Users/dur-randir/perlbrew/perls/perl-5.22.1/bin
    PERLBREW_PERL=perl-5.22.1
    PERLBREW_ROOT=/Users/dur-randir/perlbrew
    PERLBREW_SHELLRC_VERSION=0.84
    PERLBREW_VERSION=0.84
    PERL_BADLANG (unset)
    SHELL=/usr/local/bin/zsh

@toddr toddr removed the khw label Oct 25, 2019
@khwilliamson khwilliamson added this to the 5.32.0 milestone Mar 20, 2020
@xsawyerx
Copy link
Member

xsawyerx commented Apr 1, 2020

Should this be a blocker, though?

@hvds
Copy link
Contributor

hvds commented Apr 1, 2020

Sorry, I don't remember seeing this before. It doesn't look like stack-refcounting to me - this still fails the same way:

% ./miniperl -e '
  my($good, $bad) = qw{ab cd};
  for ($good, $bad) {
    s/ b | (?{ 1; }) e //x;
    print "$_: $&\n";
  }'
a: b
miniperl: regcomp.c:8791: void Perl_reg_numbered_buff_fetch(REGEXP *const, const I32, SV *const): Assertion `(STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)' failed.
Aborted (core dumped)
% 

That looks quite concerning, I think it should be a blocker for now.

@xsawyerx
Copy link
Member

xsawyerx commented Apr 5, 2020

@khwilliamson thoughts on a fix?

@khwilliamson
Copy link
Contributor

@hvds is looking into it

@hvds
Copy link
Contributor

hvds commented Apr 5, 2020

This involves interaction between regexp code blocks and (presumably) copy-on-write, I'm hoping @iabyn will look at it. I can give it a go, but it'll be very much slower for me.

@hvds
Copy link
Contributor

hvds commented Apr 6, 2020

% ./miniperl -e '
  my($good, $bad) = qw{ab cd};
  for ($good, $bad) {
    s/ b | (?{ 1; }) e //x;
    print "$_: $&\n";
  }'
a: b
miniperl: regcomp.c:8791: void Perl_reg_numbered_buff_fetch(REGEXP *const, const I32, SV *const): Assertion `(STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + i)' failed.
Aborted (core dumped)
% 

Strangely I now cannot reproduce the failure with the above, but it does fail if I set $bad = 'c' instead.

At the point of the assertion, the second time round the loop, we are trying to do "get" magic on $& using PL_curpm's subbeg (the string it acted on, or a copy of it) and offs[0], the start and end offsets within that string for $0 and hence $&. At this point offs[0] = { start = 1, end = 2, ... } indicating there's a valid match to look at, but subbeg = "c" so we're pointing at the wrong string.

Effectively reversing the fingered "re-enable Copy-on-Write" commit by building with -Accflags="-DPERL_NO_COW" there isn't a problem: without COW at the same point, we have subbeg = "ab" as we should.

Here's another way to see the effect:

% perl -wle 'for ("foo", "bar") { /f(o)o|baz/; print "$&-$1" }'
foo-o
foo-o
% perl -wle 'for ("foo", "bar") { /f(o)o|(?{})baz/; print "$&-$1" }'
foo-o
bar-a
% 

.. so somehow the presence of the code block is allowing a failed match to update the string in PL_curpm, while retaining the capture offsets of the last successful match, but only when COW is enabled.

@hvds
Copy link
Contributor

hvds commented Apr 6, 2020

Here's another way to see the effect:

% perl -wle 'for ("foo", "bar") { /f(o)o|baz/; print "$&-$1" }'
foo-o
foo-o
% perl -wle 'for ("foo", "bar") { /f(o)o|(?{})baz/; print "$&-$1" }'
foo-o
bar-a
% 

.. so somehow the presence of the code block is allowing a failed match to update the string in PL_curpm, while retaining the capture offsets of the last successful match, but only when COW is enabled.

Note that the code block doesn't need to be entered: /f(o)o|(?!)(?{})/ gives the same wrong output.

subbeg is set in these cases in regexec.c:S_setup_eval_state, introduced in a75351a. The effect was originally reversed by S_restore_eval_state but that was removed in favour of S_cleanup_regmatch_info_aux by bf2039a. The cleanup function will restore a saved subbeg from eval_state if it exists; but that is set up only if RXp_MATCH_COPIED(rex) during setup, and it appears that can get set only during S_reg_set_capture_string (or restored in the cleanup), and that happens only if we don't go through the if (SvCANCOW(sv)) branch.

Right now I haven't a clue when/how subbeg is supposed to be restored in this case, and I'm dubious whether the current setting of it is valid at all when offs[] still refers to the last successful match rather than the current incomplete match.

That's all I have time for for now.

@hvds
Copy link
Contributor

hvds commented Apr 7, 2020

On second thoughts, since this has been broken since 5.20, it is not a recent regression and should not be a blocker.

@jkeenan
Copy link
Contributor

jkeenan commented Apr 7, 2020

On second thoughts, since this has been broken since 5.20, it is not a recent regression and should not be a blocker.

Are you saying it was introduced in 5.20 or that you've detected it at least as far back as 5.20?

If it was introduced in 5.20, how did you determine that? (I'm always interested in new bisection techniques; I don't know of any for fuzzer-detected stuff.)

Thank you very much.
Jim Keenan

@hvds
Copy link
Contributor

hvds commented Apr 7, 2020

I manually ran:

perl -wle 'for ("foo", "bar") { /f(o)o|(?{})baz/; print "$&-$1" }'

with various installed perls. It passed (ie printed "foo-o" twice) with 5.18, failed with later versions.

I also bisected on that code with:

Porting/bisect.pl --target=miniperl -e 'for ("foo", "bar") { /f(o)o|(?{})baz/; $s = $1 } die if $s ne "o"'

and it fingered 7fa949d, which makes sense since we're working on readonly consts here.

@hvds
Copy link
Contributor

hvds commented Apr 7, 2020

For what it's worth, that test does give the right answer with the hack below. However a test run gives failures and both malloc and double-free errors: I (clearly) don't understand what RXp_MATCH_COPIED implies or how and when it is intended to be used - its public variant was added back in cf93c79.

I think I'm about stuck now.

@@ -10497,7 +10497,7 @@ S_setup_eval_state(pTHX_ regmatch_info *const reginfo)
     eval_state->curpm = PL_curpm;
     PL_curpm_under = PL_curpm;
     PL_curpm = PL_reg_curpm;
-    if (RXp_MATCH_COPIED(rex)) {
+
     /*  Here is a serious problem: we cannot rewrite subbeg,
         since it may be needed if this match fails.  Thus
         $` inside (?{}) could fail... */
@@ -10509,9 +10509,7 @@ S_setup_eval_state(pTHX_ regmatch_info *const reginfo)
     eval_state->saved_copy = rex->saved_copy;
 #endif
     RXp_MATCH_COPIED_off(rex);
-    }
-    else
-        eval_state->subbeg = NULL;
+
     rex->subbeg = (char *)reginfo->strbeg;
     rex->suboffset = 0;
     rex->subcoffset = 0;

@dur-randir
Copy link
Member

If it was introduced in 5.20, how did you determine that? (I'm always interested in new bisection techniques; I don't know of any for fuzzer-detected stuff.)

I could've been introduced anytime between creating CoW and enabling builds with it by default. I didn't bisect manually enabled CoW builds of pre-5.20 perls.

@hvds
Copy link
Contributor

hvds commented Apr 7, 2020

If it was introduced in 5.20, how did you determine that? (I'm always interested in new bisection techniques; I don't know of any for fuzzer-detected stuff.)

I could've been introduced anytime between creating CoW and enabling builds with it by default. I didn't bisect manually enabled CoW builds of pre-5.20 perls.

Good point, I was concentrating on default builds, but it would be instructive also to try a bisect with PERL_NEW_COPY_ON_WRITE defined. I'll give that a go later if I can identify suitable start/end points.

@khwilliamson khwilliamson removed this from the 5.32.0 milestone Apr 7, 2020
@hvds
Copy link
Contributor

hvds commented Apr 7, 2020

it would be instructive also to try a bisect with PERL_NEW_COPY_ON_WRITE defined

Or maybe not:

Porting/bisect.pl -DDEBUGGING -Accflags="-DPERL_NEW_COPY_ON_WRITE" \
  --start db2c6cb33ec067c880a2cb3c4efdb33f7e3e3d0f \
  --end 07d01d6ec25527bf0236de2205ea412d40353058 \
  --target=miniperl \
  -e 'my($good, $bad) = qw{ab c}; for ($good, $bad) { /b|(?{})d/; print $& } exit 0'

fingers f7a8268, so again it's just the point COW becomes enabled for a relevant string.

@dur-randir
Copy link
Member

It's from the initial implementation:

Porting/bisect.pl -j20 -DDEBUGGING -Accflags="-DPERL_NEW_COPY_ON_WRITE"  --start db2c6cb33ec067c880a2cb3c4efdb33f7e3e3d0f~50  --end 07d01d6ec25527bf0236de2205ea412d40353058  --target=miniperl  -e 'sub Z::ab {}; sub Y::c {}; my($good) = (keys %Z::); my($bad)=(keys %Y::); for ($good, $bad) { /b|(?{})d/; print $& } exit 0'

db2c6cb is the first bad commit

commit db2c6cb33ec067c880a2cb3c4efdb33f7e3e3d0f
Author: Father Chrysostomos <sprout@cpan.org>
Date:   Mon Oct 8 00:20:21 2012 -0700

    New COW mechanism

    This was discussed in ticket #114820.

@khwilliamson
Copy link
Contributor

This is still a problem in 5.37.12

@demerphq
Copy link
Collaborator

Interesting. I will try to take a look when I get some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants