Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coredump in Perl_re_intuit_start #16871

Closed
p5pRT opened this issue Mar 4, 2019 · 10 comments
Closed

Coredump in Perl_re_intuit_start #16871

p5pRT opened this issue Mar 4, 2019 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 4, 2019

Migrated from rt.perl.org#133892 (status was 'resolved')

Searchable as RT133892$

@p5pRT
Copy link
Author

p5pRT commented Mar 4, 2019

From @dur-randir

Created by @dur-randir

While fuzzing perl v5.29.8-21-gde59f38ed9 built with afl and run
under libdislocator, I found the following program to segfault

eval q!//,caller grep/b\x{1c0}ss0/i,caller 0!

GDB stack trace is following​:

#0 0x000055555588917d in Perl_re_intuit_start (rx=0x555555b411b0,
sv=0x555555b4b750, strbeg=0x555555b51200 "/tmp/0057",
strpos=0x555555b51200 "/tmp/0057",
  strend=0x555555b51209 "", flags=97, data=0x0) at regexec.c​:1283
#1 0x0000555555895ca2 in Perl_regexec_flags (rx=0x555555b411b0,
stringarg=0x555555b51200 "/tmp/0057", strend=0x555555b51209 "",
  strbeg=0x555555b51200 "/tmp/0057", minend=0, sv=0x555555b4b750,
data=0x0, flags=97) at regexec.c​:3289
#2 0x0000555555754e89 in Perl_pp_match () at pp_hot.c​:3018
#3 0x00005555556f692d in Perl_runops_debug () at dump.c​:2537
#4 0x00005555555da124 in S_run_body (oldscope=1) at perl.c​:2692
#5 0x00005555555d96a2 in perl_run (my_perl=0x555555b24260) at perl.c​:2615
#6 0x000055555558e12e in main (argc=2, argv=0x7fffffffe208,
env=0x7fffffffe220) at perlmain.c​:127

This is a regression between 5.26 and 5.28, bisect points to

commit 141513f
Author​: Karl Williamson <khw@​cpan.org>
Date​: Sun Feb 4 19​:43​:00 2018 -0700

  regcomp.c​: Under/i segregate folding vs non-folding characters

  For matching sequences of characters, the regex compiler generates
  various flavors of EXACT-type nodes. The optimizer uses those nodes to
  look for sequences that must be in the matched string. In this way, the
  pattern matching engine may be able to quickly rule out any possible
  match altogether, or to narrow down the places in the target string that
  might match.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.29.9:

Configured by dur-randir at Wed Feb 27 14:51:01 MSK 2019.

Summary of my perl5 (revision 5 version 29 subversion 9) configuration:
  Commit id: c1e47bad34ce1d9c84ed57c9b8978bcbd5a02e98
  Platform:
    osname=darwin
    osvers=13.4.0
    archname=darwin-thread-multi-2level
    uname='darwin isengard.local 13.4.0 darwin kernel version 13.4.0:
mon jan 11 18:17:34 pst 2016; root:xnu-2422.115.15~1release_x86_64
x86_64 '
    config_args='-de -Dusedevel -DDEBUGGING -Dusethreads'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.9
-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -DPERL_USE_SAFE_PUTENV'
    optimize='-O3 -g'
    cppflags='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.9
-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include'
    ccversion=''
    gccversion='4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -mmacosx-version-min=10.9 -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.0/lib
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib
/usr/lib
    libs=-lpthread -lgdbm -ldbm -ldl -lm -lutil -lc
    perllibs=-lpthread -ldl -lm -lutil -lc
    libc=
    so=dylib
    useshrplib=false
    libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=bundle
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' -mmacosx-version-min=10.9 -bundle -undefined
dynamic_lookup -L/usr/local/lib -fstack-protector'



@INC for perl 5.29.9:
    lib
    /usr/local/lib/perl5/site_perl/5.29.9/darwin-thread-multi-2level
    /usr/local/lib/perl5/site_perl/5.29.9
    /usr/local/lib/perl5/5.29.9/darwin-thread-multi-2level
    /usr/local/lib/perl5/5.29.9


Environment for perl 5.29.9:
    DYLD_LIBRARY_PATH (unset)
    HOME=/Users/dur-randir
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/Users/dur-randir/perlbrew/bin:/Users/dur-randir/perlbrew/perls/perl-5.22.1/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/texbin
    PERLBREW_HOME=/Users/dur-randir/.perlbrew
    PERLBREW_MANPATH=/Users/dur-randir/perlbrew/perls/perl-5.22.1/man
    PERLBREW_PATH=/Users/dur-randir/perlbrew/bin:/Users/dur-randir/perlbrew/perls/perl-5.22.1/bin
    PERLBREW_PERL=perl-5.22.1
    PERLBREW_ROOT=/Users/dur-randir/perlbrew
    PERLBREW_SHELLRC_VERSION=0.84
    PERLBREW_VERSION=0.84
    PERL_BADLANG (unset)
    SHELL=/usr/local/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Mar 23, 2019

From @khwilliamson

This is failing in a part of the engine I don't know anything about. I think the blamed commit is a red herring, probably it just perturbs things so that this showed up, but there is an underlying bug. The attached patch intercepts this and avoids the failure, but I don't believe it's the correct thing to do. I'm hoping someone more familiar with the engine could look at it, if only to give me hints as to how to proceed.

The commit message gives further details
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Mar 23, 2019

From @khwilliamson

0004-work-around-for-perl-133892.patch
From 142137083d6c79a4415cfe0b411a1a4a12099123 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Fri, 22 Mar 2019 22:25:09 -0600
Subject: [PATCH 4/4] work around for [perl #133892]

The problem here is that the data structure 'other' contains a
'utf8_substr' but not a plain 'substr'.  But utf8_target is false, so
'must' is pointing to NULL, and the assertion fails.

This patch adds a test for 'must' being NUL, and abandons the match.
But I don't know what the real solution should be.  How did we get here
with a UTF-8 substr, but the target isn't?
---
 regexec.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/regexec.c b/regexec.c
index 00e822d729..da1613bfa6 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1270,6 +1270,12 @@ Perl_re_intuit_start(pTHX_
             s = other_last;
 
         must = utf8_target ? other->utf8_substr : other->substr;
+        if (! must) {
+            DEBUG_EXECUTE_r(Perl_re_printf( aTHX_
+                                    "  no other substr in valid UTF8ness; giving up...\n"));
+            goto fail_finish;
+        }
+
         assert(SvPOK(must));
         {
             char *from = s;
-- 
2.17.1

@p5pRT
Copy link
Author

p5pRT commented Mar 23, 2019

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2019

From @hvds

On Fri, 22 Mar 2019 21​:38​:03 -0700, khw wrote​:

This is failing in a part of the engine I don't know anything about.
I think the blamed commit is a red herring, probably it just perturbs
things so that this showed up, but there is an underlying bug. The
attached patch intercepts this and avoids the failure, but I don't
believe it's the correct thing to do. I'm hoping someone more
familiar with the engine could look at it, if only to give me hints as
to how to proceed.

The commit message gives further details

Minimal reproducer is something like​:
  ./miniperl -e 'grep /b\x{1c0}ss0/i, ("xxxx", "xxxx0")'

We're coming from regexec.c​:1533 /* Contradict one of substrings */ via
its 'goto do_other_substr', which bypasses the utf8_target check; the
comment implies we've matched the anchored substr at one position, and
now need to check for another match at a new position, which implies
that this should be safe. But it's wrong - the point we would have
checked (1176 /* now look for the 'other' substring if defined */)
we saw only that the non-utf8 was empty, so didn't bother to check.

I think the check at 1176 could handle this, and bail if there's
a utf8 only check string and a non-utf8 target; a patch for that is
included below (also pushed as branch hv/rt133892). Absent objections,
I'll aim to add a test and push this today or tomorrow.

It might be better to simplify a lot of this - test up front if either
of the check substrings demand utf8 for a non-utf8 target, and remove
all the other "Can't match, because target string needs to be in UTF-8"
tests. I'm not sure if there's a reason that might be risky, but it
feels like it should need only expanding the check at 920​:
  if (!prog->check_substr && prog->check_utf8) {
.. to check both.

It's also slightly odd that this does not happen with a single match;
I haven't looked into that yet, but I suspect we start off with the
check being the anchored substr, and flip to prefer the floating substr
after the first fail - but for some reason do so even though both
anchored and floating are enough to reject the match.

Hugo

commit f67366c
Author​: Hugo van der Sanden <hv@​crypt.org>
Date​: Mon Mar 25 11​:27​:12 2019 +0000

  [perl #133892] coredump in Perl_re_intuit_start

  Make sure we have a valid non-utf8 'other' check substring before we
  try to use it.

Inline Patch
diff --git a/regexec.c b/regexec.c
index 9d67da6..ad589e2 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1173,8 +1173,8 @@ Perl_re_intuit_start(pTHX_

     /* now look for the 'other' substring if defined */

-    if (utf8_target ? prog->substrs->data[other_ix].utf8_substr
-                    : prog->substrs->data[other_ix].substr)
+    if (prog->substrs->data[other_ix].utf8_substr
+        || prog->substrs->data[other_ix].substr)
     {
        /* Take into account the "other" substring. */
         char *last, *last1;
@@ -1184,6 +1184,11 @@ Perl_re_intuit_start(pTHX_

       do_other_substr:
         other = &prog->substrs->data[other_ix];
+        if (!utf8_target && !other->substr) {
+            if (!to_byte_substr(prog)) {
+                NON_UTF8_TARGET_BUT_UTF8_REQUIRED(fail);
+            }
+        }

         /* if "other" is anchored:
          * we've previously found a floating substr starting at check_at.

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2019

From @hvds

On Mon, 25 Mar 2019 04​:50​:48 -0700, hv wrote​:
[...]

I think the check at 1176 could handle this, and bail if there's
a utf8 only check string and a non-utf8 target; a patch for that is
included below (also pushed as branch hv/rt133892). Absent objections,
I'll aim to add a test and push this today or tomorrow.
[...]
--- a/regexec.c
+++ b/regexec.c
@​@​ -1173,8 +1173,8 @​@​ Perl_re_intuit_start(pTHX_

 /\* now look for the 'other' substring if defined \*/

- if (utf8_target ? prog->substrs->data[other_ix].utf8_substr
- : prog->substrs->data[other_ix].substr)
+ if (prog->substrs->data[other_ix].utf8_substr
+ || prog->substrs->data[other_ix].substr)
{
/* Take into account the "other" substring. */
char *last, *last1;
@​@​ -1184,6 +1184,11 @​@​ Perl_re_intuit_start(pTHX_

   do\_other\_substr&#8203;:
     other = &prog\->substrs\->data\[other\_ix\];

+ if (!utf8_target && !other->substr) {
+ if (!to_byte_substr(prog)) {
+ NON_UTF8_TARGET_BUT_UTF8_REQUIRED(fail);
+ }
+ }

     /\* if "other" is anchored&#8203;:
      \* we've previously found a floating substr starting at check\_at\.

Now pushed with test as fd8def1.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2019

@hvds - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented May 22, 2019

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.30.0, this and 160 other issues have been
resolved.

Perl 5.30.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.30.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Author

p5pRT commented May 22, 2019

@khwilliamson - Status changed from 'pending release' to 'resolved'

@p5pRT p5pRT closed this as completed May 22, 2019
@p5pRT
Copy link
Author

p5pRT commented Jun 19, 2019

From @nwc10

On Wed, 27 Mar 2019 06​:18​:49 -0700, hv wrote​:

Now pushed with test as fd8def1.

Thanks Sergey for finding this, and to Karl and Hugu for figuring out the fix.

I've just hit this SEGV when testing stuff at work with v5.28.1. It's a SEGV at the same location (for the same reason) but despite gdb showing a short regex and a short target string, I wasn't able to get a small testcase out of it (or I think even have it reliably crash), so it's good to see that (1) I don't have to and (2) it's already fixed.

My only "problem" now is whether our internal build system is up to installing custom patches, or whether I can sell my boss on v5.30.0

Nicholas Clark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant