Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coredump in Perl_re_op_compile #15809

Closed
p5pRT opened this issue Jan 15, 2017 · 25 comments
Closed

Coredump in Perl_re_op_compile #15809

p5pRT opened this issue Jan 15, 2017 · 25 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 15, 2017

Migrated from rt.perl.org#130561 (status was 'open')

Searchable as RT130561$

@p5pRT
Copy link
Author

p5pRT commented Jan 15, 2017

From @dur-randir

Created by @dur-randir

While fuzzing perl v5.25.8-207-gcbe2fc5001 built with afl and run
under libdislocator, I found the following program

/((?1)){8,0}/

to crash. GDB info about the crash location​:

#0 0x00007f565fa9442e in Perl_re_op_compile (patternp=0x0,
pat_count=1, expr=0x7f56600aa9b8, eng=0x7f565ffd3540
<PL_core_reg_engine>, old_re=0x0,
  is_bare_re=0x0, orig_rx_flags=0, pm_flags=0) at regcomp.c​:7772
7772 ARG2L_SET( scan, RExC_open_parens[ARG(scan)] - scan );
(gdb) bt
#0 0x00007f565fa9442e in Perl_re_op_compile (patternp=0x0,
pat_count=1, expr=0x7f56600aa9b8, eng=0x7f565ffd3540
<PL_core_reg_engine>, old_re=0x0,
  is_bare_re=0x0, orig_rx_flags=0, pm_flags=0) at regcomp.c​:7772
#1 0x00007f565f9b1d58 in Perl_pmruntime (o=0x7f56600aa9f8,
expr=0x7f56600aa9b8, repl=0x0, flags=1, floor=0) at op.c​:5784
#2 0x00007f565fa629eb in Perl_yyparse (gramtype=258) at perly.y​:1204
#3 0x00007f565f9e22e1 in S_parse_body (env=0x0, xsinit=0x7f565f99dde8
<xs_init>) at perl.c​:2376
#4 0x00007f565f9e0646 in perl_parse (my_perl=0x7f5660088010,
xsinit=0x7f565f99dde8 <xs_init>, argc=2, argv=0x7ffe9dfb51b8, env=0x0)
at perl.c​:1691
#5 0x00007f565f99dd26 in main (argc=2, argv=0x7ffe9dfb51b8,
env=0x7ffe9dfb51d0) at perlmain.c​:121

Perl Info

Flags:
    category=core
    severity=high

Site configuration information for perl 5.25.9:

Configured by root at Sat Jan 14 02:25:05 MSK 2017.

Summary of my perl5 (revision 5 version 25 subversion 9) configuration:
  Commit id: cbe2fc5001aa59cdc73e04cc35e097a2ecfbeec0
  Platform:
    osname=linux
    osvers=3.16.0-4-amd64
    archname=x86_64-linux
    uname='linux dorothy 3.16.0-4-amd64 #1 smp debian 3.16.36-1+deb8u2
(2016-10-19) x86_64 gnulinux '
    config_args='-des -Dusedevel -DDEBUGGING -Dcc=afl-clang-fast
-Doptimize=-O0 -g -ggdb3'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    bincompat5005=undef
  Compiler:
    cc='afl-clang-fast'
    ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
    optimize='-O0 -g -ggdb3'
    cppflags='-DDEBUGGING -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='4.2.1 Compatible Clang 3.9.1 (tags/RELEASE_391/rc2)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='afl-clang-fast'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/llvm-3.9/bin/../lib/clang/3.9.1/lib
/usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu
/lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.24.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.24'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O0 -g -ggdb3 -L/usr/local/lib -fstack-protector-strong'



@INC for perl 5.25.9:
    lib
    /usr/local/lib/perl5/site_perl/5.25.9/x86_64-linux
    /usr/local/lib/perl5/site_perl/5.25.9
    /usr/local/lib/perl5/5.25.9/x86_64-linux
    /usr/local/lib/perl5/5.25.9


Environment for perl 5.25.9:
    HOME=/home/afl
    LANG=en_US.UTF-8
    LANGUAGE=en_US:en
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/afl/perlbrew/bin:/home/afl/perlbrew/perls/perl-5.22.1/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
    PERLBREW_BASHRC_VERSION=0.78
    PERLBREW_HOME=/home/afl/.perlbrew
    PERLBREW_MANPATH=/home/afl/perlbrew/perls/perl-5.22.1/man
    PERLBREW_PATH=/home/afl/perlbrew/bin:/home/afl/perlbrew/perls/perl-5.22.1/bin
    PERLBREW_PERL=perl-5.22.1
    PERLBREW_ROOT=/home/afl/perlbrew
    PERLBREW_VERSION=0.78
    PERL_BADLANG (unset)
    SHELL=/usr/bin/zsh

@p5pRT
Copy link
Author

p5pRT commented Jan 22, 2017

From @dur-randir

Before v5.18.0, this'd just die() during a regex compilation. It was either exposed or caused by the following commit​:

31c15ce is the first bad commit
commit 31c15ce
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Sat Sep 15 12​:27​:22 2012 -0600

  PATCH​: [perl #82954] Make "Can't do {n,m} with n > m into warning

  This commit now causes this situation to warn instead of dying. The
  portion of the regular expression that can't match is optimized into an
  OPFAIL.

@p5pRT
Copy link
Author

p5pRT commented Jan 22, 2017

From [Unknown Contact. See original ticket]

Before v5.18.0, this'd just die() during a regex compilation. It was either exposed or caused by the following commit​:

31c15ce is the first bad commit
commit 31c15ce
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Sat Sep 15 12​:27​:22 2012 -0600

  PATCH​: [perl #82954] Make "Can't do {n,m} with n > m into warning

  This commit now causes this situation to warn instead of dying. The
  portion of the regular expression that can't match is optimized into an
  OPFAIL.

@p5pRT
Copy link
Author

p5pRT commented Jan 25, 2017

From @hvds

On Sun, 22 Jan 2017 04​:48​:49 -0800, randir wrote​:

Before v5.18.0, this'd just die() during a regex compilation. It was
either exposed or caused by the following commit​:

31c15ce is the first bad commit
commit 31c15ce
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Sat Sep 15 12​:27​:22 2012 -0600

PATCH​: [perl #82954] Make "Can't do {n,m} with n > m into warning

This commit now causes this situation to warn instead of dying. The
portion of the regular expression that can't match is optimized into
an
OPFAIL.

Yes, this change falls foul of​:
  /* Do setup, note this code has side effects beyond
  * the rest of this block. Specifically setting
  * RExC_recurse[] must happen at least once during
  * study_chunk(). */
.. since it optimizes away the node we must still study.

Karl, do you see any problem with just removing the optimizing aspect, per the attached patch? If not, I'll wrap in a test and commit.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jan 25, 2017

From @hvds

0001-perl-130561-Don-t-attempt-to-optimize-x-3-1.patch
From 5c0fa970ab98f53899b89c61536102da7b80b477 Mon Sep 17 00:00:00 2001
From: Hugo van der Sanden <hv@crypt.org>
Date: Wed, 25 Jan 2017 15:45:22 +0000
Subject: [PATCH] [perl #130561] Don't attempt to optimize /x{3,1}/

Nulling out the target of the quantifier risks causing other parts of
the regexp engine to get out of whack.
---
 regcomp.c | 19 ++-----------------
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/regcomp.c b/regcomp.c
index 97888ca..96339ac 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -11629,9 +11629,6 @@ S_regpiece(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
     const char *maxpos = NULL;
     UV uv;
 
-    /* Save the original in case we change the emitted regop to a FAIL. */
-    regnode * const orig_emit = RExC_emit;
-
     GET_RE_DEBUG_FLAGS_DECL;
 
     PERL_ARGS_ASSERT_REGPIECE;
@@ -11693,22 +11690,10 @@ S_regpiece(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
             }
 	    RExC_parse = next;
 	    nextchar(pRExC_state);
-            if (max < min) {    /* If can't match, warn and optimize to fail
-                                   unconditionally */
-                if (SIZE_ONLY) {
-
-                    /* We can't back off the size because we have to reserve
-                     * enough space for all the things we are about to throw
-                     * away, but we can shrink it by the amount we are about
-                     * to re-use here */
-                    RExC_size += PREVOPER(RExC_size) - regarglen[(U8)OPFAIL];
-                }
-                else {
+            if (max < min) {    /* If can't match, warn */
+                if (PASS2) {
                     ckWARNreg(RExC_parse, "Quantifier {n,m} with n > m can't match");
-                    RExC_emit = orig_emit;
                 }
-                ret = reganode(pRExC_state, OPFAIL, 0);
-                return ret;
             }
             else if (min == max && *RExC_parse == '?')
             {
-- 
2.10.2

@p5pRT
Copy link
Author

p5pRT commented Jan 25, 2017

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 26, 2017

From @khwilliamson

On 01/25/2017 08​:47 AM, Hugo van der Sanden via RT wrote​:

On Sun, 22 Jan 2017 04​:48​:49 -0800, randir wrote​:

Before v5.18.0, this'd just die() during a regex compilation. It was
either exposed or caused by the following commit​:

31c15ce is the first bad commit
commit 31c15ce
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Sat Sep 15 12​:27​:22 2012 -0600

PATCH​: [perl #82954] Make "Can't do {n,m} with n > m into warning

This commit now causes this situation to warn instead of dying. The
portion of the regular expression that can't match is optimized into
an
OPFAIL.

Yes, this change falls foul of​:
/* Do setup, note this code has side effects beyond
* the rest of this block. Specifically setting
* RExC_recurse[] must happen at least once during
* study_chunk(). */
.. since it optimizes away the node we must still study.

Karl, do you see any problem with just removing the optimizing aspect, per the attached patch? If not, I'll wrap in a test and commit.

I see no problems. Too bad that comment in the file is buried

Hugo

---
via perlbug​: queue​: perl5 status​: new
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=130561

@p5pRT
Copy link
Author

p5pRT commented Jan 26, 2017

From @hvds

Ugh, I experimented with this some more, and found that after my patch to deoptimize this loops until stack exhaustion​:

./miniperl -wle '"abaad" =~ m{(a) b ( ((?1)){2,1} | aa )d}x'

I think it'd be worth trying to chase down the bugs here, but I suspect the fuzzers will find more, and we're quite late in the release cycle to be hoping for confidence in this before 5.26 freeze.

So for now I'm going to look instead whether I can make it less dangerous to miss RExC_recurse entries, that feels like the surer (and more useful) target.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jan 27, 2017

From @demerphq

On 26 January 2017 at 11​:20, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

Ugh, I experimented with this some more, and found that after my patch to deoptimize this loops until stack exhaustion​:

./miniperl -wle '"abaad" =~ m{(a) b ( ((?1)){2,1} | aa )d}x'

That is because your patch did not change regexec.c to fail when the
min/max are incorrect.

I think it'd be worth trying to chase down the bugs here, but I suspect the fuzzers will find more, and we're quite late in the release cycle to be hoping for confidence in this before 5.26 freeze.

So for now I'm going to look instead whether I can make it less dangerous to miss RExC_recurse entries, that feels like the surer (and more useful) target.

That wont work either.

This​:

"foo"=~/(foo){8,0}|(?1)/

should match if we allow this pattern to execute. If we optimise away
the (foo){8,0} then the (?1) cant match 'foo'.

Because of this I think we should either revert the original change
until we can rethink it, or otherwise pursue your original solution
which IMO could work.

FWIW, before I realized the flaw in the original logic I nearly pushed
the following​:

commit c934d69b23fd180481261c72c04cda25924cb10e
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Fri Jan 27 06​:50​:58 2017 +0100

  fix [#130561], optimizing away a RECURSE op should not result in a SEGV

  It is possible that an OPEN which is the target of a recurse op,
  or that a recursion call itself, such as (?1) might be optimised
  away, which broke assumptions made about optimising recursion.

  The simplest fix is to verify that scan and OP(scan) are correct
  before we try to update parameters in the possible optimised away
  opcode.

Inline Patch
diff --git a/regcomp.c b/regcomp.c
index 4f54b01..e515204 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -7789,7 +7789,9 @@ Perl_re_op_compile(pTHX_ SV ** const patternp,
int pat_count,

  while ( RExC_recurse_count > 0 ) {
  const regnode *scan = RExC_recurse[ --RExC_recurse_count ];
- ARG2L_SET( scan, RExC_open_parens[ARG(scan)] - scan );
+ /* this GOSUB may have been optimized away */
+ if (scan && OP(scan) == GOSUB)
+ ARG2L_SET( scan, RExC_open_parens[ARG(scan)] - scan );
  }

  Newxz(r->offs, RExC_npar, regexp_paren_pair);

Inline Patch
diff --git a/t/re/pat_rt_report.t b/t/re/pat_rt_report.t
index 2b6063c..150c854 100644
--- a/t/re/pat_rt_report.t
+++ b/t/re/pat_rt_report.t
@@ -20,7 +20,7 @@ use warnings;
 use 5.010;
 use Config;

-plan tests => 2502;  # Update this when adding/deleting tests.
+plan tests => 2503;  # Update this when adding/deleting tests.

 run_tests() unless caller;

@@ -1131,6 +1131,12 @@ EOP
         my $s = "\x{f2}\x{140}\x{fe}\x{ff}\x{ff}\x{ff}";
         ok($s !~ /^0000.\34500\376\377\377\377/, "RT #129085");
     }
+    {
+        fresh_perl_is(
+            '"foo"=~/((?1)){8,0}/; print "ok"',
+            "ok", {}, 'RT #130561 - Optimised away recursion should
not cause SEGVs'); \+ \} \+

} # End of sub run_tests

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jan 27, 2017

From @demerphq

On 27 January 2017 at 07​:21, demerphq <demerphq@​gmail.com> wrote​:

On 26 January 2017 at 11​:20, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

Ugh, I experimented with this some more, and found that after my patch to deoptimize this loops until stack exhaustion​:

./miniperl -wle '"abaad" =~ m{(a) b ( ((?1)){2,1} | aa )d}x'

That is because your patch did not change regexec.c to fail when the
min/max are incorrect.

And I am wrong. I tried going down that road, and it is painful. So
painful I prepared a revert. But just before I was going to send it I
had an epiphany and hopefully 3rd time lucky.

Instead of converting the impossible construct into an OPFAIL, we can
inject an OPFAIL in *front* of the impossible construct. This means
that the construct is still reachable via recursion, but will never be
executed otherwise. And since we have not removed the OPEN/CLOSE the
segfaults that lead to this bugreport also go away.

So fixed in​:

commit 31fc939
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Fri Jan 27 10​:18​:51 2017 +0100

  fix RT #130561 - recursion and optimising away impossible
quantifiers are not friends

  Instead of optimising away impossible quantifiers like (foo){1,0} treat them
  as unquantified, and guard them with an OPFAIL. Thus /(foo){1,0}/ is treated
  the same as /(*FAIL)(foo)/ this is important in patterns like
/(foo){1,0}|(?1)/
  where the (?1) needs to be able to recurse into the (foo) even though the
  (foo){1,0} can never match. It also resolves various issues
(SEGVs) with patterns
  like /((?1)){1,0}/.

  This patch would have been easier if S_reginsert() documented that it is
  the callers responsibility to properly set up the NEXT_OFF() of the inserted
  node (if the node has a NEXT_OFF())

Along with a couple of cleanup patches which would have made fixing this easier.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jan 27, 2017

From @hvds

On Fri, 27 Jan 2017 01​:26​:03 -0800, demerphq wrote​:

Instead of converting the impossible construct into an OPFAIL, we can
inject an OPFAIL in *front* of the impossible construct.

Cool, I like.

This patch would have been easier if S_reginsert() documented that it is
the callers responsibility to properly set up the NEXT_OFF() of the inserted
node (if the node has a NEXT_OFF())

Should it be setting that only if PASS2? I didn't think we were supposed to write into the node during sizing.

(and earlier)
hv> So for now I'm going to look instead whether I can make it less dangerous to
hv> miss RExC_recurse entries, that feels like the surer (and more useful) target.
dq> That wont work either.

Just to clarify​: I assume you mean "that won't be enough to let us remove the ops".

I'm not entirely sure that I have a use case, but what I had in mind was to modify the fixup loop as in your "this GOSUB may have been optimized away" change, but combine it with a regexec-time check & panic should we see a GOSUB that hasn't had its target set (which requires inventing a way to flag that). Failing that, we should at least have an assert at the point of fixup, for a clea[nr]er failure mode.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jan 27, 2017

From @demerphq

On 27 January 2017 at 16​:15, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Fri, 27 Jan 2017 01​:26​:03 -0800, demerphq wrote​:

Instead of converting the impossible construct into an OPFAIL, we can
inject an OPFAIL in *front* of the impossible construct.

Cool, I like.

This patch would have been easier if S_reginsert() documented that it is
the callers responsibility to properly set up the NEXT_OFF() of the inserted
node (if the node has a NEXT_OFF())

Should it be setting that only if PASS2? I didn't think we were supposed to write into the node during sizing.

Good catch. Fix is testing and will be pushed shortly.

(and earlier)
hv> So for now I'm going to look instead whether I can make it less dangerous to
hv> miss RExC_recurse entries, that feels like the surer (and more useful) target.
dq> That wont work either.

Just to clarify​: I assume you mean "that won't be enough to let us remove the ops".

Yes, correct. "foo"=~/(foo){1,0}|(?1)/ should match (assuming we allow
the pattern at all).

I'm not entirely sure that I have a use case, but what I had in mind was to modify the fixup loop as in your "this GOSUB may have been optimized away" change, but combine it with a regexec-time check & panic should we see a GOSUB that hasn't had its target set (which requires inventing a way to flag that). Failing that, we should at least have an assert at the point of fixup, for a clea[nr]er failure mode.

If we do this in debug mode only then I see no harm. Right now I see
no immediate need, but your point about cleaner failure modes for
future developers is compelling. Can I leave that part to you?

Yves

@p5pRT
Copy link
Author

p5pRT commented Jan 27, 2017

From @demerphq

On 27 January 2017 at 17​:00, demerphq <demerphq@​gmail.com> wrote​:

On 27 January 2017 at 16​:15, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Fri, 27 Jan 2017 01​:26​:03 -0800, demerphq wrote​:

Instead of converting the impossible construct into an OPFAIL, we can
inject an OPFAIL in *front* of the impossible construct.

Cool, I like.

This patch would have been easier if S_reginsert() documented that it is
the callers responsibility to properly set up the NEXT_OFF() of the inserted
node (if the node has a NEXT_OFF())

Should it be setting that only if PASS2? I didn't think we were supposed to write into the node during sizing.

Good catch. Fix is testing and will be pushed shortly.

Pushed as

commit 2253e8fbc23277e000d9e0dc692359f58ba6b0f6
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Fri Jan 27 16​:57​:40 2017 +0100

  only mess with NEXT_OFF() when we are in PASS2

  In 31fc939 I added code that would modify
  NEXT_OFF() when we were not in PASS2, when we should not do so.
Strangly this
  did not segfault when I tested, but this fix is required.

Thanks Hugo!

cheers,
Yves

@p5pRT
Copy link
Author

p5pRT commented Jan 27, 2017

From @hvds

On Fri, 27 Jan 2017 08​:02​:14 -0800, demerphq wrote​:

On 27 January 2017 at 16​:15, Hugo van der Sanden wrote​:

I'm not entirely sure that I have a use case, but what I had in mind
was to modify the fixup loop as in your "this GOSUB may have been
optimized away" change, but combine it with a regexec-time check &
panic should we see a GOSUB that hasn't had its target set (which
requires inventing a way to flag that). Failing that, we should at
least have an assert at the point of fixup, for a clea[nr]er failure
mode.

If we do this in debug mode only then I see no harm.

I had planned to make it a mandatory panic (not only under DEBUGGING), instead of the SEGV or worse you would otherwise get.

Can I leave that part to you?

For now yes - I had already had a go at it (assuming I could set arg2=0 and test that for "not set") but got a bunch of test failures, which I haven't tried to dig into yet. If I can't use arg2, I might want to use flags - but as far as I can see we don't ever actually use those as flags, so some caution will be needed.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jan 27, 2017

From @demerphq

On 27 January 2017 at 19​:04, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Fri, 27 Jan 2017 08​:02​:14 -0800, demerphq wrote​:

On 27 January 2017 at 16​:15, Hugo van der Sanden wrote​:

I'm not entirely sure that I have a use case, but what I had in mind
was to modify the fixup loop as in your "this GOSUB may have been
optimized away" change, but combine it with a regexec-time check &
panic should we see a GOSUB that hasn't had its target set (which
requires inventing a way to flag that). Failing that, we should at
least have an assert at the point of fixup, for a clea[nr]er failure
mode.

If we do this in debug mode only then I see no harm.

I had planned to make it a mandatory panic (not only under DEBUGGING), instead of the SEGV or worse you would otherwise get.

Can I leave that part to you?

For now yes - I had already had a go at it (assuming I could set arg2=0 and test that for "not set") but got a bunch of test failures, which I haven't tried to dig into yet. If I can't use arg2, I might want to use flags - but as far as I can see we don't ever actually use those as flags, so some caution will be needed.

Wait wait. Are we talking about the same thing? I thought you were
talking about guarding the code that produced the SEGV with logic like
i posted in my GOSUB patch which I did not apply.

I think the bulk of that patch is fine to apply. I just don't see the
need to do those check except under debugging as we no longer optimise
away parens.

And really, the core idea of optimising a capturing paren is broken in
the face of recursion. You simply cannot do it. So basically we are
talking about adding guard code for a problem that should never occur.
IMO that means a DEBUGGING only feature.

Maybe I should just post the patch I am thinking of then you can do
whatever you like on top, and then we get out of the game of broken
telephone. (Will that term ever be modernized I wonder?)

:-)

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jan 27, 2017

From @hvds

On Fri, 27 Jan 2017 10​:32​:31 -0800, demerphq wrote​:

On 27 January 2017 at 19​:04, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Fri, 27 Jan 2017 08​:02​:14 -0800, demerphq wrote​:

On 27 January 2017 at 16​:15, Hugo van der Sanden wrote​:

I'm not entirely sure that I have a use case, but what I had in
mind
was to modify the fixup loop as in your "this GOSUB may have been
optimized away" change, but combine it with a regexec-time check &
panic should we see a GOSUB that hasn't had its target set (which
requires inventing a way to flag that). Failing that, we should at
least have an assert at the point of fixup, for a clea[nr]er
failure
mode.

If we do this in debug mode only then I see no harm.

I had planned to make it a mandatory panic (not only under
DEBUGGING), instead of the SEGV or worse you would otherwise get.

Can I leave that part to you?

For now yes - I had already had a go at it (assuming I could set
arg2=0 and test that for "not set") but got a bunch of test failures,
which I haven't tried to dig into yet. If I can't use arg2, I might
want to use flags - but as far as I can see we don't ever actually
use those as flags, so some caution will be needed.

Wait wait. Are we talking about the same thing? I thought you were
talking about guarding the code that produced the SEGV with logic like
i posted in my GOSUB patch which I did not apply.

I'm talking about​: carefully hiding mandatory fixups in an apparently optimization-only function such as study_chunk makes me nervous. Currently downstream code just assumes it has happened.

So the first thing I want to do is remove that assumption with a runtime check and panic.

The second thing I want to do is allow for the possibility we do optimizing things in the apparently optimization-only function, and take advantage of the runtime check to make it ok to skip unset RExC_recurse nodes (or ones that no longer point to GOSUBs).

I attach the failed attempt, that's the best explanation (though the commit message now needs changing).

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jan 27, 2017

From @hvds

0001-perl-130561-make-unstudied-GOSUB-less-dangerous.patch
From 462279b68e35efd7ccc8763c988355a2a9b49bc7 Mon Sep 17 00:00:00 2001
From: Hugo van der Sanden <hv@crypt.org>
Date: Thu, 26 Jan 2017 13:22:48 +0000
Subject: [PATCH] [perl #130561] make unstudied GOSUB less dangerous

GOSUB nodes cannot usefully point to themselves, so use an offset of 0
to mark its offset as unknown, and panic at runtime if we see it unknown.
This should make it safe to optimize out parts of the regexp that we know
are unreachable, since we can safely skip RExC_recurse[] entries that have
never been set.
---
 regcomp.c | 17 +++++++++--------
 regexec.c |  6 ++++++
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/regcomp.c b/regcomp.c
index 4f54b01..10ed80a 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -4735,8 +4735,8 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
                 start = RExC_open_parens[paren];
                 end   = RExC_close_parens[paren];
 
-                /* NOTE we MUST always execute the above code, even
-                 * if we do nothing with a GOSUB */
+                /* NOTE: it is a fatal error to execute the GOSUB if the above
+                 * has not been reached */
                 if (
                     ( flags & SCF_IN_DEFINE )
                     ||
@@ -7789,7 +7789,9 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count,
 
     while ( RExC_recurse_count > 0 ) {
         const regnode *scan = RExC_recurse[ --RExC_recurse_count ];
-        ARG2L_SET( scan, RExC_open_parens[ARG(scan)] - scan );
+        /* the node may have been optimized out before study */
+        if (scan)
+            ARG2L_SET( scan, RExC_open_parens[ARG(scan)] - scan );
     }
 
     Newxz(r->offs, RExC_npar, regexp_paren_pair);
@@ -10964,15 +10966,14 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth)
                     num = RExC_npar + num - 1;
                 }
                 /* We keep track how many GOSUB items we have produced.
-                   To start off the ARG2L() of the GOSUB holds its "id",
-                   which is used later in conjunction with RExC_recurse
-                   to calculate the offset we need to jump for the GOSUB,
-                   which it will store in the final representation.
+                   We set ARG2L() of the GOSUB initially to zero, then
+                   later in conjunction with RExC_recurse set it to the
+                   offset we need to jump for the GOSUB.
                    We have to defer the actual calculation until much later
                    as the regop may move.
                  */
 
-                ret = reg2Lanode(pRExC_state, GOSUB, num, RExC_recurse_count);
+                ret = reg2Lanode(pRExC_state, GOSUB, num, 0);
                 if (!SIZE_ONLY) {
 		    if (num > (I32)RExC_rx->nparens) {
 			RExC_parse++;
diff --git a/regexec.c b/regexec.c
index 4e4b4fd..c3c2648 100644
--- a/regexec.c
+++ b/regexec.c
@@ -6749,6 +6749,12 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
 	    re_sv = rex_sv;
             re = rex;
             rei = rexi;
+            if (ARG2L(scan) == 0) {
+                /* This should get set via study_chunk and RExC_recurse;
+                 * it should only be missed if the node is unreachable.
+                 */
+                croak("panic: GOSUB node does not know its target");
+            }
             startpoint = scan + ARG2L(scan);
             EVAL_CLOSE_PAREN_SET( st, arg );
             /* Detect infinite recursion
-- 
2.10.2

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2017

From @iabyn

On Fri, Jan 27, 2017 at 10​:25​:22AM +0100, demerphq wrote​:

So fixed in​:

commit 31fc939
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Fri Jan 27 10​:18​:51 2017 +0100

fix RT \#130561 \- recursion and optimising away impossible

quantifiers are not friends

Instead of optimising away impossible quantifiers like \(foo\)\{1\,0\} treat them
as unquantified\, and guard them with an OPFAIL\. Thus /\(foo\)\{1\,0\}/ is treated

That commit makes pat_rt_report.t produce noise on stderr​:

  $ ./perl -Ilib t/harness t/re/pat_rt_report.t
  re/pat_rt_report.t .. Quantifier {n,m} with n > m can't match in regex; marked by <-- HERE in m/(foo){1,0} <-- HERE |(?1)/ at re/pat_rt_report.t line 1140.
  re/pat_rt_report.t .. ok
  All tests successful.
  Files=1, Tests=2504, 2 wallclock secs ( 0.33 usr 0.00 sys + 1.18 cusr 0.03 csys = 1.54 CPU)
  Result​: PASS

Presumably 'no warnings "regex"' or similar is called for here?

--
Art is anything that has a label (especially if the label is "untitled 1")

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2017

From @demerphq

Oh. Sorry, I didn't notice that. I'm not close to a computer so I can't
fix directly. If you haven't done so already I will silence it when I get
access later.

Yves

On 28 Jan 2017 18​:30, "Dave Mitchell" <davem@​iabyn.com> wrote​:

On Fri, Jan 27, 2017 at 10​:25​:22AM +0100, demerphq wrote​:

So fixed in​:

commit 31fc939
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Fri Jan 27 10​:18​:51 2017 +0100

fix RT \#130561 \- recursion and optimising away impossible

quantifiers are not friends

Instead of optimising away impossible quantifiers like \(foo\)\{1\,0\}

treat them

as unquantified\, and guard them with an OPFAIL\. Thus /\(foo\)\{1\,0\}/ is

treated

That commit makes pat_rt_report.t produce noise on stderr​:

$ \./perl \-Ilib t/harness t/re/pat\_rt\_report\.t
re/pat\_rt\_report\.t \.\. Quantifier \{n\,m\} with n > m can't match in

regex; marked by <-- HERE in m/(foo){1,0} <-- HERE |(?1)/ at
re/pat_rt_report.t line 1140.
re/pat_rt_report.t .. ok
All tests successful.
Files=1, Tests=2504, 2 wallclock secs ( 0.33 usr 0.00 sys + 1.18
cusr 0.03 csys = 1.54 CPU)
Result​: PASS

Presumably 'no warnings "regex"' or similar is called for here?

--
Art is anything that has a label (especially if the label is "untitled 1")

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2017

From @demerphq

On 28 January 2017 at 12​:33, demerphq <demerphq@​gmail.com> wrote​:

Oh. Sorry, I didn't notice that. I'm not close to a computer so I can't
fix directly. If you haven't done so already I will silence it when I get
access later.

Fixed in

923e23b

sorry for not noticing earlier.

yves

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2017

From @demerphq

On 27 January 2017 at 20​:08, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Fri, 27 Jan 2017 10​:32​:31 -0800, demerphq wrote​:

On 27 January 2017 at 19​:04, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Fri, 27 Jan 2017 08​:02​:14 -0800, demerphq wrote​:

On 27 January 2017 at 16​:15, Hugo van der Sanden wrote​:

I'm not entirely sure that I have a use case, but what I had in
mind
was to modify the fixup loop as in your "this GOSUB may have been
optimized away" change, but combine it with a regexec-time check &
panic should we see a GOSUB that hasn't had its target set (which
requires inventing a way to flag that). Failing that, we should at
least have an assert at the point of fixup, for a clea[nr]er
failure
mode.

If we do this in debug mode only then I see no harm.

I had planned to make it a mandatory panic (not only under
DEBUGGING), instead of the SEGV or worse you would otherwise get.

Can I leave that part to you?

For now yes - I had already had a go at it (assuming I could set
arg2=0 and test that for "not set") but got a bunch of test failures,
which I haven't tried to dig into yet. If I can't use arg2, I might
want to use flags - but as far as I can see we don't ever actually
use those as flags, so some caution will be needed.

Wait wait. Are we talking about the same thing? I thought you were
talking about guarding the code that produced the SEGV with logic like
i posted in my GOSUB patch which I did not apply.

I'm talking about​: carefully hiding mandatory fixups in an apparently optimization-only function such as study_chunk makes me nervous. Currently downstream code just assumes it has happened.

I don't really get your concern here.

Lots of stuff like this happens in study_chunk().

And if this code doesnt happen in study chunk then i would have to
write a "walk the optree" sub that is basically a slimmed down version
of study chunk.

The problem is that we cannot reliably know the position and distance
between two regops until compilation is finished. reginsert() may be
called at any time to move the ops.

So once compilation is finished we need something to walk the data
structure, find the regops we care about and update them so they know
the offset they need to jump to etc.

So if this doesnt happen in study chunk we would have to write a copy
of study chunk that knows how to walk the regop tree, and we would
have to walk it twice for any pattern with recursion.

So the first thing I want to do is remove that assumption with a runtime check and panic.

I see no reason to burden every production regexp compilation with
validating that someone has hacked regexp.c to do something that they
should never do.

I am sympathetic to the idea that an assert would have prevented some
of this, so I do agree that in debugging mode we should do checks, but
I strongly object to the idea it should happen in non debugging
builds.

The second thing I want to do is allow for the possibility we do optimizing things in the apparently optimization-only function, and take advantage of the runtime check to make it ok to skip unset RExC_recurse nodes (or ones that no longer point to GOSUBs).

It is not ever ever ok to optimise away an OPEN without updating the
code so that the same logic that populates RExC_recursion. (See for
instance CURLYM).

If someone does, then you will break recursion. And that is not acceptable.

Anyway, I will push a patch with some improvements, and hopefully we
can then have something to discuss further.

Yves

@p5pRT
Copy link
Author

p5pRT commented Jan 28, 2017

From @demerphq

On 28 January 2017 at 15​:51, demerphq <demerphq@​gmail.com> wrote​:

On 27 January 2017 at 20​:08, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

On Fri, 27 Jan 2017 10​:32​:31 -0800, demerphq wrote​:
The problem is that we cannot reliably know the position and distance
between two regops until compilation is finished. reginsert() may be
called at any time to move the ops.

Actually we could do the same thing that we do for OPEN/CLOSE regops
in reginsert(). But since the optimiser still needs to follow all
recursion calls to determine if they are cyclic, it would mean a
higher cost for no advantage.

So the first thing I want to do is remove that assumption with a runtime check and panic.

I see no reason to burden every production regexp compilation with
validating that someone has hacked regexp.c to do something that they
should never do.

On further reflection I retract. It is not unreasonable to optimise
away a GOSUB op.

However I still see no reason to burden production regexp compilation
with these checks, after all they can only be triggered by changes to
the regex engine.

It is not ever ever ok to optimise away an OPEN without updating the
code so that the same logic that populates RExC_recursion. (See for
instance CURLYM).

If someone does, then you will break recursion. And that is not acceptable.

This is correct, but actually irrelevant to the exception we threw,
which came from having optimising away a GOSUB, which is a reasonable
thing to do (although AFAIK right now we don't do it.) But if someone
changes the code to do so then they will trigger the assert during
dev, and will see the comment explaining what is going on.

Anyway, I will push a patch with some improvements, and hopefully we
can then have something to discuss further.

I pushed the following patch​:

commit bc18b9d
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Sat Jan 28 16​:20​:35 2017 +0100

  assert that the RExC_recurse data structure points at a valid GOSUB

  This assert will fail if someone adds code that optimises away a GOSUB
  call. At which point they will see the comment and know what to do.

Inline Patch
diff --git a/regcomp.c b/regcomp.c
index d5ce63f..19ed866 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -7789,6 +7789,18 @@ Perl_re_op_compile(pTHX_ SV ** const patternp,
int pat_count,

  while ( RExC_recurse_count > 0 ) {
  const regnode *scan = RExC_recurse[ --RExC_recurse_count ];
+ /*
+ * This data structure is set up in study_chunk() and is used
+ * to calculate the distance between a GOSUB regopcode and
+ * the OPEN/CURLYM (CURLYM's are special and can act like OPEN's)
+ * it refers to.
+ *
+ * If for some reason someone writes code that optimises
+ * away a GOSUB opcode then the assert should be changed to
+ * an if(scan) to guard the ARG2L_SET() - Yves
+ *
+ */
+ assert(scan && OP(scan) == GOSUB);
  ARG2L_SET( scan, RExC_open_parens[ARG(scan)] - scan );
  }

Sorry my original overhasty and misleading reply.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jan 29, 2017

From @hvds

What I'd really like to do is reduce the amount of spooky action at a distance. Any time we find something that breaks unexpectedly, I'd rather think about how we can make things behave in a less unexpected manner than just put comments in - it's real hard to find a sane place to put such information that actually guarantees it'll be seen by someone trying to change the code in ways they might otherwise reasonably expect to be safe.

I was hoping we'd have made more progress towards compiling to an AST by now, which was what prompted my initial work on the study_chunk branch, but I guess you haven't had time to progress on that.

Whether a runtime check is suitable in this case or not, as a general principle I expect that the process of untangling the mess we have right now might incur small short-term performance costs; but that that's worth it in the expectation that once we have a clue what's going on we'll be able to implement better optimizations more easily.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jan 29, 2017

From @demerphq

On 29 January 2017 at 15​:57, Hugo van der Sanden via RT
<perlbug-followup@​perl.org> wrote​:

What I'd really like to do is reduce the amount of spooky action at a distance. Any time we find something that breaks unexpectedly, I'd rather think about how we can make things behave in a less unexpected manner than just put comments in - it's real hard to find a sane place to put such information that actually guarantees it'll be seen by someone trying to change the code in ways they might otherwise reasonably expect to be safe.

I was hoping we'd have made more progress towards compiling to an AST by now, which was what prompted my initial work on the study_chunk branch, but I guess you haven't had time to progress on that.

Whether a runtime check is suitable in this case or not, as a general principle I expect that the process of untangling the mess we have right now might incur small short-term performance costs; but that that's worth it in the expectation that once we have a clue what's going on we'll be able to implement better optimizations more easily.

Ok, so lets put that kind of restructuring onto the backburner for
when we do untangle the unholy mess that is study_chunk and regcomp.c

For what it is worth I am very sympathetic to your intentions here,
just not the immediate manifestation for this bug right now. Anybody
adding features to the regex engine should be doing it under
debugging, so if they break that code the assert will fire, they will
see the comment and fix it.

In the long run however you are quite correct that a proper AST and
peephole optimiser and etc, would make a lot of the hackery involved
here go away and be replaced by sensible and understandable code. So
lets park this for now, and see where we get with that.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@hvds
Copy link
Contributor

hvds commented Jan 23, 2020

Closing this ticket: various patches went in to fix the original issue; the further improvements discussed here are already planned for some future in which the regexp engine code has been made more tractable, keeping this (somewhat confusing) ticket open will not significantly help with that.

@hvds hvds closed this as completed Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants