New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coredump in Perl_re_op_compile #15809
Comments
From @dur-randirCreated by @dur-randirWhile fuzzing perl v5.25.8-207-gcbe2fc5001 built with afl and run /((?1)){8,0}/ to crash. GDB info about the crash location: #0 0x00007f565fa9442e in Perl_re_op_compile (patternp=0x0, Perl Info
|
From @dur-randirBefore v5.18.0, this'd just die() during a regex compilation. It was either exposed or caused by the following commit: 31c15ce is the first bad commit PATCH: [perl #82954] Make "Can't do {n,m} with n > m into warning This commit now causes this situation to warn instead of dying. The |
From [Unknown Contact. See original ticket]Before v5.18.0, this'd just die() during a regex compilation. It was either exposed or caused by the following commit: 31c15ce is the first bad commit PATCH: [perl #82954] Make "Can't do {n,m} with n > m into warning This commit now causes this situation to warn instead of dying. The |
From @hvdsOn Sun, 22 Jan 2017 04:48:49 -0800, randir wrote:
Yes, this change falls foul of: Karl, do you see any problem with just removing the optimizing aspect, per the attached patch? If not, I'll wrap in a test and commit. Hugo |
From @hvds0001-perl-130561-Don-t-attempt-to-optimize-x-3-1.patchFrom 5c0fa970ab98f53899b89c61536102da7b80b477 Mon Sep 17 00:00:00 2001
From: Hugo van der Sanden <hv@crypt.org>
Date: Wed, 25 Jan 2017 15:45:22 +0000
Subject: [PATCH] [perl #130561] Don't attempt to optimize /x{3,1}/
Nulling out the target of the quantifier risks causing other parts of
the regexp engine to get out of whack.
---
regcomp.c | 19 ++-----------------
1 file changed, 2 insertions(+), 17 deletions(-)
diff --git a/regcomp.c b/regcomp.c
index 97888ca..96339ac 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -11629,9 +11629,6 @@ S_regpiece(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
const char *maxpos = NULL;
UV uv;
- /* Save the original in case we change the emitted regop to a FAIL. */
- regnode * const orig_emit = RExC_emit;
-
GET_RE_DEBUG_FLAGS_DECL;
PERL_ARGS_ASSERT_REGPIECE;
@@ -11693,22 +11690,10 @@ S_regpiece(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
}
RExC_parse = next;
nextchar(pRExC_state);
- if (max < min) { /* If can't match, warn and optimize to fail
- unconditionally */
- if (SIZE_ONLY) {
-
- /* We can't back off the size because we have to reserve
- * enough space for all the things we are about to throw
- * away, but we can shrink it by the amount we are about
- * to re-use here */
- RExC_size += PREVOPER(RExC_size) - regarglen[(U8)OPFAIL];
- }
- else {
+ if (max < min) { /* If can't match, warn */
+ if (PASS2) {
ckWARNreg(RExC_parse, "Quantifier {n,m} with n > m can't match");
- RExC_emit = orig_emit;
}
- ret = reganode(pRExC_state, OPFAIL, 0);
- return ret;
}
else if (min == max && *RExC_parse == '?')
{
--
2.10.2
|
The RT System itself - Status changed from 'new' to 'open' |
From @khwilliamsonOn 01/25/2017 08:47 AM, Hugo van der Sanden via RT wrote:
I see no problems. Too bad that comment in the file is buried
|
From @hvdsUgh, I experimented with this some more, and found that after my patch to deoptimize this loops until stack exhaustion: ./miniperl -wle '"abaad" =~ m{(a) b ( ((?1)){2,1} | aa )d}x' I think it'd be worth trying to chase down the bugs here, but I suspect the fuzzers will find more, and we're quite late in the release cycle to be hoping for confidence in this before 5.26 freeze. So for now I'm going to look instead whether I can make it less dangerous to miss RExC_recurse entries, that feels like the surer (and more useful) target. Hugo |
From @demerphqOn 26 January 2017 at 11:20, Hugo van der Sanden via RT
That is because your patch did not change regexec.c to fail when the
That wont work either. This: "foo"=~/(foo){8,0}|(?1)/ should match if we allow this pattern to execute. If we optimise away Because of this I think we should either revert the original change FWIW, before I realized the flaw in the original logic I nearly pushed commit c934d69b23fd180481261c72c04cda25924cb10e fix [#130561], optimizing away a RECURSE op should not result in a SEGV It is possible that an OPEN which is the target of a recurse op, The simplest fix is to verify that scan and OP(scan) are correct Inline Patchdiff --git a/regcomp.c b/regcomp.c
index 4f54b01..e515204 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -7789,7 +7789,9 @@ Perl_re_op_compile(pTHX_ SV ** const patternp,
while ( RExC_recurse_count > 0 ) { Newxz(r->offs, RExC_npar, regexp_paren_pair); Inline Patchdiff --git a/t/re/pat_rt_report.t b/t/re/pat_rt_report.t
index 2b6063c..150c854 100644
--- a/t/re/pat_rt_report.t
+++ b/t/re/pat_rt_report.t
@@ -20,7 +20,7 @@ use warnings;
use 5.010;
use Config;
-plan tests => 2502; # Update this when adding/deleting tests.
+plan tests => 2503; # Update this when adding/deleting tests.
run_tests() unless caller;
@@ -1131,6 +1131,12 @@ EOP
my $s = "\x{f2}\x{140}\x{fe}\x{ff}\x{ff}\x{ff}";
ok($s !~ /^0000.\34500\376\377\377\377/, "RT #129085");
}
+ {
+ fresh_perl_is(
+ '"foo"=~/((?1)){8,0}/; print "ok"',
+ "ok", {}, 'RT #130561 - Optimised away recursion should
} # End of sub run_tests -- |
From @demerphqOn 27 January 2017 at 07:21, demerphq <demerphq@gmail.com> wrote:
And I am wrong. I tried going down that road, and it is painful. So Instead of converting the impossible construct into an OPFAIL, we can So fixed in: commit 31fc939 fix RT #130561 - recursion and optimising away impossible Instead of optimising away impossible quantifiers like (foo){1,0} treat them This patch would have been easier if S_reginsert() documented that it is Along with a couple of cleanup patches which would have made fixing this easier. Yves -- |
From @hvdsOn Fri, 27 Jan 2017 01:26:03 -0800, demerphq wrote:
Cool, I like.
Should it be setting that only if PASS2? I didn't think we were supposed to write into the node during sizing. (and earlier) Just to clarify: I assume you mean "that won't be enough to let us remove the ops". I'm not entirely sure that I have a use case, but what I had in mind was to modify the fixup loop as in your "this GOSUB may have been optimized away" change, but combine it with a regexec-time check & panic should we see a GOSUB that hasn't had its target set (which requires inventing a way to flag that). Failing that, we should at least have an assert at the point of fixup, for a clea[nr]er failure mode. Hugo |
From @demerphqOn 27 January 2017 at 16:15, Hugo van der Sanden via RT
Good catch. Fix is testing and will be pushed shortly.
Yes, correct. "foo"=~/(foo){1,0}|(?1)/ should match (assuming we allow
If we do this in debug mode only then I see no harm. Right now I see Yves |
From @demerphqOn 27 January 2017 at 17:00, demerphq <demerphq@gmail.com> wrote:
Pushed as commit 2253e8fbc23277e000d9e0dc692359f58ba6b0f6 only mess with NEXT_OFF() when we are in PASS2 In 31fc939 I added code that would modify Thanks Hugo! cheers, |
From @hvdsOn Fri, 27 Jan 2017 08:02:14 -0800, demerphq wrote:
I had planned to make it a mandatory panic (not only under DEBUGGING), instead of the SEGV or worse you would otherwise get.
For now yes - I had already had a go at it (assuming I could set arg2=0 and test that for "not set") but got a bunch of test failures, which I haven't tried to dig into yet. If I can't use arg2, I might want to use flags - but as far as I can see we don't ever actually use those as flags, so some caution will be needed. Hugo |
From @demerphqOn 27 January 2017 at 19:04, Hugo van der Sanden via RT
Wait wait. Are we talking about the same thing? I thought you were I think the bulk of that patch is fine to apply. I just don't see the And really, the core idea of optimising a capturing paren is broken in Maybe I should just post the patch I am thinking of then you can do :-) cheers, -- |
From @hvdsOn Fri, 27 Jan 2017 10:32:31 -0800, demerphq wrote:
I'm talking about: carefully hiding mandatory fixups in an apparently optimization-only function such as study_chunk makes me nervous. Currently downstream code just assumes it has happened. So the first thing I want to do is remove that assumption with a runtime check and panic. The second thing I want to do is allow for the possibility we do optimizing things in the apparently optimization-only function, and take advantage of the runtime check to make it ok to skip unset RExC_recurse nodes (or ones that no longer point to GOSUBs). I attach the failed attempt, that's the best explanation (though the commit message now needs changing). Hugo |
From @hvds0001-perl-130561-make-unstudied-GOSUB-less-dangerous.patchFrom 462279b68e35efd7ccc8763c988355a2a9b49bc7 Mon Sep 17 00:00:00 2001
From: Hugo van der Sanden <hv@crypt.org>
Date: Thu, 26 Jan 2017 13:22:48 +0000
Subject: [PATCH] [perl #130561] make unstudied GOSUB less dangerous
GOSUB nodes cannot usefully point to themselves, so use an offset of 0
to mark its offset as unknown, and panic at runtime if we see it unknown.
This should make it safe to optimize out parts of the regexp that we know
are unreachable, since we can safely skip RExC_recurse[] entries that have
never been set.
---
regcomp.c | 17 +++++++++--------
regexec.c | 6 ++++++
2 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/regcomp.c b/regcomp.c
index 4f54b01..10ed80a 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -4735,8 +4735,8 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp,
start = RExC_open_parens[paren];
end = RExC_close_parens[paren];
- /* NOTE we MUST always execute the above code, even
- * if we do nothing with a GOSUB */
+ /* NOTE: it is a fatal error to execute the GOSUB if the above
+ * has not been reached */
if (
( flags & SCF_IN_DEFINE )
||
@@ -7789,7 +7789,9 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count,
while ( RExC_recurse_count > 0 ) {
const regnode *scan = RExC_recurse[ --RExC_recurse_count ];
- ARG2L_SET( scan, RExC_open_parens[ARG(scan)] - scan );
+ /* the node may have been optimized out before study */
+ if (scan)
+ ARG2L_SET( scan, RExC_open_parens[ARG(scan)] - scan );
}
Newxz(r->offs, RExC_npar, regexp_paren_pair);
@@ -10964,15 +10966,14 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth)
num = RExC_npar + num - 1;
}
/* We keep track how many GOSUB items we have produced.
- To start off the ARG2L() of the GOSUB holds its "id",
- which is used later in conjunction with RExC_recurse
- to calculate the offset we need to jump for the GOSUB,
- which it will store in the final representation.
+ We set ARG2L() of the GOSUB initially to zero, then
+ later in conjunction with RExC_recurse set it to the
+ offset we need to jump for the GOSUB.
We have to defer the actual calculation until much later
as the regop may move.
*/
- ret = reg2Lanode(pRExC_state, GOSUB, num, RExC_recurse_count);
+ ret = reg2Lanode(pRExC_state, GOSUB, num, 0);
if (!SIZE_ONLY) {
if (num > (I32)RExC_rx->nparens) {
RExC_parse++;
diff --git a/regexec.c b/regexec.c
index 4e4b4fd..c3c2648 100644
--- a/regexec.c
+++ b/regexec.c
@@ -6749,6 +6749,12 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
re_sv = rex_sv;
re = rex;
rei = rexi;
+ if (ARG2L(scan) == 0) {
+ /* This should get set via study_chunk and RExC_recurse;
+ * it should only be missed if the node is unreachable.
+ */
+ croak("panic: GOSUB node does not know its target");
+ }
startpoint = scan + ARG2L(scan);
EVAL_CLOSE_PAREN_SET( st, arg );
/* Detect infinite recursion
--
2.10.2
|
From @iabynOn Fri, Jan 27, 2017 at 10:25:22AM +0100, demerphq wrote:
That commit makes pat_rt_report.t produce noise on stderr: $ ./perl -Ilib t/harness t/re/pat_rt_report.t Presumably 'no warnings "regex"' or similar is called for here? -- |
From @demerphqOh. Sorry, I didn't notice that. I'm not close to a computer so I can't Yves On 28 Jan 2017 18:30, "Dave Mitchell" <davem@iabyn.com> wrote:
|
From @demerphqOn 27 January 2017 at 20:08, Hugo van der Sanden via RT
I don't really get your concern here. Lots of stuff like this happens in study_chunk(). And if this code doesnt happen in study chunk then i would have to The problem is that we cannot reliably know the position and distance So once compilation is finished we need something to walk the data So if this doesnt happen in study chunk we would have to write a copy
I see no reason to burden every production regexp compilation with I am sympathetic to the idea that an assert would have prevented some
It is not ever ever ok to optimise away an OPEN without updating the If someone does, then you will break recursion. And that is not acceptable. Anyway, I will push a patch with some improvements, and hopefully we Yves |
From @demerphqOn 28 January 2017 at 15:51, demerphq <demerphq@gmail.com> wrote:
Actually we could do the same thing that we do for OPEN/CLOSE regops
On further reflection I retract. It is not unreasonable to optimise However I still see no reason to burden production regexp compilation
This is correct, but actually irrelevant to the exception we threw,
I pushed the following patch: commit bc18b9d assert that the RExC_recurse data structure points at a valid GOSUB This assert will fail if someone adds code that optimises away a GOSUB Inline Patchdiff --git a/regcomp.c b/regcomp.c
index d5ce63f..19ed866 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -7789,6 +7789,18 @@ Perl_re_op_compile(pTHX_ SV ** const patternp,
while ( RExC_recurse_count > 0 ) { Sorry my original overhasty and misleading reply. cheers, -- |
From @hvdsWhat I'd really like to do is reduce the amount of spooky action at a distance. Any time we find something that breaks unexpectedly, I'd rather think about how we can make things behave in a less unexpected manner than just put comments in - it's real hard to find a sane place to put such information that actually guarantees it'll be seen by someone trying to change the code in ways they might otherwise reasonably expect to be safe. I was hoping we'd have made more progress towards compiling to an AST by now, which was what prompted my initial work on the study_chunk branch, but I guess you haven't had time to progress on that. Whether a runtime check is suitable in this case or not, as a general principle I expect that the process of untangling the mess we have right now might incur small short-term performance costs; but that that's worth it in the expectation that once we have a clue what's going on we'll be able to implement better optimizations more easily. Hugo |
From @demerphqOn 29 January 2017 at 15:57, Hugo van der Sanden via RT
Ok, so lets put that kind of restructuring onto the backburner for For what it is worth I am very sympathetic to your intentions here, In the long run however you are quite correct that a proper AST and Yves -- |
Closing this ticket: various patches went in to fix the original issue; the further improvements discussed here are already planned for some future in which the regexp engine code has been made more tractable, keeping this (somewhat confusing) ticket open will not significantly help with that. |
Migrated from rt.perl.org#130561 (status was 'open')
Searchable as RT130561$
The text was updated successfully, but these errors were encountered: