New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion rx->sublen >= (s - rx->subbeg) + i failed: file "regcomp.c" #9566
Comments
From @clintongormleyCreated by @clintongormleyThis is a bug report for perl from clint@traveljury.com, ----------------------------------------------------------------- Assertion rx->sublen >= (s - rx->subbeg) + i failed: file "regcomp.c", I can't tell you what regex or value was being checked at the time, return HTML::Entities::encode_entities( $_[0], q(<>&"') ); All of my text is decoded utf8. I don't know what other information I can give you to help thanks Clint Perl Info
|
From @rgs2008/11/12 via RT Clinton Gormley <perlbug-followup@perl.org>:
If this bug is reproducible, we'd need a small program to replicate |
The RT System itself - Status changed from 'new' to 'open' |
From @clintongormley
Yeah, sorry - I realise it's a rubbish bug report. Unfortunately, I've There are only two places I could see this coming from, neither of them The first is Template Toolkit (v2.20) using the XS stash: From the compiled template: This is pretty standard stuff, and shouldn't have anything to do with user data. And the second (which is the only regexy thing called by the above), is: where $_[0] would be the UTF8-decoded value passed in by the user (the The error was thrown on a polish site, so it is likely that non Latin-1 hope this helps (a little) clint |
From @clintongormley
OK - I added some debugging and have seen this error a few times, but am It looks like a string which will trigger this error is: "np. imi\x{119}, nazwisko, s\x{142}owa kluczowe" and the only regex-related code which touches it is: #=================================== (HTML::Entities v 1.35) I've tried running the same code on the same object, and it works just I can only assume this is intermittent. I've only seen this error 5 Other strings that have triggered the bug: Hope this helps Clint |
From dwtrusty@hotmail.comOn Thu Nov 13 08:57:24 2008, rafael wrote:
I have a test case which reproduces this problem. I have attached |
From @gannett-ggreerI too have this problem, although with XML::Simple rather than - - - 8< - - - 8< - - - my $twig = XML::Twig->new(twig_handlers => { $ ./parse.pl Twig.pm:7806 is: $string=~ s/([&<])/$XML::Twig::base_ent{$1}/g unless( $keep_encoding || Versions: (My own crash has too much code and data to easily cut down.) |
From @gannett-ggreerFurther testing indicates Perl built from git repository works and Perl |
From @gannett-ggreerCorrection to the above: not a patch, but the parameters given to perl5.11.0: regcomp.c:5176: Perl_reg_numbered_buff_fetch: Assertion |
From @gannett-ggreerRemoving "-g" from -Doptimize passes the test case. |
From @gannett-ggreerI submitted to Fedora: |
From mmaslano@redhat.comOn Tue Apr 07 20:24:15 2009, greerga wrote:
As was said in Fedora's bugzilla this is not problem of rpm or distro. |
From @gannett-ggreerI have distilled this assertion failure down to the following program: - - - 8< - - - 8< - - - $k1 = "<a> xxxxx x x <>"; $k2 = "pu\x{f1}al \x{2022}"; Notes: For Perl 5.10.0 from Fedora (perl-5.10.0-56.fc10.i386): For bleadperl in git as of this writing |
From lubo.rintel@gooddata.comBisected to: commit c74340f Re: [PATCH] Fix RT#19049 and add relative backreferences |
From @avarOn Sun, Apr 19, 2009 at 7:27 AM, George Greer via RT
According to git bisect (bisect scripts attached): c74340f is first bad commit Re: [PATCH] Fix RT#19049 and add relative backreferences p4raw-id: //depot/perl@29279 :040000 040000 83188149c434a6cc022cceedd0d4f022869d37e9 |
From @avar2009/4/19 Ævar Arnfjörð Bjarmason <avarab@gmail.com>:
Hrm, actually that's a different failure, from git bisect output: Assertion rx->sublen >= (s - rx->subbeg) + i failed: file "regcomp.c", [...] Assertion rx->sublen >= s1 failed: file "mg.c", line 866 at I.e. the revision I found has an assertion error too, but somewhere But I'm done bisecting for today :) |
From @gannett-ggreerMore fun with recursive regex engine? The backtrace: - - - 8< - - - 8< - - - Program received signal SIGABRT, Aborted. Some details at the time of the assertion: - - - 8< - - - 8< - - - Assert: rx->sublen >= (s - rx->subbeg) + i Interestingly: - - - 8< - - - 8< - - - which is what I would expect since "([&<>])" matches the & in - - - 8< - - - 8< - - - Something not restoring ->offs (or clobbering the saved one) or I'm |
From @gannett-ggreerAttached a guess patch to fix versus blead. I don't guarantee anything Original bug report test case with blead+patch: 17:28:55 ggreer@ggreer-l:~/projects/git/perl$ ./perl -Ilib Original bug report test case with Perl 5.10 (Fedora): 17:29:06 ggreer@ggreer-l:~/projects/git/perl$ perl /tmp/ffff/parse.pl My bug report test case with blead+patch: 17:29:14 ggreer@ggreer-l:~/projects/git/perl$ ./perl -CS -Ilib My bug report test case with Perl 5.10 (Fedora): 17:35:04 ggreer@ggreer-l:~/projects/git/perl$ perl -CS ~/tmp/perltestcase.pl The above uses a slightly tweaked version of my test case: $k1 = "." x 4 . ">>"; $k2 = "\x{f1}\x{2022}"; make test says: |
From @gannett-ggreer60508.patchdiff --git a/proto.h b/proto.h
index 3f95eb5..35c104e 100644
--- a/proto.h
+++ b/proto.h
@@ -5431,11 +5431,6 @@ STATIC char* S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, cons
#define PERL_ARGS_ASSERT_FIND_BYCLASS \
assert(prog); assert(c); assert(s); assert(strend)
-STATIC void S_swap_match_buff(pTHX_ regexp * prog)
- __attribute__nonnull__(pTHX_1);
-#define PERL_ARGS_ASSERT_SWAP_MATCH_BUFF \
- assert(prog)
-
STATIC void S_to_utf8_substr(pTHX_ regexp * prog)
__attribute__nonnull__(pTHX_1);
#define PERL_ARGS_ASSERT_TO_UTF8_SUBSTR \
diff --git a/regcomp.c b/regcomp.c
index e061528..1b7eaee 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -9382,7 +9382,6 @@ Perl_pregfree2(pTHX_ REGEXP *rx)
if (r->saved_copy)
SvREFCNT_dec(r->saved_copy);
#endif
- Safefree(r->swap);
Safefree(r->offs);
}
@@ -9441,7 +9440,6 @@ Perl_reg_temp_copy (pTHX_ REGEXP *rx)
ret->saved_copy = NULL;
#endif
ret->mother_re = rx;
- ret->swap = NULL;
return ret_x;
}
@@ -9609,10 +9607,6 @@ Perl_re_dup_guts(pTHX_ const REGEXP *sstr, REGEXP *dstr, CLONE_PARAMS *param)
npar = r->nparens+1;
Newx(ret->offs, npar, regexp_paren_pair);
Copy(r->offs, ret->offs, npar, regexp_paren_pair);
- if(ret->swap) {
- /* no need to copy these */
- Newx(ret->swap, npar, regexp_paren_pair);
- }
if (ret->substrs) {
/* Do it this way to avoid reading from *r after the StructCopy().
diff --git a/regexec.c b/regexec.c
index 93fadab..c9358b4 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1734,28 +1734,6 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
return s;
}
-static void
-S_swap_match_buff (pTHX_ regexp *prog)
-{
- regexp_paren_pair *t;
-
- PERL_ARGS_ASSERT_SWAP_MATCH_BUFF;
-
- if (!prog->swap) {
- /* We have to be careful. If the previous successful match
- was from this regex we don't want a subsequent paritally
- successful match to clobber the old results.
- So when we detect this possibility we add a swap buffer
- to the re, and switch the buffer each match. If we fail
- we switch it back, otherwise we leave it swapped.
- */
- Newxz(prog->swap, (prog->nparens + 1), regexp_paren_pair);
- }
- t = prog->swap;
- prog->swap = prog->offs;
- prog->offs = t;
-}
-
/*
- regexec_flags - match a regexp against a string
@@ -1785,7 +1763,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
I32 multiline;
RXi_GET_DECL(prog,progi);
regmatch_info reginfo; /* create some info to pass to regtry etc */
- bool swap_on_fail = 0;
+ regexp_paren_pair *swap = NULL;
GET_RE_DEBUG_FLAGS_DECL;
PERL_ARGS_ASSERT_REGEXEC_FLAGS;
@@ -1863,9 +1841,19 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
reginfo.ganch = strbeg;
}
if (PL_curpm && (PM_GETRE(PL_curpm) == rx)) {
- swap_on_fail = 1;
- swap_match_buff(prog); /* do we need a save destructor here for
- eval dies? */
+ regexp_paren_pair *t;
+ /* We have to be careful. If the previous successful match
+ was from this regex we don't want a subsequent partially
+ successful match to clobber the old results.
+ So when we detect this possibility we add a swap buffer
+ to the re, and switch the buffer each match. If we fail
+ we switch it back, otherwise we leave it swapped.
+ */
+ /* do we need a save destructor here for eval dies? */
+ Newxz(swap, (prog->nparens + 1), regexp_paren_pair);
+ t = prog->offs;
+ prog->offs = swap;
+ swap = t;
}
if (!(flags & REXEC_CHECKED) && (prog->check_substr != NULL || prog->check_utf8 != NULL)) {
re_scream_pos_data d;
@@ -2166,6 +2154,9 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
got_it:
RX_MATCH_TAINTED_set(rx, PL_reg_flags & RF_tainted);
+ /* Keep successful match buffer rather than the original one. */
+ Safefree(swap);
+
if (PL_reg_eval_set)
restore_pos(aTHX_ prog);
if (RXp_PAREN_NAMES(prog))
@@ -2209,9 +2200,13 @@ phooey:
PL_colors[4], PL_colors[5]));
if (PL_reg_eval_set)
restore_pos(aTHX_ prog);
- if (swap_on_fail)
+ if (swap) {
+ regexp_paren_pair *t;
/* we failed :-( roll it back */
- swap_match_buff(prog);
+ t = prog->offs;
+ prog->offs = swap;
+ Safefree(t);
+ }
return 0;
}
diff --git a/regexp.h b/regexp.h
index e8a8cc0..7a2e917 100644
--- a/regexp.h
+++ b/regexp.h
@@ -88,7 +88,6 @@ typedef struct regexp_paren_pair {
/* during matching */ \
U32 lastparen; /* last open paren matched */ \
U32 lastcloseparen; /* last close paren matched */ \
- regexp_paren_pair *swap; /* Swap copy of *offs */ \
/* Array of offsets for (@-) and (@+) */ \
regexp_paren_pair *offs; \
/* saved or original string so \digit works forever. */ \
|
From @nwc10Dave notes: looks like a regression in 5.10.0, maint, blead. Nick notes: There is a patch for review. |
From @gannett-ggreerOn Thu May 28 07:57:21 2009, nicholas wrote:
I'm revising the patch to include more areas that need to be removed to |
From @gannett-ggreerSent a revised patch to p5p although since I recreated it on a different |
From @rgsLink to my last comment on P5P : |
From @gannett-ggreerhttp://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html |
From @schwernLooks like this is hung waiting for review of George's last patch. I've written up a test for it. I couldn't fathom what file to put it in There's also a small patch to fresh_perl_is(). It already strips the |
From @schwern0001-Make-fresh_perl_is-strip-newlines-off-the-expected.patchFrom 326ac6e9382ab3a8a33ccb56c0f8b7e20d6043c3 Mon Sep 17 00:00:00 2001
From: Michael G. Schwern <schwern@pobox.com>
Date: Sun, 12 Jul 2009 02:30:26 -0700
Subject: [PATCH 1/2] Make fresh_perl_is() strip newlines off the expected result just like it does the result so tests don't weirdly fail just because the author didn't realize it was normalizing newlines.
---
t/test.pl | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/t/test.pl b/t/test.pl
index 32c4a37..4b2161f 100644
--- a/t/test.pl
+++ b/t/test.pl
@@ -681,7 +681,7 @@ sub _fresh_perl {
my $status = $?;
# Clean up the results into something a bit more predictable.
- $results =~ s/\n+$//;
+ $results =~ s/\n+$//;
$results =~ s/at\s+$::tempfile_regexp\s+line/at - line/g;
$results =~ s/of\s+$::tempfile_regexp\s+aborted/of - aborted/g;
@@ -722,6 +722,11 @@ sub _fresh_perl {
sub fresh_perl_is {
my($prog, $expected, $runperl_args, $name) = @_;
+
+ # _fresh_perl() is going to clip the trailing newlines off the result.
+ # This will make it so the test author doesn't have to know that.
+ $expected =~ s/\n+$//;
+
local $Level = 2;
_fresh_perl($prog,
sub { @_ ? $_[0] eq $expected : $expected },
--
1.6.2.4
|
From @schwern0002-This-is-a-test-for-rt.cpan.org-60508-which-I-can-t-f.patchFrom cec9ff832f6337e3ec1492742431f7813cfacaac Mon Sep 17 00:00:00 2001
From: Michael G. Schwern <schwern@pobox.com>
Date: Sun, 12 Jul 2009 02:33:47 -0700
Subject: [PATCH 2/2] This is a test for rt.cpan.org 60508 which I can't figure out where else
to put it or what the underlying problem is, but it has to go somewhere.
---
t/op/reg_60508.t | 40 ++++++++++++++++++++++++++++++++++++++++
1 files changed, 40 insertions(+), 0 deletions(-)
create mode 100644 t/op/reg_60508.t
diff --git a/t/op/reg_60508.t b/t/op/reg_60508.t
new file mode 100644
index 0000000..96a4fef
--- /dev/null
+++ b/t/op/reg_60508.t
@@ -0,0 +1,40 @@
+#!./perl
+
+# This is a test for rt.cpan.org 60508 which I can't figure out where else
+# to put it or what the underlying problem is, but it has to go somewhere.
+# --Schwern
+
+BEGIN {
+ chdir 't' if -d 't';
+ @INC = '../lib';
+ require './test.pl';
+}
+
+use utf8;
+plan tests => 1;
+
+{
+ my $expect = <<"EXPECT";
+k1 = ....
+k2.1 = >\x{2022}
+k2.2 = \x{2022}
+EXPECT
+ utf8::encode($expect);
+
+ local $TODO = "rt.cpan.org 60508";
+
+ fresh_perl_is(<<'CODE', $expect, {});
+binmode STDOUT, ":utf8";
+sub f { $_[0] =~ s/([>X])//g; }
+
+$k1 = "." x 4 . ">>";
+f($k1);
+print "k1 = $k1\n";
+
+$k2 = "\x{f1}\x{2022}";
+$k2 =~ s/([\360-\362])/>/g;
+print "k2.1 = $k2\n";
+f($k2);
+print "k2.2 = $k2\n";
+CODE
+}
--
1.6.2.4
|
From @gannett-ggreerOn Sun Jul 12 02:59:44 2009, schwern wrote:
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html Which reminded me to go make the 3rd revision of the patch which updates
It is probably better to use fresh_perl_is() like yours, but take a look -George |
From @gannett-ggreer0001-Commit-c74340f9-added-backreferences-as-well-as-the.patchFrom 57b3ac303670d51e25c82940e6cfa0ff0311bce3 Mon Sep 17 00:00:00 2001
From: George Greer <greerga@m-l.org>
Date: Sun, 12 Jul 2009 14:53:29 -0400
Subject: [PATCH] Commit c74340f9 added backreferences as well as the idea of a ->swap regex
pointer to keep track of the match offsets in case of backtracking. The
problem is that when Perl re-enters the regex engine to handle
utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture
from the utf8 (Perl) code could inadvertently modify the regex match data
that caused the utf8 swash to get built.
---
embed.fnc | 1 -
embed.h | 2 -
ext/Devel-PPPort/parts/embed.fnc | 1 -
pod/perlreapi.pod | 2 +-
pod/perlreguts.pod | 13 +++++-----
proto.h | 5 ----
regcomp.c | 2 -
regexec.c | 46 ++++++++++++++------------------------
regexp.h | 2 +-
t/op/pat.t | 20 +++++++++++++++-
10 files changed, 44 insertions(+), 50 deletions(-)
diff --git a/embed.fnc b/embed.fnc
index 3ff1b89..6d89a40 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1678,7 +1678,6 @@ ERsn |U8* |reghop4 |NN U8 *s|I32 off|NN const U8 *llim \
#endif
ERsn |U8* |reghopmaybe3 |NN U8 *s|I32 off|NN const U8 *lim
ERs |char* |find_byclass |NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo
-Es |void |swap_match_buff|NN regexp * prog
Es |void |to_utf8_substr |NN regexp * prog
Es |void |to_byte_substr |NN regexp * prog
ERs |I32 |reg_check_named_buff_matched |NN const regexp *rex \
diff --git a/embed.h b/embed.h
index 6f6877f..67c6fd2 100644
--- a/embed.h
+++ b/embed.h
@@ -1468,7 +1468,6 @@
#if defined(PERL_CORE) || defined(PERL_EXT)
#define reghopmaybe3 S_reghopmaybe3
#define find_byclass S_find_byclass
-#define swap_match_buff S_swap_match_buff
#define to_utf8_substr S_to_utf8_substr
#define to_byte_substr S_to_byte_substr
#define reg_check_named_buff_matched S_reg_check_named_buff_matched
@@ -3814,7 +3813,6 @@
#if defined(PERL_CORE) || defined(PERL_EXT)
#define reghopmaybe3 S_reghopmaybe3
#define find_byclass(a,b,c,d,e) S_find_byclass(aTHX_ a,b,c,d,e)
-#define swap_match_buff(a) S_swap_match_buff(aTHX_ a)
#define to_utf8_substr(a) S_to_utf8_substr(aTHX_ a)
#define to_byte_substr(a) S_to_byte_substr(aTHX_ a)
#define reg_check_named_buff_matched(a,b) S_reg_check_named_buff_matched(aTHX_ a,b)
diff --git a/ext/Devel-PPPort/parts/embed.fnc b/ext/Devel-PPPort/parts/embed.fnc
index 68f3817..48cb9f3 100644
--- a/ext/Devel-PPPort/parts/embed.fnc
+++ b/ext/Devel-PPPort/parts/embed.fnc
@@ -1677,7 +1677,6 @@ ERsn |U8* |reghop4 |NN U8 *s|I32 off|NN const U8 *llim \
#endif
ERsn |U8* |reghopmaybe3 |NN U8 *s|I32 off|NN const U8 *lim
ERs |char* |find_byclass |NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo
-Es |void |swap_match_buff|NN regexp * prog
Es |void |to_utf8_substr |NN regexp * prog
Es |void |to_byte_substr |NN regexp * prog
ERs |I32 |reg_check_named_buff_matched |NN const regexp *rex \
diff --git a/pod/perlreapi.pod b/pod/perlreapi.pod
index b0d6275..03996fd 100644
--- a/pod/perlreapi.pod
+++ b/pod/perlreapi.pod
@@ -598,7 +598,7 @@ engine should use something else.
=head2 C<swap>
-TODO: document
+Unused. Left in for compatibility with perl 5.10.0.
=head2 C<offs>
diff --git a/pod/perlreguts.pod b/pod/perlreguts.pod
index 2049931..b9f306a 100644
--- a/pod/perlreguts.pod
+++ b/pod/perlreguts.pod
@@ -810,13 +810,12 @@ value to other engine implementations.
=item C<swap>
-C<swap> is an extra set of startp/endp stored in a C<regexp_paren_ofs>
-struct. This is used when the last successful match was from the same pattern
-as the current pattern, so that a partial match doesn't overwrite the
-previous match's results. When this field is data filled the matching
-engine will swap buffers before every match attempt. If the match fails,
-then it swaps them back. If it's successful it leaves them. This field
-is populated on demand and is by default null.
+C<swap> formerly was an extra set of startp/endp stored in a
+C<regexp_paren_ofs> struct. This was used when the last successful match
+was from the same pattern as the current pattern, so that a partial
+match didn't overwrite the previous match's results, but it caused a
+problem with re-entrant code such as trying to build the UTF-8 swashes.
+Currently unused and left for backward compatibility with 5.10.0.
=item C<offsets>
diff --git a/proto.h b/proto.h
index 427600e..37d1371 100644
--- a/proto.h
+++ b/proto.h
@@ -5434,11 +5434,6 @@ STATIC char* S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, cons
#define PERL_ARGS_ASSERT_FIND_BYCLASS \
assert(prog); assert(c); assert(s); assert(strend)
-STATIC void S_swap_match_buff(pTHX_ regexp * prog)
- __attribute__nonnull__(pTHX_1);
-#define PERL_ARGS_ASSERT_SWAP_MATCH_BUFF \
- assert(prog)
-
STATIC void S_to_utf8_substr(pTHX_ regexp * prog)
__attribute__nonnull__(pTHX_1);
#define PERL_ARGS_ASSERT_TO_UTF8_SUBSTR \
diff --git a/regcomp.c b/regcomp.c
index 50b0632..41211cf 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -9414,7 +9414,6 @@ Perl_pregfree2(pTHX_ REGEXP *rx)
if (r->saved_copy)
SvREFCNT_dec(r->saved_copy);
#endif
- Safefree(r->swap);
Safefree(r->offs);
}
@@ -9473,7 +9472,6 @@ Perl_reg_temp_copy (pTHX_ REGEXP *rx)
ret->saved_copy = NULL;
#endif
ret->mother_re = rx;
- ret->swap = NULL;
return ret_x;
}
diff --git a/regexec.c b/regexec.c
index 93fadab..dc2a01b 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1734,28 +1734,6 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
return s;
}
-static void
-S_swap_match_buff (pTHX_ regexp *prog)
-{
- regexp_paren_pair *t;
-
- PERL_ARGS_ASSERT_SWAP_MATCH_BUFF;
-
- if (!prog->swap) {
- /* We have to be careful. If the previous successful match
- was from this regex we don't want a subsequent paritally
- successful match to clobber the old results.
- So when we detect this possibility we add a swap buffer
- to the re, and switch the buffer each match. If we fail
- we switch it back, otherwise we leave it swapped.
- */
- Newxz(prog->swap, (prog->nparens + 1), regexp_paren_pair);
- }
- t = prog->swap;
- prog->swap = prog->offs;
- prog->offs = t;
-}
-
/*
- regexec_flags - match a regexp against a string
@@ -1785,7 +1763,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
I32 multiline;
RXi_GET_DECL(prog,progi);
regmatch_info reginfo; /* create some info to pass to regtry etc */
- bool swap_on_fail = 0;
+ regexp_paren_pair *swap = NULL;
GET_RE_DEBUG_FLAGS_DECL;
PERL_ARGS_ASSERT_REGEXEC_FLAGS;
@@ -1863,9 +1841,16 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
reginfo.ganch = strbeg;
}
if (PL_curpm && (PM_GETRE(PL_curpm) == rx)) {
- swap_on_fail = 1;
- swap_match_buff(prog); /* do we need a save destructor here for
- eval dies? */
+ /* We have to be careful. If the previous successful match
+ was from this regex we don't want a subsequent partially
+ successful match to clobber the old results.
+ So when we detect this possibility we add a swap buffer
+ to the re, and switch the buffer each match. If we fail
+ we switch it back, otherwise we leave it swapped.
+ */
+ swap = prog->offs;
+ /* do we need a save destructor here for eval dies? */
+ Newxz(prog->offs, (prog->nparens + 1), regexp_paren_pair);
}
if (!(flags & REXEC_CHECKED) && (prog->check_substr != NULL || prog->check_utf8 != NULL)) {
re_scream_pos_data d;
@@ -2164,6 +2149,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre
goto phooey;
got_it:
+ Safefree(swap);
RX_MATCH_TAINTED_set(rx, PL_reg_flags & RF_tainted);
if (PL_reg_eval_set)
@@ -2209,10 +2195,12 @@ phooey:
PL_colors[4], PL_colors[5]));
if (PL_reg_eval_set)
restore_pos(aTHX_ prog);
- if (swap_on_fail)
+ if (swap) {
/* we failed :-( roll it back */
- swap_match_buff(prog);
-
+ Safefree(prog->offs);
+ prog->offs = swap;
+ }
+
return 0;
}
diff --git a/regexp.h b/regexp.h
index e8a8cc0..6bac7e6 100644
--- a/regexp.h
+++ b/regexp.h
@@ -88,7 +88,7 @@ typedef struct regexp_paren_pair {
/* during matching */ \
U32 lastparen; /* last open paren matched */ \
U32 lastcloseparen; /* last close paren matched */ \
- regexp_paren_pair *swap; /* Swap copy of *offs */ \
+ regexp_paren_pair *swap; /* Unused: 5.10.1 and later */ \
/* Array of offsets for (@-) and (@+) */ \
regexp_paren_pair *offs; \
/* saved or original string so \digit works forever. */ \
diff --git a/t/op/pat.t b/t/op/pat.t
index 62ca4b2..53b4477 100644
--- a/t/op/pat.t
+++ b/t/op/pat.t
@@ -13,7 +13,7 @@ sub run_tests;
$| = 1;
-my $EXPECTED_TESTS = 4061; # Update this when adding/deleting tests.
+my $EXPECTED_TESTS = 4062; # Update this when adding/deleting tests.
BEGIN {
chdir 't' if -d 't';
@@ -4346,6 +4346,24 @@ sub run_tests {
iseq($str, "\$1 = undef, \$2 = undef, \$3 = undef, \$4 = undef, \$5 = undef, \$^R = undef");
}
}
+
+ # This only works under -DEBUGGING because it relies on an assert().
+ {
+ local $BugId = '60508';
+ local $Message = "Check capture offset re-entrancy of utf8 code.";
+
+ sub fswash { $_[0] =~ s/([>X])//g; }
+
+ my $k1 = "." x 4 . ">>";
+ fswash($k1);
+
+ my $k2 = "\x{f1}\x{2022}";
+ $k2 =~ s/([\360-\362])/>/g;
+ fswash($k2);
+
+ iseq($k2, "\x{2022}", "utf8::SWASHNEW doesn't cause capture leaks");
+ }
+
#
# This should be the last test.
#
--
1.6.0.4
|
From @schwernGeorge Greer via RT wrote:
Thanks.
What, in English or Perl, does that assert failing indicate has gone wrong? -- |
From @greergaOn Sun, 12 Jul 2009, Michael G Schwern wrote:
It means the paren-capture offsets point outside the match string. For (gdb) info locals which results in: Assert: rx->sublen >= (s - rx->subbeg) + i In the case above, the utf8 SWASHNEW Perl code is where the lingering Unfortunately the assert only notices if the capture is later in the -- |
From @demerphqOn Sun Jul 12 02:59:44 2009, schwern wrote:
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html
Thanks. Applied. |
From @demerphq2009/7/12 George Greer via RT <perlbug-followup@perl.org>:
Hi, sorry it took so long to get to this, but I have applied it as 7a68ade9729c0afc78c5f9cbadc5c77928cfedb8 with some minor commit message munging. Thanks a lot! I think your solution makes more sense than mine did. Cheers -- |
From @iabynOn Sun, Jul 26, 2009 at 11:32:27PM +0200, demerphq wrote:
With 03fbf60b09ed44cc7d2e021c979a259636ae4488, I've backed out the change -- |
From @iabynOn Sun, Jul 26, 2009 at 11:32:27PM +0200, demerphq wrote:
I'm having great difficulty pulling this into maint; specifically the Perl_regexec_flags(pTHX_ REGEXP * const prog, ... becomes the blead: Perl_regexec_flags(pTHX_ REGEXP * const rx, ... then theres lots of places in the function which sometimes refer to rx and -- |
From @demerphq2009/7/27 Dave Mitchell <davem@iabyn.com>:
Ill give it a try tomorrow if George doesnt beat me to it. cheers, -- |
From @greergaOn Mon, 27 Jul 2009, Dave Mitchell wrote:
Doing a straight port and fixing up the conflicts passes the tests It has the ext/Devel-PPPort/ part again though. -- |
From @greerga0001-much-better-swap-logic-to-support-reentrancy-and-fix.patchFrom 8529230764704158520dab3daf6d8ac92fb46e06 Mon Sep 17 00:00:00 2001
From: George Greer <perl@greerga.m-l.org>
Date: Sun, 2 Aug 2009 20:05:09 -0400
Subject: [PATCH] much better swap logic to support reentrancy and fix assert failure
Commit c74340f9 added backreferences as well as the idea of a ->swap regex
pointer to keep track of the match offsets in case of backtracking. The
problem is that when Perl re-enters the regex engine to handle
utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture
from the utf8 (Perl) code could inadvertently modify the regex match data
that caused the utf8 swash to get built.
---
embed.fnc | 1 -
embed.h | 2 -
ext/Devel-PPPort/parts/embed.fnc | 1 -
pod/perlreapi.pod | 2 +-
pod/perlreguts.pod | 13 +++++-----
proto.h | 5 ----
regcomp.c | 6 -----
regexec.c | 46 ++++++++++++++------------------------
regexp.h | 2 +-
t/op/pat.t | 20 +++++++++++++++-
10 files changed, 44 insertions(+), 54 deletions(-)
diff --git a/embed.fnc b/embed.fnc
index 6cddb88..168aa93 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1658,7 +1658,6 @@ ERsn |U8* |reghop4 |NN U8 *s|I32 off|NN const U8 *llim \
#endif
ERsn |U8* |reghopmaybe3 |NN U8 *s|I32 off|NN const U8 *lim
ERs |char* |find_byclass |NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo
-Es |void |swap_match_buff|NN regexp * prog
Es |void |to_utf8_substr |NN regexp * prog
Es |void |to_byte_substr |NN regexp * prog
ERs |I32 |reg_check_named_buff_matched |NN const regexp *rex \
diff --git a/embed.h b/embed.h
index e968707..5ec3a05 100644
--- a/embed.h
+++ b/embed.h
@@ -1452,7 +1452,6 @@
#if defined(PERL_CORE) || defined(PERL_EXT)
#define reghopmaybe3 S_reghopmaybe3
#define find_byclass S_find_byclass
-#define swap_match_buff S_swap_match_buff
#define to_utf8_substr S_to_utf8_substr
#define to_byte_substr S_to_byte_substr
#define reg_check_named_buff_matched S_reg_check_named_buff_matched
@@ -3783,7 +3782,6 @@
#if defined(PERL_CORE) || defined(PERL_EXT)
#define reghopmaybe3 S_reghopmaybe3
#define find_byclass(a,b,c,d,e) S_find_byclass(aTHX_ a,b,c,d,e)
-#define swap_match_buff(a) S_swap_match_buff(aTHX_ a)
#define to_utf8_substr(a) S_to_utf8_substr(aTHX_ a)
#define to_byte_substr(a) S_to_byte_substr(aTHX_ a)
#define reg_check_named_buff_matched(a,b) S_reg_check_named_buff_matched(aTHX_ a,b)
diff --git a/ext/Devel-PPPort/parts/embed.fnc b/ext/Devel-PPPort/parts/embed.fnc
index 68f3817..48cb9f3 100644
--- a/ext/Devel-PPPort/parts/embed.fnc
+++ b/ext/Devel-PPPort/parts/embed.fnc
@@ -1677,7 +1677,6 @@ ERsn |U8* |reghop4 |NN U8 *s|I32 off|NN const U8 *llim \
#endif
ERsn |U8* |reghopmaybe3 |NN U8 *s|I32 off|NN const U8 *lim
ERs |char* |find_byclass |NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo
-Es |void |swap_match_buff|NN regexp * prog
Es |void |to_utf8_substr |NN regexp * prog
Es |void |to_byte_substr |NN regexp * prog
ERs |I32 |reg_check_named_buff_matched |NN const regexp *rex \
diff --git a/pod/perlreapi.pod b/pod/perlreapi.pod
index b0d6275..03996fd 100644
--- a/pod/perlreapi.pod
+++ b/pod/perlreapi.pod
@@ -598,7 +598,7 @@ engine should use something else.
=head2 C<swap>
-TODO: document
+Unused. Left in for compatibility with perl 5.10.0.
=head2 C<offs>
diff --git a/pod/perlreguts.pod b/pod/perlreguts.pod
index 2049931..b9f306a 100644
--- a/pod/perlreguts.pod
+++ b/pod/perlreguts.pod
@@ -810,13 +810,12 @@ value to other engine implementations.
=item C<swap>
-C<swap> is an extra set of startp/endp stored in a C<regexp_paren_ofs>
-struct. This is used when the last successful match was from the same pattern
-as the current pattern, so that a partial match doesn't overwrite the
-previous match's results. When this field is data filled the matching
-engine will swap buffers before every match attempt. If the match fails,
-then it swaps them back. If it's successful it leaves them. This field
-is populated on demand and is by default null.
+C<swap> formerly was an extra set of startp/endp stored in a
+C<regexp_paren_ofs> struct. This was used when the last successful match
+was from the same pattern as the current pattern, so that a partial
+match didn't overwrite the previous match's results, but it caused a
+problem with re-entrant code such as trying to build the UTF-8 swashes.
+Currently unused and left for backward compatibility with 5.10.0.
=item C<offsets>
diff --git a/proto.h b/proto.h
index ccb9eb3..835aa3b 100644
--- a/proto.h
+++ b/proto.h
@@ -5448,11 +5448,6 @@ STATIC char* S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, cons
#define PERL_ARGS_ASSERT_FIND_BYCLASS \
assert(prog); assert(c); assert(s); assert(strend)
-STATIC void S_swap_match_buff(pTHX_ regexp * prog)
- __attribute__nonnull__(pTHX_1);
-#define PERL_ARGS_ASSERT_SWAP_MATCH_BUFF \
- assert(prog)
-
STATIC void S_to_utf8_substr(pTHX_ regexp * prog)
__attribute__nonnull__(pTHX_1);
#define PERL_ARGS_ASSERT_TO_UTF8_SUBSTR \
diff --git a/regcomp.c b/regcomp.c
index 49e69b2..372c680 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -9361,7 +9361,6 @@ Perl_pregfree(pTHX_ REGEXP *r)
if (r->saved_copy)
SvREFCNT_dec(r->saved_copy);
#endif
- Safefree(r->swap);
Safefree(r->offs);
Safefree(r);
}
@@ -9413,7 +9412,6 @@ Perl_reg_temp_copy (pTHX_ REGEXP *r) {
ret->saved_copy = NULL;
#endif
ret->mother_re = r;
- ret->swap = NULL;
return ret;
}
@@ -9588,10 +9586,6 @@ Perl_re_dup(pTHX_ const regexp *r, CLONE_PARAMS *param)
StructCopy(r, ret, regexp);
Newx(ret->offs, npar, regexp_paren_pair);
Copy(r->offs, ret->offs, npar, regexp_paren_pair);
- if(ret->swap) {
- /* no need to copy these */
- Newx(ret->swap, npar, regexp_paren_pair);
- }
if (ret->substrs) {
/* Do it this way to avoid reading from *r after the StructCopy().
diff --git a/regexec.c b/regexec.c
index 7a42c4f..3dda67b 100644
--- a/regexec.c
+++ b/regexec.c
@@ -1733,28 +1733,6 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s,
return s;
}
-static void
-S_swap_match_buff (pTHX_ regexp *prog)
-{
- regexp_paren_pair *t;
-
- PERL_ARGS_ASSERT_SWAP_MATCH_BUFF;
-
- if (!prog->swap) {
- /* We have to be careful. If the previous successful match
- was from this regex we don't want a subsequent paritally
- successful match to clobber the old results.
- So when we detect this possibility we add a swap buffer
- to the re, and switch the buffer each match. If we fail
- we switch it back, otherwise we leave it swapped.
- */
- Newxz(prog->swap, (prog->nparens + 1), regexp_paren_pair);
- }
- t = prog->swap;
- prog->swap = prog->offs;
- prog->offs = t;
-}
-
/*
- regexec_flags - match a regexp against a string
@@ -1783,7 +1761,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const prog, char *stringarg, register char *st
I32 multiline;
RXi_GET_DECL(prog,progi);
regmatch_info reginfo; /* create some info to pass to regtry etc */
- bool swap_on_fail = 0;
+ regexp_paren_pair *swap = NULL;
GET_RE_DEBUG_FLAGS_DECL;
PERL_ARGS_ASSERT_REGEXEC_FLAGS;
@@ -1861,9 +1839,16 @@ Perl_regexec_flags(pTHX_ REGEXP * const prog, char *stringarg, register char *st
reginfo.ganch = strbeg;
}
if (PL_curpm && (PM_GETRE(PL_curpm) == prog)) {
- swap_on_fail = 1;
- swap_match_buff(prog); /* do we need a save destructor here for
- eval dies? */
+ /* We have to be careful. If the previous successful match
+ was from this regex we don't want a subsequent partially
+ successful match to clobber the old results.
+ So when we detect this possibility we add a swap buffer
+ to the re, and switch the buffer each match. If we fail
+ we switch it back, otherwise we leave it swapped.
+ */
+ swap = prog->offs;
+ /* do we need a save destructor here for eval dies? */
+ Newxz(prog->offs, (prog->nparens + 1), regexp_paren_pair);
}
if (!(flags & REXEC_CHECKED) && (prog->check_substr != NULL || prog->check_utf8 != NULL)) {
re_scream_pos_data d;
@@ -2162,6 +2147,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const prog, char *stringarg, register char *st
goto phooey;
got_it:
+ Safefree(swap);
RX_MATCH_TAINTED_set(prog, PL_reg_flags & RF_tainted);
if (PL_reg_eval_set)
@@ -2207,10 +2193,12 @@ phooey:
PL_colors[4], PL_colors[5]));
if (PL_reg_eval_set)
restore_pos(aTHX_ prog);
- if (swap_on_fail)
+ if (swap) {
/* we failed :-( roll it back */
- swap_match_buff(prog);
-
+ Safefree(prog->offs);
+ prog->offs = swap;
+ }
+
return 0;
}
diff --git a/regexp.h b/regexp.h
index a1417af..6cd2e7b 100644
--- a/regexp.h
+++ b/regexp.h
@@ -88,7 +88,7 @@ typedef struct regexp {
/* Data about the last/current match. These are modified during matching*/
U32 lastparen; /* last open paren matched */
U32 lastcloseparen; /* last close paren matched */
- regexp_paren_pair *swap; /* Swap copy of *offs */
+ regexp_paren_pair *swap; /* Unused: 5.10.1 and later */
regexp_paren_pair *offs; /* Array of offsets for (@-) and (@+) */
char *subbeg; /* saved or original string
diff --git a/t/op/pat.t b/t/op/pat.t
index 0b2c729..5902c96 100644
--- a/t/op/pat.t
+++ b/t/op/pat.t
@@ -13,7 +13,7 @@ sub run_tests;
$| = 1;
-my $EXPECTED_TESTS = 4065; # Update this when adding/deleting tests.
+my $EXPECTED_TESTS = 4066; # Update this when adding/deleting tests.
BEGIN {
chdir 't' if -d 't';
@@ -4363,6 +4363,24 @@ sub run_tests {
ok($s =~ $pat, $pat);
}
}
+
+ # This only works under -DEBUGGING because it relies on an assert().
+ {
+ local $BugId = '60508';
+ local $Message = "Check capture offset re-entrancy of utf8 code.";
+
+ sub fswash { $_[0] =~ s/([>X])//g; }
+
+ my $k1 = "." x 4 . ">>";
+ fswash($k1);
+
+ my $k2 = "\x{f1}\x{2022}";
+ $k2 =~ s/([\360-\362])/>/g;
+ fswash($k2);
+
+ iseq($k2, "\x{2022}", "utf8::SWASHNEW doesn't cause capture leaks");
+ }
+
#
# This should be the last test.
#
--
1.6.0.4
|
From @iabynOn Sun, Aug 02, 2009 at 08:12:37PM -0400, George Greer wrote:
Thanks, but given the closeness to 5.10.1 release, and the risk of this -- |
From @demerphq2009/8/3 Dave Mitchell <davem@iabyn.com>:
So is it ok to go with it now? Yves -- |
From @iabynOn Wed, Sep 02, 2009 at 06:19:14PM +0200, demerphq wrote:
I haven't really caught up with my p5p mailbox since 5.10.1 was released, -- |
From @demerphqfixed in perl 5.11 |
@demerphq - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#60508 (status was 'resolved')
Searchable as RT60508$
The text was updated successfully, but these errors were encountered: