New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex fails when string is too long #9729
Comments
From @perlpunkCreated by @perlpunkThis is a bug report for perl from tina@cure.localdomain, ----------------------------------------------------------------- I found that in 5.10 and newer a regex begins to fail if a matching string The shortest example I could produce: use strict; parse("x" x 32767); sub parse { my $tag = "html"; if ( $xml =~ m{<$tag>(.|\n)*?</$tag>}i ) { It prints: If I change the regex to: it works. regards, Perl Info
|
From @schwernTina (via RT) wrote:
Thanks for the report. Confirmed on 5.10.0 on OS X. No bug in 5.8.8, 5.8.9, 5.6.2 or 5.5.5. -- |
The RT System itself - Status changed from 'new' to 'open' |
From @andk
> Tina (via RT) wrote:
> Thanks for the report. Confirmed on 5.10.0 on OS X. > No bug in 5.8.8, 5.8.9, 5.6.2 or 5.5.5. I think it's a known issue but Dave will correct me. Bisect points at commit 40a8244 start turning regmatch() main loop into a FSM -- |
From @iabynOn Fri, May 08, 2009 at 08:00:17AM +0200, Andreas J. Koenig wrote:
Ah, mea cupla :-( I'm aware of at least one other >32767 issue which this may be the same as. -- |
From @nwc10Dave notes: looks to be another 5.10.0 regression involving regexes longer than 32767 |
From @hvdsThis looks to be a simple oversight. All tests pass here. Hugo Inline Patch--- regexec.c.old 2009-03-22 16:02:09.000000000 +0000
+++ regexec.c 2009-07-06 15:18:27.000000000 +0100
@@ -4411,7 +4411,7 @@
case CURLYM: /* /A{m,n}B/ where A is fixed-length */
/* This is an optimisation of CURLYX that enables us to push
- * only a single backtracking state, no matter now many matches
+ * only a single backtracking state, no matter how many matches
* there are in {m,n}. It relies on the pattern being constant
* length, with no parens to influence future backrefs
*/
@@ -4574,7 +4574,8 @@
case CURLYM_B_fail: /* just failed to match a B */
REGCP_UNWIND(ST.cp);
if (ST.minmod) {
- if (ST.count == ARG2(ST.me) /* max */)
+ I32 max = ARG2(ST.me);
+ if (max != REG_INFTY && ST.count == max)
sayNO;
goto curlym_do_A; /* try to match a further A */
}
--- t/op/pat.t.old 2009-06-06 13:51:10.000000000 +0100
+++ t/op/pat.t 2009-07-06 15:31:32.000000000 +0100
@@ -13,7 +13,7 @@
$| = 1;
-my $EXPECTED_TESTS = 4061; # Update this when adding/deleting tests.
+my $EXPECTED_TESTS = 4065; # Update this when adding/deleting tests.
BEGIN {
chdir 't' if -d 't';
@@ -4346,6 +4346,21 @@
iseq($str, "\$1 = undef, \$2 = undef, \$3 = undef, \$4 = undef, \$5 = undef, \$^R = undef");
}
}
+
+ {
+ local $BugId = 65372; # minimal CURLYM limited to 32767 matches
+ my @pat = (
+ qr{a(x|y)*b}, # CURLYM
+ qr{a(x|y)*?b}, # .. with minmod
+ qr{a([wx]|[yz])*b}, # .. and without tries
+ qr{a([wx]|[yz])*?b},
+ );
+ my $len = 32768;
+ my $s = join '', 'a', 'x' x $len, 'b';
+ for my $pat (@pat) {
+ ok($s =~ $pat, $pat);
+ }
+ }
#
# This should be the last test.
# |
From @TuxOn Mon, 06 Jul 2009 15:45:12 +0100, hv@crypt.org wrote:
I trust you on this :)
-- |
bitcard@profvince.com - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#65372 (status was 'resolved')
Searchable as RT65372$
The text was updated successfully, but these errors were encountered: