Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regexp engine cannot match >2GB strings #12089

Closed
p5pRT opened this issue May 6, 2012 · 4 comments
Closed

Regexp engine cannot match >2GB strings #12089

p5pRT opened this issue May 6, 2012 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented May 6, 2012

Migrated from rt.perl.org#112790 (status was 'resolved')

Searchable as RT112790$

@p5pRT
Copy link
Author

p5pRT commented May 6, 2012

From @dgl

Matching unexpectedly fails when the string is longer than I32. The
following fixes it, but I see a lot of I32 in the regexp engine itself so
this might be masking other issues (see also RT #72784).

Inline Patch
diff --git a/pp_hot.c b/pp_hot.c
index 89165d9..662b908 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -1303,7 +1303,7 @@ PP(pp_match)
        rx = PM_GETRE(pm);
     }

-    if (RX_MINLEN(rx) > (I32)len)
+    if ((STRLEN)RX_MINLEN(rx) > len)
        goto failure;

     truebase = t = s;

Reproduce with:

$ perl -Mre=debug -le'$a="x"x 1048576; $b.=$a for 1 .. 2047; $b.="y"; print
length $b; print $b =~ /y/ ? "Matched" : "No match"'
Compiling REx "y"
Final program​:
  1​: EXACT <y> (3)
  3​: END (0)
anchored "y" at 0 (checking anchored isall) minlen 1
2146435073
Guessing start of match in sv for REx "y" against
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"...
Found anchored substr "y" at offset 2146435072...
Starting position does not contradict /^/m...
Guessed​: match at offset 2146435072
Matched
Freeing REx​: "y"

$ perl -Mre=debugcolor -le'$a="x"x 1048576; $b.=$a for 1 .. 2048; $b.="y";
print length $b; print $b =~ /y/ ? "Matched" : "No match"'
Compiling REx "y"
Final program​:
  1​: EXACT <y> (3)
  3​: END (0)
anchored "y" at 0 (checking anchored isall) minlen 1
2147483649
No match
Freeing REx​: "y"

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2013

From @cpansprout

On Sun May 06 11​:19​:53 2012, dgl wrote​:

Matching unexpectedly fails when the string is longer than I32. The
following fixes it, but I see a lot of I32 in the regexp engine itself so
this might be masking other issues (see also RT #72784).

diff --git a/pp_hot.c b/pp_hot.c
index 89165d9..662b908 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@​@​ -1303,7 +1303,7 @​@​ PP(pp_match)
rx = PM_GETRE(pm);
}

- if (RX_MINLEN(rx) > (I32)len)
+ if ((STRLEN)RX_MINLEN(rx) > len)
goto failure;

 truebase = t = s;

I have gone through the regexp engine and change any uses of I32 that
hold lengths of the string being matched against.

See the commits leading up to merge commit ed56dbc, in particular
389ecb5 which is similar to your proposed patch.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT p5pRT closed this as completed Aug 25, 2013
@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2013

@cpansprout - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant