New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(a|b)* cant specify a string match larger than 2**16-1 chars long. #6063
Comments
From edi@agharta.deCreated by edi@agharta.deedi@bird:~ > perl -le '$_="x" . ("a" x (2 ** 15 - 1)) . "y"; print (/x(bc|a)*y/ ? "yes" : "no");' I've seen this with Perl 5.005, 5.6.0, 5.6.1, and 5.8.0 on various Perl Info
|
From @eserte"edi@agharta.de (via RT)" <perlbug@perl.org> writes:
Well, you can trigger the core dump on all systems by running with limits -s 1M perl ... Nevertheless, running the samples with -Dr suggests that (bc|a)* is Regards, -- tktimex - project time manager |
From edi@agharta.deSlaven Rezic (via RT) <perlbug@perl.org> writes:
Thanks! Edi. |
From @demerphqHi, Your bugreport is actually two different bugs. The stack overflow/segv has been resolved already. The other part, that of the star quantifier not matching strings longer D:\dev\perl\ver\myblead\win32>..\perl -le "$_='x' . ('a' x (2 ** 15)) . D:\dev\perl\ver\myblead\win32>..\perl -le "$_='x' . ('a' x (2 ** 15-1)) Thanks for the report. Cheers |
@demerphq - Status changed from 'open' to 'stalled' |
From terada@ice.uec.ac.jpCreated by terada@ice.uec.ac.jpHere is a test case:#! /usr/bin/perl $n = 32768; $str = "{"; if($str =~ /^{((E.|[^}E])*)}/){ 1;I tried perl 5.8.8 and 5.13.1. If $n <= 32767, the result is correct for both versions. For larger $n (>= 32768), the bug appears: Perl Info
|
From @AbigailOn Sat, May 22, 2010 at 01:24:10AM -0700, Terada Minoru wrote:
Thank you for your report. This is a known bug. Patterns of the form /(A|B)*/ are really $ perl -Mre=debug -wce '/(A|B)*/' Abigail |
The RT System itself - Status changed from 'new' to 'open' |
From @jkeenanOn Fri Nov 17 10:31:22 2006, demerphq wrote:
Confirmed as still present in Perl 5.16. ##### Hence, ticket remains stalled. |
The RT System itself - Status changed from 'stalled' to 'open' |
From @jkeenanOn Fri Oct 12 17:10:17 2012, jkeenan wrote:
(Well, RT changed its status back to open.) Is this something that is potentially fixable? If not, should we just Thank you very much. |
From @hvds"James E Keenan via RT" <perlbug-followup@perl.org> wrote: It is fixable, and I believe eventually it'll have to be fixed - as I just don't know (quite) how, or when. I believe that in principle it requires little more than changing the I have long felt that the compilation process for regexps should construct (Others include: much more readable and maintainable code for compiling Hugo |
From @demerphqOn 15 October 2012 00:16, <hv@crypt.org> wrote:
I agree. Ive even toyed with an implementation. But its a bear of a Yves -- |
From ina.cpan@gmail.comCreated by ina@cpan.orgThe following regular expression(No.1) doesn't match. # a32768c_gabbc.pl $ ./perl a32768c_gabbc.pl Thanks, Perl Info
|
From @TuxOn Mon, 14 Jan 2013 05:27:20 -0800, ina cpan (via RT)
If you'd have used -w or 'use warnings;', you might have seen Complex regular subexpression recursion limit (32766) exceeded at a32768c_gabbc.pl line 2. which imho indicates that this is a known problem
-- |
The RT System itself - Status changed from 'new' to 'open' |
From @khwilliamsonOn 01/14/2013 07:02 AM, H.Merijn Brand wrote:
I believe this covers it |
From @iabynOn Mon, Jan 14, 2013 at 03:02:01PM +0100, H.Merijn Brand wrote:
Yeah. Since making the regex engine non-recursive, its silly that we still -- |
From @demerphqOn 14 January 2013 17:51, Dave Mitchell <davem@iabyn.com> wrote:
Agreed on the silliness, but is fixing this practical? Yves -- |
From @iabynOn Mon, Jan 14, 2013 at 06:35:33PM +0100, demerphq wrote:
I haven't looked closely at it, but I'm assuming its "just" a case of However, I've never done much work on the compilation side of regexes, and -- |
From @iabynOn Wed, Jan 16, 2013 at 09:54:33PM +0900, ina cpan wrote:
Note that the bug is not actually related to \G, but rather to X* matching This is ok even though the \G matches more than 32K into the string: $_ = ('A'x 40000).'C'; while this fails, even with no \G: $_ = ('A'x 32768).'C'; print /^(A|BB)*C/ ? "ok - 1\n" : "not ok - 1\n"; -- |
From ina.cpan@gmail.com2013/1/24, Dave Mitchell <davem@iabyn.com>:
Isn't the bug(limitation) related to anchoring not only /\G/ ? # a32768c_gabbc-2.pl $_ = ('A'x (33000+32767)).'C'; $_ = ('A'x (33000+32768)).'C'; $_ = ('A'x 32767).'C'; print /^(A|BB)*C/ ? "ok - 3\n" : "not ok - 3\n"; __END__ |
From @iabynOn Thu, Jan 24, 2013 at 09:43:40AM +0900, ina cpan wrote:
No, the bug is exactly that '*' is being treated as equivalent to So the bug is completed unrelated to anchoring (with ^ or \G etc), except If you run your code with warnings enabled, you'll see this output, showing 1..6 -- |
From @cpansproutOn Sun Oct 14 16:06:33 2012, hv wrote:
Hopefully soon. I’m trying to write tests for perl’s anchored string optimisation with (?:(?:.{32766}){32766}){2}(?:.{32766}){8}.{8} This is a bit annoying. :-) -- Father Chrysostomos |
From @jkeenanOn Sat Jul 27 00:03:28 2013, sprout wrote:
Has anyone been able to make any progress on this issue? In 5.20.1: ##### $ perl -wle '$_="x" . ("a" x (2 ** 15 )) . "y"; print (/x(bc|a)*y/ ? "yes" : "no");' |
From @shlomifOn Fri Feb 27 12:08:26 2015, jkeenan wrote:
I should note that it also happens with a non-capturing group: <SHELL> shlomif@telaviv1:~$ perl -wle '$_="x" . ("a" x (2 ** 15 -1 )) . "y"; print (/x(?:bc|a)*y/ ? "yes" : "no");' </SHELL> Regards, -- Shlomi Fish |
From @khwilliamsonThe current state of perl is that the limit in 5.30 has been doubled so you don't hit the recursion limit until 2**16. Also, some parts of this ticket (and those merged to it) have been fixed, so that if you give an unbounded quantifier, it actually uses the machine platform limit. If you do specify an upper bound, it must a max of 2**16-1. -- |
I looked at this, and it's not that hard to fix. There are several options. One is to press into service the currently unused FLAGS field in the CURLY regnodes if the high bit in the 16 bit argument is set (That field may be used when compiling the pattern; I would need to look further.) . That would allow us 23 bits. Or we could create new regnodes which parallel the existing ones and are used only if the max quantifier is above 16 bits. That would give us 24 bits. Or we could instead create parallel regnodes that have 32 bit arguments if the input called for that. With the FLAGS field that would give us 39 bits. But how important is it fo fix this 20 year old bug? We should decide. close it or fix it. (I'm willing to do the work, if we deem it worthy of doing |
For that matter, the simplest fix is just to make each of the regnodes that hold these to use 32 bits. That would increase the size of any pattern that used {m,n} quantifiers. Something else that could be done is to instead of storing {m,n} in the regnode, store {m, n-m} so that the delta has to be <65K |
On Mon, 25 Apr 2022, 22:38 Karl Williamson, ***@***.***> wrote:
For that matter, the simplest fix is just to make each of the regnodes
that hold these to use 32 bits. That would increase the size of any pattern
that used {m,n} quantifiers.
Something else that could be done is to instead of storing {m,n} in the
regnode, store {m, n-m} so that the delta has to be <65K
As I said, with the regnode_types branch stuff like that, would be nearly
trivial.
Yves
… |
On Fri, 22 Apr 2022 at 05:58, Karl Williamson ***@***.***> wrote:
I looked at this, and it's not that hard to fix. There are several options.
One is to press into service the currently unused FLAGS field in the CURLY
regnodes if the high bit in the 16 bit argument is set (That field may be
used when compiling the pattern; I would need to look further.) . That
would allow us 23 bits. Or we could create new regnodes which parallel the
existing ones and are used only if the max quantifier is above 16 bits.
That would give us 24 bits.
Or we could instead create parallel regnodes that have 32 bit arguments if
the input called for that. With the FLAGS field that would give us 39 bits.
But how important is it fo fix this 20 year old bug? We should decide.
close it or fix it. (I'm willing to do the work, if we deem it worthy of
doing
I think we should fix this in 5.37. With my new regex branch it is trivial
to resize a regop and we can make it as big as we choose without any tricks.
Yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
@DMQ I believe this is now fixed. |
Whoops, @demerphq Is this closable? |
Fixed in 0678333 |
Migrated from rt.perl.org#18268 (status was 'open')
Searchable as RT18268$
The text was updated successfully, but these errors were encountered: