Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

substitution loop issue with long strings #14190

Closed
p5pRT opened this issue Oct 28, 2014 · 8 comments
Closed

substitution loop issue with long strings #14190

p5pRT opened this issue Oct 28, 2014 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 28, 2014

Migrated from rt.perl.org#123071 (status was 'resolved')

Searchable as RT123071$

@p5pRT
Copy link
Author

p5pRT commented Oct 28, 2014

From @cpansprout

I’m creating a ticket for this, so it is easier to track.

In <CAH+_n-4R1tO7nxEu6ehRkNZfO+0reMdcU7wGK-YMF3SqHchoOw@​mail.gmail.com> Edward Peschko wrote​:

I'm getting the following problem when doing a substitution on a large string​:

Substitution loop at ...

Is there a way to override this error? As it is, its annoying because
acts as a de-facto built in limitation on the size of strings that you
can substitute.

Is this fixed in the latest versions of perl?

And in <CAH+_n-4-aiGqxHDOdwd9NB-xbGkGfaEMOjsGyeSSZKUEi6R-mw@​mail.gmail.com> he wrote​:

Its very easy to reproduce​:

local($/) = undef;
open(FD, "very_large_file.txt"); # say with the alphabet printed over and over, one per line, 2 GB in total size
my $line = <FD>;
close(FD);

do a substitution where the size of substitution is greater than the
thing its replacing, ie​:

$line =~ s#a#bbb#sg;

and you'll get 'Substitution loop at ... line ...'

And no - the 'substitution loop' description as described in perldiag
doesn't apply. Any replacement string doesn't work (where it is longer
than the original). There are only a finite number of 'a's in the
source string - so my guess is what is happening is perl is keeping
some counter of substitutions, and that counter is overflowing.

That’s exactly what’s happening. The sbu_iters and sbu_maxiters members defined in cop.h are of type I32.

(And this bug is *old*. Perl 1 had a fixed limit of 10000. Perl 4 started calculating the maximum number of iterations based on the string length, fixing the bug, but in such a way that when 64-bit systems came along it resurfaced. So since Perl 4 the bug is as old as 64-bit systems.)

We could fix this by changing those two struct members to SSize_t. But if that would enlarge the struct subst/struct blk union defined in cop.h, it might be worthwhile considering skipping the check altogether for long strings. After all, if substitution loops, it is because of a bug in perl; and if that bug does occur then it is likely to happen regardless of the length of the string. (Right?) So it will be caught even if the check is skipped for long strings.

Now, to work around the bug, you would have to do a while() loop instead of substituting all at once. But that will still fail in 5.18 and earlier, because it was not until 5.20 that the regular expression gained support for strings longer than 2GB. Another thing you could do is split your string into smaller strings and concatenate them afterwards. But only you can tell whether that will work for your code.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2014

From @jkeenan

On Mon Oct 27 21​:40​:18 2014, sprout wrote​:

I’m creating a ticket for this, so it is easier to track.

[snip]

That’s exactly what’s happening. The sbu_iters and sbu_maxiters
members defined in cop.h are of type I32.

[snip]

We could fix this by changing those two struct members to SSize_t.
But if that would enlarge the struct subst/struct blk union defined in
cop.h, it might be worthwhile considering skipping the check
altogether for long strings.

Father C, which of these two alternatives do you think we should pursue? (Or, are there others?)

After all, if substitution loops, it is
because of a bug in perl; and if that bug does occur then it is likely
to happen regardless of the length of the string. (Right?) So it
will be caught even if the check is skipped for long strings.

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 15, 2014

From @cpansprout

On Fri Nov 14 19​:01​:36 2014, jkeenan wrote​:

On Mon Oct 27 21​:40​:18 2014, sprout wrote​:

I’m creating a ticket for this, so it is easier to track.

[snip]

That’s exactly what’s happening. The sbu_iters and sbu_maxiters
members defined in cop.h are of type I32.

[snip]

We could fix this by changing those two struct members to SSize_t.
But if that would enlarge the struct subst/struct blk union defined
in
cop.h, it might be worthwhile considering skipping the check
altogether for long strings.

Father C, which of these two alternatives do you think we should
pursue? (Or, are there others?)

That would depend on whether using 64-bit values to records the iterations enlarges the struct. I haven’t checked yet.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Dec 24, 2014

From @cpansprout

Fixed in 3c6ef0a. This turned out to be the same bug as #103260.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Dec 24, 2014

@cpansprout - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2015

From @khwilliamson

Thanks for submitting this ticket

The issue should be resolved with the release today of Perl v5.22. If you find that the problem persists, feel free to reopen this ticket

--
Karl Williamson for the Perl 5 porters team

@p5pRT p5pRT closed this as completed Jun 2, 2015
@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2015

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant