Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The pre-match copy leaks #14364

Open
p5pRT opened this issue Dec 24, 2014 · 5 comments
Open

The pre-match copy leaks #14364

p5pRT opened this issue Dec 24, 2014 · 5 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 24, 2014

Migrated from rt.perl.org#123492 (status was 'open')

Searchable as RT123492$

@p5pRT
Copy link
Author

p5pRT commented Dec 24, 2014

From @cpansprout

Run this little script and watch the memory usage go up​:

print "$$\n";
$x .=" "x4096 for 1..2**29/4096;
print "doing subst 1\n";
$x =~ s/ /_/g;
print "doing subst 2\n";
$x =~ s/_/ /g;
print "doing subst 3\n";
$x =~ s/ /_/g;
print "doing subst 4\n";
$x =~ s/_/ /g;

Every time we do a substitution, we get a new pre-match copy stored in the last subst op. And that copy is not freed until the same subst op is executed again.

This whole PL_curpm mechanism needs to be reworked. (I know this has been said many times before, but I have only just noticed this particular symptom.) Currently PL_curpm, a global variable, is set to point to the last match op that was successfully executed. It gets saved and restored upon scope entry.

Instead of having the information necessary for $1, $&, etc. to work stored in a structure merely pointed to by a global variable, we should probably store it somehow on the context stack and have it actually freed on scope exit and freed when overwritten.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Dec 24, 2014

From @demerphq

On 24 December 2014 at 02​:46, Father Chrysostomos
<perlbug-followup@​perl.org> wrote​:

# New Ticket Created by Father Chrysostomos
# Please include the string​: [perl #123492]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=123492 >

Run this little script and watch the memory usage go up​:

print "$$\n";
$x .=" "x4096 for 1..2**29/4096;
print "doing subst 1\n";
$x =~ s/ /_/g;
print "doing subst 2\n";
$x =~ s/_/ /g;
print "doing subst 3\n";
$x =~ s/ /_/g;
print "doing subst 4\n";
$x =~ s/_/ /g;

Every time we do a substitution, we get a new pre-match copy stored in the last subst op. And that copy is not freed until the same subst op is executed again.

This whole PL_curpm mechanism needs to be reworked. (I know this has been said many times before, but I have only just noticed this particular symptom.) Currently PL_curpm, a global variable, is set to point to the last match op that was successfully executed. It gets saved and restored upon scope entry.

Instead of having the information necessary for $1, $&amp;, etc. to work stored in a structure merely pointed to by a global variable, we should probably store it somehow on the context stack and have it actually freed on scope exit and freed when overwritten.

A big problem is that we store regex match results in the same struct
as the pattern. This leads to all kinds of awkwardness.

A much better approach would be to separate match results from the
regexp program structures.

And then we could easily do what you suggest here.

Anyway, the whole PL_curpm is horrible, anything to improve it is a
big step forward.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Dec 24, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Dec 24, 2014

From @cpansprout

On Tue Dec 23 18​:09​:54 2014, demerphq wrote​:

A big problem is that we store regex match results in the same struct
as the pattern. This leads to all kinds of awkwardness.

A much better approach would be to separate match results from the
regexp program structures.

I know, and that’s what makes it hard to fix. That’s why I’m creating a ticket instead of just fixing it. :-)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Dec 24, 2014

From @cpansprout

On Tue Dec 23 18​:09​:54 2014, demerphq wrote​:

On 24 December 2014 at 02​:46, Father Chrysostomos
<perlbug-followup@​perl.org> wrote​:

# New Ticket Created by Father Chrysostomos
# Please include the string​: [perl #123492]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=123492 >

Run this little script and watch the memory usage go up​:

print "$$\n";
$x .=" "x4096 for 1..2**29/4096;
print "doing subst 1\n";
$x =~ s/ /_/g;
print "doing subst 2\n";
$x =~ s/_/ /g;
print "doing subst 3\n";
$x =~ s/ /_/g;
print "doing subst 4\n";
$x =~ s/_/ /g;

Every time we do a substitution, we get a new pre-match copy stored
in the last subst op. And that copy is not freed until the same
subst op is executed again.

This whole PL_curpm mechanism needs to be reworked. (I know this has
been said many times before, but I have only just noticed this
particular symptom.) Currently PL_curpm, a global variable, is set
to point to the last match op that was successfully executed. It
gets saved and restored upon scope entry.

Instead of having the information necessary for $1, $&amp;, etc. to work
stored in a structure merely pointed to by a global variable, we
should probably store it somehow on the context stack and have it
actually freed on scope exit and freed when overwritten.

A big problem is that we store regex match results in the same struct
as the pattern. This leads to all kinds of awkwardness.

A much better approach would be to separate match results from the
regexp program structures.

And then we could easily do what you suggest here.

What I suggest would actually change the behaviour of something like this​:

my $proto = ";+";
while($proto) {
  $proto =~ s/(;)//;
  print "[$1]\n";
  last if $count++ == 3;
}

Currently the value of $1 is remembered in the second entry into the block. I hope nobody is depending on that behaviour. (I just encountered this in B​::Deparse, where it was causing an infinite loop.)

--

Father Chrysostomos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants