Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unacceptable memory consumption #9136

Closed
p5pRT opened this issue Dec 1, 2007 · 10 comments
Closed

unacceptable memory consumption #9136

p5pRT opened this issue Dec 1, 2007 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Dec 1, 2007

Migrated from rt.perl.org#48004 (status was 'resolved')

Searchable as RT48004$

@p5pRT
Copy link
Author

p5pRT commented Dec 1, 2007

From @salva

Created by @salva

The script below has two versions of the same algorithm, one using "for" and the other "map". The "map" version uses an unacceptable amount of memory.

--------------------------

#!/usr/bin/perl

$| = 1;

my $max = 50000;
my @​l = ('a'..'z');
my @​keys;

if (1) {
  # this version uses ~ 300MB
  @​keys = map { join '', map $l[rand @​l], 0..71 } 0..$max;

} else {
  # this version uses ~ 9MB
  for (0..$max) {
  push @​keys, join '', map $l[rand @​l], 0..71;
  }

}

print "sleeping...\n";
sleep 100;

---------------------------

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.8:

Configured by Debian Project at Mon Nov 12 06:19:22 UTC 2007.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.22.10, archname=i486-linux-gnu-thread-multi
    uname='linux ninsei 2.6.22.10 #1 smp preempt thu oct 25 08:49:01 pdt 2007 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.2.3 20071014 (prerelease) (Debian 4.2.2-3)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.6.1.so, so=so, useshrplib=true, libperl=libperl.so.5.8.8
    gnulibc_version='2.6.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.8:
    /etc/perl
    /usr/local/lib/perl/5.8.8
    /usr/local/share/perl/5.8.8
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    .


Environment for perl v5.8.8:
    HOME=/home/salva
    LANG=C
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
    PERL_BADLANG (unset)
    SHELL=/bin/bash




      ____________________________________________________________________________________
Be a better sports nut!  Let your teams follow you 
with Yahoo Mobile. Try it now.  http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ

@p5pRT
Copy link
Author

p5pRT commented Dec 1, 2007

From @druud62

Salvador Fandiño schreef​:

The script below has two versions of the same algorithm, one using
"for" and the other "map". The "map" version uses an unacceptable
amount of memory.

--------------------------

#!/usr/bin/perl

$| = 1;

my $max = 50000;
my @​l = ('a'..'z');
my @​keys;

if (1) {
# this version uses ~ 300MB
@​keys = map { join '', map $l[rand @​l], 0..71 } 0..$max;

} else {
# this version uses ~ 9MB
for (0..$max) {
push @​keys, join '', map $l[rand @​l], 0..71;
}

}

Why is the memory consumption of the map-map version more "unacceptable"
than that of the map-for version?

There is a scale difference between the two of 50000, so shouldn't the 9
MB actually use no more than 300 MB / 50000 is less than 1 kB?
;)

The map-for version can also be written like this​:

  push @​keys, join('', map $l[rand @​l], 0..71) for 0..$max;

I brought it back to 4 MB (RSS), but I am not sure it is equivalent​:

sub lazymap (&@​) {
  my $_code = shift;
  my @​_return;
  push @​_return, sub {$_code->($_)}->($_) for @​_;
  return @​_return;
}

  # this version uses ~ 4 MB (RSS)
  @​keys = lazymap { join '', lazymap {$l[rand @​l]} 0..71 } 0..$max;

--
Affijn, Ruud

"Gewoon is een tijger."

@p5pRT
Copy link
Author

p5pRT commented Dec 1, 2007

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Dec 1, 2007

From @schwern

Salvador Fandiño (via RT) wrote​:

# this version uses ~ 300MB
@​keys = map { join '', map $l[rand @​l], 0..71 } 0..$max;

My first thought was that for (1..$x) is optimized to iterate rather than
create the intermediate array, but that doesn't account for 300 megs.

There are other inefficiencies in the map approach, like that it has to build
the entire array in the map and then copy it to @​keys. But that should
account for, at most, another 9 megs.

If you half 71 down to 35 the memory usage halves. That should not be. It
suggests that something isn't getting deallocated.

It's also telling that replacing the inner map with a static string reduces
memory consumption to what you'd expect, 18 megs. Same if you replace the
inner map with a for loop.

Here is the most damning evidence that the result of the inner map is leaking.

  @​keys = map {
  my @​inner = map $l[rand @​l], 0..71;
  join '', @​inner;
  } 0..$max;

That takes 18 megs, as expected.

--
'All anyone gets in a mirror is themselves,' she said. 'But what you
gets in a good gumbo is everything.'
  -- "Witches Abroad" by Terry Prachett

@p5pRT
Copy link
Author

p5pRT commented Dec 3, 2007

From @lizmat

At 5​:50 AM -0800 12/1/07, Salvador Fandi�±o (via RT) wrote​:

--------------------------

#!/usr/bin/perl

$| = 1;

my $max = 50000;
my @​l = ('a'..'z');
my @​keys;

if (1) {
# this version uses ~ 300MB
@​keys = map { join '', map $l[rand @​l], 0..71 } 0..$max;

} else {
# this version uses ~ 9MB
for (0..$max) {
push @​keys, join '', map $l[rand @​l], 0..71;
}

}

print "sleeping...\n";
sleep 100;

---------------------------

What I find most disturbing about this, is that
nesting is related to execution time, not compile
time.

=================================================================
my $max = 50000;
my @​l = ('a'..'z');
my @​keys;

sub foo { join '', map { $l[rand @​l] } 0..71 }

# this version uses ~ 300MB
@​keys = map { foo() } 0..$max;

exhibits the same memory "leaking" behaviour.
Which means any subroutines called inside the
context of a map {} can potentially leak *badly*.
So *if* you call subroutines inside a map {},
you'd better make sure no other map {}'s are
being executed inside it.

Oddly enough, a slight change in the foo() sub​:

============================================================
sub foo { my $foo = join '', map { $l[rand @​l] } 0..71; $foo }

and the problem is not as nearly as bad, just
memory usage for the copying of the large string
(as Schwern already pointed out). So it
apparently only happens with "open" nested map
{}'s.

Oddly enough, if you repeat the same map​:

============================================================
# this version uses ~ 300MB
@​keys = map { foo() } 0..$max;
@​keys = map { foo() } 0..$max;

the process size does *not* grow to 600 MB, but
instead stays at about 300 MB. So I guess Perl
*is* able to re-use the memory used in the map {}.

So I'm wondering whether this is really a case of
leaking, or just an excessive amount of
"temporary" space being used, which isn't
returned to the OS because of fragmentation.

Liz

@p5pRT
Copy link
Author

p5pRT commented Dec 3, 2007

From @nwc10

On Sat, Dec 01, 2007 at 03​:52​:27PM -0800, Michael G Schwern wrote​:

Salvador Fandiño (via RT) wrote​:

# this version uses ~ 300MB
@​keys = map { join '', map $l[rand @​l], 0..71 } 0..$max;

$ perl -MO=Concise -e '@​keys = map { join "", map $l[rand @​l], 0..71 } 0..$max;'
s <@​> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 2 -e​:1) v ->3
r <2> aassign[t11] vKS/COMMON ->s
- <1> ex-list lK ->o
3 <0> pushmark s ->4
9 <|> mapwhile(other->a)[t10] lK/1 ->o
8 <@​> mapstart lK*/2 ->9
4 <0> pushmark s ->5
- <1> null lK/1 ->5
- <1> null lK/1 ->9
- <@​> scope lK ->9
- <0> ex-nextstate v ->a
n <@​> join[t7] sK/2 ->-
a <0> pushmark s ->b
b <$> const(PV "") s ->c
g <|> mapwhile(other->h)[t6] lK/1 ->n
f <@​> mapstart lK/2 ->g
c <0> pushmark s ->d
- <1> null lK/1 ->d
m <2> aelem sK/2 ->g
i <1> rv2av sKR/1 ->j
h <$> gv(*l) s ->i
l <1> rand[t3] sK/1 ->m
k <1> rv2av[t2] sK/1 ->l
j <$> gv(*l) s ->k
e <1> rv2av lKPM/1 ->f
d <$> const(AV ) s ->e
- <1> null lKM/1 ->8
7 <1> flop lKM ->8
u <1> flip[t9] lK/LINENUM ->8
5 <|> range(other->6)[t8] lK/1 ->t
t <$> const(IV 0) s ->u
- <1> ex-rv2sv sK/1 ->7
6 <$> gvsv(*max) s ->7
- <1> ex-list lK ->r
o <0> pushmark s ->p
q <1> rv2av[t1] lKRM*/1 ->r
p <$> gv(*keys) s ->q
-e syntax OK

I think that the only place where the temporaries get freed is in the
nextstate op, and the only nextstate op is outside all of the maps.

Here is the most damning evidence that the result of the inner map is leaking.

@​keys = map {
my @​inner = map $l[rand @​l], 0..71;
join '', @​inner;
} 0..$max;

That takes 18 megs, as expected.

$ cat schwern48004.pl
@​keys = map {
  my @​inner = map $l[rand @​l], 0..71;
  join '', @​inner;
} 0..$max;
$ perl -MO=Concise schwern48004.pl
11 <@​> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 3 schwern48004.pl​:2) v ->3
10 <2> aassign[t13] vKS/COMMON ->11
- <1> ex-list lK ->x
3 <0> pushmark s ->4
9 <|> mapwhile(other->a)[t12] lK/1 ->x
8 <@​> mapstart lK*/2 ->9
4 <0> pushmark s ->5
- <1> null lK/1 ->5
- <1> null lK/1 ->9
w <@​> leave lKP ->9
a <0> enter l ->b
b <;> nextstate(main 1 schwern48004.pl​:1) v ->c
q <2> aassign[t8] vKS ->r
- <1> ex-list lK ->o
c <0> pushmark s ->d
h <|> mapwhile(other->i)[t7] lK/1 ->o
g <@​> mapstart lK/2 ->h
d <0> pushmark s ->e
- <1> null lK/1 ->e
n <2> aelem sK/2 ->h
j <1> rv2av sKR/1 ->k
i <$> gv(*l) s ->j
m <1> rand[t4] sK/1 ->n
l <1> rv2av[t3] sK/1 ->m
k <$> gv(*l) s ->l
f <1> rv2av lKPM/1 ->g
e <$> const(AV ) s ->f
- <1> ex-list lK ->q
o <0> pushmark s ->p
p <0> padav[@​inner​:1,2] lRM*/LVINTRO ->q
r <;> nextstate(main 2 schwern48004.pl​:3) v ->s
v <@​> join[t9] sK/2 ->w
s <0> pushmark s ->t
t <$> const(PV "") s ->u
u <0> padav[@​inner​:1,2] l ->v
- <1> null lKM/1 ->8
7 <1> flop lKM ->8
13 <1> flip[t11] lK/LINENUM ->8
5 <|> range(other->6)[t10] lK/1 ->12
12 <$> const(IV 0) s ->13
- <1> ex-rv2sv sK/1 ->7
6 <$> gvsv(*max) s ->7
- <1> ex-list lK ->10
x <0> pushmark s ->y
z <1> rv2av[t1] lKRM*/1 ->10
y <$> gv(*keys) s ->z
schwern48004.pl syntax OK

I see 3 nextstate ops there, which would mean that temporaries are getting
freed within the inner maps.

So the inner map isn't leaking in absolute terms in the original code, but it
is "leaking" temporaries for the duration of its execution, rather than
reusing them.

I'm not sure how to fix this. I'm also not sure of the best way to measure
the RSS, but I'd have sneaking suspicion that this hack would work round it​:

@​keys = map { 1; join "", map $l[rand @​l], 0..71 } 0..$max;

Adding the seemingly useless 1 constant adds 2 nextstate ops.

Maybe pp_mapwhile should FREETMPS?

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Dec 3, 2007

From @rgs

On 03/12/2007, Nicholas Clark <nick@​ccl4.org> wrote​:

Maybe pp_mapwhile should FREETMPS?

Maybe at least for the block form of map ?
Also, grep might be in the same case ?

@p5pRT
Copy link
Author

p5pRT commented Dec 4, 2007

From @nwc10

On Mon, Dec 03, 2007 at 11​:07​:51PM +0100, Rafael Garcia-Suarez wrote​:

On 03/12/2007, Nicholas Clark <nick@​ccl4.org> wrote​:

Maybe pp_mapwhile should FREETMPS?

Maybe at least for the block form of map ?

I can't think of a way to break it for the expression form of map, but yes,
I also don't feel totally sure that it wouldn't be break-able.

Also, grep might be in the same case ?

It seems to suffer from the same problem.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2010

From @iabyn

On Tue, Dec 04, 2007 at 11​:16​:35AM +0000, Nicholas Clark wrote​:

On Mon, Dec 03, 2007 at 11​:07​:51PM +0100, Rafael Garcia-Suarez wrote​:

On 03/12/2007, Nicholas Clark <nick@​ccl4.org> wrote​:

Maybe pp_mapwhile should FREETMPS?

Maybe at least for the block form of map ?

I can't think of a way to break it for the expression form of map, but yes,
I also don't feel totally sure that it wouldn't be break-able.

Also, grep might be in the same case ?

It seems to suffer from the same problem.

Now fixed with this commit​:

commit b2a2a90
Author​: David Mitchell <davem@​iabyn.com>
AuthorDate​: Mon Oct 4 15​:18​:44 2010 +0100
Commit​: David Mitchell <davem@​iabyn.com>
CommitDate​: Mon Oct 4 15​:29​:24 2010 +0100

  stop map,grep leaking temps [perl #48004]
 
  The former behaviour of map and grep was to never free any temps.
  Thus for large lists (and even worse, nested maps), the tmps stack could
  grow very large. For all cases expect list-context map, the fix is easy​:
  just do a FREETMPS at the end of each iteration.
 
  The list-context map however, needs to accumulate a list of temporaries
  over the course of the iterations, and finally return that list to the
  caller (which is responsible for freeing them). We get round this by, at
  the end of each iteration, directly manipulating the tmps stack to free
  everything *except* the values to be returned. To make this efficient,
  we splice in the returned tmp items at the base of the stack frame, move
  PL_tmps_floor above them, then do a FREETMPS (so they may appear twice on
  the temps stack, but initially only get freed once).

M pp_ctl.c
M pp_hot.c
M t/op/svleak.t

--
Hofstadter's Law​: It always takes longer than you expect, even when you
take into account Hofstadter's Law.

@p5pRT
Copy link
Author

p5pRT commented Oct 4, 2010

@iabyn - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant