Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COWification seems expensive in PADMY variables #13880

Open
p5pRT opened this issue May 28, 2014 · 9 comments
Open

COWification seems expensive in PADMY variables #13880

p5pRT opened this issue May 28, 2014 · 9 comments

Comments

@p5pRT
Copy link

p5pRT commented May 28, 2014

Migrated from rt.perl.org#121977 (status was 'open')

Searchable as RT121977$

@p5pRT
Copy link
Author

p5pRT commented May 28, 2014

From @Leont

This is a bug report for perl from fawaka@​gmail.com,
generated with the help of perlbug 1.40 running under perl 5.20.0.


It appears that in 5.20 returning string from a lexical variable is
significantly slower that returning a dereferenced temporary value (return
${$foo}), while on previous versions without COW they were equally slow.
This slowdown does not happen when the same function is called in void
context or if the subroutine is :lvalue.

I would guess this is because the return creates a new temporary value out
of the padmy variable, and this copy is not done as a COW and thus
expensive. IMO this is rather unfortunate.

See attached script for a benchmark



Flags​:
  category=core
  severity=low


Site configuration information for perl 5.20.0​:

Configured by leon at Tue May 27 11​:32​:58 CEST 2014.

Summary of my perl5 (revision 5 version 20 subversion 0) configuration​:

  Platform​:
  osname=linux, osvers=3.11.0-20-generic,
archname=x86_64-linux-thread-multi
  uname='linux leon-laptop 3.11.0-20-generic #35-ubuntu smp fri may 2
21​:32​:49 utc 2014 x86_64 x86_64 x86_64 gnulinux '
  config_args='-de -Dprefix=/home/leon/perl5/perlbrew/perls/perl-5.20.0
-Dusethreads -Duseshrplib'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
  optimize='-O2',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe
-fstack-protector -I/usr/local/include'
  ccversion='', gccversion='4.8.1', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
/usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
  libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
-lgdbm_compat
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  libc=, so=so, useshrplib=true, libperl=libperl.so
  gnulibc_version='2.17'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
-Wl,-rpath,/home/leon/perl5/perlbrew/perls/perl-5.20.0/lib/5.20.0/x86_64-linux-thread-multi/CORE'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib
-fstack-protector'


@​INC for perl 5.20.0​:

/home/leon/perl5/perlbrew/perls/perl-5.20.0/lib/site_perl/5.20.0/x86_64-linux-thread-multi
  /home/leon/perl5/perlbrew/perls/perl-5.20.0/lib/site_perl/5.20.0

/home/leon/perl5/perlbrew/perls/perl-5.20.0/lib/5.20.0/x86_64-linux-thread-multi
  /home/leon/perl5/perlbrew/perls/perl-5.20.0/lib/5.20.0
  .


Environment for perl 5.20.0​:
  HOME=/home/leon
  LANG=en_US.UTF-8
  LANGUAGE=en_US​:en
  LC_ADDRESS=en_US.UTF-8
  LC_IDENTIFICATION=en_US.UTF-8
  LC_MEASUREMENT=en_US.UTF-8
  LC_MONETARY=en_US.UTF-8
  LC_NAME=en_US.UTF-8
  LC_NUMERIC=en_US.UTF-8
  LC_PAPER=en_US.UTF-8
  LC_TELEPHONE=en_US.UTF-8
  LC_TIME=en_US.UTF-8
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)

PATH=/home/leon/perl5/perlbrew/bin​:/home/leon/perl5/perlbrew/perls/perl-5.20.0/bin​:/home/leon/bin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/usr/bin​:/sbin​:/bin​:/usr/games​:/usr/local/games
  PERLBREW_HOME=/home/leon/.perlbrew

PERLBREW_PATH=/home/leon/perl5/perlbrew/bin​:/home/leon/perl5/perlbrew/perls/perl-5.20.0/bin
  PERLBREW_PERL=perl-5.20.0
  PERLBREW_ROOT=/home/leon/perl5/perlbrew
  PERLBREW_VERSION=0.25
  PERL_BADLANG (unset)
  SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented May 28, 2014

From @Leont

bug.pl

@p5pRT
Copy link
Author

p5pRT commented May 28, 2014

From @jkeenan

On Wed May 28 13​:27​:16 2014, LeonT wrote​:

This is a bug report for perl from fawaka@​gmail.com,
generated with the help of perlbug 1.40 running under perl 5.20.0.

-----------------------------------------------------------------

It appears that in 5.20 returning string from a lexical variable is
significantly slower that returning a dereferenced temporary value
(return
${$foo}), while on previous versions without COW they were equally
slow.
This slowdown does not happen when the same function is called in void
context or if the subroutine is :lvalue.

[snip]

See attached script for a benchmark

Leon, I didn't get quite the same results as you did. Please see file attached which reports results on two Linux x86_64 machines and one Linux i686 machine.

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented May 28, 2014

From @jkeenan

# current laptop
$ which perl
/home/jkeenan/perl5/perlbrew/perls/perl-5.20.0/bin/perl

$ uname -a
Linux zareason 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18​:06​:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ perl 121977-leont-cow.pl 300000
  Rate lvalue ref simple
lvalue 12815/s -- -2% -2%
ref 13049/s 2% -- -0%
simple 13106/s 2% 0% --
  Rate simple lvalue ref
simple 4175/s -- -65% -65%
lvalue 12029/s 188% -- -0%
ref 12068/s 189% 0% --

# dromedary blead
This is perl 5, version 21, subversion 1 (v5.21.1 (v5.21.0-69-g0fadf2d)) built for x86_64-linux
$ ./perl -Ilib ../p5p/121977-leont-cow.pl 300000
  Rate lvalue ref simple
lvalue 8460/s -- -1% -1%
ref 8535/s 1% -- -0%
simple 8574/s 1% 0% --
  Rate simple lvalue ref
simple 4228/s -- -49% -49%
lvalue 8299/s 96% -- -1%
ref 8366/s 98% 1% --

# older Linode
$ perl -v | head -3

This is perl 5, version 20, subversion 0 (v5.20.0) built for i686-linux

$ uname -a
Linux li11-226 2.6.18.8-linode22 #1 SMP Tue Nov 10 16​:12​:12 UTC 2009 i686 GNU/Linux

$ perl 121977-leont-cow.pl 100000
# took nearly 6 minutes!
  Rate ref lvalue simple
ref 1253/s -- -87% -87%
lvalue 9524/s 660% -- -2%
simple 9747/s 678% 2% --
  Rate simple lvalue ref
simple 1085/s -- -12% -13%
lvalue 1227/s 13% -- -2%
ref 1251/s 15% 2% --

@p5pRT
Copy link
Author

p5pRT commented May 28, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 29, 2014

From @Leont

On Thu, May 29, 2014 at 12​:46 AM, James E Keenan via RT <
perlbug-followup@​perl.org> wrote​:

On Wed May 28 13​:27​:16 2014, LeonT wrote​:

This is a bug report for perl from fawaka@​gmail.com,
generated with the help of perlbug 1.40 running under perl 5.20.0.

-----------------------------------------------------------------

It appears that in 5.20 returning string from a lexical variable is
significantly slower that returning a dereferenced temporary value
(return
${$foo}), while on previous versions without COW they were equally
slow.
This slowdown does not happen when the same function is called in void
context or if the subroutine is :lvalue.

[snip]

See attached script for a benchmark

Leon, I didn't get quite the same results as you did. Please see file
attached which reports results on two Linux x86_64 machines and one Linux
i686 machine.

Thank you very much.
Jim Keenan

# current laptop
$ which perl
/home/jkeenan/perl5/perlbrew/perls/perl-5.20.0/bin/perl

$ uname -a
Linux zareason 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18​:06​:16 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux

$ perl 121977-leont-cow.pl 300000
Rate lvalue ref simple
lvalue 12815/s -- -2% -2%
ref 13049/s 2% -- -0%
simple 13106/s 2% 0% --
Rate simple lvalue ref
simple 4175/s -- -65% -65%
lvalue 12029/s 188% -- -0%
ref 12068/s 189% 0% --

# dromedary blead
This is perl 5, version 21, subversion 1 (v5.21.1 (v5.21.0-69-g0fadf2d))
built for x86_64-linux
$ ./perl -Ilib ../p5p/121977-leont-cow.pl 300000
Rate lvalue ref simple
lvalue 8460/s -- -1% -1%
ref 8535/s 1% -- -0%
simple 8574/s 1% 0% --
Rate simple lvalue ref
simple 4228/s -- -49% -49%
lvalue 8299/s 96% -- -1%
ref 8366/s 98% 1% --

Those are the results I was expecting.

# older Linode
$ perl -v | head -3

This is perl 5, version 20, subversion 0 (v5.20.0) built for i686-linux

$ uname -a
Linux li11-226 2.6.18.8-linode22 #1 SMP Tue Nov 10 16​:12​:12 UTC 2009 i686
GNU/Linux

$ perl 121977-leont-cow.pl 100000
# took nearly 6 minutes!
Rate ref lvalue simple
ref 1253/s -- -87% -87%
lvalue 9524/s 660% -- -2%
simple 9747/s 678% 2% --
Rate simple lvalue ref
simple 1085/s -- -12% -13%
lvalue 1227/s 13% -- -2%
ref 1251/s 15% 2% --

I have no idea what's going on there.

Leon

@p5pRT
Copy link
Author

p5pRT commented Jun 5, 2014

From @iabyn

On Thu, May 29, 2014 at 12​:22​:42PM +0200, Leon Timmermans wrote​:

This is perl 5, version 21, subversion 1 (v5.21.1 (v5.21.0-69-g0fadf2d))
built for x86_64-linux
$ ./perl -Ilib ../p5p/121977-leont-cow.pl 300000
Rate lvalue ref simple
lvalue 8460/s -- -1% -1%
ref 8535/s 1% -- -0%
simple 8574/s 1% 0% --
Rate simple lvalue ref
simple 4228/s -- -49% -49%
lvalue 8299/s 96% -- -1%
ref 8366/s 98% 1% --

Those are the results I was expecting.

i.e. that although COW has made some things faster in 5.20.0 compared with
5.18, the 'simple' case hasn't seen the speedup seen by the other cases.

The commit below fixes the proximate cause. However, there were three
things interacting with each other that together caused the issue.
My commit fixes one of the 3 issues, and is enough to boost performance
for this use case. Two other issues that I have not yet addressed are​:

('x' x 1_000_000) is constant folded at compile time, and the COW code
in sv_setsv_flags() is failing to do COW on something like

  $buf = 'x' x 1_000_000;

and is copying instead. I think FC did some work on making COW work with
RO values, so perhaps this is something that should work. Perhaps
string constants need to be marked as COW (with RC==0) before making
them read-only at compile time???

The second issue is that, to work around the problem with readline
allocating a large buffer, which then got COWed and 'donated' in something
like

  push @​a, $_ while <>;

we added a heuristic along the lines of 'copy rather than COW' if
SvCUR * A < SvLEN for some constant factor A. The trouble is, this is
clashing with sv_grow()'s

  SvLEN = SvCUR * B;

for some fudge factor B (i.e. over-allocate when growing the buffer).

If B > A, we end up creating strings that can't be COWed. So we probably
need to harmonise the constants involved in A and B.

Anyway, here's my commit​:

commit a7ab896
Author​: David Mitchell <davem@​iabyn.com>
AuthorDate​: Thu Jun 5 15​:03​:32 2014 +0100
Commit​: David Mitchell <davem@​iabyn.com>
CommitDate​: Thu Jun 5 15​:03​:32 2014 +0100

  when unCOWing a string, set SvCUR to 0
 
  When a COW string is unCOWed, as well as setting SvPVX to NULL and SvLEN
  to 0, set SvCUR to 0 too.
 
  This is to avoid a later SvGROW on the same using the old SvCUR() value
  to calculate a roundup to the buffer size.
 
  Consider the following code​:
 
  use Devel​::Peek;
  for (1..3) {
  my $t;
  my $s = 'x' x 100;
  $t = $s;
  Dump $s;
  }
 
  Looking at the LEN line of the Dump output, we got on 5.20.0​:
 
  LEN = 102
  LEN = 135
  LEN = 135
 
  and after this commit,
 
  LEN = 102
  LEN = 102
  LEN = 102
 
  As well as wasting space, this extra LEN was then triggering the 'skip COW
  if LEN >> CUR' mechanism, causing extra copies. See​:
 
  [perl #121977] COWification seems expensive in PADMY variables

--
A power surge on the Bridge is rapidly and correctly diagnosed as a faulty
capacitor by the highly-trained and competent engineering staff.
  -- Things That Never Happen in "Star Trek" #9

@p5pRT
Copy link
Author

p5pRT commented Feb 26, 2016

From @mauke

On Thu Jun 05 08​:16​:02 2014, davem wrote​:

On Thu, May 29, 2014 at 12​:22​:42PM +0200, Leon Timmermans wrote​:

This is perl 5, version 21, subversion 1 (v5.21.1 (v5.21.0-69-
g0fadf2d))
built for x86_64-linux
$ ./perl -Ilib ../p5p/121977-leont-cow.pl 300000
Rate lvalue ref simple
lvalue 8460/s -- -1% -1%
ref 8535/s 1% -- -0%
simple 8574/s 1% 0% --
Rate simple lvalue ref
simple 4228/s -- -49% -49%
lvalue 8299/s 96% -- -1%
ref 8366/s 98% 1% --

Those are the results I was expecting.

i.e. that although COW has made some things faster in 5.20.0 compared
with
5.18, the 'simple' case hasn't seen the speedup seen by the other
cases.

The commit below fixes the proximate cause. However, there were three
things interacting with each other that together caused the issue.
My commit fixes one of the 3 issues, and is enough to boost
performance
for this use case. Two other issues that I have not yet addressed are​:

('x' x 1_000_000) is constant folded at compile time, and the COW code
in sv_setsv_flags() is failing to do COW on something like

$buf = 'x' x 1_000_000;

and is copying instead. I think FC did some work on making COW work
with
RO values, so perhaps this is something that should work. Perhaps
string constants need to be marked as COW (with RC==0) before making
them read-only at compile time???

The second issue is that, to work around the problem with readline
allocating a large buffer, which then got COWed and 'donated' in
something
like

push @​a, $_ while <>;

we added a heuristic along the lines of 'copy rather than COW' if
SvCUR * A < SvLEN for some constant factor A. The trouble is, this is
clashing with sv_grow()'s

SvLEN = SvCUR * B;

for some fudge factor B (i.e. over-allocate when growing the buffer).

If B > A, we end up creating strings that can't be COWed. So we
probably
need to harmonise the constants involved in A and B.

Anyway, here's my commit​:

commit a7ab896
Author​: David Mitchell <davem@​iabyn.com>
AuthorDate​: Thu Jun 5 15​:03​:32 2014 +0100
Commit​: David Mitchell <davem@​iabyn.com>
CommitDate​: Thu Jun 5 15​:03​:32 2014 +0100

when unCOWing a string, set SvCUR to 0

When a COW string is unCOWed, as well as setting SvPVX to NULL and
SvLEN
to 0, set SvCUR to 0 too.

This is to avoid a later SvGROW on the same using the old SvCUR()
value
to calculate a roundup to the buffer size.

Consider the following code​:

use Devel​::Peek;
for (1..3) {
my $t;
my $s = 'x' x 100;
$t = $s;
Dump $s;
}

Looking at the LEN line of the Dump output, we got on 5.20.0​:

LEN = 102
LEN = 135
LEN = 135

and after this commit,

LEN = 102
LEN = 102
LEN = 102

As well as wasting space, this extra LEN was then triggering the 'skip
COW
if LEN >> CUR' mechanism, causing extra copies. See​:

[perl #121977] COWification seems expensive in PADMY variables

This ticket is listed in perl5201delta. Is there still work happening here?

@p5pRT
Copy link
Author

p5pRT commented Feb 29, 2016

From @iabyn

On Fri, Feb 26, 2016 at 10​:34​:50AM -0800, l.mai@​web.de via RT wrote​:

This ticket is listed in perl5201delta. Is there still work happening here?

The other two issues I mentioned still appear to to be issues;
so this ticket needs to remain open

--
Any [programming] language that doesn't occasionally surprise the
novice will pay for it by continually surprising the expert.
  -- Larry Wall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants