Optimize assigning to scalars from @_ #12342

p5pRT · 2012-08-22T15:03:07Z

Migrated from rt.perl.org#114536 (status was 'open')

Searchable as RT114536$

p5pRT · 2012-08-22T15:03:08Z

From @xdg

Created by @xdg

This ticket suggests a possible peephole optimization, should someone
be interested in pursing it. It is based on a discussion I had with
Nicholas Clark and I agreed to write it up so the idea wouldn't be
lost.

In short, a huge number of pure Perl subroutines start off like this:

sub foo {
my ($one, $two) = @_;
...
}

This assignment requires multiple OPs to execute:

$ perl -MO=Concise,foo,-exec -e 'sub foo {my ($one, $two) = @_; dump }'
main::foo:
1 <;> nextstate(main 1 -e:1) v
2 <0> pushmark s
3 <#> gv[*_] s
4 <1> rv2av[t4] lK/1
5 <0> pushmark sRM*/128
6 <0> padsv[$one:1,2] lRM*/LVINTRO
7 <0> padsv[$two:1,2] lRM*/LVINTRO
8 <2> aassign[t5] vKS
9 <;> nextstate(main 2 -e:1) v:{
a <0> dump s*
b <1> leavesub[1 ref] K/REFC,1

If this is predictable and detectable at the start of a subroutine --
albeit with a variable number of padsv's being assigned -- then it
should be possible for the peephole optimizier to replace them with a
new, single OP that copies an arbitrary number of SV's directly from the
stack to the pad. Depending on the overhead of such a peephole check on
every sub, this could potentially speed up code with many small
subroutines such as occurs in highly-factored code.

(I don't know if similar optimization could be made to work -- or would
be worthwhile -- for shift or array/hash assignment.)

-- David

Perl Info


Flags:
    category=core
    severity=wishlist

Site configuration information for perl 5.16.0:

Configured by david at Fri Jul 20 11:04:40 PDT 2012.

Summary of my perl5 (revision 5 version 16 subversion 0) configuration:

  Platform:
    osname=linux, osvers=3.0.0-22-generic, archname=x86_64-linux
    uname='linux icarus 3.0.0-22-generic #36-ubuntu smp tue jun 12 17:37:42
utc 2012 x86_64 x86_64 x86_64 gnulinux '
    config_args='-de -Dprefix=/home/david/perl5/perlbrew/perls/perl-5.16.0'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include'
    ccversion='', gccversion='4.6.1', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.13'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib
-fstack-protector'

Locally applied patches:



@INC for perl 5.16.0:

/home/david/perl5/perlbrew/perls/perl-5.16.0/lib/site_perl/5.16.0/x86_64-linux
    /home/david/perl5/perlbrew/perls/perl-5.16.0/lib/site_perl/5.16.0
    /home/david/perl5/perlbrew/perls/perl-5.16.0/lib/5.16.0/x86_64-linux
    /home/david/perl5/perlbrew/perls/perl-5.16.0/lib/5.16.0
    .


Environment for perl 5.16.0:
    HOME=/home/david
    LANG=en_US.UTF-8
    LANGUAGE=en_US:en
    LC_COLLATE=C
    LC_CTYPE=en_US.UTF-8
    LC_MESSAGES=en_US.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

PATH=/home/david/perl5/perlbrew/bin:/home/david/perl5/perlbrew/perls/perl-5.16.0/bin:~/bin:~/git/utility-scripts:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/vagrant/bin:.
    PERLBREW_BASHRC_VERSION=0.45
    PERLBREW_HOME=/home/david/.perlbrew
    PERLBREW_MANPATH=/home/david/perl5/perlbrew/perls/perl-5.16.0/man

PERLBREW_PATH=/home/david/perl5/perlbrew/bin:/home/david/perl5/perlbrew/perls/perl-5.16.0/bin
    PERLBREW_PERL=perl-5.16.0
    PERLBREW_ROOT=/home/david/perl5/perlbrew
    PERLBREW_VERSION=0.45
    PERL_BADLANG (unset)
    PERL_EXTUTILS_AUTOINSTALL=--defaultdeps
    SHELL=/bin/bash

p5pRT · 2012-08-31T21:12:42Z

From @cpansprout

On Wed Aug 22 08:03:08 2012, dagolden@cpan.org wrote:

This ticket suggests a possible peephole optimization, should someone
be interested in pursing it. It is based on a discussion I had with
Nicholas Clark and I agreed to write it up so the idea wouldn't be
lost.

In short, a huge number of pure Perl subroutines start off like this:

sub foo {
my ($one, $two) = @_;
...
}

This assignment requires multiple OPs to execute:

$ perl -MO=Concise,foo,-exec -e 'sub foo {my ($one, $two) = @_; dump
}'
main::foo:
1 <;> nextstate(main 1 -e:1) v
2 <0> pushmark s
3 <#> gv[*_] s
4 <1> rv2av[t4] lK/1
5 <0> pushmark sRM*/128
6 <0> padsv[$one:1,2] lRM*/LVINTRO
7 <0> padsv[$two:1,2] lRM*/LVINTRO
8 <2> aassign[t5] vKS
9 <;> nextstate(main 2 -e:1) v:{
a <0> dump s*
b <1> leavesub[1 ref] K/REFC,1

If this is predictable and detectable at the start of a subroutine --
albeit with a variable number of padsv's being assigned -- then it
should be possible for the peephole optimizier to replace them with a
new, single OP that copies an arbitrary number of SV's directly from
the
stack to the pad. Depending on the overhead of such a peephole check
on
every sub, this could potentially speed up code with many small
subroutines such as occurs in highly-factored code.

(I don't know if similar optimization could be made to work -- or
would
be worthwhile -- for shift or array/hash assignment.)

One thing you would have to be careful of is &foo-style calls, that
share @_.

XS modules can get the caller’s arguments via get_av("_"), and any sub
call could potentially be an XS call.

I don’t think this optimisation is possible.

--

Father Chrysostomos

p5pRT · 2012-08-31T21:12:42Z

The RT System itself - Status changed from 'new' to 'open'

p5pRT · 2012-09-02T16:01:53Z

From @ap

* Father Chrysostomos via RT <perlbug-followup@perl.org> [2012-08-31 23:15]:

One thing you would have to be careful of is &foo-style calls, that
share @_.

XS modules can get the caller’s arguments via get_av("_"), and any sub
call could potentially be an XS call.

I don’t think this optimisation is possible.

Pardon my ignorance in perlguts, but as far as I can tell within the
reach of my understanding, David’s proposal changes absolutely nothing
about the semantics of @_ in any way whatsoever. I understood his call
to be for a way to do skip the usual “push push push list-assign” grind
required to implement `my ($foo, $bar, $baz) = @_` in terms of the
general-case implementation for assigning to a list – which would have
no effect I can see on the presence or contents etc of @_.

Were all of that correct, then the answer to your objections should seem
to be a simple – so what?

Is it? Am I missing something?

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

p5pRT · 2012-09-02T18:05:02Z

From @cpansprout

On Sun Sep 02 09:01:53 2012, aristotle wrote:

* Father Chrysostomos via RT <perlbug-followup@perl.org> [2012-08-31
23:15]:

One thing you would have to be careful of is &foo-style calls, that
share @_.

XS modules can get the caller’s arguments via get_av("_"), and any sub
call could potentially be an XS call.

I don’t think this optimisation is possible.

Pardon my ignorance in perlguts, but as far as I can tell within the
reach of my understanding, David’s proposal changes absolutely nothing
about the semantics of @_ in any way whatsoever. I understood his call
to be for a way to do skip the usual “push push push list-assign” grind
required to implement `my ($foo, $bar, $baz) = @_` in terms of the
general-case implementation for assigning to a list – which would have
no effect I can see on the presence or contents etc of @_.

Were all of that correct, then the answer to your objections should seem
to be a simple – so what?

Is it? Am I missing something?

I understood this to be a follow-up to Nicholas Clark’s suggestion of
optimising the setup of @_ by avoid having to copy the stack to @_ on
subroutine entry (<20120816160304.GV9834@plum.flirble.org>).

David mentioned copying straight from the stack to the pad, so I thought
he meant bypassing @_ altogether and not even having to set it up.

If my ($foo,$bar) = @_ occurs at the beginning of a subroutine, then the
items are still on the stack by chance, but copying from the stack in
that case would be no faster that copying from @_.

Optimising my($foo,$bar)=... in general might be possible. The speed
gain might be negligible, though, as the padsv op is already super-fast.

--

Father Chrysostomos

p5pRT · 2012-09-02T19:04:17Z

From @iabyn

On Sun, Sep 02, 2012 at 11:05:02AM -0700, Father Chrysostomos via RT wrote:

Optimising my($foo,$bar)=... in general might be possible. The speed
gain might be negligible, though, as the padsv op is already super-fast.

I already have a plan for this. The speedup might actually be
considerable, both in terms of replacing several ops with a single one,
and in terms of making the all the individual SAVEt_CLEARSV's into a
single "SAVE pad index range" operation that is pushed on the savestack
once and popped once.

In more detail: the basic idea is to is to replace the expression
my (X,Y,Z)
with a single op that contains a range of pad indexes. It will also have
a flag bit to indicate whether it should get @_ at the same time. So,

my ($a,$b,$c) = @_

goes from being

pushmark s
gv[*_] s
rv2av[t5] lK/1
pushmark sRM*/128
padsv[$a:1,2] lRM*/LVINTRO
padsv[$b:1,2] lRM*/LVINTRO
padsv[$c:1,2] lRM*/LVINTRO
aassign[t6] vKS
to
padlist[$a..$c]
aassign[t6] vKS

The new op's action is to
1. If the SPECIAL flag is set, push GvAV(PL_defgv);
2. if not in void context, then
for (index_min..index_max) { push PL_curpad[$_]};
3. push a single item onto the savestack, which is a
SAVEt_CLEARSV_RANGE (say) that holds a pointer to the pad index list
(rather than pushing a whole separate bunch of SAVEt_CLEARSV's)
At scope exit time there will thus be only one entry to pop from
the scope stack.
4. Do appropriate PUSHMARKs as needed.

I think this could be a big performance win. It speeds up sub entry:
my ($a,$b,$c) = @_;
general assignments:
my ($a,$b,$c) = /.../g;
and even just void declarations:
my ($a,$b,$c);
(this latter avoiding a bunch of pointless retrieving lexicals from
the pad and pushing them on the stack.)

I plan to look into this further once I've finished off the regex copying
stuff.

--
All wight. I will give you one more chance. This time, I want to hear
no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers.
-- Life of Brian

p5pRT added the Severity Low label Oct 19, 2019

xenu removed the Severity Low label Dec 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize assigning to scalars from @_ #12342

Optimize assigning to scalars from @_ #12342

p5pRT commented Aug 22, 2012

p5pRT commented Aug 22, 2012

p5pRT commented Aug 31, 2012

p5pRT commented Aug 31, 2012

p5pRT commented Sep 2, 2012

p5pRT commented Sep 2, 2012

p5pRT commented Sep 2, 2012

Optimize assigning to scalars from @_ #12342

Optimize assigning to scalars from @_ #12342

Comments

p5pRT commented Aug 22, 2012

p5pRT commented Aug 22, 2012

From @xdg

Created by @xdg

p5pRT commented Aug 31, 2012

From @cpansprout

p5pRT commented Aug 31, 2012

p5pRT commented Sep 2, 2012

From @ap

p5pRT commented Sep 2, 2012

From @cpansprout

p5pRT commented Sep 2, 2012

From @iabyn