Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize assigning to scalars from @_ #12342

Open
p5pRT opened this issue Aug 22, 2012 · 6 comments
Open

Optimize assigning to scalars from @_ #12342

p5pRT opened this issue Aug 22, 2012 · 6 comments

Comments

@p5pRT
Copy link

p5pRT commented Aug 22, 2012

Migrated from rt.perl.org#114536 (status was 'open')

Searchable as RT114536$

@p5pRT
Copy link
Author

p5pRT commented Aug 22, 2012

From @xdg

Created by @xdg

This ticket suggests a possible peephole optimization, should someone
be interested in pursing it. It is based on a discussion I had with
Nicholas Clark and I agreed to write it up so the idea wouldn't be
lost.

In short, a huge number of pure Perl subroutines start off like this​:

  sub foo {
  my ($one, $two) = @​_;
  ...
  }

This assignment requires multiple OPs to execute​:

  $ perl -MO=Concise,foo,-exec -e 'sub foo {my ($one, $two) = @​_; dump }'
  main​::foo​:
  1 <;> nextstate(main 1 -e​:1) v
  2 <0> pushmark s
  3 <#> gv[*_] s
  4 <1> rv2av[t4] lK/1
  5 <0> pushmark sRM*/128
  6 <0> padsv[$one​:1,2] lRM*/LVINTRO
  7 <0> padsv[$two​:1,2] lRM*/LVINTRO
  8 <2> aassign[t5] vKS
  9 <;> nextstate(main 2 -e​:1) v​:{
  a <0> dump s*
  b <1> leavesub[1 ref] K/REFC,1

If this is predictable and detectable at the start of a subroutine --
albeit with a variable number of padsv's being assigned -- then it
should be possible for the peephole optimizier to replace them with a
new, single OP that copies an arbitrary number of SV's directly from the
stack to the pad. Depending on the overhead of such a peephole check on
every sub, this could potentially speed up code with many small
subroutines such as occurs in highly-factored code.

(I don't know if similar optimization could be made to work -- or would
be worthwhile -- for shift or array/hash assignment.)

-- David

Perl Info

Flags:
    category=core
    severity=wishlist

Site configuration information for perl 5.16.0:

Configured by david at Fri Jul 20 11:04:40 PDT 2012.

Summary of my perl5 (revision 5 version 16 subversion 0) configuration:

  Platform:
    osname=linux, osvers=3.0.0-22-generic, archname=x86_64-linux
    uname='linux icarus 3.0.0-22-generic #36-ubuntu smp tue jun 12 17:37:42
utc 2012 x86_64 x86_64 x86_64 gnulinux '
    config_args='-de -Dprefix=/home/david/perl5/perlbrew/perls/perl-5.16.0'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include'
    ccversion='', gccversion='4.6.1', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.13'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib
-fstack-protector'

Locally applied patches:



@INC for perl 5.16.0:

/home/david/perl5/perlbrew/perls/perl-5.16.0/lib/site_perl/5.16.0/x86_64-linux
    /home/david/perl5/perlbrew/perls/perl-5.16.0/lib/site_perl/5.16.0
    /home/david/perl5/perlbrew/perls/perl-5.16.0/lib/5.16.0/x86_64-linux
    /home/david/perl5/perlbrew/perls/perl-5.16.0/lib/5.16.0
    .


Environment for perl 5.16.0:
    HOME=/home/david
    LANG=en_US.UTF-8
    LANGUAGE=en_US:en
    LC_COLLATE=C
    LC_CTYPE=en_US.UTF-8
    LC_MESSAGES=en_US.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

PATH=/home/david/perl5/perlbrew/bin:/home/david/perl5/perlbrew/perls/perl-5.16.0/bin:~/bin:~/git/utility-scripts:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/vagrant/bin:.
    PERLBREW_BASHRC_VERSION=0.45
    PERLBREW_HOME=/home/david/.perlbrew
    PERLBREW_MANPATH=/home/david/perl5/perlbrew/perls/perl-5.16.0/man

PERLBREW_PATH=/home/david/perl5/perlbrew/bin:/home/david/perl5/perlbrew/perls/perl-5.16.0/bin
    PERLBREW_PERL=perl-5.16.0
    PERLBREW_ROOT=/home/david/perl5/perlbrew
    PERLBREW_VERSION=0.45
    PERL_BADLANG (unset)
    PERL_EXTUTILS_AUTOINSTALL=--defaultdeps
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Aug 31, 2012

From @cpansprout

On Wed Aug 22 08​:03​:08 2012, dagolden@​cpan.org wrote​:

This ticket suggests a possible peephole optimization, should someone
be interested in pursing it. It is based on a discussion I had with
Nicholas Clark and I agreed to write it up so the idea wouldn't be
lost.

In short, a huge number of pure Perl subroutines start off like this​:

sub foo {
my ($one, $two) = @​_;
...
}

This assignment requires multiple OPs to execute​:

$ perl -MO=Concise,foo,-exec -e 'sub foo {my ($one, $two) = @​_; dump
}'
main​::foo​:
1 <;> nextstate(main 1 -e​:1) v
2 <0> pushmark s
3 <#> gv[*_] s
4 <1> rv2av[t4] lK/1
5 <0> pushmark sRM*/128
6 <0> padsv[$one​:1,2] lRM*/LVINTRO
7 <0> padsv[$two​:1,2] lRM*/LVINTRO
8 <2> aassign[t5] vKS
9 <;> nextstate(main 2 -e​:1) v​:{
a <0> dump s*
b <1> leavesub[1 ref] K/REFC,1

If this is predictable and detectable at the start of a subroutine --
albeit with a variable number of padsv's being assigned -- then it
should be possible for the peephole optimizier to replace them with a
new, single OP that copies an arbitrary number of SV's directly from
the
stack to the pad. Depending on the overhead of such a peephole check
on
every sub, this could potentially speed up code with many small
subroutines such as occurs in highly-factored code.

(I don't know if similar optimization could be made to work -- or
would
be worthwhile -- for shift or array/hash assignment.)

One thing you would have to be careful of is &foo-style calls, that
share @​_.

XS modules can get the caller’s arguments via get_av("_"), and any sub
call could potentially be an XS call.

I don’t think this optimisation is possible.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 31, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2012

From @ap

* Father Chrysostomos via RT <perlbug-followup@​perl.org> [2012-08-31 23​:15]​:

One thing you would have to be careful of is &foo-style calls, that
share @​_.

XS modules can get the caller’s arguments via get_av("_"), and any sub
call could potentially be an XS call.

I don’t think this optimisation is possible.

Pardon my ignorance in perlguts, but as far as I can tell within the
reach of my understanding, David’s proposal changes absolutely nothing
about the semantics of @​_ in any way whatsoever. I understood his call
to be for a way to do skip the usual “push push push list-assign” grind
required to implement `my ($foo, $bar, $baz) = @​_` in terms of the
general-case implementation for assigning to a list – which would have
no effect I can see on the presence or contents etc of @​_.

Were all of that correct, then the answer to your objections should seem
to be a simple – so what?

Is it? Am I missing something?

Regards,
--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2012

From @cpansprout

On Sun Sep 02 09​:01​:53 2012, aristotle wrote​:

* Father Chrysostomos via RT <perlbug-followup@​perl.org> [2012-08-31
23​:15]​:

One thing you would have to be careful of is &foo-style calls, that
share @​_.

XS modules can get the caller’s arguments via get_av("_"), and any sub
call could potentially be an XS call.

I don’t think this optimisation is possible.

Pardon my ignorance in perlguts, but as far as I can tell within the
reach of my understanding, David’s proposal changes absolutely nothing
about the semantics of @​_ in any way whatsoever. I understood his call
to be for a way to do skip the usual “push push push list-assign” grind
required to implement `my ($foo, $bar, $baz) = @​_` in terms of the
general-case implementation for assigning to a list – which would have
no effect I can see on the presence or contents etc of @​_.

Were all of that correct, then the answer to your objections should seem
to be a simple – so what?

Is it? Am I missing something?

I understood this to be a follow-up to Nicholas Clark’s suggestion of
optimising the setup of @​_ by avoid having to copy the stack to @​_ on
subroutine entry (<20120816160304.GV9834@​plum.flirble.org>).

David mentioned copying straight from the stack to the pad, so I thought
he meant bypassing @​_ altogether and not even having to set it up.

If my ($foo,$bar) = @​_ occurs at the beginning of a subroutine, then the
items are still on the stack by chance, but copying from the stack in
that case would be no faster that copying from @​_.

Optimising my($foo,$bar)=... in general might be possible. The speed
gain might be negligible, though, as the padsv op is already super-fast.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 2, 2012

From @iabyn

On Sun, Sep 02, 2012 at 11​:05​:02AM -0700, Father Chrysostomos via RT wrote​:

Optimising my($foo,$bar)=... in general might be possible. The speed
gain might be negligible, though, as the padsv op is already super-fast.

I already have a plan for this. The speedup might actually be
considerable, both in terms of replacing several ops with a single one,
and in terms of making the all the individual SAVEt_CLEARSV's into a
single "SAVE pad index range" operation that is pushed on the savestack
once and popped once.

In more detail​: the basic idea is to is to replace the expression
  my (X,Y,Z)
with a single op that contains a range of pad indexes. It will also have
a flag bit to indicate whether it should get @​_ at the same time. So,

  my ($a,$b,$c) = @​_

goes from being

  pushmark s
  gv[*_] s
  rv2av[t5] lK/1
  pushmark sRM*/128
  padsv[$a​:1,2] lRM*/LVINTRO
  padsv[$b​:1,2] lRM*/LVINTRO
  padsv[$c​:1,2] lRM*/LVINTRO
  aassign[t6] vKS
to
  padlist[$a..$c]
  aassign[t6] vKS

The new op's action is to
  1. If the SPECIAL flag is set, push GvAV(PL_defgv);
  2. if not in void context, then
  for (index_min..index_max) { push PL_curpad[$_]};
  3. push a single item onto the savestack, which is a
  SAVEt_CLEARSV_RANGE (say) that holds a pointer to the pad index list
  (rather than pushing a whole separate bunch of SAVEt_CLEARSV's)
  At scope exit time there will thus be only one entry to pop from
  the scope stack.
  4. Do appropriate PUSHMARKs as needed.

I think this could be a big performance win. It speeds up sub entry​:
  my ($a,$b,$c) = @​_;
general assignments​:
  my ($a,$b,$c) = /.../g;
and even just void declarations​:
  my ($a,$b,$c);
(this latter avoiding a bunch of pointless retrieving lexicals from
the pad and pushing them on the stack.)

I plan to look into this further once I've finished off the regex copying
stuff.

--
All wight. I will give you one more chance. This time, I want to hear
no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers.
  -- Life of Brian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants