Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check if %hash 500x times slower than if keys %hash #12350

Closed
p5pRT opened this issue Aug 24, 2012 · 73 comments
Closed

check if %hash 500x times slower than if keys %hash #12350

p5pRT opened this issue Aug 24, 2012 · 73 comments

Comments

@p5pRT
Copy link

p5pRT commented Aug 24, 2012

Migrated from rt.perl.org#114576 (status was 'resolved')

Searchable as RT114576$

@p5pRT
Copy link
Author

p5pRT commented Aug 24, 2012

From mgrigoriev@nvidia.com

Created by mgrigoriev@smtp.nvidia.com

if you run this benchmark​:

use Benchmark qw( cmpthese :hireswallclock );

my $hash = { 1 .. 10000 };
my %hash = ( 1 .. 10000 );

cmpthese( -10, {
  'if (keys %$hash)' => sub { if( keys %$hash ) { return 1 }},
  'if (keys $hash)' => sub { if( keys $hash ) {return 1 } },
  'if (keys %hash)' => sub { if( keys %hash ) { return 1 }},
  'if ( %$hash )' => sub { if( %$hash ) {return 1 }},
  'if ( %hash )' => sub { if( %hash ) { return 1 }},
  'if ( scalar %$hash )' => sub { if( scalar %$hash ) {return 1 }},
  'if ( scalar %hash )' => sub { if( scalar %hash ) { return 1 }},
  });
then you will get this result​:
Rate if ( %hash ) if ( scalar %$hash ) if ( %$hash ) if ( scalar %hash ) if (keys $hash) if (keys %$hash) if (keys %hash)

if ( %hash ) 136924/s -- -0% -0% -0% -99% -99% -99%
if ( scalar %$hash ) 136940/s 0% -- -0% -0% -99% -99% -99%
if ( %$hash ) 137054/s 0% 0% -- -0% -99% -99% -99%
if ( scalar %hash ) 137187/s 0% 0% 0% -- -99% -99% -99%
if (keys $hash) 14306992/s 10349% 10348% 10339% 10329% -- -14% -28%
if (keys %$hash) 16698271/s 12095% 12094% 12084% 12072% 17% -- -16%
if (keys %hash) 19933977/s 14458% 14457% 14445% 14431% 39% 19% -
-------------
as you can see if %hash performs more than hundred times worse than if keys %hash. It will perform even worse on large hashes, it was
observed up to 500x worse performance. Seems like this operation is O(n) insteadof O(1).
Should be fixed ASAP.

Perl Info

Flags:
    category=core
    severity=critical

Site configuration information for perl 5.14.1:

Configured by root at Fri Jun 17 09:46:42 PDT 2011.

Summary of my perl5 (revision 5 version 14 subversion 1) configuration:
   
  Platform:
    osname=linux, osvers=2.6.29.4, archname=x86_64-linux
    uname='linux l-c4u5build64 2.6.29.4 #8 smp tue may 25 14:00:33 pdt 2010 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Dprefix=/home/utils/perl-5.14/5.14.1-nothreads-64'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-8)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64 /usr/local/lib64
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.4.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.4'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib'

Locally applied patches:
    


@INC for perl 5.14.1:
    /home/utils/perl-5.14/5.14.1-nothreads-64/lib/site_perl/5.14.1/x86_64-linux
    /home/utils/perl-5.14/5.14.1-nothreads-64/lib/site_perl/5.14.1
    /home/utils/perl-5.14/5.14.1-nothreads-64/lib/5.14.1/x86_64-linux
    /home/utils/perl-5.14/5.14.1-nothreads-64/lib/5.14.1
    .


Environment for perl 5.14.1:
    HOME=/home/mgrigoriev
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH=.:/usr/lib
    LOGDIR (unset)
    PATH=/home/mgrigoriev/bin:/home/utils/bin:/bin:/home/gnu/bin:/usr/bin:.:/sbin:/usr/sbin:/usr/ucb:/usr/ccs/bin:/usr/lib:/etc:/home/nv/bin:/usr/bin/X11:/usr/local/lsf/bin:/home/tools/td/td5303/linux/bin:/home/tools/synopsys/syn_2010.12-SP5/bin:/home/tools/synopsys/pt_2009.06-SP3/bin:/home/tools/synopsys/syn_2010.12-SP5/linux/mc/bin:/home/tools/synopsys/fm_2010.12-SP5/bin:/home/tools/verilint/2001.4.10-linux2.2:/home/tools/vcs/vcs_latest/virsimdir/bin:/home/tools/vcs/vcs_latest/bin:/home/tools/debussy/latest/bin:/home/tools/debussy/verdi_latest/bin
    PERL_BADLANG (unset)
    SHELL=/bin/tcsh

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

From @jkeenan

On Fri Aug 24 16​:32​:12 2012, mgrigoriev@​nvidia.com wrote​:

This is a bug report for perl from mgrigoriev@​smtp.nvidia.com,
generated with the help of perlbug 1.39 running under perl 5.14.1.

-----------------------------------------------------------------
[Please describe your issue here]

if you run this benchmark​:

use Benchmark qw( cmpthese :hireswallclock );

my $hash = { 1 .. 10000 };
my %hash = ( 1 .. 10000 );

cmpthese( -10, {
'if (keys %$hash)' => sub { if( keys %$hash ) { return 1 }},
'if (keys $hash)' => sub { if( keys $hash ) {return 1 } },
'if (keys %hash)' => sub { if( keys %hash ) { return 1 }},
'if ( %$hash )' => sub { if( %$hash ) {return 1 }},
'if ( %hash )' => sub { if( %hash ) { return 1 }},
'if ( scalar %$hash )' => sub { if( scalar %$hash )
{return 1 }},
'if ( scalar %hash )' => sub { if( scalar %hash ) { return
1 }},
});
then you will get this result​:
Rate if ( %hash ) if ( scalar %$hash ) if ( %$hash ) if ( scalar
%hash ) if (keys $hash) if (keys %$hash) if (keys %hash)

if ( %hash ) 136924/s -- -0%
-0% -0% -99% -99%
-99%
if ( scalar %$hash ) 136940/s 0% --
-0% -0% -99% -99%
-99%
if ( %$hash ) 137054/s 0% 0%
-- -0% -99% -99%
-99%
if ( scalar %hash ) 137187/s 0% 0%
0% -- -99% -99%
-99%
if (keys $hash) 14306992/s 10349% 10348%
10339% 10329% -- -14%
-28%
if (keys %$hash) 16698271/s 12095% 12094%
12084% 12072% 17% --
-16%
if (keys %hash) 19933977/s 14458% 14457%
14445% 14431% 39% 19%
-
-------------
as you can see if %hash performs more than hundred times worse than
if keys %hash. It will perform even worse on large hashes, it was
observed up to 500x worse performance. Seems like this operation is
O(n) insteadof O(1).
Should be fixed ASAP.

AFAICT, all this shows is that correct Perl runs faster than incorrect Perl.

By placing the various versions of $hash and %hash inside the expression
of an 'if' block, you are asking for what's between the parentheses to
be evaluated in scalar context. But only the variants with 'keys' are
doing what one would expect.

Take this simple program​:

#########
$ cat hash.pl
my $hashref = { 1 .. 10000 };
my %hash = ( 1 .. 10000 );

print "Finished\n";
#########

Now run it through the Perl debugger, using the 'x' debug command to
evaluate expressions and the 'scalar' function to impose a scalar
context similar to that of 'if (EXPR)' (some output trimmed for
readability)​:

#########
perl -d hash.pl

main​::(hash.pl​:1)​: my $hashref = { 1 .. 10000 };
 
DB<1> c 4
main​::(hash.pl​:4)​: print "Finished\n";
 
DB<2> x (scalar keys %hash)
0 5000
 
DB<3> x (scalar keys %$hashref)
0 5000
 
DB<4> x (scalar %hash)
0 '3700/8192'
 
DB<5> x (scalar %$hashref)
0 '3700/8192'
##########
Only the first two variants DWIM.

What's happening here? Camel book, 4th ed., page 85 states​:

##########
"To find the number of keys in a hash, use the keys function in scalar
context."
##########

That's what we're doing in the first two variants. However, just before
that quotation, Camel says​:

##########
When you evaluate a hash variable in scalar context, it returns a true
value only if the hash contains any key/value pairs whatsoever. If
there are any key/value pairs at all, the value returned is a string
consisting of the number of used buckets and the number of allocated
buckets, separated by a slash. This is pretty much only useful to find
out whether Perl's (compiled in) hashing algorithm is performing poorly
on your data set.
##########

'perldoc perldata' contains much of the same argument.

Now, I myself can't say why 'if (%hash)' is so much slower than 'if
(keys %hash)'. But I can say that the former is doing something very
different from the latter. Hence there is no basis for comparing the
execution speeds of the two expressions. And there is nothing to be fixed.

Unless others want to provide more detail, I recommend that this RT be
rejected as a "wontfix".

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

From @cpansprout

On Fri Aug 24 18​:52​:13 2012, jkeenan wrote​:

On Fri Aug 24 16​:32​:12 2012, mgrigoriev@​nvidia.com wrote​:

This is a bug report for perl from mgrigoriev@​smtp.nvidia.com,
generated with the help of perlbug 1.39 running under perl 5.14.1.

-----------------------------------------------------------------
[Please describe your issue here]

if you run this benchmark​:

use Benchmark qw( cmpthese :hireswallclock );

my $hash = { 1 .. 10000 };
my %hash = ( 1 .. 10000 );

cmpthese( -10, {
'if (keys %$hash)' => sub { if( keys %$hash ) { return 1 }},
'if (keys $hash)' => sub { if( keys $hash ) {return 1 } },
'if (keys %hash)' => sub { if( keys %hash ) { return 1 }},
'if ( %$hash )' => sub { if( %$hash ) {return 1 }},
'if ( %hash )' => sub { if( %hash ) { return 1 }},
'if ( scalar %$hash )' => sub { if( scalar %$hash )
{return 1 }},
'if ( scalar %hash )' => sub { if( scalar %hash ) { return
1 }},
});
then you will get this result​:
Rate if ( %hash ) if ( scalar %$hash ) if ( %$hash ) if ( scalar
%hash ) if (keys $hash) if (keys %$hash) if (keys %hash)

if ( %hash ) 136924/s -- -0%
-0% -0% -99% -99%
-99%
if ( scalar %$hash ) 136940/s 0% --
-0% -0% -99% -99%
-99%
if ( %$hash ) 137054/s 0% 0%
-- -0% -99% -99%
-99%
if ( scalar %hash ) 137187/s 0% 0%
0% -- -99% -99%
-99%
if (keys $hash) 14306992/s 10349% 10348%
10339% 10329% -- -14%
-28%
if (keys %$hash) 16698271/s 12095% 12094%
12084% 12072% 17% --
-16%
if (keys %hash) 19933977/s 14458% 14457%
14445% 14431% 39% 19%
-
-------------
as you can see if %hash performs more than hundred times worse than
if keys %hash. It will perform even worse on large hashes, it was
observed up to 500x worse performance. Seems like this operation is
O(n) insteadof O(1).
Should be fixed ASAP.

AFAICT, all this shows is that correct Perl runs faster than incorrect
Perl.

By placing the various versions of $hash and %hash inside the expression
of an 'if' block, you are asking for what's between the parentheses to
be evaluated in scalar context. But only the variants with 'keys' are
doing what one would expect.

Take this simple program​:

#########
$ cat hash.pl
my $hashref = { 1 .. 10000 };
my %hash = ( 1 .. 10000 );

print "Finished\n";
#########

Now run it through the Perl debugger, using the 'x' debug command to
evaluate expressions and the 'scalar' function to impose a scalar
context similar to that of 'if (EXPR)' (some output trimmed for
readability)​:

#########
perl -d hash.pl

main​::(hash.pl​:1)​: my $hashref = { 1 .. 10000 };

DB<1> c 4
main​::(hash.pl​:4)​: print "Finished\n";

DB<2> x (scalar keys %hash)
0 5000

DB<3> x (scalar keys %$hashref)
0 5000

DB<4> x (scalar %hash)
0 '3700/8192'

DB<5> x (scalar %$hashref)
0 '3700/8192'
##########
Only the first two variants DWIM.

What's happening here? Camel book, 4th ed., page 85 states​:

##########
"To find the number of keys in a hash, use the keys function in scalar
context."
##########

That's what we're doing in the first two variants. However, just before
that quotation, Camel says​:

##########
When you evaluate a hash variable in scalar context, it returns a true
value only if the hash contains any key/value pairs whatsoever. If
there are any key/value pairs at all, the value returned is a string
consisting of the number of used buckets and the number of allocated
buckets, separated by a slash. This is pretty much only useful to find
out whether Perl's (compiled in) hashing algorithm is performing poorly
on your data set.
##########

'perldoc perldata' contains much of the same argument.

Now, I myself can't say why 'if (%hash)' is so much slower than 'if
(keys %hash)'. But I can say that the former is doing something very
different from the latter. Hence there is no basis for comparing the
execution speeds of the two expressions. And there is nothing to be
fixed.

Unless others want to provide more detail, I recommend that this RT be
rejected as a "wontfix".

%hash takes a long time because it has to iterate through the buckets
(8192 of them), though not all the elements, fortunately.

There is an optimisation known as boolkeys, added in 5.11.1
(867fa1e), that optimises %hash&&... and %hash||... down to
something slightly faster than keys %hash, if the expression occurs in
void context.

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in non-void
context (or context that cannot be discerned at compile time, such as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys, if an
extra ‘return’ is added after the if block.

In the case of ||, we can’t optimise it in non-void context, because
someone might be writing $bucket_info = %hash || '0/0';

In the case of &&, we can optimise it, even in non-void context, because
a true value will always be discarded in %hash && foo. The false value
it returns for an empty hash is always the integer 0. That would change
if we simply applied boolkeys to my $ret = %hash && foo; because
boolkeys returns &PL_sv_no (the dualvar you get from !1). But since
boolkeys’ return value is never directly visible to perl code, we can
safely change it.

I made that change in commit 20e53f5.

scalar(%hash) was not optimised at all, so I did that in commit
59d0f63.

%hash?...​:... (aka if/else) was not optimised either, so I did that in
commit a8b106e.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

@cpansprout - Status changed from 'open' to 'resolved'

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

From @demerphq

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Fri Aug 24 18​:52​:13 2012, jkeenan wrote​:

On Fri Aug 24 16​:32​:12 2012, mgrigoriev@​nvidia.com wrote​:

This is a bug report for perl from mgrigoriev@​smtp.nvidia.com,
generated with the help of perlbug 1.39 running under perl 5.14.1.

-----------------------------------------------------------------
[Please describe your issue here]

if you run this benchmark​:

use Benchmark qw( cmpthese :hireswallclock );

my $hash = { 1 .. 10000 };
my %hash = ( 1 .. 10000 );

cmpthese( -10, {
'if (keys %$hash)' => sub { if( keys %$hash ) { return 1 }},
'if (keys $hash)' => sub { if( keys $hash ) {return 1 } },
'if (keys %hash)' => sub { if( keys %hash ) { return 1 }},
'if ( %$hash )' => sub { if( %$hash ) {return 1 }},
'if ( %hash )' => sub { if( %hash ) { return 1 }},
'if ( scalar %$hash )' => sub { if( scalar %$hash )
{return 1 }},
'if ( scalar %hash )' => sub { if( scalar %hash ) { return
1 }},
});
then you will get this result​:
Rate if ( %hash ) if ( scalar %$hash ) if ( %$hash ) if ( scalar
%hash ) if (keys $hash) if (keys %$hash) if (keys %hash)

if ( %hash ) 136924/s -- -0%
-0% -0% -99% -99%
-99%
if ( scalar %$hash ) 136940/s 0% --
-0% -0% -99% -99%
-99%
if ( %$hash ) 137054/s 0% 0%
-- -0% -99% -99%
-99%
if ( scalar %hash ) 137187/s 0% 0%
0% -- -99% -99%
-99%
if (keys $hash) 14306992/s 10349% 10348%
10339% 10329% -- -14%
-28%
if (keys %$hash) 16698271/s 12095% 12094%
12084% 12072% 17% --
-16%
if (keys %hash) 19933977/s 14458% 14457%
14445% 14431% 39% 19%
-
-------------
as you can see if %hash performs more than hundred times worse than
if keys %hash. It will perform even worse on large hashes, it was
observed up to 500x worse performance. Seems like this operation is
O(n) insteadof O(1).
Should be fixed ASAP.

AFAICT, all this shows is that correct Perl runs faster than incorrect
Perl.

By placing the various versions of $hash and %hash inside the expression
of an 'if' block, you are asking for what's between the parentheses to
be evaluated in scalar context. But only the variants with 'keys' are
doing what one would expect.

Take this simple program​:

#########
$ cat hash.pl
my $hashref = { 1 .. 10000 };
my %hash = ( 1 .. 10000 );

print "Finished\n";
#########

Now run it through the Perl debugger, using the 'x' debug command to
evaluate expressions and the 'scalar' function to impose a scalar
context similar to that of 'if (EXPR)' (some output trimmed for
readability)​:

#########
perl -d hash.pl

main​::(hash.pl​:1)​: my $hashref = { 1 .. 10000 };

DB<1> c 4
main​::(hash.pl​:4)​: print "Finished\n";

DB<2> x (scalar keys %hash)
0 5000

DB<3> x (scalar keys %$hashref)
0 5000

DB<4> x (scalar %hash)
0 '3700/8192'

DB<5> x (scalar %$hashref)
0 '3700/8192'
##########
Only the first two variants DWIM.

What's happening here? Camel book, 4th ed., page 85 states​:

##########
"To find the number of keys in a hash, use the keys function in scalar
context."
##########

That's what we're doing in the first two variants. However, just before
that quotation, Camel says​:

##########
When you evaluate a hash variable in scalar context, it returns a true
value only if the hash contains any key/value pairs whatsoever. If
there are any key/value pairs at all, the value returned is a string
consisting of the number of used buckets and the number of allocated
buckets, separated by a slash. This is pretty much only useful to find
out whether Perl's (compiled in) hashing algorithm is performing poorly
on your data set.
##########

'perldoc perldata' contains much of the same argument.

Now, I myself can't say why 'if (%hash)' is so much slower than 'if
(keys %hash)'. But I can say that the former is doing something very
different from the latter. Hence there is no basis for comparing the
execution speeds of the two expressions. And there is nothing to be
fixed.

Unless others want to provide more detail, I recommend that this RT be
rejected as a "wontfix".

%hash takes a long time because it has to iterate through the buckets
(8192 of them), though not all the elements, fortunately.

There is an optimisation known as boolkeys, added in 5.11.1
(867fa1e), that optimises %hash&&... and %hash||... down to
something slightly faster than keys %hash, if the expression occurs in
void context.

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in non-void
context (or context that cannot be discerned at compile time, such as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys, if an
extra ‘return’ is added after the if block.

In the case of ||, we can’t optimise it in non-void context, because
someone might be writing $bucket_info = %hash || '0/0';

In the case of &&, we can optimise it, even in non-void context, because
a true value will always be discarded in %hash && foo. The false value
it returns for an empty hash is always the integer 0. That would change
if we simply applied boolkeys to my $ret = %hash && foo; because
boolkeys returns &PL_sv_no (the dualvar you get from !1). But since
boolkeys’ return value is never directly visible to perl code, we can
safely change it.

I made that change in commit 20e53f5.

scalar(%hash) was not optimised at all, so I did that in commit
59d0f63.

%hash?...​:... (aka if/else) was not optimised either, so I did that in
commit a8b106e.

Wow. Thanks a lot for all the hard work FC. I remember my original
take on this and when i look at all the work you did to take it
forward I am speechless. Thanks a ton.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

From @demerphq

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Fri Aug 24 18​:52​:13 2012, jkeenan wrote​:

On Fri Aug 24 16​:32​:12 2012, mgrigoriev@​nvidia.com wrote​:

This is a bug report for perl from mgrigoriev@​smtp.nvidia.com,
generated with the help of perlbug 1.39 running under perl 5.14.1.

-----------------------------------------------------------------
[Please describe your issue here]

if you run this benchmark​:

use Benchmark qw( cmpthese :hireswallclock );

my $hash = { 1 .. 10000 };
my %hash = ( 1 .. 10000 );

cmpthese( -10, {
'if (keys %$hash)' => sub { if( keys %$hash ) { return 1 }},
'if (keys $hash)' => sub { if( keys $hash ) {return 1 } },
'if (keys %hash)' => sub { if( keys %hash ) { return 1 }},
'if ( %$hash )' => sub { if( %$hash ) {return 1 }},
'if ( %hash )' => sub { if( %hash ) { return 1 }},
'if ( scalar %$hash )' => sub { if( scalar %$hash )
{return 1 }},
'if ( scalar %hash )' => sub { if( scalar %hash ) { return
1 }},
});
then you will get this result​:
Rate if ( %hash ) if ( scalar %$hash ) if ( %$hash ) if ( scalar
%hash ) if (keys $hash) if (keys %$hash) if (keys %hash)

if ( %hash ) 136924/s -- -0%
-0% -0% -99% -99%
-99%
if ( scalar %$hash ) 136940/s 0% --
-0% -0% -99% -99%
-99%
if ( %$hash ) 137054/s 0% 0%
-- -0% -99% -99%
-99%
if ( scalar %hash ) 137187/s 0% 0%
0% -- -99% -99%
-99%
if (keys $hash) 14306992/s 10349% 10348%
10339% 10329% -- -14%
-28%
if (keys %$hash) 16698271/s 12095% 12094%
12084% 12072% 17% --
-16%
if (keys %hash) 19933977/s 14458% 14457%
14445% 14431% 39% 19%
-
-------------
as you can see if %hash performs more than hundred times worse than
if keys %hash. It will perform even worse on large hashes, it was
observed up to 500x worse performance. Seems like this operation is
O(n) insteadof O(1).
Should be fixed ASAP.

AFAICT, all this shows is that correct Perl runs faster than incorrect
Perl.

By placing the various versions of $hash and %hash inside the expression
of an 'if' block, you are asking for what's between the parentheses to
be evaluated in scalar context. But only the variants with 'keys' are
doing what one would expect.

Take this simple program​:

#########
$ cat hash.pl
my $hashref = { 1 .. 10000 };
my %hash = ( 1 .. 10000 );

print "Finished\n";
#########

Now run it through the Perl debugger, using the 'x' debug command to
evaluate expressions and the 'scalar' function to impose a scalar
context similar to that of 'if (EXPR)' (some output trimmed for
readability)​:

#########
perl -d hash.pl

main​::(hash.pl​:1)​: my $hashref = { 1 .. 10000 };

DB<1> c 4
main​::(hash.pl​:4)​: print "Finished\n";

DB<2> x (scalar keys %hash)
0 5000

DB<3> x (scalar keys %$hashref)
0 5000

DB<4> x (scalar %hash)
0 '3700/8192'

DB<5> x (scalar %$hashref)
0 '3700/8192'
##########
Only the first two variants DWIM.

What's happening here? Camel book, 4th ed., page 85 states​:

##########
"To find the number of keys in a hash, use the keys function in scalar
context."
##########

That's what we're doing in the first two variants. However, just before
that quotation, Camel says​:

##########
When you evaluate a hash variable in scalar context, it returns a true
value only if the hash contains any key/value pairs whatsoever. If
there are any key/value pairs at all, the value returned is a string
consisting of the number of used buckets and the number of allocated
buckets, separated by a slash. This is pretty much only useful to find
out whether Perl's (compiled in) hashing algorithm is performing poorly
on your data set.
##########

'perldoc perldata' contains much of the same argument.

Now, I myself can't say why 'if (%hash)' is so much slower than 'if
(keys %hash)'. But I can say that the former is doing something very
different from the latter. Hence there is no basis for comparing the
execution speeds of the two expressions. And there is nothing to be
fixed.

Unless others want to provide more detail, I recommend that this RT be
rejected as a "wontfix".

%hash takes a long time because it has to iterate through the buckets
(8192 of them), though not all the elements, fortunately.

There is an optimisation known as boolkeys, added in 5.11.1
(867fa1e), that optimises %hash&&... and %hash||... down to
something slightly faster than keys %hash, if the expression occurs in
void context.

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in non-void
context (or context that cannot be discerned at compile time, such as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys, if an
extra ‘return’ is added after the if block.

Rereading this I wonder about something, doesn't this indicate a
problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context
the sub is called in?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

From @rurban

On Sat, Aug 25, 2012 at 11​:07 AM, demerphq <demerphq@​gmail.com> wrote​:

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:
...

There is an optimisation known as boolkeys, added in 5.11.1
(867fa1e), that optimises %hash&&... and %hash||... down to
something slightly faster than keys %hash, if the expression occurs in
void context.

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in non-void
context (or context that cannot be discerned at compile time, such as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys, if an
extra ‘return’ is added after the if block.

Rereading this I wonder about something, doesn't this indicate a
problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context
the sub is called in?

Unfortunately we only store the LVALUE context in PUSHSTUB, not the
other 3 contexts (yet).

There's a technical reason, there's not much room for the 3 missing bits.
We use cx->blk_u16 for the LVALUE and DEREF bits plus the old op_type
and in_eval info. Something for Nicholas to investigate probably.
Knowing the sub context, e.g. for the right-hand-side on assignment, is
definitely worth knowing at compile-time.
--
Reini Urban
http​://cpanel.net/ http​://www.perl-compiler.org/

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

From @cpansprout

On Sat Aug 25 02​:08​:34 2012, demerphq wrote​:

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in non-
void
context (or context that cannot be discerned at compile time, such
as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys, if
an
extra ‘return’ is added after the if block.

Rereading this I wonder about something, doesn't this indicate a
problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the
if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context
the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing
something obvious?

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

From @cpansprout

On Sat Aug 25 02​:34​:45 2012, rurban wrote​:

On Sat, Aug 25, 2012 at 11​:07 AM, demerphq <demerphq@​gmail.com> wrote​:

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:
...

There is an optimisation known as boolkeys, added in 5.11.1
(867fa1e), that optimises %hash&&... and %hash||... down to
something slightly faster than keys %hash, if the expression occurs
in
void context.

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in non-
void
context (or context that cannot be discerned at compile time, such
as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys, if
an
extra ‘return’ is added after the if block.

Rereading this I wonder about something, doesn't this indicate a
problem propagating context?

The sub is called in in void context. Shouldn't that propagate to
the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the
context
the sub is called in?

Unfortunately we only store the LVALUE context in PUSHSTUB, not the
other 3 contexts (yet).

But every sub also does a PUSHBLOCK which sets cx->blk_gimme.

There's a technical reason, there's not much room for the 3 missing
bits.
We use cx->blk_u16 for the LVALUE and DEREF bits plus the old op_type
and in_eval info. Something for Nicholas to investigate probably.
Knowing the sub context, e.g. for the right-hand-side on assignment,
is
definitely worth knowing at compile-time.

But the sub isn’t called till run time.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

From @demerphq

On 25 August 2012 16​:03, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Sat Aug 25 02​:08​:34 2012, demerphq wrote​:

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in non-
void
context (or context that cannot be discerned at compile time, such
as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys, if
an
extra ‘return’ is added after the if block.

Rereading this I wonder about something, doesn't this indicate a
problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the
if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context
the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing
something obvious?

Well there were cases where you said we couldn't do boolkeys because
it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context
then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

From @rurban

On Sat, Aug 25, 2012 at 4​:07 PM, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Sat Aug 25 02​:34​:45 2012, rurban wrote​:

On Sat, Aug 25, 2012 at 11​:07 AM, demerphq <demerphq@​gmail.com> wrote​:

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:
...

There is an optimisation known as boolkeys, added in 5.11.1
(867fa1e), that optimises %hash&&... and %hash||... down to
something slightly faster than keys %hash, if the expression occurs
in
void context.

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in non-
void
context (or context that cannot be discerned at compile time, such
as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys, if
an
extra ‘return’ is added after the if block.

Rereading this I wonder about something, doesn't this indicate a
problem propagating context?

The sub is called in in void context. Shouldn't that propagate to
the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the
context
the sub is called in?

Unfortunately we only store the LVALUE context in PUSHSTUB, not the
other 3 contexts (yet).

But every sub also does a PUSHBLOCK which sets cx->blk_gimme.

There's a technical reason, there's not much room for the 3 missing
bits.s>> We use cx->blk_u16 for the LVALUE and DEREF bits plus the old op_type
and in_eval info. Something for Nicholas to investigate probably.
Knowing the sub context e.g. for the right-hand-side on assignment,
is
definitely worth knowing at compile-time.

But the sub isn’t called till run time.

Oh, right.

So we'd need to put the 3 context bits into the entersub op flag,
but there is no room for it.
--
Reini Urban
http​://cpanel.net/ http​://www.perl-compiler.org/

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

From @cpansprout

On Sat Aug 25 08​:39​:29 2012, demerphq wrote​:

On 25 August 2012 16​:03, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Sat Aug 25 02​:08​:34 2012, demerphq wrote​:

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in non-
void
context (or context that cannot be discerned at compile time, such
as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys, if
an
extra ‘return’ is added after the if block.

Rereading this I wonder about something, doesn't this indicate a
problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the
if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context
the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing
something obvious?

Well there were cases where you said we couldn't do boolkeys because
it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context
then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Thank you for persisting, because you have made me realising I was
concentrating too much on boolkeys and missing the forest for the trees.
This is what I was going to write​:

In

  sub {
  if (%hash) { }
  }
  ->();

the void context cannot be detected until run time, at which point it is
too late to add an boolkeys op to the op tree.

But then I realised that %hash itself should be detecting void context,
in which case in %hash || $foo, where it is flagged as scalar context at
compile time, it would also have to be flagged if the || is in void
context, so it will know to call block_gimme() to find out whether it
can act as a ‘true boolean’.

See commit 6ea72b3.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 25, 2012

From @cpansprout

On Sat Aug 25 14​:47​:01 2012, sprout wrote​:

On Sat Aug 25 08​:39​:29 2012, demerphq wrote​:

On 25 August 2012 16​:03, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Sat Aug 25 02​:08​:34 2012, demerphq wrote​:

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in non-
void
context (or context that cannot be discerned at compile time, such
as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys, if
an
extra ‘return’ is added after the if block.

Rereading this I wonder about something, doesn't this indicate a
problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the
if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the
context
the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing
something obvious?

Well there were cases where you said we couldn't do boolkeys because
it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context
then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Thank you for persisting, because you have made me realising I was
concentrating too much on boolkeys and missing the forest for the trees.
This is what I was going to write​:

In

sub \{
    if \(%hash\) \{ \}
\}
\->\(\);

the void context cannot be detected until run time, at which point it is
too late to add an boolkeys op to the op tree.

But then I realised that %hash itself should be detecting void context,
in which case in %hash || $foo, where it is flagged as scalar context at
compile time, it would also have to be flagged if the || is in void
context, so it will know to call block_gimme() to find out whether it
can act as a ‘true boolean’.

See commit 6ea72b3.

Now I’m wondering whether we even need the boolkeys op.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2012

From mgrigoriev@nvidia.com

And all just asked was the expected behavior from if %hash.
As alternative to obsolete and obviously undesired if defined %hash.
And my bug was a direct reference to the peldoc where it said to
Use if %hash if someone wants to get false/true Boolean answer about the
emtiness of the hash.
__maxim

-----Original Message-----
From​: Father Chrysostomos via RT [mailto​:perlbug-followup@​perl.org]
Sent​: Saturday, August 25, 2012 2​:47 PM
To​: Maxim Grigoriev
Subject​: [perl #114576] check if %hash 500x times slower than if keys %hash

On Sat Aug 25 14​:47​:01 2012, sprout wrote​:

On Sat Aug 25 08​:39​:29 2012, demerphq wrote​:

On 25 August 2012 16​:03, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Sat Aug 25 02​:08​:34 2012, demerphq wrote​:

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in
non-
void
context (or context that cannot be discerned at compile time,
such
as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than
keys, if
an
extra ‘return’ is added after the if block.

Rereading this I wonder about something, doesn't this indicate a
problem propagating context?

The sub is called in in void context. Shouldn't that propagate to
the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the
context
the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing
something obvious?

Well there were cases where you said we couldn't do boolkeys because
it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context
then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Thank you for persisting, because you have made me realising I was
concentrating too much on boolkeys and missing the forest for the trees.
This is what I was going to write​:

In

sub \{
    if \(%hash\) \{ \}
\}
\->\(\);

the void context cannot be detected until run time, at which point it
is too late to add an boolkeys op to the op tree.

But then I realised that %hash itself should be detecting void
context, in which case in %hash || $foo, where it is flagged as scalar
context at compile time, it would also have to be flagged if the || is
in void context, so it will know to call block_gimme() to find out
whether it can act as a ‘true boolean’.

See commit 6ea72b3.

Now I’m wondering whether we even need the boolkeys op.

--

Father Chrysostomos


This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.


@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2012

From @cpansprout

On Sat Aug 25 14​:47​:26 2012, sprout wrote​:

On Sat Aug 25 14​:47​:01 2012, sprout wrote​:

On Sat Aug 25 08​:39​:29 2012, demerphq wrote​:

On 25 August 2012 16​:03, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Sat Aug 25 02​:08​:34 2012, demerphq wrote​:

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in
non-
void
context (or context that cannot be discerned at compile time,
such
as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than
keys, if
an
extra ‘return’ is added after the if block.

Rereading this I wonder about something, doesn't this indicate a
problem propagating context?

The sub is called in in void context. Shouldn't that propagate
to the
if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the
context
the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing
something obvious?

Well there were cases where you said we couldn't do boolkeys because
it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context
then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Thank you for persisting, because you have made me realising I was
concentrating too much on boolkeys and missing the forest for the trees.
This is what I was going to write​:

In

sub \{
    if \(%hash\) \{ \}
\}
\->\(\);

the void context cannot be detected until run time, at which point it is
too late to add an boolkeys op to the op tree.

But then I realised that %hash itself should be detecting void context,
in which case in %hash || $foo, where it is flagged as scalar context at
compile time, it would also have to be flagged if the || is in void
context, so it will know to call block_gimme() to find out whether it
can act as a ‘true boolean’.

See commit 6ea72b3.

Now I’m wondering whether we even need the boolkeys op.

I’m disabled it with commit c8fe3bd and moved the optimisation
into pp_rv2av/pp_padhv.

I did not remove the boolkeys op type. Can op types be removed, or does
that cause problems elsewhere?

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2012

From [Unknown Contact. See original ticket]

On Sat Aug 25 14​:47​:26 2012, sprout wrote​:

On Sat Aug 25 14​:47​:01 2012, sprout wrote​:

On Sat Aug 25 08​:39​:29 2012, demerphq wrote​:

On 25 August 2012 16​:03, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Sat Aug 25 02​:08​:34 2012, demerphq wrote​:

On 25 August 2012 09​:31, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||, so the
optimisation applies to those, too, unless they are called in
non-
void
context (or context that cannot be discerned at compile time,
such
as
the last statement of a subroutine).

The examples above will show that if(%hash) is faster than
keys, if
an
extra ‘return’ is added after the if block.

Rereading this I wonder about something, doesn't this indicate a
problem propagating context?

The sub is called in in void context. Shouldn't that propagate
to the
if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the
context
the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing
something obvious?

Well there were cases where you said we couldn't do boolkeys because
it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context
then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Thank you for persisting, because you have made me realising I was
concentrating too much on boolkeys and missing the forest for the trees.
This is what I was going to write​:

In

sub \{
    if \(%hash\) \{ \}
\}
\->\(\);

the void context cannot be detected until run time, at which point it is
too late to add an boolkeys op to the op tree.

But then I realised that %hash itself should be detecting void context,
in which case in %hash || $foo, where it is flagged as scalar context at
compile time, it would also have to be flagged if the || is in void
context, so it will know to call block_gimme() to find out whether it
can act as a ‘true boolean’.

See commit 6ea72b3.

Now I’m wondering whether we even need the boolkeys op.

I’m disabled it with commit c8fe3bd and moved the optimisation
into pp_rv2av/pp_padhv.

I did not remove the boolkeys op type. Can op types be removed, or does
that cause problems elsewhere?

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2012

From @cpansprout

On Sat Aug 25 19​:52​:35 2012, mgrigoriev@​nvidia.com wrote​:

And all just asked was the expected behavior from if %hash.
As
alternative to obsolete and obviously undesired if defined %hash.
And
my bug was a direct reference to the peldoc where it said to
Use if
%hash if someone wants to get false/true Boolean answer about the
emtiness of the hash.
__maxim

Yes, it is a pity that the recommended alternative can be much slower.
I think I’ve now optimised pretty much every case in which %hash is used
as a boolean. In the mean time, you could use !!%hash, which will be
fast as far back as 5.12.0.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2012

From @rurban

On Aug 26, 2012 8​:04 AM, "Father Chrysostomos via RT" <
perlbug-comment@​perl.org> wrote​:

I’m disabled it with commit c8fe3bd and moved the optimisation
into pp_rv2av/pp_padhv.

I did not remove the boolkeys op type. Can op types be removed, or does
that cause problems elsewhere?

Only little problems, mostly for me. Several have been removed before. Just
go on.

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2012

From perl@profvince.com

I’m disabled it with commit c8fe3bd and moved the optimisation
into pp_rv2av/pp_padhv.

Cool.

Is it really necessary to set OPpTRUEBOOL and OpMAYBE_TRUEBOOL at
compile time, knowing that gimme is already available in pp_rv2av and
that it is set to block_gimme() when no static context is available?

Vincent

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2012

From @cpansprout

On Sun Aug 26 10​:16​:29 2012, perl@​profvince.com wrote​:

I’m disabled it with commit c8fe3bd and moved the optimisation
into pp_rv2av/pp_padhv.

Cool.

Is it really necessary to set OPpTRUEBOOL and OpMAYBE_TRUEBOOL at
compile time, knowing that gimme is already available in pp_rv2av and
that it is set to block_gimme() when no static context is available?

The context information returned by GIMME_V is for the rv2hv op itself.
In the case of %hash || ..., it will simply tell us scalar context.
What we want to know is the context of the parent op. If the or-op is
in void context, the rv2hv op can be considered to be in boolean-scalar
context, not just regular scalar context.

BTW, do you know the answer to the question in the subject?

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2012

From perl@profvince.com

The context information returned by GIMME_V is for the rv2hv op itself.
In the case of %hash || ..., it will simply tell us scalar context.
What we want to know is the context of the parent op. If the or-op is
in void context, the rv2hv op can be considered to be in boolean-scalar
context, not just regular scalar context.

I hadn't think of that. Thanks for explaining.

BTW, do you know the answer to the question in the subject?

Since op types aren't explicitely documented anywere, I consider that
they can be removed or that their meaning can change (I did that with
OP_BREAK in 5.16) without prior notice. How the optree is built is
really an implementation detail, and lots of optimizations made in the
past resulted in silent structural changes.

Of course there would more to discuss if a really basic op (like
nextstate) were to be removed or changed, but that's not the case for
such a recent and specific op whose meaning is obviously to optimize the
canonical construct (which also has to be handled if you want to do
something with boolkeys anyway).

Vincent

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2012

From @cpansprout

On Aug 26, 2012, at 12​:05 PM, Vincent Pit wrote​:

The context information returned by GIMME_V is for the rv2hv op itself.
In the case of %hash || ..., it will simply tell us scalar context.
What we want to know is the context of the parent op. If the or-op is
in void context, the rv2hv op can be considered to be in boolean-scalar
context, not just regular scalar context.

I hadn't think of that. Thanks for explaining.

BTW, do you know the answer to the question in the subject?

Since op types aren't explicitely documented anywere,

They are listed in Opcode.pm, but I’ve gone ahead and removed it in commit 605fa6b, following the precedent set by 5edb5b2.

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2012

From @Leont

On Sun, Aug 26, 2012 at 8​:04 AM, Father Chrysostomos via RT
<perlbug-comment@​perl.org> wrote​:

I did not remove the boolkeys op type. Can op types be removed, or does
that cause problems elsewhere?

Given that we modules that manipulate the op-tree, in particular
keyword plugins, we should start seriously considering which parts of
the OP-tree are API and which ones are not. The current nothing-is-API
is going to be an issue for anyone doing serious work using them.

Leon

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2012

From @tsee

On 08/26/2012 11​:27 PM, Leon Timmermans wrote​:

On Sun, Aug 26, 2012 at 8​:04 AM, Father Chrysostomos via RT
<perlbug-comment@​perl.org> wrote​:

I did not remove the boolkeys op type. Can op types be removed, or does
that cause problems elsewhere?

Given that we modules that manipulate the op-tree, in particular
keyword plugins, we should start seriously considering which parts of
the OP-tree are API and which ones are not. The current nothing-is-API
is going to be an issue for anyone doing serious work using them.

As long as the op tree is shared between threads and a thread might be
executing part of the tree while you modify it from another thread,
serious work using op trees is not thread safe and therefore generally
not a very good idea. Some time ago, I proposed a plan (here, on p5p) to
make op tree manipulation safe, but it was considered somewhat painful
and some commented that it might not be worth the effort. I neither was
nor am in a position to follow through myself on it (sorry).

--Steffen

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2012

From @doy

On Mon, Aug 27, 2012 at 07​:53​:18AM +0200, Steffen Mueller wrote​:

On 08/26/2012 11​:27 PM, Leon Timmermans wrote​:

On Sun, Aug 26, 2012 at 8​:04 AM, Father Chrysostomos via RT
<perlbug-comment@​perl.org> wrote​:

I did not remove the boolkeys op type. Can op types be removed, or does
that cause problems elsewhere?

Given that we modules that manipulate the op-tree, in particular
keyword plugins, we should start seriously considering which parts of
the OP-tree are API and which ones are not. The current nothing-is-API
is going to be an issue for anyone doing serious work using them.

As long as the op tree is shared between threads and a thread might
be executing part of the tree while you modify it from another
thread, serious work using op trees is not thread safe and therefore
generally not a very good idea. Some time ago, I proposed a plan
(here, on p5p) to make op tree manipulation safe, but it was
considered somewhat painful and some commented that it might not be
worth the effort. I neither was nor am in a position to follow
through myself on it (sorry).

This is only a problem for runtime optree manipulation, and not for new
optree creation, isn't it?

-doy

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2012

From @rurban

On Mon, Aug 27, 2012 at 9​:51 AM, Jesse Luehrs <doy@​tozt.net> wrote​:

On Mon, Aug 27, 2012 at 07​:53​:18AM +0200, Steffen Mueller wrote​:

On 08/26/2012 11​:27 PM, Leon Timmermans wrote​:

On Sun, Aug 26, 2012 at 8​:04 AM, Father Chrysostomos via RT
<perlbug-comment@​perl.org> wrote​:

I did not remove the boolkeys op type. Can op types be removed, or does
that cause problems elsewhere?

Given that we modules that manipulate the op-tree, in particular
keyword plugins, we should start seriously considering which parts of
the OP-tree are API and which ones are not. The current nothing-is-API
is going to be an issue for anyone doing serious work using them.

As long as the op tree is shared between threads and a thread might
be executing part of the tree while you modify it from another
thread, serious work using op trees is not thread safe and therefore
generally not a very good idea. Some time ago, I proposed a plan
(here, on p5p) to make op tree manipulation safe, but it was
considered somewhat painful and some commented that it might not be
worth the effort. I neither was nor am in a position to follow
through myself on it (sorry).

This is only a problem for runtime optree manipulation, and not for new
optree creation, isn't it?

Independent from new or change.
The issue is that curcop must be PL_compiling to be able to change or add ops.

There is even now a hard check. Before it was only an ASSERT, which
was active with DEBUGGING only.
--
Reini Urban
http​://cpanel.net/ http​://www.perl-compiler.org/

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2012

From @Leont

On Mon, Aug 27, 2012 at 9​:51 AM, Jesse Luehrs <doy@​tozt.net> wrote​:

This is only a problem for runtime optree manipulation, and not for new
optree creation, isn't it?

Yeah. During compile-time an OP-tree isn't shared yet, that shouldn't
be an issue.

Leon

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2012

From @doy

On Mon, Aug 27, 2012 at 11​:35​:10AM +0200, Leon Timmermans wrote​:

On Mon, Aug 27, 2012 at 9​:51 AM, Jesse Luehrs <doy@​tozt.net> wrote​:

This is only a problem for runtime optree manipulation, and not for new
optree creation, isn't it?

Yeah. During compile-time an OP-tree isn't shared yet, that shouldn't
be an issue.

Right, so for things like parser hooks, having a stable optree interface
to code to is pretty important.

-doy

@p5pRT
Copy link
Author

p5pRT commented Aug 27, 2012

From @iabyn

On Mon, Aug 27, 2012 at 09​:57​:01AM +0200, Reini Urban wrote​:

Independent from new or change.
The issue is that curcop must be PL_compiling to be able to change or add ops.

But PL_curcop == PL_compiling is just perl's way of determining whether
we are in compile- or run-time. So if an API is designed only to be called
at compile-time, then what's the issue?

--
Atheism is a religion like not collecting stamps is a hobby

@p5pRT
Copy link
Author

p5pRT commented Aug 29, 2012

From @nwc10

On Tue, Aug 28, 2012 at 07​:22​:51AM -0700, Father Chrysostomos via RT wrote​:

On Tue Aug 28 06​:32​:26 2012, nicholas wrote​:

But you've now nailed more? most? of them?

I hope.

How easy is it to generate them by mistake?

my $generate_report = defined %sales || defined %purchases;

if ($generate_report) {
print_header();
}
...

Oops, that gives me a deprecation warning! OK​:

If one uses lexical hashes, that's been giving a deprecation warning for
over a decade​:

$ cat hash.pl
use warnings;
 
my %sales;
my %purchases;

my $generate_report = defined %sales || defined %purchases;

if ($generate_report) {
  print_header();
}
__END__
$ ~/Sandpit/580/bin/perl5.8.0 hash.pl
defined(%hash) is deprecated at hash.pl line 6.
  (Maybe you should just omit the defined()?)
defined(%hash) is deprecated at hash.pl line 6.
  (Maybe you should just omit the defined()?)

my $generate_report = %sales || %purchases;

Yes, I'd not thought of this. Yes, it is obvious.

Also​:

sub { ...; return defined %hash }

could easily become​:

sub { ...; return scalar %hash }

which is not optimised, even if called in boolean context, because we
have no mechanism for propagating boolean context.

Of those generated by mistake, how many are subsequently actually used
in any
context other than true/false?

Probably .00001 percent. How is that question pertinent? Do you have
some brilliant way to upgrade a boolean into bucket stats after the
fact? (Hmm....​:)

No, but I can see several options, and more variants may exist

1) just reverse the change to the structure, and make every object bigger

2) The first time it's requested, count the hash buckets, and from then on
  track the hash buckets in the HV aux structure

  This will keep pretty much every object smaller.
  But there's a cost in code complexity.
  I don't think that it will be much slower than the previous case
  (always tracking) because the amount of work done ends up being the same

3) If it's requested, count the hash buckets. Cache the answer.
  Discard the answer if the hash changes

  Similar trade offs to above.
  This assumes that large hashes which are used in boolean tests are mostly
  unchanging. This may be wrong too often.

3a) Cache the string form of the scalar hash value

4) Make the scalar return value from keys be linked via magic to the hash,
  somewhat like $#a

  If it's used in boolean context (hard to tell? hard to implement?) return
  the fast boolean answer. If used in a different context, do the hard work.
  If the hash is accessed for a write before the value is discarded,
  calculate the value there and then.

  This feels like too much work.

It's interesting that this changed behaviour has been in 5.12.0 etc, 5.14.0
etc and 5.16.0 etc, but it's only been spotted as a problem just recently.
There is a very long feedback cycle.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2012

@cpansprout - Status changed from 'resolved' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Aug 30, 2012

From @cpansprout

I’m reopening this, as there are still unresolved issues.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Apr 10, 2013

From @nwc10

On Wed, Aug 29, 2012 at 08​:44​:13PM -0700, Father Chrysostomos via RT wrote​:

I'm reopening this, as there are still unresolved issues.

I have a change that I believe fixes the slowdown without also increasing
memory size for objects, in smoke-me/nicholas/lazy-hv-fill

However, it missed the code freeze.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Apr 10, 2013

From @demerphq

On 10 April 2013 17​:42, Nicholas Clark <nick@​ccl4.org> wrote​:

On Wed, Aug 29, 2012 at 08​:44​:13PM -0700, Father Chrysostomos via RT wrote​:

I'm reopening this, as there are still unresolved issues.

I have a change that I believe fixes the slowdown without also increasing
memory size for objects, in smoke-me/nicholas/lazy-hv-fill

However, it missed the code freeze.

And I have a patch somewhere (i forget the branch name right now) which
rips out hv-fill entirely.

It also is waiting on the code thaw.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 14, 2013

From @jkeenan

On Wed Apr 10 11​:22​:02 2013, demerphq wrote​:

On 10 April 2013 17​:42, Nicholas Clark <nick@​ccl4.org> wrote​:

On Wed, Aug 29, 2012 at 08​:44​:13PM -0700, Father Chrysostomos via RT
wrote​:

I'm reopening this, as there are still unresolved issues.

I have a change that I believe fixes the slowdown without also
increasing
memory size for objects, in smoke-me/nicholas/lazy-hv-fill

However, it missed the code freeze.

And I have a patch somewhere (i forget the branch name right now)
which
rips out hv-fill entirely.

It also is waiting on the code thaw.

Yves

Nicholas, Yves​: Can you take a look at the patches you referred to above?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Jun 14, 2013

From @nwc10

On Thu, Jun 13, 2013 at 07​:19​:51PM -0700, James E Keenan via RT wrote​:

On Wed Apr 10 11​:22​:02 2013, demerphq wrote​:

On 10 April 2013 17​:42, Nicholas Clark <nick@​ccl4.org> wrote​:

On Wed, Aug 29, 2012 at 08​:44​:13PM -0700, Father Chrysostomos via RT
wrote​:

I'm reopening this, as there are still unresolved issues.

I have a change that I believe fixes the slowdown without also
increasing
memory size for objects, in smoke-me/nicholas/lazy-hv-fill

However, it missed the code freeze.

And I have a patch somewhere (i forget the branch name right now)
which
rips out hv-fill entirely.

It also is waiting on the code thaw.

Yves

Nicholas, Yves​: Can you take a look at the patches you referred to above?

I've merged mine to blead, but after the previous monthly release. Before
thinking further I'd like to see what happens once it's in a monthly release
available to the CPANTesters.

I believe that Yves was waiting for a CPAN smoke run before proceeding further.
This hasn't happened yet. I thought that it's waiting on hardware replacement,
for the machine it's due to run on, but I might be wrong.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Feb 22, 2016

From @mauke

On Fri Jun 14 02​:42​:11 2013, nicholas wrote​:

On Thu, Jun 13, 2013 at 07​:19​:51PM -0700, James E Keenan via RT wrote​:

On Wed Apr 10 11​:22​:02 2013, demerphq wrote​:

On 10 April 2013 17​:42, Nicholas Clark <nick@​ccl4.org> wrote​:

On Wed, Aug 29, 2012 at 08​:44​:13PM -0700, Father Chrysostomos via
RT
wrote​:

I'm reopening this, as there are still unresolved issues.

I have a change that I believe fixes the slowdown without also
increasing
memory size for objects, in smoke-me/nicholas/lazy-hv-fill

However, it missed the code freeze.

And I have a patch somewhere (i forget the branch name right now)
which
rips out hv-fill entirely.

It also is waiting on the code thaw.

Yves

Nicholas, Yves​: Can you take a look at the patches you referred to
above?

I've merged mine to blead, but after the previous monthly release.
Before
thinking further I'd like to see what happens once it's in a monthly
release
available to the CPANTesters.

I believe that Yves was waiting for a CPAN smoke run before proceeding
further.
This hasn't happened yet. I thought that it's waiting on hardware
replacement,
for the machine it's due to run on, but I might be wrong.

What's the status of this ticket? It's listed in perl5200delta.

Can it be closed?

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2016

From @iabyn

On Mon, Feb 22, 2016 at 02​:13​:50PM -0800, l.mai@​web.de via RT wrote​:

On Fri Jun 14 02​:42​:11 2013, nicholas wrote​:

On Thu, Jun 13, 2013 at 07​:19​:51PM -0700, James E Keenan via RT wrote​:

On Wed Apr 10 11​:22​:02 2013, demerphq wrote​:

On 10 April 2013 17​:42, Nicholas Clark <nick@​ccl4.org> wrote​:

On Wed, Aug 29, 2012 at 08​:44​:13PM -0700, Father Chrysostomos via
RT
wrote​:

I'm reopening this, as there are still unresolved issues.

I have a change that I believe fixes the slowdown without also
increasing
memory size for objects, in smoke-me/nicholas/lazy-hv-fill

However, it missed the code freeze.

And I have a patch somewhere (i forget the branch name right now)
which
rips out hv-fill entirely.

It also is waiting on the code thaw.

Yves

Nicholas, Yves​: Can you take a look at the patches you referred to
above?

I've merged mine to blead, but after the previous monthly release.
Before
thinking further I'd like to see what happens once it's in a monthly
release
available to the CPANTesters.

I believe that Yves was waiting for a CPAN smoke run before proceeding
further.
This hasn't happened yet. I thought that it's waiting on hardware
replacement,
for the machine it's due to run on, but I might be wrong.

What's the status of this ticket? It's listed in perl5200delta.

Nicholas's work, which caches the value of hv_fill, is merged.
Yves refers to​:

And I have a patch somewhere (i forget the branch name right now) which
rips out hv-fill entirely.

I'm not sure what this refers to, but I suspect it hasn't been merged.

--
A power surge on the Bridge is rapidly and correctly diagnosed as a faulty
capacitor by the highly-trained and competent engineering staff.
  -- Things That Never Happen in "Star Trek" #9

@p5pRT
Copy link
Author

p5pRT commented Mar 25, 2016

From @demerphq

On 25 Mar 2016 14​:16, "Dave Mitchell" <davem@​iabyn.com> wrote​:

On Mon, Feb 22, 2016 at 02​:13​:50PM -0800, l.mai@​web.de via RT wrote​:

On Fri Jun 14 02​:42​:11 2013, nicholas wrote​:

On Thu, Jun 13, 2013 at 07​:19​:51PM -0700, James E Keenan via RT wrote​:

On Wed Apr 10 11​:22​:02 2013, demerphq wrote
....
Nicholas's work, which caches the value of hv_fill, is merged.
Yves refers to​:

And I have a patch somewhere (i forget the branch name right now) which
rips out hv-fill entirely.

I'm not sure what this refers to, but I suspect it hasn't been merged.

No, it hasn't.

Fwiw, The idea was to stop exposing hv_fill via scalar (%hash), and let
people use one of the utility subs in Hash​::Util to calculate it on demand,
thus eliminating any need to cache it, or maintain it in process.

It is only used to construct the left hand value in "3/8" which might be
returned for %hash=1..10; scalar %hash, which imo is useful *only* in that
its truthfulness is equivalent to 0+keys %hash, or to someone attacking the
hash algorithm, and perhaps to someone learning about how our hash
implementation works.

If scalar %hash became the same as 0+keys, then the first use case is
satisfied, and the other two are better handled these days by the powerful
introspection functions in Hash​::Util.

Its worth noting that the current behaviour is an abstraction violation, it
exposes the internal implementation of our associative arrays, which under
hash randomization is non deterministic anyway; outside of the boolean case
the lhs could be any value < rhs.

If we change it to be the count of keys we get rid of these problems. The
value becomes deterministic, we simplify some core code, and we remove a
possible barrier to switching to an alternate associative array
implementation.

It's probably worth revisiting this subject. Ill see what I can do to find
my patch.

Yves

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2016

From @iabyn

On Fri, Mar 25, 2016 at 07​:15​:58PM +0100, demerphq wrote​:

On 25 Mar 2016 14​:16, "Dave Mitchell" <davem@​iabyn.com> wrote​:

On Mon, Feb 22, 2016 at 02​:13​:50PM -0800, l.mai@​web.de via RT wrote​:

On Fri Jun 14 02​:42​:11 2013, nicholas wrote​:

On Thu, Jun 13, 2013 at 07​:19​:51PM -0700, James E Keenan via RT wrote​:

On Wed Apr 10 11​:22​:02 2013, demerphq wrote
....
Nicholas's work, which caches the value of hv_fill, is merged.
Yves refers to​:

And I have a patch somewhere (i forget the branch name right now) which
rips out hv-fill entirely.

I'm not sure what this refers to, but I suspect it hasn't been merged.

No, it hasn't.

Fwiw, The idea was to stop exposing hv_fill via scalar (%hash), and let
people use one of the utility subs in Hash​::Util to calculate it on demand,
thus eliminating any need to cache it, or maintain it in process.

It is only used to construct the left hand value in "3/8" which might be
returned for %hash=1..10; scalar %hash, which imo is useful *only* in that
its truthfulness is equivalent to 0+keys %hash, or to someone attacking the
hash algorithm, and perhaps to someone learning about how our hash
implementation works.

If scalar %hash became the same as 0+keys, then the first use case is
satisfied, and the other two are better handled these days by the powerful
introspection functions in Hash​::Util.

Its worth noting that the current behaviour is an abstraction violation, it
exposes the internal implementation of our associative arrays, which under
hash randomization is non deterministic anyway; outside of the boolean case
the lhs could be any value < rhs.

If we change it to be the count of keys we get rid of these problems. The
value becomes deterministic, we simplify some core code, and we remove a
possible barrier to switching to an alternate associative array
implementation.

It's probably worth revisiting this subject. Ill see what I can do to find
my patch.

I like in principle the idea of simplifying %hash in scalar context.
I wonder if instead of making it return 0+scalar %hash, we return
"${number_of_keys}/${number_of_alloced_buckets}" - i.e. we return
something that still looks like a fraction, but has a numerator that's
now easy to calculate. This maintains a certain level of backwards
compatibility, for code that expects to be able to do

  %hash =~ m{^(\d+)/(\d+)/$}

All we're then doing is lying about how efficiently HEs are being
distributed across buckets.

Conversely, just returning an int rather than a "1/2" fraction means that
a temporary string doesn't have to be created and then numified just to
satisfy code that's probably just using it in boolean context anyway.

--
In England there is a special word which means the last sunshine
of the summer. That word is "spring".

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2016

From @demerphq

On 28 March 2016 at 11​:28, Dave Mitchell <davem@​iabyn.com> wrote​:

On Fri, Mar 25, 2016 at 07​:15​:58PM +0100, demerphq wrote​:

On 25 Mar 2016 14​:16, "Dave Mitchell" <davem@​iabyn.com> wrote​:

On Mon, Feb 22, 2016 at 02​:13​:50PM -0800, l.mai@​web.de via RT wrote​:

On Fri Jun 14 02​:42​:11 2013, nicholas wrote​:

On Thu, Jun 13, 2013 at 07​:19​:51PM -0700, James E Keenan via RT wrote​:

On Wed Apr 10 11​:22​:02 2013, demerphq wrote
....
Nicholas's work, which caches the value of hv_fill, is merged.
Yves refers to​:

And I have a patch somewhere (i forget the branch name right now) which
rips out hv-fill entirely.

I'm not sure what this refers to, but I suspect it hasn't been merged.

No, it hasn't.

Fwiw, The idea was to stop exposing hv_fill via scalar (%hash), and let
people use one of the utility subs in Hash​::Util to calculate it on demand,
thus eliminating any need to cache it, or maintain it in process.

It is only used to construct the left hand value in "3/8" which might be
returned for %hash=1..10; scalar %hash, which imo is useful *only* in that
its truthfulness is equivalent to 0+keys %hash, or to someone attacking the
hash algorithm, and perhaps to someone learning about how our hash
implementation works.

If scalar %hash became the same as 0+keys, then the first use case is
satisfied, and the other two are better handled these days by the powerful
introspection functions in Hash​::Util.

Its worth noting that the current behaviour is an abstraction violation, it
exposes the internal implementation of our associative arrays, which under
hash randomization is non deterministic anyway; outside of the boolean case
the lhs could be any value < rhs.

If we change it to be the count of keys we get rid of these problems. The
value becomes deterministic, we simplify some core code, and we remove a
possible barrier to switching to an alternate associative array
implementation.

It's probably worth revisiting this subject. Ill see what I can do to find
my patch.

I like in principle the idea of simplifying %hash in scalar context.
I wonder if instead of making it return 0+scalar %hash, we return
"${number_of_keys}/${number_of_alloced_buckets}" - i.e. we return
something that still looks like a fraction, but has a numerator that's
now easy to calculate.

I certainly would welcome such a change as a positive step, and
actually I have this as a patch in my patch sequence but personally I
think it is too timid.

My view is that the denominator of the fraction is a leakage of the
underlying implementation that just causes problems. It only makes
sense for an associative array implemented as a hash data structure.
Any other associative array implementation would have nothing to put
there.

This is not a pie-in-the-sky objection either. Tied hashes already
have the problem of deciding what to return for scalar(%hash), and of
not returning what Perl returns.

So the patch sequence I have been working on aims to do the following​:

1. make scalar(%hash) return the count of the keys.
2. remove all references to HvFILL from core with the exception of
Hash​::Util and the code used Devel​::Peek/sv_dump().
3. make calculation of HvFILL always "slow" and uncached.
4. add a function to Hash​::Util which returns the result of
scalar(%hash) for backwards compatibility.
5. add more explicit functions to Hash​::Util which allow one to
introspect and set the bucket size of a hash without the strange
malarky of setting keys(), or parsing the result of scalar(%hash).

The patch sequence is structured so we can fallback to the proposal
you are making here if my plan proves to controversial. I will push it
as soon as I have patches to all the tests that currently fail because
of my more aggressive change.

This maintains a certain level of backwards
compatibility, for code that expects to be able to do

%hash =~ m\{^\(\\d\+\)/\(\\d\+\)/$\}

All we're then doing is lying about how efficiently HEs are being
distributed across buckets.

Yes, I agree this is a reasonable "middle way" to address this issue,
but I am going to start bold, and then fallback to this strategy if we
cannot build consensus.

Conversely, just returning an int rather than a "1/2" fraction means that
a temporary string doesn't have to be created and then numified just to
satisfy code that's probably just using it in boolean context anyway.

Yes, which is much faster.

Anyway, to repeat, I definitely agree your proposal is a reasonable
option, just not IMO the best one. I think the practical use cases of
the current return are really very unusual, and that in practice the
very small handful of people that do use them can switch to
introspection subs in Hash​::Util.

FWIW I do not think we should go through a deprecation cycle on this.
I think warning that the signature will change will cause far far more
heartache than the possible breakage of the tiny number of scripts
that might actually use this feature productively.

This is predicated on my conclusion that there really are only the
following use cases for the current return​:

1. simple demo scripts showing how perls hash function works - better
done using exisitng introspection routines in Hash​::Util.
2. attacking the hash function - scripts designed to attack the hash
seed and the hashing algorithm are made easier by seeing how many
buckets are used. - to the extent we care about this use case they can
use the existing introspection routines in Hash​::Util and have far far
better information to craft their attack. [FWIW, I think this is the
vast majority of the non-trivial uses of the return of scalar(%hash).]
3. testing if keys(%hash)=$size works - this is really only used in
the core perl tests, and can be easily replaced with a function in
Hash​::Util.
4. I suppose *someone* might be testing for for hash allocation
density and resizing their hashes accordingly. I think it is really
quite unlikely, but in the extreme event such users exist I imagine
they would be aware of the issue and would be proactive during
upgrading, and IMO would suffer negligble consequences for its failure
as a) they would quickly see undefined values in their code, and b)
perls hash function would continue to work just fine, albeit possibly
slower for their very very special use case. I really would be quite
surprised if such a system existed with good reason.

So I think this is something that is so rarely used, and so unlikely
to be useful, that providing a back-compat routine in H​::U, and doing
the change without deprecation is justified in this unusual case.

Yves

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2016

From @demerphq

On 28 March 2016 at 12​:37, demerphq <demerphq@​gmail.com> wrote​:

FWIW I do not think we should go through a deprecation cycle on this.
I think warning that the signature will change will cause far far more
heartache than the possible breakage of the tiny number of scripts
that might actually use this feature productively.

Although this implies the change is only suitable for a major release.

Yves

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2016

From @cpansprout

On Mon Mar 28 03​:37​:52 2016, demerphq wrote​:

On 28 March 2016 at 11​:28, Dave Mitchell <davem@​iabyn.com> wrote​:

Conversely, just returning an int rather than a "1/2" fraction means
that
a temporary string doesn't have to be created and then numified just
to
satisfy code that's probably just using it in boolean context anyway.

Yes, which is much faster.

Absolutely. +1

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Mar 28, 2016

From @demerphq

On 28 March 2016 at 18​:15, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Mon Mar 28 03​:37​:52 2016, demerphq wrote​:

On 28 March 2016 at 11​:28, Dave Mitchell <davem@​iabyn.com> wrote​:

Conversely, just returning an int rather than a "1/2" fraction means
that
a temporary string doesn't have to be created and then numified just
to
satisfy code that's probably just using it in boolean context anyway.

Yes, which is much faster.

Absolutely. +1

Note that boolean context (thanks iirc to your efforts) actually does
the right thing, and uses HvUSEDKEYS(), it does not use HvFILL.

It is only code that uses scalar(%hash) outside of boolean context
that might be affected by a change to this behaviour.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 17, 2016

From @iabyn

On Mon, Mar 28, 2016 at 06​:27​:05PM +0200, demerphq wrote​:

On 28 March 2016 at 18​:15, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Mon Mar 28 03​:37​:52 2016, demerphq wrote​:

On 28 March 2016 at 11​:28, Dave Mitchell <davem@​iabyn.com> wrote​:

Conversely, just returning an int rather than a "1/2" fraction means
that
a temporary string doesn't have to be created and then numified just
to
satisfy code that's probably just using it in boolean context anyway.

Yes, which is much faster.

Absolutely. +1

Note that boolean context (thanks iirc to your efforts) actually does
the right thing, and uses HvUSEDKEYS(), it does not use HvFILL.

It is only code that uses scalar(%hash) outside of boolean context
that might be affected by a change to this behaviour.

Now that 5.24.0 is out, are you likely to implement this for 5.25.x?

--
Nothing ventured, nothing lost.

@p5pRT
Copy link
Author

p5pRT commented Jun 17, 2016

From @demerphq

On 17 June 2016 at 15​:27, Dave Mitchell <davem@​iabyn.com> wrote​:

On Mon, Mar 28, 2016 at 06​:27​:05PM +0200, demerphq wrote​:

On 28 March 2016 at 18​:15, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Mon Mar 28 03​:37​:52 2016, demerphq wrote​:

On 28 March 2016 at 11​:28, Dave Mitchell <davem@​iabyn.com> wrote​:

Conversely, just returning an int rather than a "1/2" fraction means
that
a temporary string doesn't have to be created and then numified just
to
satisfy code that's probably just using it in boolean context anyway.

Yes, which is much faster.

Absolutely. +1

Note that boolean context (thanks iirc to your efforts) actually does
the right thing, and uses HvUSEDKEYS(), it does not use HvFILL.

It is only code that uses scalar(%hash) outside of boolean context
that might be affected by a change to this behaviour.

Now that 5.24.0 is out, are you likely to implement this for 5.25.x?

If you mean changing the signature of scalar(%hash). IMO yes, we
should, and if there is support for doing so the

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 17, 2016

From @demerphq

On 17 June 2016 at 22​:15, demerphq <demerphq@​gmail.com> wrote​:

On 17 June 2016 at 15​:27, Dave Mitchell <davem@​iabyn.com> wrote​:

On Mon, Mar 28, 2016 at 06​:27​:05PM +0200, demerphq wrote​:

On 28 March 2016 at 18​:15, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Mon Mar 28 03​:37​:52 2016, demerphq wrote​:

On 28 March 2016 at 11​:28, Dave Mitchell <davem@​iabyn.com> wrote​:

Conversely, just returning an int rather than a "1/2" fraction means
that
a temporary string doesn't have to be created and then numified just
to
satisfy code that's probably just using it in boolean context anyway.

Yes, which is much faster.

Absolutely. +1

Note that boolean context (thanks iirc to your efforts) actually does
the right thing, and uses HvUSEDKEYS(), it does not use HvFILL.

It is only code that uses scalar(%hash) outside of boolean context
that might be affected by a change to this behaviour.

Now that 5.24.0 is out, are you likely to implement this for 5.25.x?

If you mean changing the signature of scalar(%hash). IMO yes, we
should, and if there is support for doing so the

then yes I am interested in implementing this for 5.25.x.

I actually planned to address this in a post about my vision of hash
related stuff. I just havent found the tuits to write that yet...

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 20, 2016

From @iabyn

On Fri, Jun 17, 2016 at 10​:17​:09PM +0200, demerphq wrote​:

If you mean changing the signature of scalar(%hash).

yes, that's what I meant.

IMO yes, we
should, and if there is support for doing so the

then yes I am interested in implementing this for 5.25.x.

+1

--
A walk of a thousand miles begins with a single step...
then continues for another 1,999,999 or so.

@p5pRT
Copy link
Author

p5pRT commented Jun 20, 2016

From @mauke

Am 17.06.2016 um 22​:17 schrieb demerphq​:

On 17 June 2016 at 22​:15, demerphq <demerphq@​gmail.com> wrote​:

If you mean changing the signature of scalar(%hash). IMO yes, we
should, and if there is support for doing so the

then yes I am interested in implementing this for 5.25.x.

I'd like that.

--
Lukas Mai <plokinom@​gmail.com>

@p5pRT
Copy link
Author

p5pRT commented Jun 20, 2016

From @demerphq

On 20 June 2016 at 20​:14, Lukas Mai <plokinom@​gmail.com> wrote​:

Am 17.06.2016 um 22​:17 schrieb demerphq​:

On 17 June 2016 at 22​:15, demerphq <demerphq@​gmail.com> wrote​:

If you mean changing the signature of scalar(%hash). IMO yes, we
should, and if there is support for doing so the

then yes I am interested in implementing this for 5.25.x.

I'd like that.

I mostly have a patch ready for this.

My plan is to make it return the count of keys in the hash.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Jun 21, 2016

From @xsawyerx

On 06/20/2016 08​:14 PM, Lukas Mai wrote​:

Am 17.06.2016 um 22​:17 schrieb demerphq​:

On 17 June 2016 at 22​:15, demerphq <demerphq@​gmail.com> wrote​:

If you mean changing the signature of scalar(%hash). IMO yes, we
should, and if there is support for doing so the

then yes I am interested in implementing this for 5.25.x.

I'd like that.

If we're voting, I'll throw a +1 on that too. :)

@p5pRT
Copy link
Author

p5pRT commented Jun 21, 2016

From @demerphq

On 21 June 2016 at 16​:44, Sawyer X <xsawyerx@​gmail.com> wrote​:

On 06/20/2016 08​:14 PM, Lukas Mai wrote​:

Am 17.06.2016 um 22​:17 schrieb demerphq​:

On 17 June 2016 at 22​:15, demerphq <demerphq@​gmail.com> wrote​:

If you mean changing the signature of scalar(%hash). IMO yes, we
should, and if there is support for doing so the

then yes I am interested in implementing this for 5.25.x.

I'd like that.

If we're voting, I'll throw a +1 on that too. :)

I just pushed a branch that contains code to do this as​: smoke-me/no_xhv_fill

The patch relevant to this thread is

commit 28ecb97
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Mon Jun 20 22​:51​:38 2016 +0200

  Change scalar(%hash) to be the same as 0+keys(%hash)

I am going to start a new thread to discuss this.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Dec 27, 2016

From @jkeenan

On Tue, 21 Jun 2016 18​:51​:42 GMT, demerphq wrote​:

On 21 June 2016 at 16​:44, Sawyer X <xsawyerx@​gmail.com> wrote​:

On 06/20/2016 08​:14 PM, Lukas Mai wrote​:

Am 17.06.2016 um 22​:17 schrieb demerphq​:

On 17 June 2016 at 22​:15, demerphq <demerphq@​gmail.com> wrote​:

If you mean changing the signature of scalar(%hash). IMO yes, we
should, and if there is support for doing so the

then yes I am interested in implementing this for 5.25.x.

I'd like that.

If we're voting, I'll throw a +1 on that too. :)

I just pushed a branch that contains code to do this as​: smoke-
me/no_xhv_fill

The patch relevant to this thread is

commit 28ecb97
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Mon Jun 20 22​:51​:38 2016 +0200

Change scalar(%hash) to be the same as 0+keys(%hash)

I am going to start a new thread to discuss this.

Yves

This ticket was originally filed back in 2012 and was a complaint about performance. Have we addressed the original poster's concerns? Is the ticket closeable?

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2017

From @iabyn

On Mon, Dec 26, 2016 at 05​:24​:14PM -0800, James E Keenan via RT wrote​:

On Tue, 21 Jun 2016 18​:51​:42 GMT, demerphq wrote​:

On 21 June 2016 at 16​:44, Sawyer X <xsawyerx@​gmail.com> wrote​:

On 06/20/2016 08​:14 PM, Lukas Mai wrote​:

Am 17.06.2016 um 22​:17 schrieb demerphq​:

On 17 June 2016 at 22​:15, demerphq <demerphq@​gmail.com> wrote​:

If you mean changing the signature of scalar(%hash). IMO yes, we
should, and if there is support for doing so the

then yes I am interested in implementing this for 5.25.x.

I'd like that.

If we're voting, I'll throw a +1 on that too. :)

I just pushed a branch that contains code to do this as​: smoke-
me/no_xhv_fill

The patch relevant to this thread is

commit 28ecb97
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Mon Jun 20 22​:51​:38 2016 +0200

Change scalar(%hash) to be the same as 0+keys(%hash)

I am going to start a new thread to discuss this.

Yves

This ticket was originally filed back in 2012 and was a complaint about
performance. Have we addressed the original poster's concerns? Is the
ticket closeable?

As far as I'm aware, yes. FC fixed up some of the boolean context stuff,

then Yves added​:

  8bf4c40 Change scalar(%hash) to be the same as 0+keys(%hash)

then I reworked the boolean context stuff to be more general.

So in 5.25.10, the OP's benchmark code is now fast for all cases (I had to
disable the 'keys $hash' benchmark case, as that's no longer supported)

I'll close this ticket now.

--
Dave's first rule of Opera​:
If something needs saying, say it​: don't warble it.

@p5pRT p5pRT closed this as completed Mar 27, 2017
@p5pRT
Copy link
Author

p5pRT commented Mar 27, 2017

@iabyn - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant