Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] Increase hash collision ratio and reduce hsplit calls #15711

Open
p5pRT opened this issue Nov 13, 2016 · 8 comments
Open

[PATCH] Increase hash collision ratio and reduce hsplit calls #15711

p5pRT opened this issue Nov 13, 2016 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 13, 2016

Migrated from rt.perl.org#130084 (status was 'open')

Searchable as RT130084$

@p5pRT
Copy link
Author

p5pRT commented Nov 13, 2016

From @atoomic

Created by @atoomic

Increase hash colision ratio and reduce hsplit calls,
This is a memory (micro) optimization.

Previously the hash array was resized each time the hash contains
the same number of elements of its size.

With this commit, hash collision is increased to 2*SIZE, which
has two advantages​: resize the hash less often, reduce the size
of the array for hashes.

As elements are now going to collide a little more, this is
going to make hash lookup a bit slower.

Note that when resizing the hash we are still using 2*Number_of_elements
(which is 4*Size_of_Hash)

Here are some examples showing the memory optimization resulting from this
change.
Time is not really relevant here in these samples.

# -------------------------------
# set of different hash sizes
# ———————————————

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status} '
VmRSS​: 1480072 kB

real 0m19.072s
user 0m18.012s
sys 0m1.060s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status} '
VmRSS​: 1409380 kB (-5%)

real 0m16.707s
user 0m15.640s
sys 0m1.068s

# -------------------------------
# large hash grow & read/write access
# -------------------------------

(blead)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for
1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS
/proc/$$/status}'
VmRSS​: 1687840 kB

real 0m20.777s
user 0m19.548s
sys 0m1.227s

(blead+patch)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for
1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS
/proc/$$/status}'
VmRSS​: 1556784 kB

real 0m20.468s
user 0m19.412s
sys 0m1.053s

# -------------------------------
# 100_000 hashes with 2 keys
# -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..1000_000) { $h{$l1} = {1..4} } print qx{grep RSS /proc/$$/status} '
VmRSS​: 340300 kB

real 0m2.641s
user 0m2.370s
sys 0m0.271s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..100_0000) { $h{$l1} = {1..4} } print qx{grep RSS /proc/$$/status} '
VmRSS​: 332072 kB

real 0m2.647s
user 0m2.405s
sys 0m0.242s

# -------------------------------
# 100_000 hashes with 32 keys
# -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..100_0000) { $h{$l1} = {1..64} } print qx{grep RSS /proc/$$/status} '
VmRSS​: 2202668 kB

real 0m8.591s
user 0m7.062s
sys 0m1.528s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..1000_000) { $h{$l1} = {1..64} } print qx{grep RSS /proc/$$/status} '
VmRSS​: 1942756 kB

real 0m7.755s
user 0m6.379s
sys 0m1.377s

# -------------------------------
# 100_000 hashes with 32 keys
# -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS /proc/$$/status} '
VmRSS​: 326848 kB

real 0m1.267s
user 0m1.006s
sys 0m0.260s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS /proc/$$/status} '
VmRSS​: 286512 kB

real 0m1.149s
user 0m0.919s
sys 0m0.229s

Perl Info

Flags:
    category=core
    severity=low
    Type=Patch
    PatchStatus=HasPatch

Site configuration information for perl 5.25.7:

Configured by root at Sun Nov 13 03:43:35 MST 2016.

Summary of my perl5 (revision 5 version 25 subversion 7) configuration:
  Commit id: b37d3ac68c2a38a42fd9d7fabe9cf5b8c74d4a83
  Platform:
    osname=linux
    osvers=3.10.0-327.36.3.el7.x86_64
    archname=x86_64-linux
    uname='linux nico-c7.dev.cpanel.net 3.10.0-327.36.3.el7.x86_64 #1 smp
mon oct 24 16:09:20 utc 2016 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Dusedevel'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-D_FORTIFY_SOURCE=2'
    optimize='-O2'
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong
-I/usr/local/include'
    ccversion=''
    gccversion='4.8.5 20150623 (Red Hat 4.8.5-4)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib
/lib64 /usr/lib64 /usr/local/lib64
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
-lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.17.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.17'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'



@INC for perl 5.25.7:
    lib
    /root/.dotfiles/perl-must-have/lib
    /root/perl5/lib/perl5/
    /usr/local/lib/perl5/site_perl/5.25.7/x86_64-linux
    /usr/local/lib/perl5/site_perl/5.25.7
    /usr/local/lib/perl5/5.25.7/x86_64-linux
    /usr/local/lib/perl5/5.25.7


Environment for perl 5.25.7:
    HOME=/root
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

PATH=/usr/local/cpanel/3rdparty/perl/524/bin:/usr/local/cpanel/3rdparty/perl/522/bin:/usr/local/cpanel/3rdparty/perl/514/bin:/usr/local/cpanel/3rdparty/bin:/root/bin/:/opt/local/bin:/opt/local/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/cpanel/composer/bin:/root/.dotfiles/bin:/root/perl5/bin:/root/.rvm/bin:/root/bin
    PERL5DB=use Devel::NYTProf
    PERL5LIB=/root/.dotfiles/perl-must-have/lib::/root/perl5/lib/perl5/
    PERL_BADLANG (unset)
    PERL_CPANM_OPT=--quiet
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Nov 13, 2016

From @atoomic

Better with the patch :-) attached to this message

On Sun, 13 Nov 2016 06​:03​:38 -0800, atoomic@​cpan.org wrote​:

This is a bug report for perl from atoomic@​cpan.org,
generated with the help of perlbug 1.40 running under perl 5.25.7.

-----------------------------------------------------------------
[Please describe your issue here]

Increase hash colision ratio and reduce hsplit calls,
This is a memory (micro) optimization.

Previously the hash array was resized each time the hash contains
the same number of elements of its size.

With this commit, hash collision is increased to 2*SIZE, which
has two advantages​: resize the hash less often, reduce the size
of the array for hashes.

As elements are now going to collide a little more, this is
going to make hash lookup a bit slower.

Note that when resizing the hash we are still using
2*Number_of_elements
(which is 4*Size_of_Hash)

Here are some examples showing the memory optimization resulting from
this
change.
Time is not really relevant here in these samples.

# -------------------------------
# set of different hash sizes
# ———————————————

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status}
'
VmRSS​: 1480072 kB

real 0m19.072s
user 0m18.012s
sys 0m1.060s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
$l1
(1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status}
'
VmRSS​: 1409380 kB (-5%)

real 0m16.707s
user 0m15.640s
sys 0m1.068s

# -------------------------------
# large hash grow & read/write access
# -------------------------------

(blead)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for
1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS
/proc/$$/status}'
VmRSS​: 1687840 kB

real 0m20.777s
user 0m19.548s
sys 0m1.227s

(blead+patch)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for
1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS
/proc/$$/status}'
VmRSS​: 1556784 kB

real 0m20.468s
user 0m19.412s
sys 0m1.053s

# -------------------------------
# 100_000 hashes with 2 keys
# -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..1000_000) { $h{$l1} = {1..4} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 340300 kB

real 0m2.641s
user 0m2.370s
sys 0m0.271s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
$l1
(1..100_0000) { $h{$l1} = {1..4} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 332072 kB

real 0m2.647s
user 0m2.405s
sys 0m0.242s

# -------------------------------
# 100_000 hashes with 32 keys
# -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..100_0000) { $h{$l1} = {1..64} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 2202668 kB

real 0m8.591s
user 0m7.062s
sys 0m1.528s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
$l1
(1..1000_000) { $h{$l1} = {1..64} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 1942756 kB

real 0m7.755s
user 0m6.379s
sys 0m1.377s

# -------------------------------
# 100_000 hashes with 32 keys
# -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 326848 kB

real 0m1.267s
user 0m1.006s
sys 0m0.260s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
$l1
(1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 286512 kB

real 0m1.149s
user 0m0.919s
sys 0m0.229s

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags​:
category=core
severity=low
Type=Patch
PatchStatus=HasPatch
---
Site configuration information for perl 5.25.7​:

Configured by root at Sun Nov 13 03​:43​:35 MST 2016.

Summary of my perl5 (revision 5 version 25 subversion 7)
configuration​:
Commit id​: b37d3ac
Platform​:
osname=linux
osvers=3.10.0-327.36.3.el7.x86_64
archname=x86_64-linux
uname='linux nico-c7.dev.cpanel.net 3.10.0-327.36.3.el7.x86_64 #1
smp
mon oct 24 16​:09​:20 utc 2016 x86_64 x86_64 x86_64 gnulinux '
config_args='-des -Dusedevel'
hint=recommended
useposix=true
d_sigaction=define
useithreads=undef
usemultiplicity=undef
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
bincompat5005=undef
Compiler​:
cc='cc'
ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-
strong
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-D_FORTIFY_SOURCE=2'
optimize='-O2'
cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-
strong
-I/usr/local/include'
ccversion=''
gccversion='4.8.5 20150623 (Red Hat 4.8.5-4)'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries​:
ld='cc'
ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64
/lib
/lib64 /usr/lib64 /usr/local/lib64
libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
-lgdbm_compat
perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
libc=libc-2.17.so
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version='2.17'
Dynamic Linking​:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

---
@​INC for perl 5.25.7​:
lib
/root/.dotfiles/perl-must-have/lib
/root/perl5/lib/perl5/
/usr/local/lib/perl5/site_perl/5.25.7/x86_64-linux
/usr/local/lib/perl5/site_perl/5.25.7
/usr/local/lib/perl5/5.25.7/x86_64-linux
/usr/local/lib/perl5/5.25.7

---
Environment for perl 5.25.7​:
HOME=/root
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)

PATH=/usr/local/cpanel/3rdparty/perl/524/bin​:/usr/local/cpanel/3rdparty/perl/522/bin​:/usr/local/cpanel/3rdparty/perl/514/bin​:/usr/local/cpanel/3rdparty/bin​:/root/bin/​:/opt/local/bin​:/opt/local/sbin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/usr/bin​:/opt/cpanel/composer/bin​:/root/.dotfiles/bin​:/root/perl5/bin​:/root/.rvm/bin​:/root/bin
PERL5DB=use Devel​::NYTProf
PERL5LIB=/root/.dotfiles/perl-must-
have/lib​::/root/perl5/lib/perl5/
PERL_BADLANG (unset)
PERL_CPANM_OPT=--quiet
SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Nov 13, 2016

From @atoomic

0001-Increase-hash-colision-ratio-and-reduce-hsplit-calls.patch
From ad816612cb622fe7d8a07dff73171f38ce8494e6 Mon Sep 17 00:00:00 2001
From: Nicolas R <atoomic@cpan.org>
Date: Sat, 12 Nov 2016 17:02:09 +0100
Subject: [PATCH] Increase hash colision ratio and reduce hsplit calls

Previously the hash array was resized each time the hash contains
the same number of elements of its size.

With this commit, hash collision is increased to 2*SIZE, which
has two advantages: resize the hash less often, reduce the size
of the array for hashes.

As elements are now going to collide a little more, this is
going to make hash lookup a bit slower.

Note that when resizing the hash we are still using 2*Number_of_elements
(which is 4*Size_of_Hash).

diff --git a/ext/Hash-Util/t/builtin.t b/ext/Hash-Util/t/builtin.t
index 3654c9b..5ef26da 100644
--- a/ext/Hash-Util/t/builtin.t
+++ b/ext/Hash-Util/t/builtin.t
@@ -6,7 +6,7 @@ use Test::More;
 my @Exported_Funcs;
 BEGIN {
     @Exported_Funcs = qw( bucket_ratio num_buckets used_buckets );
-    plan tests => 13 + @Exported_Funcs;
+    plan tests => 19 + @Exported_Funcs;
     use_ok 'Hash::Util', @Exported_Funcs;
 }
 foreach my $func (@Exported_Funcs) {
@@ -31,8 +31,17 @@ is(num_buckets(%hash), 8, "hash should have eight buckets");
 cmp_ok(used_buckets(%hash), "<", 8, "hash should have one used buckets");
 
 $hash{8}= 8;
-like(bucket_ratio(%hash), qr!/16!, "hash has expected number of buckets in bucket_ratio");
-is(num_buckets(%hash), 16, "hash should have sixteen buckets");
+like(bucket_ratio(%hash), qr!/8!, "hash has expected number of buckets in bucket_ratio");
+is(num_buckets(%hash), 8, "hash should have eight buckets");
+cmp_ok(used_buckets(%hash), "<=", 8, "hash should have at most 8 used buckets");
+
+$hash{$_}= $_ for 9..14;
+like(bucket_ratio(%hash), qr!/8!, "hash has expected number of buckets in bucket_ratio");
+is(num_buckets(%hash), 8, "hash should have eight buckets");
 cmp_ok(used_buckets(%hash), "<=", 8, "hash should have at most 8 used buckets");
 
+$hash{15}= 15;
+like(bucket_ratio(%hash), qr!/32!, "hash has expected number of buckets in bucket_ratio");
+is(num_buckets(%hash), 32, "hash should have 32 buckets");
+cmp_ok(used_buckets(%hash), "<=",16, "hash should have at most 8 used buckets");
 
diff --git a/hv.c b/hv.c
index 7659a6d..8ba70a2 100644
--- a/hv.c
+++ b/hv.c
@@ -34,7 +34,8 @@ holds the key and hash value.
 #define PERL_HASH_INTERNAL_ACCESS
 #include "perl.h"
 
-#define DO_HSPLIT(xhv) ((xhv)->xhv_keys > (xhv)->xhv_max) /* HvTOTALKEYS(hv) > HvMAX(hv) */
+#define SHOULD_HSPLIT(xhv) ((xhv)->xhv_keys > (2*(xhv)->xhv_max)) /* HvTOTALKEYS(hv) > 2 * HvMAX(hv) previously known as HSPLIT */
+#define HSPLIT(xhv, oldsize) hsplit(xhv, oldsize, oldsize * 4) /* 2 * the split limit above (=2*HvMAX) */
 #define HV_FILL_THRESHOLD 31
 
 static const char S_strtab_error[]
@@ -877,7 +878,7 @@ Perl_hv_common(pTHX_ HV *hv, SV *keysv, const char *key, STRLEN klen,
 	HvHASKFLAGS_on(hv);
 
     xhv->xhv_keys++; /* HvTOTALKEYS(hv)++ */
-    if ( DO_HSPLIT(xhv) ) {
+    if ( SHOULD_HSPLIT(xhv) ) {
         const STRLEN oldsize = xhv->xhv_max + 1;
         const U32 items = (U32)HvPLACEHOLDERS_get(hv);
 
@@ -894,10 +895,10 @@ Perl_hv_common(pTHX_ HV *hv, SV *keysv, const char *key, STRLEN klen,
                If we're lucky, then we may clear sufficient placeholders to
                avoid needing to split the hash at all.  */
             clear_placeholders(hv, items);
-            if (DO_HSPLIT(xhv))
-                hsplit(hv, oldsize, oldsize * 2);
+            if (SHOULD_HSPLIT(xhv))
+                HSPLIT(hv, oldsize);
         } else
-            hsplit(hv, oldsize, oldsize * 2);
+            HSPLIT(hv, oldsize);
     }
 
     if (return_svp) {
@@ -3047,9 +3048,9 @@ S_share_hek_flags(pTHX_ const char *str, I32 len, U32 hash, int flags)
 
 	xhv->xhv_keys++; /* HvTOTALKEYS(hv)++ */
 	if (!next) {			/* initial entry? */
-	} else if ( DO_HSPLIT(xhv) ) {
+	} else if ( SHOULD_HSPLIT(xhv) ) {
             const STRLEN oldsize = xhv->xhv_max + 1;
-            hsplit(PL_strtab, oldsize, oldsize * 2);
+            HSPLIT(PL_strtab, oldsize);
 	}
     }
 
-- 
2.10.2

@p5pRT
Copy link
Author

p5pRT commented Nov 13, 2016

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 13, 2016

From @atoomic

Note that increasing the collision ratio, reduce the amount of memory used for the hash array
but also slightly decrease performance (access, realocation, ...)

As we are now only increasing arrays (via calling hsplit) twice less often than earlier,
we can safely increase (also by 2) the size of the array when reallocating the array hash during
a growth, without introducing a memory bloat.

This is why the 'x4' is used here, to preserve the same dynamic/performance as earlier.

Main idea​: reduce memory at a very low performance cost.

Here are some quick&dirty tests with multiple combinations of SHOULD_HSPLIT / HSPLIT,

# -----------------------------------
# set of different hashes
# -----------------------------------
time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status} '

(blead)# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > HvMAX(hv) ; HSPLIT​: 2 x oldsize
VmRSS​: 1480072 kB

real 0m19.072s
user 0m18.012s
sys 0m1.060s

(blead+patch)# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 2 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize
VmRSS​: 1409380 kB (-5%)

real 0m16.707s
user 0m15.640s
sys 0m1.068s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 64 x HvMAX(hv) ; HSPLIT​: 2 x oldsize
# collisions pushed to the extreme, while preserving hsplit/buckets logic
# note the memory reduction, but noticeable run time increase
VmRSS​: 1192512 kB

real 0m45.874s
user 0m44.932s
sys 0m0.937s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 64 x HvMAX(hv) ; HSPLIT​: 16 x 2 x oldsize
# same memory improvement, but run time improved compared to previous run, still fare from reference (blead)
VmRSS​: 1224808 kB

real 0m27.996s
user 0m27.028s
sys 0m0.964s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 2 x oldsize
# seems ok
VmRSS​: 1224560 kB

real 0m18.474s
user 0m17.554s
sys 0m0.918s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT​: 2 x oldsize
# seems ok
VmRSS​: 1261132 kB

real 0m17.377s
user 0m16.469s
sys 0m0.906s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize
# seems ok
VmRSS​: 1296116 kB

real 0m16.645s
user 0m15.733s
sys 0m0.911s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 4 x 2 x oldsize
# seems ok
VmRSS​: 1275800 kB

real 0m17.514s
user 0m16.621s
sys 0m0.892s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize
# seems ok
VmRSS​: 1243628 kB

real 0m18.536s
user 0m17.547s
sys 0m0.990s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 16 x HvMAX(hv) ; HSPLIT​: 4 x 2 x oldsize
# seems ok
VmRSS​: 1221084 kB

real 0m21.573s
user 0m20.601s
sys 0m0.972s

# -----------------------------------
# large hash grow & read/write access
# -----------------------------------

time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for 1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS /proc/$$/status}'

(blead)# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > HvMAX(hv) ; HSPLIT​: 2 x oldsize
VmRSS​: 1687840 kB

real 0m20.777s
user 0m19.548s
sys 0m1.227s

(blead+patch)# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 2 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize
VmRSS​: 1556784 kB

real 0m20.468s
user 0m19.412s
sys 0m1.053s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 2 x oldsize
# not ok​: too slow
VmRSS​: 1458572 kB

real 0m29.747s
user 0m28.704s
sys 0m1.041s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT​: 2 x oldsize
# not ok​: too slow
for 1..10000; print qx{grep RSS /proc/$$/status}'
VmRSS​: 1687976 kB

real 0m33.095s
user 0m31.984s
sys 0m1.107s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize
VmRSS​: 1556784 kB

real 0m24.261s
user 0m23.227s
sys 0m1.033s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 4 x 2 x oldsize
VmRSS​: 1458608 kB

real 0m24.841s
user 0m23.855s
sys 0m0.979s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize
VmRSS​: 1458552 kB

real 0m25.109s
user 0m24.133s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 16 x HvMAX(hv) ; HSPLIT​: 4 x 2 x oldsize
# no ok​: too slow
VmRSS​: 1458544 kB

real 0m29.900s
user 0m28.867s
sys 0m1.032s

On Sun, 13 Nov 2016 06​:03​:38 -0800, atoomic@​cpan.org wrote​:

This is a bug report for perl from atoomic@​cpan.org,
generated with the help of perlbug 1.40 running under perl 5.25.7.

-----------------------------------------------------------------
[Please describe your issue here]

Increase hash colision ratio and reduce hsplit calls,
This is a memory (micro) optimization.

Previously the hash array was resized each time the hash contains
the same number of elements of its size.

With this commit, hash collision is increased to 2*SIZE, which
has two advantages​: resize the hash less often, reduce the size
of the array for hashes.

As elements are now going to collide a little more, this is
going to make hash lookup a bit slower.

Note that when resizing the hash we are still using
2*Number_of_elements
(which is 4*Size_of_Hash)

Here are some examples showing the memory optimization resulting from
this
change.
Time is not really relevant here in these samples.

# -------------------------------
# set of different hash sizes
# ———————————————

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status}
'
VmRSS​: 1480072 kB

real 0m19.072s
user 0m18.012s
sys 0m1.060s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
$l1
(1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status}
'
VmRSS​: 1409380 kB (-5%)

real 0m16.707s
user 0m15.640s
sys 0m1.068s

# -------------------------------
# large hash grow & read/write access
# -------------------------------

(blead)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for
1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS
/proc/$$/status}'
VmRSS​: 1687840 kB

real 0m20.777s
user 0m19.548s
sys 0m1.227s

(blead+patch)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for
1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS
/proc/$$/status}'
VmRSS​: 1556784 kB

real 0m20.468s
user 0m19.412s
sys 0m1.053s

# -------------------------------
# 100_000 hashes with 2 keys
# -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..1000_000) { $h{$l1} = {1..4} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 340300 kB

real 0m2.641s
user 0m2.370s
sys 0m0.271s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
$l1
(1..100_0000) { $h{$l1} = {1..4} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 332072 kB

real 0m2.647s
user 0m2.405s
sys 0m0.242s

# -------------------------------
# 100_000 hashes with 32 keys
# -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..100_0000) { $h{$l1} = {1..64} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 2202668 kB

real 0m8.591s
user 0m7.062s
sys 0m1.528s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
$l1
(1..1000_000) { $h{$l1} = {1..64} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 1942756 kB

real 0m7.755s
user 0m6.379s
sys 0m1.377s

# -------------------------------
# 100_000 hashes with 32 keys
# -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
(1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 326848 kB

real 0m1.267s
user 0m1.006s
sys 0m0.260s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
$l1
(1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS
/proc/$$/status} '
VmRSS​: 286512 kB

real 0m1.149s
user 0m0.919s
sys 0m0.229s

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags​:
category=core
severity=low
Type=Patch
PatchStatus=HasPatch
---
Site configuration information for perl 5.25.7​:

Configured by root at Sun Nov 13 03​:43​:35 MST 2016.

Summary of my perl5 (revision 5 version 25 subversion 7)
configuration​:
Commit id​: b37d3ac
Platform​:
osname=linux
osvers=3.10.0-327.36.3.el7.x86_64
archname=x86_64-linux
uname='linux nico-c7.dev.cpanel.net 3.10.0-327.36.3.el7.x86_64 #1
smp
mon oct 24 16​:09​:20 utc 2016 x86_64 x86_64 x86_64 gnulinux '
config_args='-des -Dusedevel'
hint=recommended
useposix=true
d_sigaction=define
useithreads=undef
usemultiplicity=undef
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
bincompat5005=undef
Compiler​:
cc='cc'
ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-
strong
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-D_FORTIFY_SOURCE=2'
optimize='-O2'
cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-
strong
-I/usr/local/include'
ccversion=''
gccversion='4.8.5 20150623 (Red Hat 4.8.5-4)'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries​:
ld='cc'
ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64
/lib
/lib64 /usr/lib64 /usr/local/lib64
libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
-lgdbm_compat
perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
libc=libc-2.17.so
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version='2.17'
Dynamic Linking​:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

---
@​INC for perl 5.25.7​:
lib
/root/.dotfiles/perl-must-have/lib
/root/perl5/lib/perl5/
/usr/local/lib/perl5/site_perl/5.25.7/x86_64-linux
/usr/local/lib/perl5/site_perl/5.25.7
/usr/local/lib/perl5/5.25.7/x86_64-linux
/usr/local/lib/perl5/5.25.7

---
Environment for perl 5.25.7​:
HOME=/root
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)

PATH=/usr/local/cpanel/3rdparty/perl/524/bin​:/usr/local/cpanel/3rdparty/perl/522/bin​:/usr/local/cpanel/3rdparty/perl/514/bin​:/usr/local/cpanel/3rdparty/bin​:/root/bin/​:/opt/local/bin​:/opt/local/sbin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/usr/bin​:/opt/cpanel/composer/bin​:/root/.dotfiles/bin​:/root/perl5/bin​:/root/.rvm/bin​:/root/bin
PERL5DB=use Devel​::NYTProf
PERL5LIB=/root/.dotfiles/perl-must-
have/lib​::/root/perl5/lib/perl5/
PERL_BADLANG (unset)
PERL_CPANM_OPT=--quiet
SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Nov 13, 2016

From @demerphq

On 13 November 2016 at 17​:29, Atoomic via RT <perlbug-followup@​perl.org> wrote​:

Note that increasing the collision ratio, reduce the amount of memory used for the hash array
but also slightly decrease performance (access, realocation, ...)

Yeah, there are a number of trade offs here.

I think its helpful to be able to visualize what is going on.

We currently use a maximum load factor of 1, or put otherwise we
expect there to be on average 1 key per bucket or less, and we double
the size of the array whenever we exceed this ratio.

This is what this looks like for 1023 keys going into 1024 buckets​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-1); print
bucket_stats_formatted(\%hash);'
Keys​: 1023 Buckets​: 640/1024 Quality-Score​: 1.02 (Good)
Utilized Buckets​: 62.50% Optimal​: 99.90% Keys In Collision​: 37.44%
Chain Length - mean​: 1.60 stddev​: 0.85
Buckets 1024
[0000000000000000000000001111111111111111111111122222222222233334*]
Len 0 37.50% 384 [########################]
Len 1 36.04% 369 [#######################]
Len 2 18.55% 190 [############]
Len 3 5.66% 58 [####]
Len 4 1.66% 17 [#]
Len 5 0.39% 4 []
Len 6 0.20% 2 []
Keys 1023
[111111111111111111111111111111111111111122222222222222222333334*]
Pos 1 62.56% 640 [########################################]
Pos 2 26.49% 271 [#################]
Pos 3 7.92% 81 [#####]
Pos 4 2.25% 23 [#]
Pos 5 0.59% 6 []
Pos 6 0.20% 2 []

As we can see, we have about 40% of our buckets unused, 36% of the
used buckets have 1 items in the chain, the longest chain length is 6,
and for a read-hit/update, 62% of our data will be found without
traversing the HE chain at all, and 88% with 1 extra HE traverse.

Then we add a key, triggering a split​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-0); print
bucket_stats_formatted(\%hash);'
Keys​: 1024 Buckets​: 803/2048 Quality-Score​: 1.01 (Good)
Utilized Buckets​: 39.21% Optimal​: 50.00% Keys In Collision​: 21.58%
Chain Length - mean​: 1.28 stddev​: 0.56
Buckets 2048
[0000000000000000000000000000000000000001111111111111111111222223*]
Len 0 60.79% 1245 [#######################################]
Len 1 30.27% 620 [###################]
Len 2 7.37% 151 [#####]
Len 3 1.32% 27 [#]
Len 4 0.20% 4 []
Len 5 0.05% 1 []
Keys 1024
[111111111111111111111111111111111111111111111111112222222222233*]
Pos 1 78.42% 803 [##################################################]
Pos 2 17.87% 183 [###########]
Pos 3 3.12% 32 [##]
Pos 4 0.49% 5 []
Pos 5 0.10% 1 []

Now we end up with 60% of the hash buckets unused, and only modestly
improve the distribution of keys. For instance where previously 88% of
read-hits would occur with at most 1 HE traverse or less, we now have
95%. A 7% improvement for a large cost in unused space.

Compare with a maximum load factor of 4​:

At 1023 keys, right before key split​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-1); print
bucket_stats_formatted(\%hash);'
Keys​: 1023 Buckets​: 252/256 Quality-Score​: 1.03 (Good)
Utilized Buckets​: 98.44% Optimal​: 399.61% Keys In Collision​: 75.37%
Chain Length - mean​: 4.06 stddev​: 2.10
Buckets 256
[011111122222222233333333333334444444444444555555566666666777789*]
Len 0 1.56% 4 [#]
Len 1 8.98% 23 [######]
Len 2 13.67% 35 [#########]
Len 3 20.70% 53 [#############]
Len 4 20.70% 53 [#############]
Len 5 11.33% 29 [#######]
Len 6 11.72% 30 [########]
Len 7 6.64% 17 [####]
Len 8 1.95% 5 [#]
Len 9 1.17% 3 [#]
Len 10 0.00% 0 []
Len 11 0.39% 1 []
Len 12 0.78% 2 []
Len 13 0.39% 1 []
Keys 1023
[1111111111111111222222222222223333333333334444444445555556666778*]
Pos 1 24.63% 252 [################]
Pos 2 22.39% 229 [##############]
Pos 3 18.96% 194 [############]
Pos 4 13.78% 141 [#########]
Pos 5 8.60% 88 [######]
Pos 6 5.77% 59 [####]
Pos 7 2.83% 29 [##]
Pos 8 1.17% 12 [#]
Pos 9 0.68% 7 []
Pos 10 0.39% 4 []
Pos 11 0.39% 4 []
Pos 12 0.29% 3 []
Pos 13 0.10% 1 []

Almost no wasted space, and a pretty good distribution of keys. Just
under 50% of the keys are found with at most 1 HE transition, etc.

And after a key insert, hsplit​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-0); print
bucket_stats_formatted(\%hash);'
Keys​: 1024 Buckets​: 449/512 Quality-Score​: 1.01 (Good)
Utilized Buckets​: 87.70% Optimal​: 200.00% Keys In Collision​: 56.15%
Chain Length - mean​: 2.28 stddev​: 1.30
Buckets 512
[000000001111111111111111111222222222222222233333333333444444555*]
Len 0 12.30% 63 [########]
Len 1 30.27% 155 [###################]
Len 2 25.59% 131 [################]
Len 3 16.99% 87 [###########]
Len 4 8.98% 46 [######]
Len 5 4.49% 23 [###]
Len 6 0.59% 3 []
Len 7 0.59% 3 []
Len 8 0.20% 1 []
Keys 1024
[111111111111111111111111111122222222222222222233333333334444455*]
Pos 1 43.85% 449 [############################]
Pos 2 28.71% 294 [##################]
Pos 3 15.92% 163 [##########]
Pos 4 7.42% 76 [#####]
Pos 5 2.93% 30 [##]
Pos 6 0.68% 7 []
Pos 7 0.39% 4 []
Pos 8 0.10% 1 []

Quite a big improvement in distribution and chain length, and still
only modest wastage of the hash array. 71% of data within 1 hop, and a
max chain length of 8.

Now compare to a load factor of 8​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-1); print
bucket_stats_formatted(\%hash);'
Keys​: 1023 Buckets​: 128/128 Quality-Score​: 0.99 (Good)
Utilized Buckets​: 100.00% Optimal​: 799.22% Keys In Collision​: 87.49%
Chain Length - mean​: 7.99 stddev​: 2.72
Buckets 128 [*]
Len 0 0.00% 0 []
Len 1 0.78% 1 []
Len 2 0.78% 1 []
Len 3 2.34% 3 [##]
Len 4 6.25% 8 [####]
Len 5 8.59% 11 [######]
Len 6 13.28% 17 [########]
Len 7 10.94% 14 [#######]
Len 8 13.28% 17 [########]
Len 9 16.41% 21 [##########]
Len 10 7.81% 10 [#####]
Len 11 8.59% 11 [######]
Len 12 5.47% 7 [####]
Len 13 3.12% 4 [##]
Len 14 2.34% 3 [##]
Keys 1023
[1111111122222222333333334444444455555556666666777778888899991010111112*]
Pos 1 12.51% 128 [########]
Pos 2 12.41% 127 [########]
Pos 3 12.32% 126 [########]
Pos 4 12.02% 123 [########]
Pos 5 11.24% 115 [#######]
Pos 6 10.17% 104 [#######]
Pos 7 8.50% 87 [#####]
Pos 8 7.14% 73 [#####]
Pos 9 5.47% 56 [####]
Pos 10 3.42% 35 [##]
Pos 11 2.44% 25 [##]
Pos 12 1.37% 14 [#]
Pos 13 0.68% 7 []
Pos 14 0.29% 3 []

As we can see, the hash is now filling up, relatively little, 25% or
so, is reachable within 1 HE transition.

And after an hsplit​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-0); print
bucket_stats_formatted(\%hash);'
Keys​: 1024 Buckets​: 251/256 Quality-Score​: 1.01 (Good)
Utilized Buckets​: 98.05% Optimal​: 400.00% Keys In Collision​: 75.49%
Chain Length - mean​: 4.08 stddev​: 2.02
Buckets 256
[011111222222222233333333333334444444444455555555566666667777889*]
Len 0 1.95% 5 [#]
Len 1 7.42% 19 [#####]
Len 2 15.62% 40 [##########]
Len 3 20.31% 52 [#############]
Len 4 17.19% 44 [###########]
Len 5 14.45% 37 [#########]
Len 6 10.94% 28 [#######]
Len 7 7.03% 18 [####]
Len 8 3.12% 8 [##]
Len 9 1.56% 4 [#]
Len 10 0.00% 0 []
Len 11 0.00% 0 []
Len 12 0.00% 0 []
Len 13 0.00% 0 []
Len 14 0.39% 1 []
Keys 1024
[1111111111111111222222222222223333333333334444444445555556666778*]
Pos 1 24.51% 251 [################]
Pos 2 22.66% 232 [##############]
Pos 3 18.75% 192 [############]
Pos 4 13.67% 140 [#########]
Pos 5 9.38% 96 [######]
Pos 6 5.76% 59 [####]
Pos 7 3.03% 31 [##]
Pos 8 1.27% 13 [#]
Pos 9 0.49% 5 []
Pos 10 0.10% 1 []
Pos 11 0.10% 1 []
Pos 12 0.10% 1 []
Pos 13 0.10% 1 []
Pos 14 0.10% 1 []

We double the number of items we can reach with 1 HE transition,
accounting for 50% or so, but the hash is still pretty full.

IMO what all this shows is that we probably want a max load factor of about 4.

I would not change the rate at which we grow the hash, at least not
with current code, and maybe not even at all. The reason being I think
we don't want to go below a certain load factor at all. When we double
the hash array, we essentially half our load factor, and if we agree
that we get the right tradeoffs at a load between 2 and 4, then we
should not grow faster than doubling, or we would drop below a load
factor of 2 when we did so.

Yves

@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2016

From @atoomic

Thanks for your time & feedback, the graphical visualization is definitively a plus.
Indeed it's all a matter of tradeoff/tuning to find the best compromise.
I would also not consider a load factor higher than 4.

My main concern is if we slow down hashes by reducing the memory usage, this would be a regression from my point of view...
And from the naive tests I run earlier seems that the load factor as a significant impact on run time.

I was also considering an alternate solution to change the grow factor only at the beginning for small hashes only.
Which will avoid to waste memory when the hash starts to be big... This will probably fix your concern of wasting memory ?

Maybe this should be an option that could be set at compilation time ? (This would not solve the question if we need/want to change the default value.)

Any opinion with this ? Of course the definition of "small/big" needs to be define, was thinking something like 1024.

I'm also fine leaving it as it's, but I've the feeling that this default setup can be optimized.

Thanks again
nicolas

On Sun, 13 Nov 2016 14​:08​:00 -0800, demerphq wrote​:

On 13 November 2016 at 17​:29, Atoomic via RT <perlbug-
followup@​perl.org> wrote​:

Note that increasing the collision ratio, reduce the amount of memory
used for the hash array
but also slightly decrease performance (access, realocation, ...)

Yeah, there are a number of trade offs here.

I think its helpful to be able to visualize what is going on.

We currently use a maximum load factor of 1, or put otherwise we
expect there to be on average 1 key per bucket or less, and we double
the size of the array whenever we exceed this ratio.

This is what this looks like for 1023 keys going into 1024 buckets​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-1); print
bucket_stats_formatted(\%hash);'
Keys​: 1023 Buckets​: 640/1024 Quality-Score​: 1.02 (Good)
Utilized Buckets​: 62.50% Optimal​: 99.90% Keys In Collision​: 37.44%
Chain Length - mean​: 1.60 stddev​: 0.85
Buckets 1024
[0000000000000000000000001111111111111111111111122222222222233334*]
Len 0 37.50% 384 [########################]
Len 1 36.04% 369 [#######################]
Len 2 18.55% 190 [############]
Len 3 5.66% 58 [####]
Len 4 1.66% 17 [#]
Len 5 0.39% 4 []
Len 6 0.20% 2 []
Keys 1023
[111111111111111111111111111111111111111122222222222222222333334*]
Pos 1 62.56% 640 [########################################]
Pos 2 26.49% 271 [#################]
Pos 3 7.92% 81 [#####]
Pos 4 2.25% 23 [#]
Pos 5 0.59% 6 []
Pos 6 0.20% 2 []

As we can see, we have about 40% of our buckets unused, 36% of the
used buckets have 1 items in the chain, the longest chain length is 6,
and for a read-hit/update, 62% of our data will be found without
traversing the HE chain at all, and 88% with 1 extra HE traverse.

Then we add a key, triggering a split​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-0); print
bucket_stats_formatted(\%hash);'
Keys​: 1024 Buckets​: 803/2048 Quality-Score​: 1.01 (Good)
Utilized Buckets​: 39.21% Optimal​: 50.00% Keys In Collision​: 21.58%
Chain Length - mean​: 1.28 stddev​: 0.56
Buckets 2048
[0000000000000000000000000000000000000001111111111111111111222223*]
Len 0 60.79% 1245 [#######################################]
Len 1 30.27% 620 [###################]
Len 2 7.37% 151 [#####]
Len 3 1.32% 27 [#]
Len 4 0.20% 4 []
Len 5 0.05% 1 []
Keys 1024
[111111111111111111111111111111111111111111111111112222222222233*]
Pos 1 78.42% 803
[##################################################]
Pos 2 17.87% 183 [###########]
Pos 3 3.12% 32 [##]
Pos 4 0.49% 5 []
Pos 5 0.10% 1 []

Now we end up with 60% of the hash buckets unused, and only modestly
improve the distribution of keys. For instance where previously 88% of
read-hits would occur with at most 1 HE traverse or less, we now have
95%. A 7% improvement for a large cost in unused space.

Compare with a maximum load factor of 4​:

At 1023 keys, right before key split​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-1); print
bucket_stats_formatted(\%hash);'
Keys​: 1023 Buckets​: 252/256 Quality-Score​: 1.03 (Good)
Utilized Buckets​: 98.44% Optimal​: 399.61% Keys In Collision​: 75.37%
Chain Length - mean​: 4.06 stddev​: 2.10
Buckets 256
[011111122222222233333333333334444444444444555555566666666777789*]
Len 0 1.56% 4 [#]
Len 1 8.98% 23 [######]
Len 2 13.67% 35 [#########]
Len 3 20.70% 53 [#############]
Len 4 20.70% 53 [#############]
Len 5 11.33% 29 [#######]
Len 6 11.72% 30 [########]
Len 7 6.64% 17 [####]
Len 8 1.95% 5 [#]
Len 9 1.17% 3 [#]
Len 10 0.00% 0 []
Len 11 0.39% 1 []
Len 12 0.78% 2 []
Len 13 0.39% 1 []
Keys 1023
[1111111111111111222222222222223333333333334444444445555556666778*]
Pos 1 24.63% 252 [################]
Pos 2 22.39% 229 [##############]
Pos 3 18.96% 194 [############]
Pos 4 13.78% 141 [#########]
Pos 5 8.60% 88 [######]
Pos 6 5.77% 59 [####]
Pos 7 2.83% 29 [##]
Pos 8 1.17% 12 [#]
Pos 9 0.68% 7 []
Pos 10 0.39% 4 []
Pos 11 0.39% 4 []
Pos 12 0.29% 3 []
Pos 13 0.10% 1 []

Almost no wasted space, and a pretty good distribution of keys. Just
under 50% of the keys are found with at most 1 HE transition, etc.

And after a key insert, hsplit​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-0); print
bucket_stats_formatted(\%hash);'
Keys​: 1024 Buckets​: 449/512 Quality-Score​: 1.01 (Good)
Utilized Buckets​: 87.70% Optimal​: 200.00% Keys In Collision​: 56.15%
Chain Length - mean​: 2.28 stddev​: 1.30
Buckets 512
[000000001111111111111111111222222222222222233333333333444444555*]
Len 0 12.30% 63 [########]
Len 1 30.27% 155 [###################]
Len 2 25.59% 131 [################]
Len 3 16.99% 87 [###########]
Len 4 8.98% 46 [######]
Len 5 4.49% 23 [###]
Len 6 0.59% 3 []
Len 7 0.59% 3 []
Len 8 0.20% 1 []
Keys 1024
[111111111111111111111111111122222222222222222233333333334444455*]
Pos 1 43.85% 449 [############################]
Pos 2 28.71% 294 [##################]
Pos 3 15.92% 163 [##########]
Pos 4 7.42% 76 [#####]
Pos 5 2.93% 30 [##]
Pos 6 0.68% 7 []
Pos 7 0.39% 4 []
Pos 8 0.10% 1 []

Quite a big improvement in distribution and chain length, and still
only modest wastage of the hash array. 71% of data within 1 hop, and a
max chain length of 8.

Now compare to a load factor of 8​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-1); print
bucket_stats_formatted(\%hash);'
Keys​: 1023 Buckets​: 128/128 Quality-Score​: 0.99 (Good)
Utilized Buckets​: 100.00% Optimal​: 799.22% Keys In Collision​: 87.49%
Chain Length - mean​: 7.99 stddev​: 2.72
Buckets 128 [*]
Len 0 0.00% 0 []
Len 1 0.78% 1 []
Len 2 0.78% 1 []
Len 3 2.34% 3 [##]
Len 4 6.25% 8 [####]
Len 5 8.59% 11 [######]
Len 6 13.28% 17 [########]
Len 7 10.94% 14 [#######]
Len 8 13.28% 17 [########]
Len 9 16.41% 21 [##########]
Len 10 7.81% 10 [#####]
Len 11 8.59% 11 [######]
Len 12 5.47% 7 [####]
Len 13 3.12% 4 [##]
Len 14 2.34% 3 [##]
Keys 1023
[1111111122222222333333334444444455555556666666777778888899991010111112*]
Pos 1 12.51% 128 [########]
Pos 2 12.41% 127 [########]
Pos 3 12.32% 126 [########]
Pos 4 12.02% 123 [########]
Pos 5 11.24% 115 [#######]
Pos 6 10.17% 104 [#######]
Pos 7 8.50% 87 [#####]
Pos 8 7.14% 73 [#####]
Pos 9 5.47% 56 [####]
Pos 10 3.42% 35 [##]
Pos 11 2.44% 25 [##]
Pos 12 1.37% 14 [#]
Pos 13 0.68% 7 []
Pos 14 0.29% 3 []

As we can see, the hash is now filling up, relatively little, 25% or
so, is reachable within 1 HE transition.

And after an hsplit​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash;
$hash{$_}++ for 1 .. ((1<<10)-0); print
bucket_stats_formatted(\%hash);'
Keys​: 1024 Buckets​: 251/256 Quality-Score​: 1.01 (Good)
Utilized Buckets​: 98.05% Optimal​: 400.00% Keys In Collision​: 75.49%
Chain Length - mean​: 4.08 stddev​: 2.02
Buckets 256
[011111222222222233333333333334444444444455555555566666667777889*]
Len 0 1.95% 5 [#]
Len 1 7.42% 19 [#####]
Len 2 15.62% 40 [##########]
Len 3 20.31% 52 [#############]
Len 4 17.19% 44 [###########]
Len 5 14.45% 37 [#########]
Len 6 10.94% 28 [#######]
Len 7 7.03% 18 [####]
Len 8 3.12% 8 [##]
Len 9 1.56% 4 [#]
Len 10 0.00% 0 []
Len 11 0.00% 0 []
Len 12 0.00% 0 []
Len 13 0.00% 0 []
Len 14 0.39% 1 []
Keys 1024
[1111111111111111222222222222223333333333334444444445555556666778*]
Pos 1 24.51% 251 [################]
Pos 2 22.66% 232 [##############]
Pos 3 18.75% 192 [############]
Pos 4 13.67% 140 [#########]
Pos 5 9.38% 96 [######]
Pos 6 5.76% 59 [####]
Pos 7 3.03% 31 [##]
Pos 8 1.27% 13 [#]
Pos 9 0.49% 5 []
Pos 10 0.10% 1 []
Pos 11 0.10% 1 []
Pos 12 0.10% 1 []
Pos 13 0.10% 1 []
Pos 14 0.10% 1 []

We double the number of items we can reach with 1 HE transition,
accounting for 50% or so, but the hash is still pretty full.

IMO what all this shows is that we probably want a max load factor of
about 4.

I would not change the rate at which we grow the hash, at least not
with current code, and maybe not even at all. The reason being I think
we don't want to go below a certain load factor at all. When we double
the hash array, we essentially half our load factor, and if we agree
that we get the right tradeoffs at a load between 2 and 4, then we
should not grow faster than doubling, or we would drop below a load
factor of 2 when we did so.

Yves

@p5pRT
Copy link
Author

p5pRT commented Nov 14, 2016

From @demerphq

On 14 November 2016 at 01​:58, Atoomic via RT <perlbug-followup@​perl.org> wrote​:

Thanks for your time & feedback, the graphical visualization is definitively a plus.
Indeed it's all a matter of tradeoff/tuning to find the best compromise.
I would also not consider a load factor higher than 4.

My main concern is if we slow down hashes by reducing the memory usage, this would be a regression from my point of view...
And from the naive tests I run earlier seems that the load factor as a significant impact on run time.

For sure, but some of your test cases were at very very high load
factors, like 16 or 64.

I think a max load factor of 4 is probably fine, but we can and should
benchmark, using Dave's porting/bench.pl for proper analysis.

And we need a test set that is comprehensive. We basically have two
broad cases, with two flavours each (read/write).

read-hit/update
read-miss/insert

So we need a benchmark that builds a largish hash (IMO something like
1024-4096 range), and which then tests updating keys that are known to
be in the hash, and also tests building the hash, and also tests
reading keys known not to be in the hash. We probably also want to use
a constant key size.

I was also considering an alternate solution to change the grow factor only at the beginning for small hashes only.
Which will avoid to waste memory when the hash starts to be big... This will probably fix your concern of wasting memory ?

We need to find the right balance between a size that over-waste with
lots of small hashes (think objects), while does not waste with a few
large hashes.

Lets discuss this in person.

Maybe this should be an option that could be set at compilation time ? (This would not solve the question if we need/want to change the default value.)

Yes for sure. My version of this patch is already structured along those lines.

Any opinion with this ? Of course the definition of "small/big" needs to be define, was thinking something like 1024.

As others have said a big part of the problem with this discussion is
that hashes are used for something like three purposes​:

1. objects, typically small, many repeated keys
2. reference tables, often large, few repeated keys
3. symbol tables, probably in the middle in terms of both key frequency and size

We have to be careful not to structure this such that we win in one
case, such as reference tables, but lose in the other two cases.

I'm also fine leaving it as it's, but I've the feeling that this default setup can be optimized.

I definitely think we can and benchmarks willing, should tweak some of this.

But I am still reluctant to change the grow factor. If I am correct
that the ideal for our purposes is to have a load factor between 2 and
4, then growing by anything other than doubling will leave us at a
load factor smaller than two after a split, which then wastes space
for little speed benefit.

cheers,
Yves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants