New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] Increase hash collision ratio and reduce hsplit calls #15711
Comments
From @atoomicCreated by @atoomicIncrease hash colision ratio and reduce hsplit calls, Previously the hash array was resized each time the hash contains With this commit, hash collision is increased to 2*SIZE, which As elements are now going to collide a little more, this is Note that when resizing the hash we are still using 2*Number_of_elements Here are some examples showing the memory optimization resulting from this # ------------------------------- (blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 real 0m19.072s (blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 real 0m16.707s # ------------------------------- (blead)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for real 0m20.777s (blead+patch)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for real 0m20.468s # ------------------------------- (blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 real 0m2.641s (blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 real 0m2.647s # ------------------------------- (blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 real 0m8.591s (blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 real 0m7.755s # ------------------------------- (blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 real 0m1.267s (blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 real 0m1.149s Perl Info
|
From @atoomicBetter with the patch :-) attached to this message On Sun, 13 Nov 2016 06:03:38 -0800, atoomic@cpan.org wrote:
|
From @atoomic0001-Increase-hash-colision-ratio-and-reduce-hsplit-calls.patchFrom ad816612cb622fe7d8a07dff73171f38ce8494e6 Mon Sep 17 00:00:00 2001
From: Nicolas R <atoomic@cpan.org>
Date: Sat, 12 Nov 2016 17:02:09 +0100
Subject: [PATCH] Increase hash colision ratio and reduce hsplit calls
Previously the hash array was resized each time the hash contains
the same number of elements of its size.
With this commit, hash collision is increased to 2*SIZE, which
has two advantages: resize the hash less often, reduce the size
of the array for hashes.
As elements are now going to collide a little more, this is
going to make hash lookup a bit slower.
Note that when resizing the hash we are still using 2*Number_of_elements
(which is 4*Size_of_Hash).
diff --git a/ext/Hash-Util/t/builtin.t b/ext/Hash-Util/t/builtin.t
index 3654c9b..5ef26da 100644
--- a/ext/Hash-Util/t/builtin.t
+++ b/ext/Hash-Util/t/builtin.t
@@ -6,7 +6,7 @@ use Test::More;
my @Exported_Funcs;
BEGIN {
@Exported_Funcs = qw( bucket_ratio num_buckets used_buckets );
- plan tests => 13 + @Exported_Funcs;
+ plan tests => 19 + @Exported_Funcs;
use_ok 'Hash::Util', @Exported_Funcs;
}
foreach my $func (@Exported_Funcs) {
@@ -31,8 +31,17 @@ is(num_buckets(%hash), 8, "hash should have eight buckets");
cmp_ok(used_buckets(%hash), "<", 8, "hash should have one used buckets");
$hash{8}= 8;
-like(bucket_ratio(%hash), qr!/16!, "hash has expected number of buckets in bucket_ratio");
-is(num_buckets(%hash), 16, "hash should have sixteen buckets");
+like(bucket_ratio(%hash), qr!/8!, "hash has expected number of buckets in bucket_ratio");
+is(num_buckets(%hash), 8, "hash should have eight buckets");
+cmp_ok(used_buckets(%hash), "<=", 8, "hash should have at most 8 used buckets");
+
+$hash{$_}= $_ for 9..14;
+like(bucket_ratio(%hash), qr!/8!, "hash has expected number of buckets in bucket_ratio");
+is(num_buckets(%hash), 8, "hash should have eight buckets");
cmp_ok(used_buckets(%hash), "<=", 8, "hash should have at most 8 used buckets");
+$hash{15}= 15;
+like(bucket_ratio(%hash), qr!/32!, "hash has expected number of buckets in bucket_ratio");
+is(num_buckets(%hash), 32, "hash should have 32 buckets");
+cmp_ok(used_buckets(%hash), "<=",16, "hash should have at most 8 used buckets");
diff --git a/hv.c b/hv.c
index 7659a6d..8ba70a2 100644
--- a/hv.c
+++ b/hv.c
@@ -34,7 +34,8 @@ holds the key and hash value.
#define PERL_HASH_INTERNAL_ACCESS
#include "perl.h"
-#define DO_HSPLIT(xhv) ((xhv)->xhv_keys > (xhv)->xhv_max) /* HvTOTALKEYS(hv) > HvMAX(hv) */
+#define SHOULD_HSPLIT(xhv) ((xhv)->xhv_keys > (2*(xhv)->xhv_max)) /* HvTOTALKEYS(hv) > 2 * HvMAX(hv) previously known as HSPLIT */
+#define HSPLIT(xhv, oldsize) hsplit(xhv, oldsize, oldsize * 4) /* 2 * the split limit above (=2*HvMAX) */
#define HV_FILL_THRESHOLD 31
static const char S_strtab_error[]
@@ -877,7 +878,7 @@ Perl_hv_common(pTHX_ HV *hv, SV *keysv, const char *key, STRLEN klen,
HvHASKFLAGS_on(hv);
xhv->xhv_keys++; /* HvTOTALKEYS(hv)++ */
- if ( DO_HSPLIT(xhv) ) {
+ if ( SHOULD_HSPLIT(xhv) ) {
const STRLEN oldsize = xhv->xhv_max + 1;
const U32 items = (U32)HvPLACEHOLDERS_get(hv);
@@ -894,10 +895,10 @@ Perl_hv_common(pTHX_ HV *hv, SV *keysv, const char *key, STRLEN klen,
If we're lucky, then we may clear sufficient placeholders to
avoid needing to split the hash at all. */
clear_placeholders(hv, items);
- if (DO_HSPLIT(xhv))
- hsplit(hv, oldsize, oldsize * 2);
+ if (SHOULD_HSPLIT(xhv))
+ HSPLIT(hv, oldsize);
} else
- hsplit(hv, oldsize, oldsize * 2);
+ HSPLIT(hv, oldsize);
}
if (return_svp) {
@@ -3047,9 +3048,9 @@ S_share_hek_flags(pTHX_ const char *str, I32 len, U32 hash, int flags)
xhv->xhv_keys++; /* HvTOTALKEYS(hv)++ */
if (!next) { /* initial entry? */
- } else if ( DO_HSPLIT(xhv) ) {
+ } else if ( SHOULD_HSPLIT(xhv) ) {
const STRLEN oldsize = xhv->xhv_max + 1;
- hsplit(PL_strtab, oldsize, oldsize * 2);
+ HSPLIT(PL_strtab, oldsize);
}
}
--
2.10.2
|
The RT System itself - Status changed from 'new' to 'open' |
From @atoomicNote that increasing the collision ratio, reduce the amount of memory used for the hash array As we are now only increasing arrays (via calling hsplit) twice less often than earlier, This is why the 'x4' is used here, to preserve the same dynamic/performance as earlier. Main idea: reduce memory at a very low performance cost. Here are some quick&dirty tests with multiple combinations of SHOULD_HSPLIT / HSPLIT, # ----------------------------------- (blead)# SHOULD_HSPLIT: HvTOTALKEYS(hv) > HvMAX(hv) ; HSPLIT: 2 x oldsize real 0m19.072s (blead+patch)# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 2 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize real 0m16.707s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 64 x HvMAX(hv) ; HSPLIT: 2 x oldsize real 0m45.874s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 64 x HvMAX(hv) ; HSPLIT: 16 x 2 x oldsize real 0m27.996s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 2 x oldsize real 0m18.474s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT: 2 x oldsize real 0m17.377s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize real 0m16.645s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 4 x 2 x oldsize real 0m17.514s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize real 0m18.536s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 16 x HvMAX(hv) ; HSPLIT: 4 x 2 x oldsize real 0m21.573s # -----------------------------------
(blead)# SHOULD_HSPLIT: HvTOTALKEYS(hv) > HvMAX(hv) ; HSPLIT: 2 x oldsize real 0m20.777s (blead+patch)# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 2 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize real 0m20.468s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 2 x oldsize real 0m29.747s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT: 2 x oldsize real 0m33.095s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize real 0m24.261s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 4 x 2 x oldsize real 0m24.841s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize real 0m25.109s # SHOULD_HSPLIT: HvTOTALKEYS(hv) > 16 x HvMAX(hv) ; HSPLIT: 4 x 2 x oldsize real 0m29.900s On Sun, 13 Nov 2016 06:03:38 -0800, atoomic@cpan.org wrote:
|
From @demerphqOn 13 November 2016 at 17:29, Atoomic via RT <perlbug-followup@perl.org> wrote:
Yeah, there are a number of trade offs here. I think its helpful to be able to visualize what is going on. We currently use a maximum load factor of 1, or put otherwise we This is what this looks like for 1023 keys going into 1024 buckets: $ ./perl -Ilib -MHash::Util=bucket_stats_formatted -e'my %hash; As we can see, we have about 40% of our buckets unused, 36% of the Then we add a key, triggering a split: $ ./perl -Ilib -MHash::Util=bucket_stats_formatted -e'my %hash; Now we end up with 60% of the hash buckets unused, and only modestly Compare with a maximum load factor of 4: At 1023 keys, right before key split: $ ./perl -Ilib -MHash::Util=bucket_stats_formatted -e'my %hash; Almost no wasted space, and a pretty good distribution of keys. Just And after a key insert, hsplit: $ ./perl -Ilib -MHash::Util=bucket_stats_formatted -e'my %hash; Quite a big improvement in distribution and chain length, and still Now compare to a load factor of 8: $ ./perl -Ilib -MHash::Util=bucket_stats_formatted -e'my %hash; As we can see, the hash is now filling up, relatively little, 25% or And after an hsplit: $ ./perl -Ilib -MHash::Util=bucket_stats_formatted -e'my %hash; We double the number of items we can reach with 1 HE transition, IMO what all this shows is that we probably want a max load factor of about 4. I would not change the rate at which we grow the hash, at least not Yves |
From @atoomicThanks for your time & feedback, the graphical visualization is definitively a plus. My main concern is if we slow down hashes by reducing the memory usage, this would be a regression from my point of view... I was also considering an alternate solution to change the grow factor only at the beginning for small hashes only. Maybe this should be an option that could be set at compilation time ? (This would not solve the question if we need/want to change the default value.) Any opinion with this ? Of course the definition of "small/big" needs to be define, was thinking something like 1024. I'm also fine leaving it as it's, but I've the feeling that this default setup can be optimized. Thanks again On Sun, 13 Nov 2016 14:08:00 -0800, demerphq wrote:
|
From @demerphqOn 14 November 2016 at 01:58, Atoomic via RT <perlbug-followup@perl.org> wrote:
For sure, but some of your test cases were at very very high load I think a max load factor of 4 is probably fine, but we can and should And we need a test set that is comprehensive. We basically have two read-hit/update So we need a benchmark that builds a largish hash (IMO something like
We need to find the right balance between a size that over-waste with Lets discuss this in person.
Yes for sure. My version of this patch is already structured along those lines.
As others have said a big part of the problem with this discussion is 1. objects, typically small, many repeated keys We have to be careful not to structure this such that we win in one
I definitely think we can and benchmarks willing, should tweak some of this. But I am still reluctant to change the grow factor. If I am correct cheers, |
Migrated from rt.perl.org#130084 (status was 'open')
Searchable as RT130084$
The text was updated successfully, but these errors were encountered: