Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The storage strategy for arrays is weird in Rakudo #2677

Closed
p6rt opened this issue Mar 19, 2012 · 6 comments
Closed

The storage strategy for arrays is weird in Rakudo #2677

p6rt opened this issue Mar 19, 2012 · 6 comments
Labels

Comments

@p6rt
Copy link

p6rt commented Mar 19, 2012

Migrated from rt.perl.org#111848 (status was 'resolved')

Searchable as RT111848$

@p6rt
Copy link
Author

p6rt commented Mar 19, 2012

From @masak

<masak> r​: my $t = now; my @​a; @​a[ $_ ] = $_ for 1 .. 1e4; print now - $t
<p6eval> rakudo 1968b8​: OUTPUT«2.55866416794898»
<masak> r​: my $t = now; my %h; %h{ $_ } = $_ for 1 .. 1e4; print now - $t
<p6eval> rakudo 1968b8​: OUTPUT«1.51738952909994»
<masak> in Rakudo, hashes are faster than arrays :)
<masak> in re http://perlmonks.org/?node_id=960334
<moritz> masak​: yes, it pays the price of laziness
<moritz> (it = Arrays)
<masak> moritz​: even when, as here, I'm not using that laziness at all.
<moritz> masak​: correct
<masak> what is the additional cost here, more exactly?
<moritz> well, for each array access we need to check if the number of
reified elements is big enough
<moritz> though that should be rather cheap-ish
<moritz> I kind think we're doing something very wrong when it comes
to arrays and lists
<moritz> nom​: my @​a = 1, 2, 3; say @​a.DUMP
<p6eval> rakudo 1968b8​: OUTPUT«Array<-4100004978996074906>(​:items(Mu),
:nextiter(ListIter<-4100004978996073111>(​:reified(▶Mu),
:rest(RPA<-4100004978996040334>(Array<-4100004978996076959>(​:items(RPA<-4100004978996035975>(▶1,
▶2, ▶3)), :nextiter(▶Mu)))), :list(Array<-4100004978996074906>))))␤»…
<moritz> there you see what's wrong
<moritz> it's not the outer-most array that holds all the reified elements
<masak> right.
<moritz> they are stored in two layers of nextiter
<masak> I almost feel tempted to submit a rakudobug about that.
<moritz> please do
* masak submits rakudobug

Submitting bug reports about suboptimality is risky business. So
here's something concrete and actionable​: remove the double nextiter
cruft from the above .DUMP output. Bonus points -- but not required to
close this ticket -- if you can get the benchmark above to run faster
too. Ideally, arrays should be faster than hashes.

@p6rt
Copy link
Author

p6rt commented Mar 19, 2012

From @pmichaud

The speed of Array vs. Hash element access is partially addressed by
commit c10792f8. Before this commit, each Array element access via
.at_pos($pos) resulted in an expensive call to self.exists($pos) for
each access, whereas Hash element accesses via .at_key($key) were
able to use nqp​::existskey() directly. This patch uses nqp​::existspos()
to hotpath the detection of existing elements and avoids calls to
self.exists($pos) when the Array is already fully reified.

For the benchmark given in RT #​111848, this resulted in a ~25%
speedup for array element accesses, and brings it to within 5% of Hash
element access times. (At present Array element accesses still have
more overhead at the PIR level than Hash element accesses due to
laziness considerations and boundary checks.

I'm still looking into the "two layers of nextiter" part; I agree
it's somewhat surprising but I'm not convinced it's suboptimal
within the current context of how we handle/store constant lists.
For the array assignment Rakudo has to do iteration at some point in
order to bring the elements into the "top-level" items part; it's
simply choosing to do it at the time of Array element access instead
of at the point of the assignment.

Given that the ticket refers to "suboptimality", it's hard to know
when it should be closed. If masak++ or someone else feels that
this reply adequately addresses the issues for now, I'm fine with
it being resolved now, or if we want to leave it open pending further
discussion/resolution of the "two nextiter layer" issue, I'm fine
with that also.

Pm

@p6rt
Copy link
Author

p6rt commented Mar 19, 2012

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Feb 10, 2013

From @nwc10

On Mon Mar 19 11​:34​:12 2012, pmichaud wrote​:

The speed of Array vs. Hash element access is partially addressed by
commit c10792f8. Before this commit, each Array element access via
.at_pos($pos) resulted in an expensive call to self.exists($pos) for
each access, whereas Hash element accesses via .at_key($key) were
able to use nqp​::existskey() directly. This patch uses nqp​::existspos()
to hotpath the detection of existing elements and avoids calls to
self.exists($pos) when the Array is already fully reified.

For the benchmark given in RT #​111848, this resulted in a ~25%
speedup for array element accesses, and brings it to within 5% of Hash
element access times. (At present Array element accesses still have
more overhead at the PIR level than Hash element accesses due to
laziness considerations and boundary checks.

Timings remain roughly consistent today (at a3869a037bb10552)​:

$ ./perl6 -e 'my $t = now; my @​a; @​a[ $_ ] = $_ for 1 .. 1e4; say now - $t'
1.3921895
$ ./perl6 -e 'my $t = now; my %h; %h{ $_ } = $_ for 1 .. 1e4; say now - $t'
1.30200286

with hashes still faster.

Nicholas Clark

@p6rt
Copy link
Author

p6rt commented Feb 17, 2015

From @coke

On Sun Feb 10 02​:05​:54 2013, nicholas wrote​:

On Mon Mar 19 11​:34​:12 2012, pmichaud wrote​:

The speed of Array vs. Hash element access is partially addressed by
commit c10792f8. Before this commit, each Array element access via
.at_pos($pos) resulted in an expensive call to self.exists($pos) for
each access, whereas Hash element accesses via .at_key($key) were
able to use nqp​::existskey() directly. This patch uses nqp​::existspos()
to hotpath the detection of existing elements and avoids calls to
self.exists($pos) when the Array is already fully reified.

For the benchmark given in RT #​111848, this resulted in a ~25%
speedup for array element accesses, and brings it to within 5% of Hash
element access times. (At present Array element accesses still have
more overhead at the PIR level than Hash element accesses due to
laziness considerations and boundary checks.

Timings remain roughly consistent today (at a3869a037bb10552)​:

$ ./perl6 -e 'my $t = now; my @​a; @​a[ $_ ] = $_ for 1 .. 1e4; say now - $t'
1.3921895
$ ./perl6 -e 'my $t = now; my %h; %h{ $_ } = $_ for 1 .. 1e4; say now - $t'
1.30200286

with hashes still faster.

Nicholas Clark

Ran this today with rakudo-m​:

$ ./perl6 -e 'my $t = now; my @​a; @​a[ $_ ] = $_ for 1 .. 1e4; say now - $t'
0.0565928
$ ./perl6 -e 'my $t = now; my %h; %h{ $_ } = $_ for 1 .. 1e4; say now - $t'
0.06960988

Not only are they both much faster, arrays are now faster than hashes.

Closing ticket.

--
Will "Coke" Coleda

@p6rt p6rt closed this as completed Feb 17, 2015
@p6rt
Copy link
Author

p6rt commented Feb 17, 2015

@coke - Status changed from 'open' to 'resolved'

@p6rt p6rt added the Bug label Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant