"for $a..$b -> $i { ... }" loops are sometimes much slower than c-style loops #6150

p6rt · 2017-03-11T15:07:57Z

Migrated from rt.perl.org#130982 (status was 'open')

Searchable as RT130982$

p6rt · 2017-03-11T15:07:57Z

From @mscha

Perl6-style simple a-to-b loops are often much slower than the
corresponding C-style loops, especially when dealing with embedded loops.

My "real life" example (as far as Project Euler is real life) is:
http://pastebin.com/SVYAyA5z
It takes about 55 seconds on my machine with C-style loops, and about 88
seconds with Perl6-style loops. (Comment the "loop" statements and
uncomment the corresponding "for" statements.)

A reduced example is at the bottom of this report. It outputs

111.04240975
95.56877319

on my machine.

I'm using the latest Rakudo Star, 2017.01.

See: https://irclog.perlgeek.de/perl6/2017-03-11#i_14245536 and below.

#!/usr/bin/env perl6

sub perl6-loop($n)
{
my $start = now;
my $sum = 0;
for 1..$n -> $i {
for 1..$i -> $j {
$sum += $i+$j;
}
}
say now - $start;
}

sub c-loop($n)
{
my $start = now;
my $sum = 0;
loop (my $i = 1; $i <= $n; $i++) {
loop (my $j = 1; $j <= $i; $j++) {
$sum += $i+$j;
}
}
say now - $start;
}

sub MAIN($n = 10⁴)
{
perl6-loop($n);
c-loop($n);
}

p6rt · 2017-03-12T05:49:57Z

From @LLFourn

If you think that discrepancy is impressive you're going to love this. I
added a version to your example using native ints:
https://gist.github.com/LLFourn/8c3e895e789fab957355ce23c9420133

bash-3.2$ perl6 native-int-perf.p6
perl6-loop: 84.8739988
c-loop: 67.65849241 (1.25 times faster)
native-loop: 0.4981954 (135.81 times faster)

(!)

On Sun, Mar 12, 2017 at 2:08 AM Michael Schaap <perl6-bugs-followup@perl.org>
wrote:

# New Ticket Created by Michael Schaap
# Please include the string: [perl #130982]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt-archive.perl.org/perl6/Ticket/Display.html?id=130982 >

Perl6-style simple a-to-b loops are often much slower than the
corresponding C-style loops, especially when dealing with embedded loops.

My "real life" example (as far as Project Euler is real life) is:
http://pastebin.com/SVYAyA5z
It takes about 55 seconds on my machine with C-style loops, and about 88
seconds with Perl6-style loops. (Comment the "loop" statements and
uncomment the corresponding "for" statements.)

A reduced example is at the bottom of this report. It outputs

111.04240975
95.56877319

on my machine.

I'm using the latest Rakudo Star, 2017.01.

See: https://irclog.perlgeek.de/perl6/2017-03-11#i_14245536 and below.

#!/usr/bin/env perl6

sub perl6-loop($n)
{
my $start = now;
my $sum = 0;
for 1..$n -> $i {
for 1..$i -> $j {
$sum += $i+$j;
}
}
say now - $start;
}

sub c-loop($n)
{
my $start = now;
my $sum = 0;
loop (my $i = 1; $i <= $n; $i++) {
loop (my $j = 1; $j <= $i; $j++) {
$sum += $i+$j;
}
}
say now - $start;
}

sub MAIN($n = 10⁴)
{
perl6-loop($n);
c-loop($n);
}

p6rt · 2017-03-12T05:49:58Z

The RT System itself - Status changed from 'new' to 'open'

p6rt · 2017-03-12T14:27:37Z

From @geekosaur

On Sun, Mar 12, 2017 at 12:48 AM, Lloyd Fournier <lloyd.fourn@gmail.com>
wrote:

perl6-loop: 84.8739988
c-loop: 67.65849241 (1.25 times faster)
native-loop: 0.4981954 (135.81 times faster)

Still quite a lot of optimization to be done on that front. WRT native int,
one of the issues is needing to track when boxing is/isn't needed,
otherwise nativizing can be a pessimization because of constant reboxing.
The optimizer isn't up to tracking boxing like that yet.

--
brandon s allbery kf8nh sine nomine associates
allbery.b@gmail.com ballbery@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

p6rt · 2017-11-19T02:04:42Z

From @MasterDuke17

On Sun, 12 Mar 2017 07:27:37 -0700, allbery.b@gmail.com wrote:

On Sun, Mar 12, 2017 at 12:48 AM, Lloyd Fournier <lloyd.fourn@gmail.com>
wrote:

perl6-loop: 84.8739988
c-loop: 67.65849241 (1.25 times faster)
native-loop: 0.4981954 (135.81 times faster)

Still quite a lot of optimization to be done on that front. WRT native int,
one of the issues is needing to track when boxing is/isn't needed,
otherwise nativizing can be a pessimization because of constant reboxing.
The optimizer isn't up to tracking boxing like that yet.

FWIW.

$ perl6 native-int-perf.p6
perl6-loop: 117.55178620
c-loop: 156.7308145 (0.75 times faster)
native-loop: 1.2542871 (124.96 times faster)

$ perl6 --version
This is Rakudo version 2017.10-211-g2f0da94c3 built on MoarVM version 2017.10-82-ge876f1484

p6rt · 2017-11-20T03:26:04Z

From @LLFourn

For comparison to march on the same comp:
bash-3.2$ perl6 perf.p6
perl6-loop: 63.0037058
c-loop: 76.86853305 (0.82 times faster)
native-loop: 0.2170930 (354.08 times faster)

perl6 loops are faster. c style loops are slower. Native loops are even
faster relative to the others (for me).

We can probably close this RT. Seeing as the issue in the title has been
addressed kinda :P

LL

On Sun, Nov 19, 2017 at 1:05 PM Daniel Green via RT <
perl6-bugs-followup@perl.org> wrote:

On Sun, 12 Mar 2017 07:27:37 -0700, allbery.b@gmail.com wrote:

On Sun, Mar 12, 2017 at 12:48 AM, Lloyd Fournier <lloyd.fourn@gmail.com>
wrote:

perl6-loop: 84.8739988
c-loop: 67.65849241 (1.25 times faster)
native-loop: 0.4981954 (135.81 times faster)

Still quite a lot of optimization to be done on that front. WRT native
int,
one of the issues is needing to track when boxing is/isn't needed,
otherwise nativizing can be a pessimization because of constant reboxing.
The optimizer isn't up to tracking boxing like that yet.

FWIW.

$ perl6 native-int-perf.p6
perl6-loop: 117.55178620
c-loop: 156.7308145 (0.75 times faster)
native-loop: 1.2542871 (124.96 times faster)

$ perl6 --version
This is Rakudo version 2017.10-211-g2f0da94c3 built on MoarVM version
2017.10-82-ge876f1484

p6rt · 2017-11-20T20:13:47Z

From @ronaldxs

What about a native perl6 range loop? Couldn't there be some way for Perl 6 / Rakudo to generate code competitive on a small range with the "native-loop" example?

perl6 -e '
{
my int ($a, $one, $three) = (42, 1, 3);
for ^10_000_000 { $a += $one + $a%$three };
say now - ENTER now;
say $a
}
{
# 50% slower with loop invariant $limit
my int ($a, $one, $three) = (42, 1, 3);
my int $limit = 10_000_000; # my removing "int" does not effect perf
for ^$limit { $a += $one + $a%$three };
say now - ENTER now;
say $a
}
{
my int ($a, $one, $three, $limit) = (42, 1, 3, 10_000_000);
loop (my int $i = 0; $i < $limit; $i++) { $a += $one + $a%$three };
say now - ENTER now;
say $a
}'

...program prints...

2.10023098 # perl6 loop
15000042
3.2119384 # attempt at native perl 6 loop
15000042
0.7915038 # native loop
15000042

p6rt · 2017-11-22T19:31:10Z

From @timo

On Mon, 20 Nov 2017 12:13:47 -0800, ronaldxs wrote:

What about a native perl6 range loop? Couldn't there be some way for
Perl 6 / Rakudo to generate code competitive on a small range with the
"native-loop" example?

perl6 -e '
{
my int ($a, $one, $three) = (42, 1, 3);
for ^10_000_000 { $a += $one + $a%$three };
say now - ENTER now;
say $a
}'

This code will actually be turned into a `loop` loop by the optimizer at compile time. If you type the block to have a signature of `-> int $_ --> Nil` it'll get faster.

The optimizer doesn't do anything like that for the `^$limit` case, however.

p6rt · 2017-11-23T21:50:29Z

From @lizmat

On 22 Nov 2017, at 19:31, Timo Paulssen via RT <perl6-bugs-followup@perl.org> wrote:
On Mon, 20 Nov 2017 12:13:47 -0800, ronaldxs wrote:

What about a native perl6 range loop? Couldn't there be some way for
Perl 6 / Rakudo to generate code competitive on a small range with the
"native-loop" example?

perl6 -e '
{
my int ($a, $one, $three) = (42, 1, 3);
for ^10_000_000 { $a += $one + $a%$three };
say now - ENTER now;
say $a
}'

This code will actually be turned into a `loop` loop by the optimizer at compile time. If you type the block to have a signature of `-> int $_ --> Nil` it'll get faster.

So, is there a reason we don’t do this on all ^10000 type loops that don’t expect the return value to be used? Because in my benchmarks, this is 3x as fast! With only 25% of the allocations, so a lot less GC churn!

The optimizer doesn't do anything like that for the `^$limit` case, however.

That feels more that it needs a smarter spesh to me?

Liz

p6rt added the perf label Jan 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"for $a..$b -> $i { ... }" loops are sometimes much slower than c-style loops #6150

"for $a..$b -> $i { ... }" loops are sometimes much slower than c-style loops #6150

p6rt commented Mar 11, 2017

p6rt commented Mar 11, 2017

p6rt commented Mar 12, 2017

p6rt commented Mar 12, 2017

p6rt commented Mar 12, 2017

p6rt commented Nov 19, 2017

p6rt commented Nov 20, 2017

p6rt commented Nov 20, 2017

p6rt commented Nov 22, 2017

p6rt commented Nov 23, 2017

"for $a..$b -> $i { ... }" loops are sometimes much slower than c-style loops #6150

"for $a..$b -> $i { ... }" loops are sometimes much slower than c-style loops #6150

Comments

p6rt commented Mar 11, 2017

p6rt commented Mar 11, 2017

From @mscha

p6rt commented Mar 12, 2017

From @LLFourn

p6rt commented Mar 12, 2017

p6rt commented Mar 12, 2017

From @geekosaur

p6rt commented Nov 19, 2017

From @MasterDuke17

p6rt commented Nov 20, 2017

From @LLFourn

p6rt commented Nov 20, 2017

From @ronaldxs

p6rt commented Nov 22, 2017

From @timo

p6rt commented Nov 23, 2017

From @lizmat