New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelizing code unexpectedly makes it slower #5712
Comments
From steve.piner@gmail.comThis could be a stupid user problem, in which case I apologise for wasting Simple concurrency demonstrations seem to work; the following completes in perl6 -e 'await Promise.allof(start {sleep 1}, start {sleep 1}, start However if the started tasks are doing any CPU-intensive work, they seem To reproduce the issue, please run this script. use v6; my $r = 2_000_000; say "Counting to $r"; my ($start, $setup, $stop); say "Non-promise iteration: {$stop - $start}"; my @promises; say "One iteration: {$stop - $start} (setup: {$setup - $start})"; @promises = (); $start = now; say "16 iterations: {$stop - $start} (setup: {$setup - $start})"; What I expected: One iteration would take roughly the same amount of time as the 16 iterations would take, at worst, about 16 times as long as the line What happens: On a Windows 10 PC, with 6 cores (12 logical processors) C:\Users\Steve>perl6 concurrency-wtf.pl On an Ubuntu 14.04.5 LTS PC, with 2 cores steve@prole:~/$ perl6 concurrency-wtf.pl Also the CPU usage leaps 100% (all CPUs, all at 100%) while running, on Perl 6 version C:\Users\Steve>perl6 -v steve@prole:~/$ perl6 -v |
From steve.piner+bitcard@gmail.comReplicated on Linux with 2016.09 steve@prole:~/$ perl6 --version On Sat Oct 01 06:15:18 2016, steve.piner@gmail.com wrote:
|
From @zoffixznetSeems the issue has more to do with running an empty loop, rather than This is a run on a 4-core box. Attempting to parallelize an empty loop makes my &code = { for ^2_000_000 { } }; my $start = now; (^4).map: &code; my $stop = now; $start = now; await (^4).map: {start code}; $stop = now; # Serial: 1.5161628 But running actual real-life code makes it almost 4 times faster, as use Crypt::Bcrypt; my $start = now; (^4).map: &code; my $stop = now; $start = now; await (^4).map: {start code}; $stop = now; # Serial: 7.69045687 On Sat Oct 01 06:15:18 2016, steve.piner@gmail.com wrote:
|
The RT System itself - Status changed from 'new' to 'open' |
From steve.piner@gmail.comBut Crypt::Bcrypt seems to be mostly native call stuff. While it is The original case I noticed the problem in was a recursive function; all Try this: it's the fibonacci function. The first section I call it 4 times, without using any concurrency. The second section I call it 4 times in a single worker. The third section I start 4 workers, each calling it once. I would expect the first and second sections to have about the same run The times below are for a 2-core Ubuntu 14.04.5 PC, running Rakudo version use v6; sub fib($n) { constant FIB = 30; my ($start, $stop); $start = now; $start = now; # No thread: 13.55591338 On Mon, 03 Oct 2016 17:33:59 +1300, Zoffix Znet via RT
|
From @toolforgerAm 03.10.2016 um 06:34 schrieb Zoffix Znet via RT:
(Disclaimer: I have no ideas of the internals, but I know a bit about This might be four cores competing to get update access to the loop counter. |
From steve.piner@gmail.comOn Tue, 04 Oct 2016 04:23:54 +1300, Joachim Durchholz via RT
For what it's worth, I've tried this with 4 separate functions, one per It doesn't seem to make a noticeable difference to performance. The comment below is the output of the following program, compared to the # No thread: 12.6327946 (vs 13.0872902) use v6; sub fib1($n) { sub fib2($n) { sub fib3($n) { sub fib4($n) { my @fibs = &fib1, &fib2, &fib3, &fib4; constant FIB = 30; my ($start, $stop); $start = now; $start = now; $start = now; $start = now; |
From @smlsI ran the Fibonacci benchmark from Steve Piner's preceding comment on a CPU with 4 cores (AMD Phenom II X4 965). ➜ MVM_SPESH_DISABLE=1 perl6 fibonacci.p6 But with spesh, the no-thread case gets a lot faster whereas the parallel case actually gets a bit slower: ➜ perl6 fibonacci.p6 timotimo ran it on a different CPU (also 4 cores), and did *not* get such a significant slow-down for parallel vs no-thread, so it may be CPU/platform-dependent: No thread: 10.4069735 He also had some thoughts on the matter (starting at https://irclog.perlgeek.de/perl6/2016-10-12#i_13383443): <timotimo> allocating frames (i.e. calling functions) currently contends ...and did some profiling with perf: <timotimo> the whole program spends 34% of its time inside |
From @toolforgerAm 11.10.2016 um 13:33 schrieb Steve Piner:
I thought that the 4 processes would be trying to update the for Four different functions should not make a difference, unless Perl is I don't know enough about Perl to decide what the single-threaded code Sorry I can't help more, I'm just learning :-) Regards, |
From @timowith jnthn's recent patch to give every thread its own free-list for the FSA, the behavior is now much nicer. Here's a paste of a 40-core machine: https://gist.github.com/anonymous/650ff6b00a0f8dc34b9e358992e572b4 here's a 24-core box (some google compute cloud machine zoffix rents) before the patch: https://gist.github.com/zoffixznet/4e0fd3e352fbf9115d63186d5dc47340 after the patch: https://gist.github.com/zoffixznet/783f0293d65300419549321f24e5ea6a actual code used: https://gist.github.com/timo/a3a405f977840e8336f50234715e9cd4 i'm closing this bug. |
@timo - Status changed from 'open' to 'resolved' |
From steve.piner+bitcard@gmail.comYes, I retested with the initial scripts that lead me to raise this issue in the first place. The multi-threaded versions are 40% faster that the single threaded versions on a dual core machine. There is also no apparent penalty for simply having threads as there was before. So yeah, I agree this is fixed. |
Migrated from rt.perl.org#129779 (status was 'resolved')
Searchable as RT129779$
The text was updated successfully, but these errors were encountered: