Skip Menu |
Report information
Id: 1421
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors:
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: docs
Perl Version: (no value)
Fixed In: (no value)



Download (untitled) / with headers
text/plain 5.9k
Or, perhaps, just perhaps, it should DWIM. Here's a patch that - compares times from sub{...} tests to sub{} instead of to '', - fixes a bug in the linear interpolation responsible for pathological looping (usually with very fast code), - improves the accuracy of the time 0 sample, and - reduces the number of separate timings that are added together for the final result when running a time limited test, reducing the additive errors that might accumulate. The problem is that subroutine overhead is big compared to fast snippets of code. This means that it can take a lot longer now if you are testing, say, sub { $a = 1 ; } and you want 10 CPU seconds. It will make existing benchmark scripts crawl if they do use time limited tests on fast subs. I don't know whether 'fixing' Benchmark.pm will hurt too many people. If we want to do this, maybe a deprecation cycle is in order? Here's the 'test suite' I used in the example runs: my @tests = ( ['sub1', sub{ $a = 10 } ], ['sub2', sub{ $a = 10 ; $a = 10 } ], ['str1', '$a = 10' ], ['str2', '$a = 10 ; $a = 10' ], ); for ( @tests ) { push( @$_, @{timethis( $count, $_->[1], $_->[0] )} ) ; } Here's the output of _61's Benchmark as run on the above tests: [barries@jester Benchmark]$ time perl -I /var/perls/perl5.005_61/lib ft -1 perl v5.00503, min of 1 seconds per test sub1: 2 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @ 492004.95/s (n=496925) sub2: 2 wallclock secs ( 1.00 usr + 0.00 sys = 1.00 CPU) @ 422276.00/s (n=422276) str1: -2 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @ 2197790.10/s (n=2219768) str2: 0 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @ 1139008.91/s (n=1150399) Rate sub1 sub2 str1 str2 sub1 492004/s 0.00 sub2 422275/s -14.17 0.00 str1 2197790/s 346.70 420.46 0.00 str2 1139008/s 131.50 169.73 -48.17 0.00 13.95user 0.00system 0:16.57elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (252major+80minor)pagefaults 0swaps Here's the output of the patched version. The patched version takes longer to run because it has to do more iterations since the time accumulated per iteration is less. **I don't know why it looks like the sub{} versions run faster than the str versions: they of course really do not.** [barries@jester Benchmark]$ time perl ft -1 perl v5.00503, min of 1 seconds per test sub1: 2 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) @ 3006237.61/s (n=3276799) sub2: 1 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 1236527.36/s (n=1310719) str1: 2 wallclock secs ( 1.23 usr + 0.00 sys = 1.23 CPU) @ 2279513.01/s (n=2803801) str2: 1 wallclock secs ( 1.03 usr + 0.00 sys = 1.03 CPU) @ 1156857.28/s (n=1191563) Rate sub1 sub2 str1 str2 sub1 3006237/s 0.00 sub2 1236527/s -58.87 0.00 str1 2279513/s -24.17 84.35 0.00 str2 1156857/s -61.52 -6.44 -49.25 0.00 48.92user 0.05system 0:50.58elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (256major+82minor)pagefaults 0swaps - Barrie ############################################################### --- /var/perls/perl5.005_61/lib/Benchmark.pm Tue Jul 20 13:17:59 1999 +++ Benchmark.pm Thu Sep 16 23:23:51 1999 @@ -273,7 +273,9 @@ sub debug { $debug = ($_[1] != 0); } -sub clearcache { delete $cache{$_[0]}; } +# The cache needs two branches: 's' for strings and 'c' for code. The +# emtpy loop is different in these two cases. +sub clearcache { delete $cache{"$_[0]c"}; delete $cache{"$_[0]s"}; } sub clearallcache { %cache = (); } sub enablecache { $cache = 1; } sub disablecache { $cache = 0; } @@ -362,11 +364,18 @@ croak "runloop unable to compile '$c': $@\ncode: $subcode\n" if $@; print STDERR "runloop $n '$subcode'\n" if $debug; - $t0 = Benchmark->new(0); + # Wait for the user timer to tick. This makes the error range more like -0.01, +0. If + # we don't wait, then it's more like -0.01, +0.01. This may not seem important, but it + # significantly reduces the chances of getting too low initial $n in the initial, 'find + # the minimum' loop in &runfor. This, in turn, can reduce the number of calls to + # &runloop a lot, and thus reduce additive errors. + my $tbase = Benchmark->new(0)->[1]; + do { + $t0 = Benchmark->new(0); + } while ( $t0->[1] == $tbase ) ; &$subref; $t1 = Benchmark->new($n); $td = &timediff($t1, $t0); timedebug("runloop:",$td); $td; } @@ -377,12 +386,12 @@ my($wn, $wc, $wd); printf STDERR "timeit $n $code\n" if $debug; - if ($cache && exists $cache{$n}) { - $wn = $cache{$n}; + my $cache_key = $n . ( ref( $code ) ? 'c' : 's' ) ; + if ($cache && exists $cache{$cache_key} ) { + $wn = $cache{$cache_key}; } else { - $wn = &runloop($n, ''); - $cache{$n} = $wn; + $wn = &runloop($n, ref( $code ) ? sub {} : '' ); + $cache{$cache_key} = $wn; } $wc = &runloop($n, $code); @@ -414,24 +423,23 @@ my ($n, $td, $tc, $ntot, $rtot, $utot, $stot, $cutot, $cstot ); - # First find the minimum $n that gives a non-zero timing. + # First find the minimum $n that gives a significant timing. my $nmin; - for ($n = 1, $tc = 0; $tc <= 0; $n *= 2 ) { + for ($n = 1, $tc = 0; ; $n *= 2 ) { $td = timeit($n, $code); $tc = $td->[1] + $td->[2]; + last if $tc > 0.1 ; } $nmin = $n; my $ttot = 0; my $tpra = 0.05 * $tmax; # Target/time practice. # Double $n until we have think we have practiced enough. - for ( $n = 1; $ttot < $tpra; $n *= 2 ) { + for ( ; $ttot < $tpra; $n *= 2 ) { $td = timeit($n, $code); - $tc = $td->cpu_p; $ntot += $n; $rtot += $td->[0]; $utot += $td->[1]; @@ -446,7 +454,7 @@ # Then iterate towards the $tmax. while ( $ttot < $tmax ) { $r = $tmax / $ttot - 1; # Linear approximation. - $n = int( $r * $n ); + $n = int( $r * $ntot ); $n = $nmin if $n < $nmin; $td = timeit($n, $code); $ntot += $n;


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org