[PATCH 5.005_61] Fix, Re: mixed benchmarks [Was: Speed of f(), $o->m(), etc. in 03,57,58,61] #569

p5pRT · 1999-09-20T02:30:03Z

Migrated from rt.perl.org#1421 (status was 'resolved')

Searchable as RT1421$

p5pRT · 1999-09-20T02:30:03Z

From The RT System itself

Or, perhaps, just perhaps, it should DWIM.

Here's a patch that

- compares times from sub{...} tests to sub{} instead of to '',

- fixes a bug in the linear interpolation responsible for
pathological looping (usually with very fast code),

- improves the accuracy of the time 0 sample, and

- reduces the number of separate timings that are added together
for the final result when running a time limited test,
reducing the additive errors that might accumulate.

The problem is that subroutine overhead is big compared to fast
snippets of code. This means that it can take a lot longer now
if you are testing, say, sub { $a = 1 ; } and you want 10 CPU
seconds.

It will make existing benchmark scripts crawl if they do use
time limited tests on fast subs. I don't know whether 'fixing'
Benchmark.pm will hurt too many people. If we want to do this,
maybe a deprecation cycle is in order?

Here's the 'test suite' I used in the example runs:

my @tests = (
['sub1', sub{ $a = 10 } ],
['sub2', sub{ $a = 10 ; $a = 10 } ],
['str1', '$a = 10' ],
['str2', '$a = 10 ; $a = 10' ],
);

for ( @tests ) {
push( @$_, @{timethis( $count, $_->[1], $_->[0] )} ) ;
}

Here's the output of _61's Benchmark as run on the above tests:

[barries@jester Benchmark]$ time perl -I /var/perls/perl5.005_61/lib ft -1
perl v5.00503, min of 1 seconds per test
sub1: 2 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @ 492004.95/s (n=496925)
sub2: 2 wallclock secs ( 1.00 usr + 0.00 sys = 1.00 CPU) @ 422276.00/s (n=422276)
str1: -2 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @ 2197790.10/s (n=2219768)
str2: 0 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @ 1139008.91/s (n=1150399)
Rate sub1 sub2 str1 str2
sub1 492004/s 0.00
sub2 422275/s -14.17 0.00
str1 2197790/s 346.70 420.46 0.00
str2 1139008/s 131.50 169.73 -48.17 0.00
13.95user 0.00system 0:16.57elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (252major+80minor)pagefaults 0swaps

Here's the output of the patched version. The patched version takes longer
to run because it has to do more iterations since the time accumulated per
iteration is less. **I don't know why it looks like the sub{} versions
run faster than the str versions: they of course really do not.**

[barries@jester Benchmark]$ time perl ft -1
perl v5.00503, min of 1 seconds per test
sub1: 2 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) @ 3006237.61/s (n=3276799)
sub2: 1 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 1236527.36/s (n=1310719)
str1: 2 wallclock secs ( 1.23 usr + 0.00 sys = 1.23 CPU) @ 2279513.01/s (n=2803801)
str2: 1 wallclock secs ( 1.03 usr + 0.00 sys = 1.03 CPU) @ 1156857.28/s (n=1191563)
Rate sub1 sub2 str1 str2
sub1 3006237/s 0.00
sub2 1236527/s -58.87 0.00
str1 2279513/s -24.17 84.35 0.00
str2 1156857/s -61.52 -6.44 -49.25 0.00
48.92user 0.05system 0:50.58elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (256major+82minor)pagefaults 0swaps

- Barrie

###############################################################

Inline Patch

--- /var/perls/perl5.005_61/lib/Benchmark.pm	Tue Jul 20 13:17:59 1999
+++ Benchmark.pm	Thu Sep 16 23:23:51 1999
@@ -273,7 +273,9 @@

 sub debug { $debug = ($_[1] != 0); }

-sub clearcache    { delete $cache{$_[0]}; }
+# The cache needs two branches: 's' for strings and 'c' for code.  The
+# emtpy loop is different in these two cases.
+sub clearcache    { delete $cache{"$_[0]c"}; delete $cache{"$_[0]s"}; }
 sub clearallcache { %cache = (); }
 sub enablecache   { $cache = 1; }
 sub disablecache  { $cache = 0; }
@@ -362,11 +364,18 @@
     croak "runloop unable to compile '$c': $@\ncode: $subcode\n" if $@;
     print STDERR "runloop $n '$subcode'\n" if $debug;

-    $t0 = Benchmark->new(0);
+    # Wait for the user timer to tick.  This makes the error range more like -0.01, +0.  If
+    # we don't wait, then it's more like -0.01, +0.01.  This may not seem important, but it
+    # significantly reduces the chances of getting too low initial $n in the initial, 'find
+    # the minimum' loop in &runfor.  This, in turn, can reduce the number of calls to
+    # &runloop a lot, and thus reduce additive errors.
+    my $tbase = Benchmark->new(0)->[1];
+    do {
+       $t0 = Benchmark->new(0);
+    } while ( $t0->[1] == $tbase ) ;
     &$subref;
     $t1 = Benchmark->new($n);
     $td = &timediff($t1, $t0);

     timedebug("runloop:",$td);
     $td;
 }
@@ -377,12 +386,12 @@
     my($wn, $wc, $wd);

     printf STDERR "timeit $n $code\n" if $debug;

-    if ($cache && exists $cache{$n}) {
-	$wn = $cache{$n};
+    my $cache_key = $n . ( ref( $code ) ? 'c' : 's' ) ;
+    if ($cache && exists $cache{$cache_key} ) {
+	$wn = $cache{$cache_key};
     } else {
-	$wn = &runloop($n, '');
-	$cache{$n} = $wn;
+	$wn = &runloop($n, ref( $code ) ? sub {} : '' );
+	$cache{$cache_key} = $wn;
     }

     $wc = &runloop($n, $code);
@@ -414,24 +423,23 @@

     my ($n, $td, $tc, $ntot, $rtot, $utot, $stot, $cutot, $cstot );

-    # First find the minimum $n that gives a non-zero timing.
+    # First find the minimum $n that gives a significant timing.
     
     my $nmin;

-    for ($n = 1, $tc = 0; $tc <= 0; $n *= 2 ) {
+    for ($n = 1, $tc = 0; ; $n *= 2 ) {
 	$td = timeit($n, $code);
 	$tc = $td->[1] + $td->[2];
+	last if $tc > 0.1 ;
     }

     $nmin = $n;

     my $ttot = 0;
     my $tpra = 0.05 * $tmax; # Target/time practice.

     # Double $n until we have think we have practiced enough.
-    for ( $n = 1; $ttot < $tpra; $n *= 2 ) {
+    for ( ; $ttot < $tpra; $n *= 2 ) {
 	$td = timeit($n, $code);
-	$tc = $td->cpu_p;
 	$ntot += $n;
 	$rtot += $td->[0];
 	$utot += $td->[1];
@@ -446,7 +454,7 @@
     # Then iterate towards the $tmax.
     while ( $ttot < $tmax ) {
 	$r = $tmax / $ttot - 1; # Linear approximation.
-	$n = int( $r * $n );
+	$n = int( $r * $ntot );
 	$n = $nmin if $n < $nmin;
 	$td = timeit($n, $code);
 	$ntot += $n;

p5pRT · 2003-04-22T15:48:58Z

@iabyn - Status changed from 'stalled' to 'resolved'

p5pRT closed this as completed Apr 22, 2003

p5pRT added Severity Low documentation labels Oct 18, 2019

p5pRT mentioned this issue Oct 19, 2019

Program terminated with signal 11, Segmentation fault. #12469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PATCH 5.005_61] Fix, Re: mixed benchmarks [Was: Speed of f(), $o->m(), etc. in 03,57,58,61] #569

[PATCH 5.005_61] Fix, Re: mixed benchmarks [Was: Speed of f(), $o->m(), etc. in 03,57,58,61] #569

p5pRT commented Sep 20, 1999

p5pRT commented Sep 20, 1999

p5pRT commented Apr 22, 2003

[PATCH 5.005_61] Fix, Re: mixed benchmarks [Was: Speed of f(), $o->m(), etc. in 03,57,58,61] #569

[PATCH 5.005_61] Fix, Re: mixed benchmarks [Was: Speed of f(), $o->m(), etc. in 03,57,58,61] #569

Comments

p5pRT commented Sep 20, 1999

p5pRT commented Sep 20, 1999

From The RT System itself

p5pRT commented Apr 22, 2003