Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH 5.005_61] Fix, Re: mixed benchmarks [Was: Speed of f(), $o->m(), etc. in 03,57,58,61] #569

Closed
p5pRT opened this issue Sep 20, 1999 · 2 comments

Comments

@p5pRT
Copy link

p5pRT commented Sep 20, 1999

Migrated from rt.perl.org#1421 (status was 'resolved')

Searchable as RT1421$

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 1999

From The RT System itself

Or, perhaps, just perhaps, it should DWIM.

Here's a patch that

- compares times from sub{...} tests to sub{} instead of to '',

- fixes a bug in the linear interpolation responsible for
  pathological looping (usually with very fast code),

- improves the accuracy of the time 0 sample, and

- reduces the number of separate timings that are added together
  for the final result when running a time limited test,
  reducing the additive errors that might accumulate.

The problem is that subroutine overhead is big compared to fast
snippets of code. This means that it can take a lot longer now
if you are testing, say, sub { $a = 1 ; } and you want 10 CPU
seconds.

It will make existing benchmark scripts crawl if they do use
time limited tests on fast subs. I don't know whether 'fixing'
Benchmark.pm will hurt too many people. If we want to do this,
maybe a deprecation cycle is in order?

Here's the 'test suite' I used in the example runs​:

my @​tests = (
  ['sub1', sub{ $a = 10 } ],
  ['sub2', sub{ $a = 10 ; $a = 10 } ],
  ['str1', '$a = 10' ],
  ['str2', '$a = 10 ; $a = 10' ],
);

for ( @​tests ) {
  push( @​$_, @​{timethis( $count, $_->[1], $_->[0] )} ) ;
}

Here's the output of _61's Benchmark as run on the above tests​:

[barries@​jester Benchmark]$ time perl -I /var/perls/perl5.005_61/lib ft -1
perl v5.00503, min of 1 seconds per test
  sub1​: 2 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @​ 492004.95/s (n=496925)
  sub2​: 2 wallclock secs ( 1.00 usr + 0.00 sys = 1.00 CPU) @​ 422276.00/s (n=422276)
  str1​: -2 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @​ 2197790.10/s (n=2219768)
  str2​: 0 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @​ 1139008.91/s (n=1150399)
  Rate sub1 sub2 str1 str2
  sub1 492004/s 0.00
  sub2 422275/s -14.17 0.00
  str1 2197790/s 346.70 420.46 0.00
  str2 1139008/s 131.50 169.73 -48.17 0.00
13.95user 0.00system 0​:16.57elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (252major+80minor)pagefaults 0swaps

Here's the output of the patched version. The patched version takes longer
to run because it has to do more iterations since the time accumulated per
iteration is less. **I don't know why it looks like the sub{} versions
run faster than the str versions​: they of course really do not.**

[barries@​jester Benchmark]$ time perl ft -1
perl v5.00503, min of 1 seconds per test
  sub1​: 2 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) @​ 3006237.61/s (n=3276799)
  sub2​: 1 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @​ 1236527.36/s (n=1310719)
  str1​: 2 wallclock secs ( 1.23 usr + 0.00 sys = 1.23 CPU) @​ 2279513.01/s (n=2803801)
  str2​: 1 wallclock secs ( 1.03 usr + 0.00 sys = 1.03 CPU) @​ 1156857.28/s (n=1191563)
  Rate sub1 sub2 str1 str2
  sub1 3006237/s 0.00
  sub2 1236527/s -58.87 0.00
  str1 2279513/s -24.17 84.35 0.00
  str2 1156857/s -61.52 -6.44 -49.25 0.00
48.92user 0.05system 0​:50.58elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (256major+82minor)pagefaults 0swaps

- Barrie

###############################################################

Inline Patch
--- /var/perls/perl5.005_61/lib/Benchmark.pm	Tue Jul 20 13:17:59 1999
+++ Benchmark.pm	Thu Sep 16 23:23:51 1999
@@ -273,7 +273,9 @@

 sub debug { $debug = ($_[1] != 0); }

-sub clearcache    { delete $cache{$_[0]}; }
+# The cache needs two branches: 's' for strings and 'c' for code.  The
+# emtpy loop is different in these two cases.
+sub clearcache    { delete $cache{"$_[0]c"}; delete $cache{"$_[0]s"}; }
 sub clearallcache { %cache = (); }
 sub enablecache   { $cache = 1; }
 sub disablecache  { $cache = 0; }
@@ -362,11 +364,18 @@
     croak "runloop unable to compile '$c': $@\ncode: $subcode\n" if $@;
     print STDERR "runloop $n '$subcode'\n" if $debug;

-    $t0 = Benchmark->new(0);
+    # Wait for the user timer to tick.  This makes the error range more like -0.01, +0.  If
+    # we don't wait, then it's more like -0.01, +0.01.  This may not seem important, but it
+    # significantly reduces the chances of getting too low initial $n in the initial, 'find
+    # the minimum' loop in &runfor.  This, in turn, can reduce the number of calls to
+    # &runloop a lot, and thus reduce additive errors.
+    my $tbase = Benchmark->new(0)->[1];
+    do {
+       $t0 = Benchmark->new(0);
+    } while ( $t0->[1] == $tbase ) ;
     &$subref;
     $t1 = Benchmark->new($n);
     $td = &timediff($t1, $t0);

     timedebug("runloop:",$td);
     $td;
 }
@@ -377,12 +386,12 @@
     my($wn, $wc, $wd);

     printf STDERR "timeit $n $code\n" if $debug;

-    if ($cache && exists $cache{$n}) {
-	$wn = $cache{$n};
+    my $cache_key = $n . ( ref( $code ) ? 'c' : 's' ) ;
+    if ($cache && exists $cache{$cache_key} ) {
+	$wn = $cache{$cache_key};
     } else {
-	$wn = &runloop($n, '');
-	$cache{$n} = $wn;
+	$wn = &runloop($n, ref( $code ) ? sub {} : '' );
+	$cache{$cache_key} = $wn;
     }

     $wc = &runloop($n, $code);
@@ -414,24 +423,23 @@

     my ($n, $td, $tc, $ntot, $rtot, $utot, $stot, $cutot, $cstot );

-    # First find the minimum $n that gives a non-zero timing.
+    # First find the minimum $n that gives a significant timing.
     
     my $nmin;

-    for ($n = 1, $tc = 0; $tc <= 0; $n *= 2 ) {
+    for ($n = 1, $tc = 0; ; $n *= 2 ) {
 	$td = timeit($n, $code);
 	$tc = $td->[1] + $td->[2];
+	last if $tc > 0.1 ;
     }

     $nmin = $n;

     my $ttot = 0;
     my $tpra = 0.05 * $tmax; # Target/time practice.

     # Double $n until we have think we have practiced enough.
-    for ( $n = 1; $ttot < $tpra; $n *= 2 ) {
+    for ( ; $ttot < $tpra; $n *= 2 ) {
 	$td = timeit($n, $code);
-	$tc = $td->cpu_p;
 	$ntot += $n;
 	$rtot += $td->[0];
 	$utot += $td->[1];
@@ -446,7 +454,7 @@
     # Then iterate towards the $tmax.
     while ( $ttot < $tmax ) {
 	$r = $tmax / $ttot - 1; # Linear approximation.
-	$n = int( $r * $n );
+	$n = int( $r * $ntot );
 	$n = $nmin if $n < $nmin;
 	$td = timeit($n, $code);
 	$ntot += $n;

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2003

@iabyn - Status changed from 'stalled' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant