Stabilise benchmarks wrt. GC
AbandonedPublic

Authored by sgraf on Dec 11 2018, 9:53 AM.

Details

Summary

This is currently a work-in-progress of Trac #15999, a follow-up on Trac #5793 and Trac #15357.

As this patch will change some benchmarks (i.e. wheel-sieve1, awards)
rather drastically, I wanted to get some early feedback on this, rather than
quietly investing hours of work when this patch would never have had a chance
to be accepted to begin with.

The general plan is outlined in Trac #15999: Identify GC-sensitive benchmarks by looking at how productivity rates change over different nursery sizes and iterate main of these benchmarks often enough for the wibbles to go away.

I was paying attention that the benchmarked logic is actually run $n times more often.

When I found benchmarks with insignificant runtime (Trac #15357), I made sure that parameters/input files were adjusted so that runtime of the different modes falls within the ranges described in https://ghc.haskell.org/trac/ghc/ticket/15357#comment:4.

This is what I did so far:

  • Stabilise gen_regexp
  • Stabilise primes
  • Stabilise wheel-sieve1
  • Stabilise wheel-sieve2
  • Adjust running time of x2n1
  • Adjust running time of ansi
  • Adjust running time of atom
  • Make awards benchmark something other than IO
  • Adjust running time of banner
  • Stabilise boyer
  • Adjust running time of boyer2
  • Adjust running time of queens
  • Adjust running time of calendar
  • Adjust runtime of cichelli
  • Stabilise circsim
  • Stabilise clausify
  • Stabilise constraints with moderate success
  • Adjust running time of cryptarithm1
  • Adjust running time of cryptarythm2
  • Adjust running time of cse
  • Adjust running time of eliza
  • Adjust running time of exact-reals
  • Adjust running time of expert
  • Stabilise fft2
  • Stabilise fibheaps
  • Stabilise fish
  • Adjust running time for gcd
  • Stabilise comp_lab_zift
  • Stabilise event
  • Stabilise fft
  • Stabilise genfft
  • Stabilise ida
  • Adjust running time for listcompr
  • Adjust running time for listcopy
  • Adjust running time of nucleic2
  • Attempt to stabilise parstof
  • Stabilise sched
  • Stabilise solid
  • Adjust running time of transform
  • Adjust running time of typecheck
  • Stabilise wang
  • Stabilise wave4main
  • Adjust running time of integer
  • Adjust running time of knights
  • Stabilise lambda
  • Stabilise lcss
  • Stabilise life
  • Stabilise mandel
  • Stabilise mandel2
  • Adjust running time of mate
  • Stabilise minimax
  • Adjust running time of multiplier
  • Adjust running time of para
  • Stabilise power
  • Adjust running time of primetest
  • Stabilise puzzle with mild success
  • Adjust running time for rewrite
  • Stabilise simple with mild success
  • Stabilise sorting
  • Stabilise sphere
  • Stabilise treejoin

Problematic benchmarks:

  • last-piece: Unclear how to stabilise. Runs for 300ms and I can't make up smaller inputs because I don't understand what it does.
  • pretty: It's just much too small to be relevant at all. Maybe we want to get rid of this one?
  • scc: Same as pretty. The input graph for which SCC analysis is done is much too small and I can't find good directed example graphs on the internet.
  • secretary: Apparently this needs -package random and consequently hasn't been run for a long time.
  • simple: Same as last-piece. Decent runtime (70ms), but it's unstable and I see no way to iterate it ~100 times in fast mode.
  • eff: Every benchmark is problematic here. Not from the point of view of allocations, but because the actual logic is vacuous. IMO, these should be performance tests, not actual benchmarks. Alternatively, write an actual application that makes use of algebraic effects.
  • maillist: Too trivial. It's just String/list manipulation, not representative of any Haskell code we would write today (no use of base library functions which could be fused, uses String instead of Text). It's only 75 loc according to cloc, that's not a real application.

Diff Detail

Repository
rNOFIB nofib
Branch
stabilise
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25437
Build 64551: arc lint + arc unit
sgraf created this revision.Dec 11 2018, 9:53 AM
Owners added a reviewer: Restricted Owners Package.Dec 11 2018, 9:53 AM
sgraf updated this revision to Diff 19121.Dec 12 2018, 10:44 AM
  • Adjust running time of calendar
  • Adjust runtime of cichelli
  • Stabilise circsim
  • Stabilise clausify
  • Stabilise constraints with moderate success
  • Adjust running time of cryptarithm1
  • Adjust running time of cryptarythm2
  • Adjust running time of cse
  • Adjust running time of eliza
  • Adjust running time of exact-reals
  • Adjust running time of expert
  • Stabilise fft2
sgraf edited the summary of this revision. (Show Details)Dec 12 2018, 10:44 AM
sgraf edited the summary of this revision. (Show Details)

One thing to watch out for is full-laziness floating out the actual work and then sharing it between iterations. (I've seen that in one of the nofib benchmarks when I adjusted the runtimes).

Assuming you only stabilize very gc heavy benchmarks just disabling full-laziness for these should be fine if you run into that.
But looks like you deal with that by always making use of the iteration variable which looks like a better solution and which should also work.

sgraf added a comment.Dec 13 2018, 4:04 AM

One thing to watch out for is full-laziness floating out the actual work and then sharing it between iterations. (I've seen that in one of the nofib benchmarks when I adjusted the runtimes).

Assuming you only stabilize very gc heavy benchmarks just disabling full-laziness for these should be fine if you run into that.
But looks like you deal with that by always making use of the iteration variable which looks like a better solution and which should also work.

Yes, exactly. This makes the whole process somewhat tedious. I also fixed benchmarks like https://github.com/ghc/nofib/blob/f87d446b4e361cc82f219cf78917db9681af69b3/spectral/awards/Main.hs#L68, which don't measure anything other than IO, on the way.

sgraf updated this revision to Diff 19151.Dec 14 2018, 7:29 AM
sgraf edited the summary of this revision. (Show Details)
  • Stabilise fibheaps
  • Stabilise fish
  • Adjust running time for gcd
  • Stabilise comp_lab_zift
  • Stabilise event
  • Stabilise fft
  • Stabilise genfft
  • Stabilise ida
  • Adjust running time for listcompr
  • Adjust running time for listcopy
  • Adjust running time of nucleic2
  • Attempt to stabilise parstof
  • Stabilise sched
  • Stabilise solid
  • Adjust running time of transform
  • Adjust running time of typecheck
  • Stabilise wang
  • Stabilise wave4main
  • Adjust running time of integer
  • Adjust running time of knights
  • Stabilise lambda
  • Stabilise lcss
  • Stabilise life
  • Stabilise mandel
  • Stabilise mandel2
  • Adjust running time of mate
  • Stabilise minimax
  • Adjust running time of multiplier
  • Adjust running time of para
  • Stabilise power
  • Adjust running time of primetest
  • Stabilise puzzle with mild success
  • Adjust running time for rewrite
  • Stabilise simple with mild success
  • Stabilise sorting
  • Stabilise sphere
  • Stabilise treejoin
sgraf edited the summary of this revision. (Show Details)Dec 14 2018, 7:30 AM
sgraf edited the summary of this revision. (Show Details)Dec 14 2018, 7:36 AM
sgraf retitled this revision from Stabilise GC wibbly benchmarks to Stabilise benchmarks wrt. GC.Dec 21 2018, 3:59 AM
sgraf edited the summary of this revision. (Show Details)
sgraf added a comment.Dec 21 2018, 4:02 AM

I'm finally done with this. Hooray! But upon uploading my final set of changes, it seems that Phabricator can't handle creating the new diff, probably because it's too large.

>>> [80] (+73,914) <http> https://phabricator.haskell.org/api/differential.creatediff
<<< [80] (+100,505) <http> 26,590,234 us

[2018-12-21 09:54:12] EXCEPTION: (HTTPFutureHTTPResponseStatus) [HTTP/502]
<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.9.15</center>
</body>
</html> at [<phutil>/src/future/http/BaseHTTPFuture.php:351]
arcanist(), phutil()
  #0 BaseHTTPFuture::parseRawHTTPResponse(string) called at [<phutil>/src/future/http/HTTPSFuture.php:418]
  #1 HTTPSFuture::isReady() called at [<phutil>/src/future/Future.php:37]
  #2 Future::resolve(NULL) called at [<phutil>/src/future/FutureProxy.php:34]
  #3 FutureProxy::resolve() called at [<phutil>/src/conduit/ConduitClient.php:64]
  #4 ConduitClient::callMethodSynchronous(string, array) called at [<arcanist>/src/workflow/ArcanistDiffWorkflow.php:519]
  #5 ArcanistDiffWorkflow::run() called at [<arcanist>/scripts/arcanist.php:394]
sgraf edited the summary of this revision. (Show Details)Dec 21 2018, 4:15 AM
sgraf edited the summary of this revision. (Show Details)
sgraf added a comment.Dec 21 2018, 4:24 AM
In D5438#150861, @sgraf wrote:

I'm finally done with this. Hooray! But upon uploading my final set of changes, it seems that Phabricator can't handle creating the new diff, probably because it's too large.

>>> [80] (+73,914) <http> https://phabricator.haskell.org/api/differential.creatediff
 <<< [80] (+100,505) <http> 26,590,234 us
 
 [2018-12-21 09:54:12] EXCEPTION: (HTTPFutureHTTPResponseStatus) [HTTP/502]
 <html>
 <head><title>502 Bad Gateway</title></head>
 <body bgcolor="white">
 <center><h1>502 Bad Gateway</h1></center>
 <hr><center>nginx/1.9.15</center>
 </body>
 </html> at [<phutil>/src/future/http/BaseHTTPFuture.php:351]
 arcanist(), phutil()
   #0 BaseHTTPFuture::parseRawHTTPResponse(string) called at [<phutil>/src/future/http/HTTPSFuture.php:418]
   #1 HTTPSFuture::isReady() called at [<phutil>/src/future/Future.php:37]
   #2 Future::resolve(NULL) called at [<phutil>/src/future/FutureProxy.php:34]
   #3 FutureProxy::resolve() called at [<phutil>/src/conduit/ConduitClient.php:64]
   #4 ConduitClient::callMethodSynchronous(string, array) called at [<arcanist>/src/workflow/ArcanistDiffWorkflow.php:519]
   #5 ArcanistDiffWorkflow::run() called at [<arcanist>/scripts/arcanist.php:394]

Added a separate diff for part 2 in D5468.

sgraf updated this revision to Diff 19213.Dec 21 2018, 5:43 AM
sgraf edited the summary of this revision. (Show Details)
  • Stabilise paraffins even more
sgraf updated this revision to Diff 19222.Dec 21 2018, 10:03 AM
  • Fix stdout files and RTS flags of paraffins