Bitmap: Fix thunk explosion
ClosedPublic

Authored by bgamari on Jul 6 2015, 6:56 AM.

Details

Summary

Previously we would build up another map (-N) thunk
for every word in the bitmap. Now we strictly accumulate the position
and carry out a single `map (subtract` accum)``.

Bitmap.intsToBitmap showed up in the profile while compiling a testcase of Trac #7450 (namely a program containing a record type with large number of fields which derived Read). The culprit was CmmBuildInfoTables.procpointSRT.bitmap. On the testcase (with 4096 fields), the profile previously looked like,

	total time  =      307.94 secs   (307943 ticks @ 1000 us, 1 processor)
	total alloc = 336,797,868,056 bytes  (excludes profiling overheads)

COST CENTRE              MODULE              %time %alloc

lintAnnots               CoreLint             17.2   25.8
procpointSRT.bitmap      CmmBuildInfoTables   11.3   25.2
FloatOutwards            SimplCore             7.5    1.6
flatten.lookup           CmmBuildInfoTables    4.0    3.9
...

After this fix it looks like,

	total time  =      256.88 secs   (256876 ticks @ 1000 us, 1 processor)
	total alloc = 255,033,667,448 bytes  (excludes profiling overheads)

COST CENTRE              MODULE              %time %alloc

lintAnnots               CoreLint             20.3   34.1
FloatOutwards            SimplCore             9.1    2.1
flatten.lookup           CmmBuildInfoTables    4.8    5.2
pprNativeCode            AsmCodeGen            3.7    4.3
simplLetUnfolding        Simplify              3.6    2.2
StgCmm                   HscMain               3.6    2.1

Signed-off-by: Ben Gamari <ben@smart-cactus.org>

Test Plan

Validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
bgamari updated this revision to Diff 3422.Jul 6 2015, 6:56 AM
bgamari retitled this revision from to Bitmap: Fix thunk explosion.
bgamari updated this object.
bgamari edited the test plan for this revision. (Show Details)
bgamari added a reviewer: simonpj.
bgamari updated the Trac tickets for this revision.
simonpj edited edge metadata.Jul 6 2015, 7:11 AM

Does this help? Hurrah if so.

Before you are done a serious Note to explain that this is a performance bottleneck would be a Good Thing

bgamari planned changes to this revision.Jul 6 2015, 7:40 AM

@simonpj, sadly no; I had a profile that seemed to suggest an improvement but I can no longer reproduce this result.

bgamari requested a review of this revision.EditedJul 6 2015, 8:05 AM
bgamari updated this object.
bgamari edited the test plan for this revision. (Show Details)
bgamari edited edge metadata.

Alright, never mind. The speed-up stands and I've documented it with a fresh measurement in the summary.

I suspect the cause of the reproducibility failure earlier was a poorly placed SCC that I had inserted.

bgamari edited the test plan for this revision. (Show Details)Jul 6 2015, 8:08 AM

Good! (250 sec vs 307 sec). Do make sure that those perf numbers end up in the Note in the code, though, with an indication of what code triggers the otherwise-pathalogical behaviour

bgamari updated this revision to Diff 3425.Jul 6 2015, 8:26 AM
  • Rework intsToReverseBitmap as well
bgamari updated this object.Jul 6 2015, 8:29 AM

It seems that things add up reasonably well. The 252 MB of allocations observed after the fix is almost exactly 75% of the initial 336 MB, implying that the majority of the allocations from procpointSRT.bitmap are gone. The time improved more than one would expect (from 308 to 257 seconds, whereas one would expect 274 seconds), although I believe this can be attributed to GC time (assuming that the "total time" shown in the profile output isn't just mutator time).

bgamari updated this revision to Diff 3427.Jul 6 2015, 10:50 AM
  • Add Note describing needed strictness
bgamari updated this object.Jul 6 2015, 10:52 AM
bgamari updated this object.
bgamari updated this object.
simonpj accepted this revision.Jul 8 2015, 5:59 PM
simonpj edited edge metadata.

Looks good to me!

This revision is now accepted and ready to land.Jul 8 2015, 5:59 PM

I've done a bit more benchmarking on the impact of this. The result can be found on #9669 comment #13.

This revision was automatically updated to reflect the committed changes.