rts: enable parallel GC scan of large (32M+) allocation area
ClosedPublic

Authored by trofi on Aug 27 2016, 5:47 PM.

Details

Summary

Parallel GC does not scan large allocation area (-A)
effectively as it does not do work stealing from nursery
by default.

That leads to large imbalance when only one of threads
overflows allocation area: most of GC threads finish
quickly (as there is not much to collect) and sit idle
waiting while single GC thread finishes scan of single
allocation area for that thread.

The patch enables work stealing for (equivalent of -qb0)
allocation area of -A32M or higher.

Tested on a highlighting-kate package from Trac Trac #9221

On 8-core machine the difference is around 5% faster
of wall-clock time. On 24-core VM the speedup is 20%.

Signed-off-by: Sergei Trofimovich <siarheit@google.com>

Test Plan

measured wall time and GC parallelism on highlighting-kate build

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
T9221-par-gen0
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 10824
Build 12881: [GHC] Linux/amd64: Patch building
Build 12880: arc lint + arc unit
trofi updated this revision to Diff 8497.Aug 27 2016, 5:47 PM
trofi retitled this revision from to rts: enable parallel GC scan of large (32M+) allocation area.
trofi updated this object.
trofi edited the test plan for this revision. (Show Details)
trofi added a reviewer: simonmar.
trofi updated the Trac tickets for this revision.
trofi updated this revision to Diff 8500.Aug 28 2016, 4:12 AM
trofi edited edge metadata.

Updated commit description about 24-core VM.

trofi updated this object.Aug 28 2016, 4:14 AM
trofi edited edge metadata.
simonmar requested changes to this revision.Aug 29 2016, 2:51 AM
simonmar edited edge metadata.

I wanted to do this slightly differently. With this patch, +RTS -qb1 -A32m will do something unexpected, and different from +RTS -A32m -qb1.

Instead, make parGcLoadBalancingGen == -1 mean "automatic, based on -A", and determine the correct setting in normaliseRtsOpts().

Obviously we also need some changes to the docs too.

This revision now requires changes to proceed.Aug 29 2016, 2:51 AM
trofi updated this revision to Diff 8508.Aug 29 2016, 4:13 AM
trofi edited edge metadata.

Moved defaulting logic to `normaliseRtsOpts()`
as suggested by Simon. Tweaked help output and
user guide to mention `-qb` now default depends
on `-A`.

bgamari accepted this revision.Aug 29 2016, 2:18 PM
bgamari edited edge metadata.

Looks good to me.

docs/users_guide/runtime_control.rst
452

the second default: here is redundant.

trofi updated this revision to Diff 8516.Aug 29 2016, 4:09 PM
trofi edited edge metadata.

Dropped redundant 'default:' from docs.

trofi marked an inline comment as done.Aug 29 2016, 4:11 PM
simonmar accepted this revision.Aug 30 2016, 5:40 AM
simonmar edited edge metadata.

Yep. Thanks!

This revision is now accepted and ready to land.Aug 30 2016, 5:40 AM
This revision was automatically updated to reflect the committed changes.