WIP: Register Allocator Loop Annotations
Changes PlannedPublic

Authored by tjakway on Dec 28 2016, 4:28 PM.

Details

Reviewers
austin
bgamari
goldfire
Trac Issues
#13763
Summary

Make the register allocator aware of loops

Test Plan

Unit tests in regalloc_unit_tests.hs (WIP)

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
tjakway/graph-loop-hoisting-counting
Lint
Lint WarningsExcuse: WIP
SeverityLocationCodeMessage
Warningcompiler/nativeGen/AsmCodeGen.hs-boot:17TXT3Line Too Long
Warningcompiler/nativeGen/AsmCodeGen.hs-boot:22TXT3Line Too Long
Warningcompiler/nativeGen/AsmCodeGen.hs-boot:23TXT3Line Too Long
Warningcompiler/nativeGen/AsmCodeGen.hs-boot:24TXT3Line Too Long
Warningcompiler/nativeGen/AsmCodeGen.hs-boot:25TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/Main.hs:100TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/Main.hs:137TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/Main.hs:285TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/SpillCost.hs:71TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/SpillCost.hs:72TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/SpillCost.hs:129TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/SpillCost.hs:130TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/SpillCost.hs:146TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/SpillCost.hs:147TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/SpillCost.hs:148TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/SpillCost.hs:149TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/SpillCost.hs:150TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/SpillCost.hs:151TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/Graph/SpillCost.hs:253TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/LoopAnnotations.hs:155TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/LoopAnnotations.hs:156TXT3Line Too Long
Warningcompiler/nativeGen/RegAlloc/LoopAnnotations.hs:174TXT3Line Too Long
Warningtestsuite/tests/regalloc/regalloc_unit_tests.hs:175TXT3Line Too Long
Warningtestsuite/tests/regalloc/regalloc_unit_tests.hs:179TXT3Line Too Long
Warningtestsuite/tests/regalloc/regalloc_unit_tests.hs:180TXT3Line Too Long
Unit
No Unit Test Coverage
Build Status
Buildable 13095
Build 18030: [GHC] Linux/amd64: Patch building
Build 18029: [GHC] OSX/amd64: Continuous Integration
Build 18028: [GHC] Windows/amd64: Continuous Integration
Build 18027: arc lint + arc unit
There are a very large number of changes, so older changes are hidden. Show Older Changes
tjakway updated this revision to Diff 10680.Jan 24 2017, 4:37 PM
tjakway edited edge metadata.

Integrating loop annotations with the linear register allocator plus general improvements.

tjakway updated this revision to Diff 10687.Jan 24 2017, 8:17 PM
tjakway edited edge metadata.

Added more comments

tjakway updated this revision to Diff 10707.Jan 25 2017, 4:16 PM
tjakway edited edge metadata.

fixed filtering for registers we can spill in chooseSpillSlowly

tjakway updated this revision to Diff 10721.Jan 26 2017, 11:26 AM
tjakway edited edge metadata.

choosing which register to spill using minimumBy instead of sortOn

tjakway updated this revision to Diff 10726.Jan 26 2017, 12:57 PM
tjakway edited edge metadata.

instead of dying on error now returning the first element if we dont have spill costs

tjakway updated this revision to Diff 10775.Jan 28 2017, 10:40 AM
tjakway edited edge metadata.

Now collecting loop annotations and spill costs in Linear.RegAllocStats so they can be used for unit tests

tjakway updated this revision to Diff 10918.Feb 3 2017, 3:39 PM
tjakway edited edge metadata.
  • fixing line length warnings
tjakway updated this revision to Diff 10942.Feb 4 2017, 12:57 PM
tjakway edited edge metadata.
  • refactored NcgImpl into its own module to eliminate AsmCodeGen.hs-boot
tjakway updated this revision to Diff 10947.Feb 4 2017, 7:13 PM
tjakway edited edge metadata.
  • updated todos

I saw you message @tjakway; I'll have a look at this tomorrow.

tjakway updated this revision to Diff 11002.Feb 7 2017, 8:28 AM
  • removed unused imports
  • applying lint suggestions
tjakway updated this revision to Diff 11007.Feb 7 2017, 2:13 PM
  • updated todos
tjakway updated this revision to Diff 11008.Feb 7 2017, 2:44 PM
  • fixed bug in pickSpill
  • fixed warnings
tjakway updated this revision to Diff 11019.Feb 7 2017, 8:42 PM
  • fixed comment

This is looking like a great start. However, a few things,

I'll happily do a more in-depth substantive review once the note is in place.

compiler/nativeGen/RegAlloc/Graph/SpillCost.hs
40

I think this would be better expressed as a record. Having a tuple of three Ints seems like a recipe for bugs. Moreover, a proper record would allow us to bang the fields, eliminating three indirections. Ideally SpillCostRecord would also be refactored, but this is something best left to another patch.

261

I guess this TODO just refers to tuning this heuristic?

compiler/nativeGen/RegAlloc/Linear/Main.hs
558 ↗(On Diff #11019)

Seems like another idea for future work would be to turn this Bool into a data AccessType = Read | Write

compiler/nativeGen/RegAlloc/Linear/Main.hs-boot
7 ↗(On Diff #11019)

Hmm, This hs-boot is quite unfortunate. I need to think about this.

compiler/nativeGen/RegAlloc/Linear/SpillCost.hs
36 ↗(On Diff #11019)

I suppose it would make it easier to tune. I think if we do give this a flag then we should document it, but do so out of sight in the "compiler debugging" section of the users guide.

40 ↗(On Diff #11019)

What are these Ints? Again, I would probably make this a proper type. It would likely be worth unboxing the Ints as well.

57 ↗(On Diff #11019)

Let's try to remain stylistically consistent with the rest of GHC. For instance,

linearSpill 
    :: (FR freeRegs, Instruction instr, Outputable instr)
    => Bool
    -> [VirtualReg]
    -> [instr]
    -> [RealReg]
    ...

Also, we need to document these arguments somewhere.

473 ↗(On Diff #11019)

Right, four spaces of indentation would be preferable here.

compiler/nativeGen/RegAlloc/LoopAnnotations.hs
57

This shouldn't be a haddock comment.

136

Again, you can't use -- ^ like this; in this case haddock will actually fail with a parse error.

142

This is a bit of a pet-peeve of mine: foldr is a poor in cases like this where we are building up a single "atomic" result which can't be deconstructed incrementally as we will build up a stack O(n) elements deep during evaluation. Really you want to foldl' here.

193

This is all a bit tricky to follow. In particular it's not clear to me why we drop the head. Perhaps it would help to have a comment with a running example which you could refer to in comments throughout the implementation.

bgamari requested changes to this revision.Feb 8 2017, 12:08 PM
This revision now requires changes to proceed.Feb 8 2017, 12:08 PM
tjakway updated this revision to Diff 12743.Jun 1 2017, 5:15 PM
tjakway edited edge metadata.
  • now exporting SpillLoc constructors
  • converted LoopInfo from a tuple to a record type
  • UniqFM -> UniqSet transition
  • fixed type annotation for linearSpillResult
  • wrote assertLinearSpills
  • wrote testSpillSingleCandidate
  • not testing unassigned regs because there might be unassigned regs that still aren't free
  • added stack map check and RegAllocStats check
  • renamed assertVRegSpilled -> checkVRegSpilled because it doesnt use assertIO
  • wrote assertEither and genCandidatesInBoth
  • added type annotations to top level filenames
  • refactored SpillCost data type and plusSpillCost into their own module SpillCostType
  • fixed haddock comment in canFallThrough
  • fixed haddock comment in annotateLoop
  • changed foldr -> foldl' in annotateLoop
  • explaining why we drop head
  • made spacing more consistent with GHC (4 spaces from the left)

RegAlloc unit tests are failing right now

bgamari requested changes to this revision.Jun 21 2017, 2:49 PM

Bumping out of the review queue. Let me know if you have any questions.

This revision now requires changes to proceed.Jun 21 2017, 2:49 PM

@tjakway, how is this going? Is there any way I can help?

bgamari updated the Trac tickets for this revision.Jul 28 2017, 9:54 AM
tjakway updated this revision to Diff 14342.EditedOct 15 2017, 2:07 PM

This patch was dormant for months because I couldn't figure out why the testsuite was failing and was busy with other things. I figured it out as I was writing an email to update ghc-devs [1] and set to work again. I've since written more tests and re-run the benchmarks.

[1] I misinterpreted 'allocs', the first item of Linear.ra_spillInstrs. I thought it meant which hreg was alloced, it actually means which vreg we're trying to allocate an hreg for.

  • changing to work with the Hoopl update
  • wrote instrSpillsHReg
  • remove redundant constraint & remove unused import
  • changed checkVRegAlloced to return an Either String () instead of a Bool
  • changed stackMapNotEmpty -> Either String ()
  • changed checkInstrsSpillHReg to return Right () instead of Bool
  • removed obsolete warning from genCandidatesInBoth
  • resolved ambiguity between Prelude.<> and Outputable.<>
  • changed free regs check from Bool -> Either String ()
  • changed checkVRegAlloced -> Either String ()
  • changed usage of checkSpillInstrs in assertLinearSpills to match the new Either String () test signature
  • changed testResults to match the new test format
  • removed unused uniqToRand function
  • fixed name shadowing in checkFreeRegs
  • wrote assertEithers
  • fixed hasRight (was causing a Prelude.head: empty list exception)
  • changed if x == True -> if x
  • applied hlint v2.0.9 suggestions
  • changed ways to only run on x86 or x86_64 and changed entry comment to reflect that
  • now hinting to the test runner that this test may use a lot of memory (relatively speaking)
tjakway added a comment.EditedOct 15 2017, 2:31 PM

I ran benchmarks comparing this patch vs. master (2be55b85):

  • with no flags
  • with -fannotate-loops (linear allocator + enable optimization)
  • with -fregs-graph -fannotate-loops (graph allocator + enable optimization)

The benchmark on master was not run with any extra flags.
All the percents are vs. running nofib on master.

Some observations:

  • The majority of the speed-up seems to be from RegAlloc.Linear.SpillCost, which ironically I only wrote because weighting loops requires some measure of spill cost in the first place. It just counts how many times a register was used in a basic block then spills the least-used one, which appears to give a speed-up of 4.1%.
  • Patch with loop annotations had a slightly worse mean runtime (-4.1%) vs. patch without loop annotations (-4.3%) but a slightly better min (-51.6% vs. -50.3%). Not sure if this is close enough to be definitive.
  • Code size was not affected, which is good. The patch doesn’t change the number of spills so if we ended up with larger executables that would be strange.
  • Is something up with exact-reals? Got
warning: dubious run time results for exact-reals [...]
warning: dubious mut time results for exact-reals [...]{F1492031}

at the top of all the comparisons. I’ll have to look at that.

  • The graph allocator with the optimization is slower (-3.4%) than both linear runs (-4.1% and -4.3%) but still better than master.

tjakway added a comment.EditedOct 15 2017, 2:37 PM

Also: will need to verify this, but it shouldn't slow down compilation if -O <2 because the spillQuickly path will be taken through linearSpill which preserves the behavior of HEAD (i.e. spill whichever reg we find first without performing any cost measures).

tjakway updated this revision to Diff 14343.EditedOct 15 2017, 3:28 PM
  • fix 'Defined but not used: ‘reading’'

Yay! I'm happy to see you pick this up again.

Patch with loop annotations had a slightly worse mean runtime (-4.1%) vs. patch without loop annotations (-4.3%) but a slightly better min (-51.6% vs. -50.3%). Not sure if this is close enough to be definitive.

Interesting. Were there significant regressions due to the annotations?

Yay! I'm happy to see you pick this up again.

Patch with loop annotations had a slightly worse mean runtime (-4.1%) vs. patch without loop annotations (-4.3%) but a slightly better min (-51.6% vs. -50.3%). Not sure if this is close enough to be definitive.

Interesting. Were there significant regressions due to the annotations?

The relevant data from master_vs_annotate_loops.txt:

           Min           -51.6%
           Max            +5.6% 
Geometric Mean            -4.1%

The benchmarks with regressions are:

  • CSD (+2.0%)
  • fannkuch-redux (+1.2%)
  • integer (only +0.1%)
  • mate (+5.6%)
  • wheel-sieve1 (+1.0%)

I'll have to look at the Cmm to puzzle that out but it suggests not putting -fannotate-loops in -O2, at least for now (which the patch doesn't). This might seem a little confusing because the patch does change the behavior of -O2 without any other flags, but by enabling spill costs, which had a max of +1.7%, a min of -50.3%, and a geometric mean of -4.3%. The patch also adds the -fannotate-loops flag, which the user must manually pass. ... Let me know if this is still unclear.

...

I'll have to look at the Cmm to puzzle that out but it suggests not putting -fannotate-loops in -O2, at least for now (which the patch doesn't). This might seem a little confusing because the patch does change the behavior of -O2 without any other flags, but by enabling spill costs, which had a max of +1.7%, a min of -50.3%, and a geometric mean of -4.3%. The patch also adds the -fannotate-loops flag, which the user must manually pass. ... Let me know if this is still unclear.

Quite clear, thanks. A flag likely for changing the spill cost model will likely would be a good idea.

Also, quite impressive numbers. Good work!

tjakway planned changes to this revision.EditedOct 16 2017, 1:45 PM

TODO: add a flag -fno-spill-costs to allow current behavior and update docs

I haven't reviewed the patch, but just to note that relying on the geometric mean of normalised results can be misleading. Browsing through the results it looks like

  • Many programs are 1-3% faster
  • One program is 12% faster
  • One program is 50% faster (exact-reals)

I would double-check the two programs that got a lot faster and ensure that the speedup is legit.

Nevertheless, these are good results, nice work!

How are things going here, @tjakway? Let me know if you would like review or help.

First of this is a great idea!

For the new fancy code layout I've introduced a control flow representation into the native GHC backend.
See commit 912fd2b6ca0bc51076835b6e3d1f469b715e2760 for the commit. I think we should aim to extend the CFG representation there to allow tagging of loops. That way we can also use the information for code layout purposes.

Ideally this would be extended to a more principled static analysis eventually which could add more information by for example estimating block entry rates.

I haven't look at the code here in detail but if you plan to pick this up again a few things stood out to me.

  • There seems to be no "big picture" description here. I think having a slightly more detailed "elevator pitch" in a Note would be worth having.
  • Use LabelMap for maps from BlockId -> X
  • Loops can easily be longer than 10 blocks. I imagine loop detection is a common need for compilers maybe there are algorithms that can be used that will scale even for large functions? Doesn't have to be in the first iteration though.
  • If you keep the arbitrary cutoff make it larger than 10 and it should definitely be a flag.
  • How does your current algorithm deal with irreducible control flow graphs? Iirc they are allowed in Cmm.

It's a great idea and I hope you get around to picking it back up.