DO NOT MERGE - Experimental code layout
Needs ReviewPublic

Authored by AndreasK on Wed, May 23, 3:57 PM.

Details

Reviewers
bgamari
jmct
jrtc27
Trac Issues
#15124
Summary

This patch implements a different experimental code layout.
It has only been finished/tested for amd64.

Performance is slightly better (but within the margin of error)
on my machine. However compile times are definitely worse so
until there is something else which can take advantage of this
it's not worth implementing this fully.

Things this patch does:

  • Move code responsilbe for block layout in it's own Module.
  • Move the NcgImpl Class into NCGMonad.
  • Extract a control flow graph from the input cmm.
  • Update this cfg to keep it in sync with changes during asm codegen. I've only did the parts used by amd64 so this is definitely broken for x86 and likely broken on the other backends. Things that change the CFG are::
    • Blocks added by the linear register allocator.
    • Blocks added by the Cmm -> [Instr] translation.
    • Shortcutting.
  • Assign weights to the edges in the CFG which are then used for block layout.
Test Plan

ci

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
layoutOpt
Lint
Lint WarningsExcuse: wip
SeverityLocationCodeMessage
Warningcompiler\nativeGen\AsmCodeGen.hs:250TXT3Line Too Long
Warningcompiler\nativeGen\AsmCodeGen.hs:304TXT3Line Too Long
Warningcompiler\nativeGen\AsmCodeGen.hs:588TXT3Line Too Long
Warningcompiler\nativeGen\AsmCodeGen.hs:668TXT3Line Too Long
Warningcompiler\nativeGen\AsmCodeGen.hs:704TXT3Line Too Long
Warningcompiler\nativeGen\BlockLayout.hs:374TXT3Line Too Long
Warningcompiler\nativeGen\BlockLayout.hs:393TXT3Line Too Long
Warningcompiler\nativeGen\CFG.hs:124TXT3Line Too Long
Warningcompiler\nativeGen\NCGMonad.hs:74TXT3Line Too Long
Warningcompiler\nativeGen\NCGMonad.hs:79TXT3Line Too Long
Warningcompiler\nativeGen\NCGMonad.hs:80TXT3Line Too Long
Warningcompiler\nativeGen\NCGMonad.hs:82TXT3Line Too Long
Warningcompiler\nativeGen\NCGMonad.hs:83TXT3Line Too Long
Warningcompiler\nativeGen\X86\CodeGen.hs:1972TXT3Line Too Long
Warningcompiler\nativeGen\X86\CodeGen.hs:2150TXT3Line Too Long
Unit
No Unit Test Coverage
Build Status
Buildable 20961
Build 46744: [GHC] Linux/amd64: Continuous Integration
Build 46743: [GHC] OSX/amd64: Continuous Integration
Build 46742: [GHC] Windows/amd64: Continuous Integration
Build 46741: arc lint + arc unit
AndreasK created this revision.Wed, May 23, 3:57 PM
AndreasK planned changes to this revision.Wed, May 23, 6:24 PM

As it stands performance is about the same but compile times are slightly worse.

There are still some knobs to turn here and it should also be more amendable to take advantage of control flow information from eg profiling.
Hopefully I will be able to base some other work on it later.

AndreasK updated this revision to Diff 16519.Thu, May 24, 2:49 AM

Some adjustments to the weights.

I've run nofib this night and results actually look quite good after a bit of tweaking!

IgnoreCal is this patch. Likely04 is this patch on top of D4327.

---------------------------------------------------------------------------------------------------------------------------
        Program           Size      Size    Allocs    Allocs   Runtime   Runtime   Elapsed   Elapsed  TotalMem  TotalMem
                     IgnoreCal  Likely04 IgnoreCal  Likely04 IgnoreCal  Likely04 IgnoreCal  Likely04 IgnoreCal  Likely04
---------------------------------------------------------------------------------------------------------------------------
            Min          -0.1%     -0.1%     -0.0%      0.0%     -3.1%     -5.3%     -3.0%     -6.2%     -1.7%     -0.7%
            Max          +0.0%     +0.0%      0.0%     +0.0%     +1.6%     +7.3%     +1.6%     +7.0%     +0.5%     +0.8%
 Geometric Mean          -0.0%     +0.0%     -0.0%     -0.0%     -0.3%     -1.1%     -0.5%     -1.2%     -0.0%     +0.0%

However I want to get this tested on at least 1-2 other CPUs before I put more work into getting this ready to land.

Full nofib log:

AndreasK updated this revision to Diff 16569.Tue, May 29, 6:20 AM
  • Only look at reachable graphs when constructing the CFG.
  • Import cleanup.

cmmToCmm finds some dead blocks and removes conditional jumps to them.
However they are still members of the map used by CmmGraph.

By using revPostOrder we avoid adding these dead blocks to the CFG.

AndreasK updated this revision to Diff 16591.Wed, May 30, 4:50 PM
  • Performance improvements
AndreasK updated this revision to Diff 16626.Fri, Jun 1, 5:16 AM
  • Trim edges in chain linking passes.