This patch implements a different experimental code layout.
It has only been finished/tested for amd64 hence is disabled on other platforms.
Performance varies slightly be CPU/Machine but in general seems to be better
While nofib shows only small differences of about +/- ~0.5% overall depending on
flags/machine performance in other benchmarks improved significantly.
In an effort to avoid accidental regressions benchmarks were run on 3 platforms:
- Sandy Bridge Xeon/Linux
Although I had to skip some benchmarks on some platforms as getting them
to build was too much work.
- Library benchmark results summarized:
- containers: ~1.5% faster
- aeson: ~2% faster
- megaparsec: ~2-5% faster
- xml library benchmarks: 0.2%-1.1% faster
- vector-benchmarks: 1-4% faster
- text: 5.5% faster
On average GHC compile times over all of nofib also went down slightly by about half a percent.
Things this patch does:
- Move code responsilbe for block layout in it's own Module.
- Move the NcgImpl Class into NCGMonad.
- Extract a control flow graph from the input cmm.
- Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout.
- Assign weights to the edges in the CFG which are then used for block layout.
- Once we have the final code layout try to eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: ..