- User Since
- Nov 19 2016, 5:25 PM (96 w, 1 d)
Fri, Sep 14
I found some slowdowns in the binary benchmarks with this patch which can be traced back to how some Cmm files are compiled in the RTS.
Not sure if the resulting code is actually worse as that depends on the actual program, but it's definitely worse for the binary benchmarks.
Fri, Sep 7
This should probably be mentioned in the Changelog as it can break existing code. (For example it breaks the packman package).
Sun, Sep 2
- Update stats to account for OsX allocation results
Sat, Sep 1
- Massive speed improvements for new code layout in edge cases.
Thu, Aug 30
- Update flag documentation
- Remove unneccesary derived instances
Tue, Aug 28
Aug 23 2018
- Add copyright notice [skip ci]
Aug 22 2018
Aug 21 2018
- Update test allocations
Aug 20 2018
- Rename test
- Remove unused import
- Line widths and minor refactorings that go along with it.
- Update flag description, make vanilla default on old layout.
- Fix info table check
- Disable new layout code on unsupported target archs
Aug 18 2018
Thanks for the feedback!
- Update note
- Add regression test
Aug 15 2018
- Remove unused parameters/imports.
Aug 14 2018
- Be more aggresive in removing dead code.
- Add last2 to get last two elements from list efficiently
- Invert cond branch jump order at the asm level in order to eliminate one.
- Decouple conditional branch weights from cmm branch order..
Aug 9 2018
- Remove space after assert
Jul 30 2018
- Only print nodes not covered by edges explicitly
- Small improvements from jmcts review
- Assert include fixup
- Remove unused bindings
Orthogonal to the issue at hand. But please keep within the line length limit unless there are good reasons not to do so.
Jul 24 2018
Jul 21 2018
- ASSERT that comparisons have been swapped as expected.
Jul 20 2018
- Fix a leftover GTT/GE mixup.
x < y == y > x instead of x < y == y >= x.
Jul 19 2018
At least imaginary/paraffins is still wrong with this patch.
Jul 15 2018
- Allow to dynamically specify cfg weights for benchmarking (temporary)
Jul 12 2018
- Trim edges in chain linking passes.
- Explain branch weights, add backedge detection.
- Refactor where we look for loops
- Refactor some aspects of the layout algorithm and code.
- Change CFG weights, refactor CFG creation slightly.
- Ignore info tables
- Use edge weights for old algorithm too
Jul 10 2018
Jul 9 2018
- Remove redundant import(s)
- Replace a few more foldl instances.
- Use X namespace in GhcPrelude
- Incorporate phab feedback
Jul 6 2018
I tried adding the Length, Width and data type information without keeping it a Maybe but it was proving to be almost impossible.
Jul 4 2018
I plan to address the inline notes and incorporate the changes from the other foldl' patch.
Abandoned in favour of D4929
Jul 3 2018
- Remove redundant import
Jul 2 2018
Jul 1 2018
- Remove unused import
and as soon as I get the time I will look at the liveliness analysis. I find it highly unlikely that simple Cabal function requires ~2200 spill slots.
Jun 30 2018
Jun 28 2018
Jun 22 2018
I've run nofib on a Xeon E3-1220 which jmct has provided me for a bench build and the results seem to hold up.
Jun 21 2018
Jun 20 2018
Abandoned in favour of D4879.
- Revert back to using a fold for list insertion
Jun 19 2018
Jun 15 2018
Jun 14 2018
[14-Jun-18 16:07:59] <sjakobi> With the strictness changes in UniqFM, I'm actually getting a few "stat not good enough" failures for
[14-Jun-18 16:07:59] <sjakobi> TEST="T12227 T12545 T3064 T5030 T5321Fun T5631 T9872a T9872b T9872c"
This currently increases allocation. sjakobi was so nice to upload the patch so I can check why.
More things to consider:
- CMOV doesn't support 8 bit operands.
- CMOV with a memory address does NOT perform a conditional load. But a load followed from a conditional move.
- Some simple cases could be better encoded using things like setcc.
Jun 13 2018
Seems I was a bit sloppy with updating the docs when I extended rtsopts thanks a lot for cleaning this up!
Jun 12 2018
- Allocations are still far too high so there is some need for optimization.
- Currently only deals with integer conditions
- Assignments which read from memory should be directly translated to cmovs. Currently they also require an intermediate register.
- Some refactoring.
- Also tackle branching code where one branch is empty.
- Refactor CmmCondAssign type.
There is a bit of noise in the results but here are the nofib measurements.
Individual benchmarks listed are these where cmov was generated and
it made a difference.
Jun 11 2018
I don't feel strong about this either way.
But I think we should either dump this with verbose,
or otherwise change the userguide to mention the fact that there is another pass which isn't dumped.
Assuming all changes in here are also in the other diff you could abandon this one to make it clear.
Jun 10 2018
It's nice how much duplication disappeared with that change. Good job :)