Add likely annotation to cmm files in a few obvious places.
ClosedPublic

Authored by AndreasK on Jan 17 2018, 2:54 PM.

Details

Summary

Provide information about paths more likely to be taken in
some cmm files used by the rts where it is obvious.

This leads to slightly better assembly being generated.

For the compacting GC (Compacting.cmm) the results for
nofib/gc/mutstore1 10000000 +RTS -c -s were:

|  Without likely info    |  With likely info       |

------------|-------------------------|-------------------------|
Total time | 47.250s ( 50.583s wall) | 45.656s ( 48.359s wall) |

Test Plan

ci

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
AndreasK created this revision.Jan 17 2018, 2:54 PM
AndreasK updated this revision to Diff 15137.Jan 17 2018, 5:02 PM

Based on the performance difference (GETTAG(v) != 0) is likely to be true.

bgamari accepted this revision.Jan 17 2018, 6:21 PM
bgamari added inline comments.
rts/Updates.cmm
57

Is this really correct? It looks to me like this is a contention case.

This revision is now accepted and ready to land.Jan 17 2018, 6:21 PM
bgamari requested changes to this revision.Jan 17 2018, 6:22 PM

Didn't mean to accept.

This revision now requires changes to proceed.Jan 17 2018, 6:22 PM
AndreasK added inline comments.Jan 18 2018, 4:15 AM
rts/Updates.cmm
57

I thought so too. But setting it to true is faster in the tests I looked at.

Either way I want to look into it a bit more before it lands as this is true even in the singe threaded rts.

AndreasK added a comment.EditedJan 18 2018, 5:23 AM

So I looked into it and it's a Cmm/Codegen bug.

if (GETTAG(v) != 0) (likely: False) {
  foo;
}
bar;

Generates this assembly:

_cM:
	movq 8(%rbp),%rax
	movq 8(%rax),%rcx
	testb $7,%cl 
	je bar
foo:
	subq $8,%rsp

But what we want is for bar to be the fall-through case.
So setting likely to True was semantically false, but produced the correct code.

Compiling with -O0 produces the correct code. So likely a bug in the Cmm optimizer.

I opened a PR which should fix the bug (https://github.com/ghc/ghc/pull/97). If desired I can make it a differential later today.

AndreasK updated this revision to Diff 15144.Jan 18 2018, 11:11 AM

Adjustment to Updates.cmm

Depends on https://github.com/ghc/ghc/pull/97/commits to produce the intended code.

AndreasK marked 2 inline comments as done.Jan 18 2018, 11:11 AM
AndreasK edited the summary of this revision. (Show Details)Jan 29 2018, 6:39 PM
AndreasK edited the test plan for this revision. (Show Details)
This revision was automatically updated to reflect the committed changes.

Woah, I'm too late to the party to review this, but this is deeply suspicious:

For the compacting GC (Compacting.cmm) the results for
nofib/gc/mutstore1 10000000 +RTS -c -s were:

Without likely infoWith likely info

------------|-------------------------|-------------------------|
Total time | 47.250s ( 50.583s wall) | 45.656s ( 48.359s wall) |

because Compact.cmm has nothing to do with compacting GC, it's for compact regions (see the Data.Compact API). So I'm not sure where this difference came from? Perhaps from the changes in primops or updates?

Woah, I'm too late to the party to review this, but this is deeply suspicious:

For the compacting GC (Compacting.cmm) the results for
nofib/gc/mutstore1 10000000 +RTS -c -s were:

Without likely infoWith likely info

------------|-------------------------|-------------------------|
Total time | 47.250s ( 50.583s wall) | 45.656s ( 48.359s wall) |

because Compact.cmm has nothing to do with compacting GC, it's for compact regions (see the Data.Compact API). So I'm not sure where this difference came from? Perhaps from the changes in primops or updates?

I've looked into this again after your comment but didn't manage to reproduce that discrepancy again.

HEAD does produce different (suboptimal) assembly for the unannotated Update.cmm file than the annotated one.
But so far I'm not sure why and even so it didn't cause a difference as large when I retested.

I'm also not sure why there is a difference to begin with. There should be no way for it to invert unless it's tagged with an annotation or there are multiple blocks jumping into the false branch.
Both of which don't seem to be the case looking at the cmm dumps.

It's correct on 8.2.1 and when annotated on HEAD. But anyway that seems like a different issue.

The test uses quite a lot of RAM so my best bet is that either I had some build fragments I didn't intend to have or some low key background task kicked in and caused enough swapping to distort the result.
I've run both cases multiple times but not alternating.
So I ran:

  • build without patch
  • test1, test2, test3
  • build with patch
  • Run patched test1, test2, test3

I will look into the assembly difference. But I'm almost certain at this point (for better or worse) that it was a measurement error.

Regarding the assembly difference:

6d14c1485cb570cbd183bcdc0f858d9a6dc1eb31 changed it such that (a != 0) conditionals get inverted to (a == 0) and the branches swapped.

As we generate fall throughs for the code executed in the case that the conditional is false this has the side effect of changing which branch becomes the fallthrough.
So unless it was masked by something else this means the Update.cmm code was suboptimal since that commit landed.

Yeah, I'm suspicious of that change to CmmSink as I noted in the comment: https://phabricator.haskell.org/rGHC6d14c1485cb570cbd183bcdc0f858d9a6dc1eb31#inline-626

It's a bit of a blunt instrument given the more refined check we already have in CmmContFlowOpt. We should fix this - would you like to do it or shall I?

Yeah, I'm suspicious of that change to CmmSink as I noted in the comment: https://phabricator.haskell.org/rGHC6d14c1485cb570cbd183bcdc0f858d9a6dc1eb31#inline-626

For what it's worth the change was not wrong. It only causes issues when assumptions are made about how cmm if statements are translated to assembly.

It's a bit of a blunt instrument given the more refined check we already have in CmmContFlowOpt. We should fix this - would you like to do it or shall I?

I assume the best approach would be to simply limit this to float types? But I'm happy if you look into it.

I assume the best approach would be to simply limit this to float types? But I'm happy if you look into it.

See D4398