An overhaul of the SRT representation
ClosedPublic

Authored by simonmar on Apr 25 2018, 5:33 PM.

Details

Summary
  • Previously we would hvae a single big table of pointers per module, with a set of bitmaps to reference entries within it. The new representation is identical to a static constructor, which is much simpler for the GC to traverse, and we get to remove the complicated bitmap-traversal code from the GC.
  • Rewrite all the code to generate SRTs in CmmBuildInfoTables, and document it much better (see Note [SRTs]). This has been something I've wanted to do since we moved to the new code generator, I finally had the opportunity to finish it while on a transatlantic flight recently :)

There are a series of 4 diffs:

  1. D4632 (this one), which does the bulk of the changes
  1. D4633 which adds support for smaller CmmLabelDiffOff constants
  1. D4634 which takes advantage of D4632 and D4633 to save a word in info tables that have an SRT on x86_64. This is where most of the binary size improvement comes from.
  1. D4637 which makes a further optimisation to merge some SRTs with static FUN closures. This adds some complexity and the benefits are fairly modest, so it's not clear yet whether we should do this.

Results (after (3), on x86_64)

  • GHC itself (staticaly linked) is 5.2% smaller
  • -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176
  • I measured the overhead of traversing all the static objects in a major GC in GHC itself by doing replicateM_ 1000 performGC as the first thing in Main.main. The new version was 5-10% faster, but the results did vary quite a bit.
  • I'm not sure if there's a compile-time difference, the results are too unreliable.
Test Plan

validate

simonmar created this revision.Apr 25 2018, 5:33 PM
simonmar edited the summary of this revision. (Show Details)Apr 25 2018, 5:39 PM
simonmar added a reviewer: osa1.
simonmar edited the summary of this revision. (Show Details)Apr 26 2018, 11:38 AM
simonmar requested review of this revision.Apr 30 2018, 10:41 AM

I've only taken a brief look. It sounds great!

You don't mention this, but if I remember right the current impl tries to build a giant long SRT list, and then use bitmaps to pick subsets from it, trying to take advantage of the structure of nested closures. (All in service of reducing SRT space usage.) Is that still extant today? You are throwing all that away... and actually declaring a win in terms of SRT space usage. That's great! The previous scheme was jolly complicated, and this is much much simpler.

But here is much I do not yet understand.

compiler/cmm/CLabel.hs
253

Nice to get rid of the small/large distinction.

compiler/cmm/CmmBuildInfoTables.hs
53

Does this field in the info-table have to point to an SRT or, for info-tables with exactly one static frree variable, can it point direct to that? That would fit with the statement

The entries of an SRT object point to either:
 - A static closure (FUN or THUNK)
 - Another SRT

only that statement would apply also to the SRT field of the info-table.

Also: what if the info table has no static free vars; what is in the SRT field?

60

This is a great idea. Very neat.

140–141

What do you mean "because there is no static closure"?

And it's odd: the line g_srt : [h_closure, ...] looks as if h_closure` is in g's srt, which surely it cannot be?

148

Ah. Maybe that's what I was asking earlier!

If so, fine -- but the explanation is quite confusing!

181

I don't understand this. Can you give an example?

188

I don't understand either of these points. Can you give an example of each?

194

But also if SRT A refers to closure C1 and closure C2; and C2 also refers to C1 (perhaps ttransitively) then we can omit C1 from A's SRT.

More generally, we can omit any entry X from A's SRT iff all the closures reachable from X are also reachable from the remaining entries in A's SRT. Your [Filter] is a special case. It may be particularly easy to implement, but worth giving the general rule too.

198

I don't understand this. Can you give an example?

224

I thought SRTs can contain SRTs.

dfeuer added a subscriber: dfeuer.Apr 30 2018, 6:33 PM
dfeuer added inline comments.
compiler/cmm/CLabel.hs
528

Can you drop some repetition? If that would lose sharing of labels or something, it would be worth commenting on that.

mkSRTInfoLabel :: Int -> CLabel
mkSRTInfoLabel n = CmmLabel rtsUnitId lit CmmInfo
  where
     !lit = case n of
                1 -> fsLit "stg_SRT_1"
                ...
compiler/cmm/Cmm.hs
142

This is a little lazier than what was there before, since the Maybe can be forced without its contents. Does that matter?

150

This looks surprising now that it just operates on a Maybe. I'd expect you'd just replace all uses of needsSRT with isJust, or change to

needsSRT :: CmmInfoTable -> Bool
needsSRT = isJust . cit_srt
compiler/cmm/CmmBuildInfoTables.hs
265

Use mapMaybe instead of catMaybes and map?

396

mapMaybe seems clearer here too.

408

Does performance matter here? If so, it should be better to use

data SRTMap = SRTMap {withoutSRT :: !(Set CAFLabel), withSRT :: !(Map CAFLabel SRTEntry)}
501

You may get better performance with Set.unions $ Map.elems $ Map.restrictKeys (flatSRTs topSRT) resolved, since restrictKeys is not naive.

505

Use Set.difference here!

604

Should this lookup be performed eagerly?

@simonpj thanks for the review!

You don't mention this, but if I remember right the current impl tries to build a giant long SRT list, and then use bitmaps to pick subsets from it, trying to take advantage of the structure of nested closures. (All in service of reducing SRT space usage.) Is that still extant today? You are throwing all that away... and actually declaring a win in terms of SRT space usage. That's great! The previous scheme was jolly complicated, and this is much much simpler.

Yes exactly (the commit summary at the top mentions the old implementation). The old scheme turned out to have some bad behaviour in some cases, e.g. I noticed it generating bitmaps covering two entries that were 1000s of elements apart. The bad cases could have been fixed by adding more heuristics, but that would have just made it more complicated.

I will expand the comments around the parts that were hard to understand.

compiler/cmm/CLabel.hs
528

will do

compiler/cmm/CmmBuildInfoTables.hs
53

Yes - this is mentioned below (it's optimisation "[Shortcut]")

Also: what if the info table has no static free vars; what is in the SRT field?

Presuming you mean "what if the code has no static free vars", in that case the SRT field would be zero. I'll update the comment.

140–141

Yes this is a bit confusing, I'll try to elaborate the comment.

There is no g_closure or h_closure: these are local functions, so they have no static constructors. But these labels are used as placeholders to refer to "g's SRT" and "h's SRT" until we've figured out what the label for "g's SRT" will actually be (maybe g won't have an SRT at all).

181

The second paragraph? Yes it's a bit subtle, I'll expand the comment.

194

I agree with the point about the general case. It might also be worth noting that thare are lots of other optimisations that aren't covered here, e.g. if we have

A = {X,Y,Z}
B = {Y,Z}
C = {X,B}

then I could replace C with just {A}.

224

An SRT label *is* a closure label!

simonmar added inline comments.May 3 2018, 1:29 PM
compiler/cmm/CmmBuildInfoTables.hs
408

Are you sure that's better? I have to do possibly two lookups instead of one.

dfeuer added inline comments.May 3 2018, 11:12 PM
compiler/cmm/CmmBuildInfoTables.hs
408

Ah, very good point. I was thinking about construction, and didn't think enough about lookups.

dfeuer added inline comments.May 3 2018, 11:56 PM
compiler/cmm/CmmBuildInfoTables.hs
408

Ah, I mixed up Map (Either k1 k2) v (which is bad) with Map k (Either v1 v2) (which is entirely reasonable). Sorry for the noise.

simonmar updated this revision to Diff 16306.May 4 2018, 7:38 AM

address comments

michalt added inline comments.May 6 2018, 10:08 AM
compiler/cmm/CmmBuildInfoTables.hs
234

Shouldn't that be "for which hasCAF is true"?

471

Nit: I find (Set.unions cafsets) `Set.difference` (Set.fromList lbls) a bit more readable (and should be also faster)

510–512

Two questions:

  • Map.union is left biased. Which means that this code might override existing entries in srtMap for some cafLbls. Is this desired? Doesn't this have potential to retain more CAFs than necessary? (since we'll be calling oneSRT in the dependency order, the later elements are more likely to cause the "No duplicates" case below).
  • updateSRTMap should only be called once in oneSRT (it's processing all lbls and it's referring to the srtMap from the start of the function and not modifying the current state). Maybe we could rename it to something like setSRTMap (or putSRTMap) to emphasize this? (at first I was quite surprised that this calls put and not modify)
bgamari requested changes to this revision.May 7 2018, 1:05 PM

There is plenty left to do here but this is looking great.

compiler/cmm/Cmm.hs
150

Carrying out the suggested needsSRT = isJust . cit_srt sounds like a nice piece of follow-up work but I think this patch is big enough as it is.

compiler/cmm/CmmBuildInfoTables.hs
68

Perhaps cite Note [STATIC_LINK fields]?

76

Is the fact that the last has 16 entries exploited in any way or are you just mentioning this here for completeness?

140–141

I agree that, for one, the wording here is rather awkward.

181

Presumably this is to anticipate the shrink of the srt field to 32-bits which you carry out in D4634?

This revision now requires changes to proceed.May 7 2018, 1:05 PM
simonmar updated this revision to Diff 16405.May 14 2018, 3:06 AM
simonmar marked 10 inline comments as done.

update

This revision was not accepted when it landed; it landed in state Needs Review.May 16 2018, 7:36 AM
This revision was automatically updated to reflect the committed changes.

I think this covers the case in Trac #15126?

Yes, though https://phabricator.haskell.org/rGHC2b0918c9834be1873728176e4944bec26271234a is responsible for the info table compression.

compiler/cmm/Cmm.hs
150

I killed needsSRT in the revision.

compiler/cmm/CmmBuildInfoTables.hs
76

"all *but* the last having 16 entries"

maybe I could reword that?

140–141

I did reword a lot of the comments around this in the revision.

510–512
  • cafLbl should not already be in the map. A CAFLabel is the identifier for the block, it's unique. So I'm not sure I follow how we could retain more CAFs than necessary here, could you explain what you had in mind?
  • I don't think it makes a difference that updateSRTMap is using put instead of modify, it's just a tiny optimisation since the srtMap hasn't changed between the get and the put. Or am I missing something here?
604

It was lazy before... I'lll leave it like this unless we have a good reason to add strictness.

michalt added inline comments.May 16 2018, 1:30 PM
compiler/cmm/CmmBuildInfoTables.hs
510–512
  • Yes, you're right - lbls will be always disjoint. I think I got confused by having lbls, cafs and cafLbl that is part of the former not the latter.
  • Agreed that it doesn't make difference in this case - I just found it a bit surprising. Based on naming, I would've expected that updateSRTMap a >> updateSrtMap b would just work. But that's not the case since we will simply overwrite the first "update".

Anyway, thanks for explanations! :)