An overhaul of the SRT representation

Authored by simonmar on Apr 25 2018, 5:33 PM.


  • Previously we would hvae a single big table of pointers per module, with a set of bitmaps to reference entries within it. The new representation is identical to a static constructor, which is much simpler for the GC to traverse, and we get to remove the complicated bitmap-traversal code from the GC.
  • Rewrite all the code to generate SRTs in CmmBuildInfoTables, and document it much better (see Note [SRTs]). This has been something I've wanted to do since we moved to the new code generator, I finally had the opportunity to finish it while on a transatlantic flight recently :)

There are a series of 4 diffs:

  1. D4632 (this one), which does the bulk of the changes
  1. D4633 which adds support for smaller CmmLabelDiffOff constants
  1. D4634 which takes advantage of D4632 and D4633 to save a word in info tables that have an SRT on x86_64. This is where most of the binary size improvement comes from.
  1. D4637 which makes a further optimisation to merge some SRTs with static FUN closures. This adds some complexity and the benefits are fairly modest, so it's not clear yet whether we should do this.

Results (after (3), on x86_64)

  • GHC itself (staticaly linked) is 5.2% smaller
  • -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176
  • I measured the overhead of traversing all the static objects in a major GC in GHC itself by doing replicateM_ 1000 performGC as the first thing in Main.main. The new version was 5-10% faster, but the results did vary quite a bit.
  • I'm not sure if there's a compile-time difference, the results are too unreliable.
Test Plan


Diff Detail

rGHC Glasgow Haskell Compiler
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.
simonmar created this revision.Apr 25 2018, 5:33 PM
simonmar edited the summary of this revision. (Show Details)Apr 25 2018, 5:39 PM
simonmar added a reviewer: osa1.
simonmar edited the summary of this revision. (Show Details)Apr 26 2018, 11:38 AM
simonmar requested review of this revision.Apr 30 2018, 10:41 AM

I've only taken a brief look. It sounds great!

You don't mention this, but if I remember right the current impl tries to build a giant long SRT list, and then use bitmaps to pick subsets from it, trying to take advantage of the structure of nested closures. (All in service of reducing SRT space usage.) Is that still extant today? You are throwing all that away... and actually declaring a win in terms of SRT space usage. That's great! The previous scheme was jolly complicated, and this is much much simpler.

But here is much I do not yet understand.


Nice to get rid of the small/large distinction.


Does this field in the info-table have to point to an SRT or, for info-tables with exactly one static frree variable, can it point direct to that? That would fit with the statement

The entries of an SRT object point to either:
 - A static closure (FUN or THUNK)
 - Another SRT

only that statement would apply also to the SRT field of the info-table.

Also: what if the info table has no static free vars; what is in the SRT field?


This is a great idea. Very neat.


What do you mean "because there is no static closure"?

And it's odd: the line g_srt : [h_closure, ...] looks as if h_closure` is in g's srt, which surely it cannot be?


Ah. Maybe that's what I was asking earlier!

If so, fine -- but the explanation is quite confusing!


I don't understand this. Can you give an example?


I don't understand either of these points. Can you give an example of each?


But also if SRT A refers to closure C1 and closure C2; and C2 also refers to C1 (perhaps ttransitively) then we can omit C1 from A's SRT.

More generally, we can omit any entry X from A's SRT iff all the closures reachable from X are also reachable from the remaining entries in A's SRT. Your [Filter] is a special case. It may be particularly easy to implement, but worth giving the general rule too.


I don't understand this. Can you give an example?


I thought SRTs can contain SRTs.

dfeuer added a subscriber: dfeuer.Apr 30 2018, 6:33 PM
dfeuer added inline comments.

Can you drop some repetition? If that would lose sharing of labels or something, it would be worth commenting on that.

mkSRTInfoLabel :: Int -> CLabel
mkSRTInfoLabel n = CmmLabel rtsUnitId lit CmmInfo
     !lit = case n of
                1 -> fsLit "stg_SRT_1"

This is a little lazier than what was there before, since the Maybe can be forced without its contents. Does that matter?


This looks surprising now that it just operates on a Maybe. I'd expect you'd just replace all uses of needsSRT with isJust, or change to

needsSRT :: CmmInfoTable -> Bool
needsSRT = isJust . cit_srt

Use mapMaybe instead of catMaybes and map?


mapMaybe seems clearer here too.


Does performance matter here? If so, it should be better to use

data SRTMap = SRTMap {withoutSRT :: !(Set CAFLabel), withSRT :: !(Map CAFLabel SRTEntry)}

You may get better performance with Set.unions $ Map.elems $ Map.restrictKeys (flatSRTs topSRT) resolved, since restrictKeys is not naive.


Use Set.difference here!


Should this lookup be performed eagerly?

@simonpj thanks for the review!

You don't mention this, but if I remember right the current impl tries to build a giant long SRT list, and then use bitmaps to pick subsets from it, trying to take advantage of the structure of nested closures. (All in service of reducing SRT space usage.) Is that still extant today? You are throwing all that away... and actually declaring a win in terms of SRT space usage. That's great! The previous scheme was jolly complicated, and this is much much simpler.

Yes exactly (the commit summary at the top mentions the old implementation). The old scheme turned out to have some bad behaviour in some cases, e.g. I noticed it generating bitmaps covering two entries that were 1000s of elements apart. The bad cases could have been fixed by adding more heuristics, but that would have just made it more complicated.

I will expand the comments around the parts that were hard to understand.


will do


Yes - this is mentioned below (it's optimisation "[Shortcut]")

Also: what if the info table has no static free vars; what is in the SRT field?

Presuming you mean "what if the code has no static free vars", in that case the SRT field would be zero. I'll update the comment.


Yes this is a bit confusing, I'll try to elaborate the comment.

There is no g_closure or h_closure: these are local functions, so they have no static constructors. But these labels are used as placeholders to refer to "g's SRT" and "h's SRT" until we've figured out what the label for "g's SRT" will actually be (maybe g won't have an SRT at all).


The second paragraph? Yes it's a bit subtle, I'll expand the comment.


I agree with the point about the general case. It might also be worth noting that thare are lots of other optimisations that aren't covered here, e.g. if we have

A = {X,Y,Z}
B = {Y,Z}
C = {X,B}

then I could replace C with just {A}.


An SRT label *is* a closure label!

simonmar added inline comments.May 3 2018, 1:29 PM

Are you sure that's better? I have to do possibly two lookups instead of one.

dfeuer added inline comments.May 3 2018, 11:12 PM

Ah, very good point. I was thinking about construction, and didn't think enough about lookups.

dfeuer added inline comments.May 3 2018, 11:56 PM

Ah, I mixed up Map (Either k1 k2) v (which is bad) with Map k (Either v1 v2) (which is entirely reasonable). Sorry for the noise.

simonmar updated this revision to Diff 16306.May 4 2018, 7:38 AM

address comments

michalt added inline comments.May 6 2018, 10:08 AM

Shouldn't that be "for which hasCAF is true"?


Nit: I find (Set.unions cafsets) `Set.difference` (Set.fromList lbls) a bit more readable (and should be also faster)


Two questions:

  • Map.union is left biased. Which means that this code might override existing entries in srtMap for some cafLbls. Is this desired? Doesn't this have potential to retain more CAFs than necessary? (since we'll be calling oneSRT in the dependency order, the later elements are more likely to cause the "No duplicates" case below).
  • updateSRTMap should only be called once in oneSRT (it's processing all lbls and it's referring to the srtMap from the start of the function and not modifying the current state). Maybe we could rename it to something like setSRTMap (or putSRTMap) to emphasize this? (at first I was quite surprised that this calls put and not modify)
bgamari requested changes to this revision.May 7 2018, 1:05 PM

There is plenty left to do here but this is looking great.


Carrying out the suggested needsSRT = isJust . cit_srt sounds like a nice piece of follow-up work but I think this patch is big enough as it is.


Perhaps cite Note [STATIC_LINK fields]?


Is the fact that the last has 16 entries exploited in any way or are you just mentioning this here for completeness?


I agree that, for one, the wording here is rather awkward.


Presumably this is to anticipate the shrink of the srt field to 32-bits which you carry out in D4634?

This revision now requires changes to proceed.May 7 2018, 1:05 PM
simonmar updated this revision to Diff 16405.May 14 2018, 3:06 AM
simonmar marked 10 inline comments as done.


This revision was not accepted when it landed; it landed in state Needs Review.May 16 2018, 7:36 AM
This revision was automatically updated to reflect the committed changes.

I think this covers the case in Trac #15126?

Yes, though is responsible for the info table compression.


I killed needsSRT in the revision.


"all *but* the last having 16 entries"

maybe I could reword that?


I did reword a lot of the comments around this in the revision.

  • cafLbl should not already be in the map. A CAFLabel is the identifier for the block, it's unique. So I'm not sure I follow how we could retain more CAFs than necessary here, could you explain what you had in mind?
  • I don't think it makes a difference that updateSRTMap is using put instead of modify, it's just a tiny optimisation since the srtMap hasn't changed between the get and the put. Or am I missing something here?

It was lazy before... I'lll leave it like this unless we have a good reason to add strictness.

michalt added inline comments.May 16 2018, 1:30 PM
  • Yes, you're right - lbls will be always disjoint. I think I got confused by having lbls, cafs and cafLbl that is part of the former not the latter.
  • Agreed that it doesn't make difference in this case - I just found it a bit surprising. Based on naming, I would've expected that updateSRTMap a >> updateSrtMap b would just work. But that's not the case since we will simply overwrite the first "update".

Anyway, thanks for explanations! :)