Implement late lambda lift
ClosedPublic

Authored by sgraf on Oct 12 2018, 5:19 AM.

Details

Summary

This implements a selective lambda-lifting pass late in the STG pipeline.

Lambda lifting has the effect of avoiding closure allocation at the cost of having to make former free vars available at call sites, possibly enlarging closures surrounding call sites in turn.

We identify beneficial cases by means of an analysis that estimates closure growth.

There's a Wiki page at https://ghc.haskell.org/trac/ghc/wiki/LateLamLift.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
sgraf created this revision.Oct 12 2018, 5:19 AM
sgraf added a comment.EditedOct 12 2018, 5:26 AM

I identified a few points worth discussing below.

I don't have the latest benchmarking numbers available on this machine, but https://ghc.haskell.org/trac/ghc/ticket/9476#comment:34 is still accurate.

The wiki page should be up to date now.

Edit: Also, this still lacks a unit test suite. Let's see what I can cook together...

compiler/simplStg/StgLiftLams/Analysis.hs
226

Is it OK to use demand info this late in the STG pipeline?

381

Note that this bit of code means we migh optimise differently when profiling. Is this acceptable behavior?

sgraf updated this revision to Diff 18303.Oct 13 2018, 10:47 AM
  • Move StgSubst to stgSyn
  • Comments only
  • Remove dead binder
sgraf updated this revision to Diff 18382.Oct 19 2018, 12:04 PM
  • Comments only

Here's a paste with the NoFib results: P189

Harbormaster edited this Differential Revision.Oct 19 2018, 2:30 PM

I have not read the code in detail, but broadly it looks fine.

Before finishing, we'll really need a sketch of what the analysis is analysing. And an overview of the whole thing: movtivation, placement in STG, moving parts. Without that, it's very opaque.

It's very modularly separated from the rest of the compiler, so I'm content to have it in -- and it can be very beneficial. Thanks for doing it.

compiler/basicTypes/Demand.hs
790

I don't object, but why did you need the extra polymorphism here?

sgraf added a comment.EditedOct 31 2018, 8:05 AM

I have not read the code in detail, but broadly it looks fine.

Before finishing, we'll really need a sketch of what the analysis is analysing. And an overview of the whole thing: movtivation, placement in STG, moving parts. Without that, it's very opaque.

Isn't the wiki page the right place to put it? Or, put another way, what do you find missing in the background chapter?

Or do you mean inline comments in the form of a number of Notes?

Edit: Now that I've read your comment on the ticket, it's clear you are talking about a note.

It's very modularly separated from the rest of the compiler, so I'm content to have it in -- and it can be very beneficial. Thanks for doing it.

compiler/basicTypes/Demand.hs
790

We need to call it on a DmdShell in line 223 of StgLiftLams.Analysis.

Edit: Now that I've read your comment on the ticket, it's clear you are talking about a note.

Yes, I think Notes tend to stay more up to date than wiki pages. One overview note giving the big picture and pointing to the moving parts, with subsidiary notes (eg for the analysis) would probably work. You can refer from the notes to the wiki page and tickets. The Notes don't need to discuss zillions of design alternatives: focus on describing what is actually in the code

Thanks!

SImion

sgraf updated this revision to Diff 18593.Nov 5 2018, 11:42 AM
  • Move StgSubst to stgSyn
  • Comments only
  • Remove dead binder
  • Comments only
  • Disallow lifting binders occuring in argument position while still allow to lift binders with undersaturated applications
  • Disallow lifting nullary applications
  • Add some notes
sgraf updated this revision to Diff 18600.Nov 6 2018, 4:41 AM
  • More comments
sgraf updated this revision to Diff 18602.Nov 6 2018, 6:44 AM
  • Not sure where I picked up the term 'required set'
sgraf updated this revision to Diff 18630.Nov 9 2018, 3:35 AM
  • Make haddock links work
sgraf updated this revision to Diff 18766.Mon, Nov 19, 6:49 AM
  • Rebased on top of Phab:D5339
sgraf updated this revision to Diff 18767.EditedMon, Nov 19, 6:52 AM
  • Testing if last update was successful

(Apparently, it wasn't)

sgraf added a comment.EditedMon, Nov 19, 9:25 AM

Unfortunately, rebasing this on top of the recent STG syntax changes led to a few complications:

  • Free vars in STG need to be DIdSets, because a deterministic ordering matters when we lift free vars to parameter binders. I could re-use the FV idea if that's needed
  • New TTG extension points for STG binders (BinderP, similar to IdP in Core), Lets (XLet), Let-no-escapes (XLetNoEscape), which take on the responsibility of the former LetBoundInfo

I currently can't validate locally, because my local master doesn't validate either, but I think this was mostly a safe refactoring.

Edit: Also have a look at the note on top of StgLiftLams.Analysis.

sgraf updated this revision to Diff 18772.Mon, Nov 19, 11:21 AM
  • Another rebase
sgraf updated this revision to Diff 18786.Tue, Nov 20, 9:06 AM
  • Remove unused copy of NoExtSilent in HsExpression
sgraf updated this revision to Diff 18787.Tue, Nov 20, 9:26 AM
  • Syntactic furbelows for new Notes
  • Bring comments up to date with the rebase
  • Move an anchor around
sgraf updated this revision to Diff 18788.Tue, Nov 20, 9:41 AM
  • Comments only
sgraf added a comment.Thu, Nov 22, 5:29 AM

I have not read the code in detail, but broadly it looks fine.

Before finishing, we'll really need a sketch of what the analysis is analysing. And an overview of the whole thing: movtivation, placement in STG, moving parts. Without that, it's very opaque.

It's very modularly separated from the rest of the compiler, so I'm content to have it in -- and it can be very beneficial. Thanks for doing it.

@simonpj Sorry to nag you, you are probably busy. But could you please give this another glance, so that we might be able to land it in 8.8? I think it's as ready as it could be.

I added a note in StgLiftLams.Analysis, see below. Other than that, there's the wiki page (which links to the paper) for people who want more background.

compiler/simplStg/StgLiftLams/Analysis.hs
40

New notes here

sgraf updated this revision to Diff 18813.Thu, Nov 22, 5:31 AM
  • Make liftedIdsExpander preserve the order of free vars

What impact does this have on compile time? I notice that you've enabled it at both -O and -O2.

sgraf added a comment.EditedFri, Nov 23, 3:13 AM

What impact does this have on compile time? I notice that you've enabled it at both -O and -O2.

I haven't actively measured, but while bootstrapping in flavor bench, allocations went up by 0.6% and runtime by 1.9%. Note that this also includes compilation of 5 new (more or less trivially to compile) modules.
So the perceived performance stayed rather the same (I also didn't perceive any 'outliers'), but it's not self-sustaining. I guess only doing it in -O2 would be a good idea.

Edit: On the other hand, seeing that StgCSE also runs in -O1 is probably indicative that we want to run this with -O1, too. SpecConstr is one of two optimisations that only run with -O2 and is probably much heavier than this analysis, with fix-pointing and all. I'm not sure I'm the right person to judge this, so just yell if you want this to only be enabled with -O2.

Do you have the nofib numbers for compilation time and allocations?

We know from the recent survey and previous feedback that users really care about compile times. A lot. And the most common compilation setting is -O, so we don't want to regress there unless we're making worthwhile gains in performance. We don't have specific guidelines but anything that's a >1% compile time regression should probably go in -O2 (that's after the compiler is bootstrapped, so optimisations that pay for themselves can stay in -O).

sgraf added a comment.Fri, Nov 23, 5:31 AM

Yes, I have: +0.1% allocations and +0.5% runtime. I agree with the general sentiment, so I'll exclude it from -O1.

sgraf updated this revision to Diff 18849.Fri, Nov 23, 5:33 AM
  • Only run with -O2

I'm content to commit this. It's a significant piece of code, and I have not reviewed it in detail. But it's completely separable from everything else; it only runs with -O2.

I would like an overview note as commented above. Apart from that, go for it.

compiler/simplStg/StgLiftLams/Analysis.hs
41

I think this is the root module of the new code. Can you add an overview Note? "When to lift" is good, but it assumes you know what "lift" means, and why you might want to do it.

Something like Note [Late lambda-lifting in STG]:

  • Point to the wiki page and Trac ticket(s)
  • Give an example, and say why it's useful
  • Explain the moving parts (analysis, floating, whatever)
  • Give the highlight of the perf impact, both on nofib and (perhaps) on some poster-child inner loops, eg from Trac tickets.
  • Mention the (absence of) compile time impact

Thanks!

simonpj accepted this revision.Fri, Nov 23, 6:49 AM
This revision is now accepted and ready to land.Fri, Nov 23, 6:49 AM
sgraf updated this revision to Diff 18854.Fri, Nov 23, 9:21 AM
  • Update users guide for -O2
  • Add an overview note in StgLiftLams
sgraf marked 6 inline comments as done.Fri, Nov 23, 9:22 AM

Thanks for the review! I'll land this.

compiler/simplStg/StgLiftLams/Analysis.hs
41

Added such a note in StgLiftLams.

This revision was automatically updated to reflect the committed changes.
sgraf marked an inline comment as done.