Don't track free variables in STG syntax by default
ClosedPublic

Authored by sgraf on Nov 11 2018, 12:08 PM.

Details

Summary

Currently, CoreToStg annotates StgRhsClosures with their set of non-global
free variables. This free variable information is only needed in the final
code generation step (i.e. StgCmm.codeGen), which leads to transformations
such as StgCse and StgUnarise having to maintain this information.

This is tiresome and unnecessary, so this patch introduces a trees-to-grow-like
approach that only introduces the free variable set into the syntax tree in the
code gen pass, along with a free variable analysis on STG terms to generate
that information.

Fixes Trac #15754.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
sgraf created this revision.Nov 11 2018, 12:08 PM
osa1 added a comment.Nov 11 2018, 10:58 PM

Thanks! I didn't review it line by line but the idea sounds good to me.

Simplification in unarise and removing the parameters on binder and occurance types are great.

Added some comments about the StgFVs API.

Any changes in compile-time runtimes and allocations/residency?

Is there a good reason for changing the printer? I don't feel strongly about it, but it may confuse and frustrate readers, so we better have a good reason for this.

compiler/stgSyn/StgFVs.hs
10

I think it may be good idea to remove the accumulator (first parameter) in annTopBindingsFreeVars. Then we can avoid exporting XRhsClosureUpdate, abstracting more stuff.

Any reasons to export annExprFreeVars and others? I think we only need annTopBindingsFreeVars.

Looks great -- but see my comment on annTopBindingsFreeVars

compiler/stgSyn/StgFVs.hs
43

This seems a bit over-abstracted. Why not stick to the simple type?

annTopBindingsFreeVars ::  [GenStgTopBinding Vanilla] -> [GenStgTopBinding CodeGen]
In D5324#146660, @osa1 wrote:

Any changes in compile-time runtimes and allocations/residency?

Is there a good reason for changing the printer? I don't feel strongly about it, but it may confuse and frustrate readers, so we better have a good reason for this.

I'll have some numbers by tomorrow, but AFAIR, validate didn't detect anything.

As for the printer, I don't think there's a reasonable alternative to omitting things. We ''could'' use NoExt and print NoExt instead, but that feels very noisy.
Note that in the CgStgRhsClosure case, it should still print free variables with the new syntax, as in

{x_s2fE, y_s2fF, z_s2fG, g_s2fH} \r [ds_s2fI] ...

But there currently is no debug output before the call to StgCmm.codeGen.

I also want to point out that CoreToStg isn't as simple as it could be: When trying to remove FreeVarInfos altogether, I realised that it still uses it at least for stuff like occurrence info: https://github.com/ghc/ghc/blob/5b98a38a32f2bc8491dc897631be8892919e2143/compiler/stgSyn/CoreToStg.hs#L446
I'm not sure if this is the only hack. So, this diff doesn't really do much less stuff than before. Quite the contrary, since we perform FV analysis twice now.
My reasoning is to do it in a follow-up which could happen in parallel with adjusting my LLF branch.

compiler/stgSyn/StgFVs.hs
10

Yeah, you're probably right. I'll monomorphise and simplify stuff.

43

Right, I was probably a bit overeager to generalise stuff. But there might be passes (looking at LLF) that have more info in their syntax tree than just free vars, such as occurrence info.

Let's worry about it later, though.

I also want to point out that CoreToStg isn't as simple as it could be: When trying to remove FreeVarInfos altogether, I realised that it still uses it at least for stuff like occurrence info

Gah! Rip it out! You can do this in StgFVs! I can see no good reason to gather free-var info in core-to-stg.

compiler/stgSyn/StgFVs.hs
43

If you leave it for now, it'll be there for ever, and puzzle future readers. Moreover, introducing complexity now for some unspecified future purpose is often fruitless; when the future purpose materialises, it needs something different anyway.

I suggest just doing the simple thing for now, and complicating it later if need be.

OK, I agree this still needs some work. I'm afraid that I won't have time to pick this up until thursday, so if someone wants to push this over the finish line in light of the approaching release deadline, be my guest. What remains to be done:

sgraf updated this revision to Diff 18691.Nov 15 2018, 6:36 AM
  • Monomorphise StgFVs
  • Copy an ancient occ hack from CoreToStg
  • Removed FreeVarInfo logic from CoreToStg
sgraf added a comment.Nov 15 2018, 7:39 AM

Validate says this is green, but moving the occurrence hack from CoreToStg to CoreFVs means that all STG-to-STG transformations don't have the generated dead case binder info available, which might have performance implications. I'm currently measuring.

Alternatively (to the ancient occurrence hack), we could run occurAnalysePgm between CorePrep and CoreToStg to refresh occurrence info. I'm not sure in what way occurrence analysis messes with the invariants put in place by CorePrep, though. We'll see.

osa1 added a comment.Nov 15 2018, 9:30 AM

moving the occurrence hack from CoreToStg to CoreFVs means that all STG-to-STG transformations don't have the generated dead case binder info available

This should be fine -- that information is only used in StgCmmExpr when generating code for case expressions.

sgraf added a comment.Nov 15 2018, 9:45 AM
In D5324#146952, @osa1 wrote:

moving the occurrence hack from CoreToStg to CoreFVs means that all STG-to-STG transformations don't have the generated dead case binder info available

This should be fine -- that information is only used in StgCmmExpr when generating code for case expressions.

Great! Measurements from nofib are also unaffected by this, so this patch should be good to go.

I'm currently also measuring how removing that hack without any replacement. Maybe we can get rid of it altogether.

Omitting the hack had no effect on nofib allocations (and counted instructions) whatsoever. Remove it?

sgraf edited the summary of this revision. (Show Details)Nov 15 2018, 10:31 AM
osa1 added a comment.Nov 15 2018, 10:55 AM

Omitting the hack had no effect on nofib allocations (and counted instructions) whatsoever. Remove it?

Right, I had also tried to remove it in the past and it made no difference in nofib. Looking at cgCase, when the scrutinee binder is not dead there should be one or two extra mov instructions, but I guess that's not enough to make measurable difference in runtime. Alternatively, perhaps a CmmSink (or some other Cmm pass) eliminates those instructions later because the temporaries are never used. I think CmmSink may be doing this. It'd be nice to know if this is the case, then we could just remove this without worrying about regressions.

@sgraf, do you think you could give this a try? I think compiling a case expression with dead binder, with debug prints in CmmSink (maybe print CmmGraph before and after) should be enough (don't forget to remove the optimisation code in cmmCase).

Let me know otherwise and I can give it a try this weekend probably.

osa1 added a comment.Nov 15 2018, 11:01 AM

and counted instructions

I missed this part. I think it's safe to remove then.

osa1 accepted this revision.Nov 15 2018, 11:15 AM

This looks good to me, thanks! Let's wait to hear from @simonpj before landing.

compiler/main/DynFlags.hs
619

This will break someone's workflow but maybe not a big deal...

compiler/simplStg/StgStats.hs
132

No need to align this anymore

compiler/stgSyn/StgSyn.hs
244

Good idea.

This revision is now accepted and ready to land.Nov 15 2018, 11:15 AM
sgraf updated this revision to Diff 18693.Nov 15 2018, 11:26 AM
sgraf marked an inline comment as done.
  • Whitespace only

Rectified that alignment thing.

compiler/main/DynFlags.hs
619

Note that I didn't rename the actual flag, but rather deprecated usage in favor of the new -dsuppress-stg-exts.

Also note that D5315 is still pending, which this and D5339 depend on.

LGTM

I don't know about this dead-default-binder business. Don't we need dead-var info for _every_ variable? Case-binders are just a special case. Reason: after a case-expression, avoid loading the field from the constructor into a register.

And the free-var pass can easily attach deadness-info to every binder. So what's the problem?

I think it is possible that, even if we generate Cmm with a redundant load, that a Cmm optimisation will drop the dead code. Could you confirm or deny that? If so, then it makes no difference in the end; it's just a fast way to make CoreToStg generate less code, and Cmm optimisations less code to discover is dead.

I'm content for you to go ahead at your discretion.

Oh, one other thing: can we have an overview Note that explains the invariants at each stage? Just how all the parts fit together.

osa1 added a comment.Nov 19 2018, 12:01 AM

@sgraf Could you add the note @simonpj asked? I'm hoping to land this.

sgraf updated this revision to Diff 18759.Nov 19 2018, 2:06 AM
  • Add Note [Extensible STG syntax]
sgraf added a comment.Nov 19 2018, 2:37 AM

Sorry, had a busy weekend.

I think it is possible that, even if we generate Cmm with a redundant load, that a Cmm optimisation will drop the dead code. Could you confirm or deny that? If so, then it makes no difference in the end; it's just a fast way to make CoreToStg generate less code, and Cmm optimisations less code to discover is dead.

Even without knowing how Cmm's (or LLVM's, for that matter) dead code elimination pass works, I think the overhead is negligible.

sgraf updated this revision to Diff 18770.Nov 19 2018, 10:48 AM
  • Rebase
This revision was automatically updated to reflect the committed changes.