Source Note Core Infrastructure
AbandonedPublic

Authored by scpmw on Aug 21 2014, 11:45 AM.

Details

Summary

This patch introduces "SourceNote" tickishs that carry a reference to the
original source code. They are meant to be passed along the compilation
pipeline with as little disturbance to optimization processes as possible.

Generation is triggered by command line parameter "-g". The flag is free and
fits with the intended end result (generation of DWARF). Internally we
say that we compile with "debugging", which is probably at least
slightly confusing given the plethora of other debugging options we have.

Keeping ticks from getting into the way of transformations is
tricky, but doable. The changes in this patch produce identical Core
in all cases I tested - we now even have a linting pass that makes
sure of it.

I have migrated the detailed design notes to the Haskell Wiki:
https://ghc.haskell.org/trac/ghc/wiki/SourceNotes

Test Plan

This is about changing a lot of parts of GHC all at once,
so at this point it is important that everybody is basically okay with the
design. I have run the testsuite on both Linux and Mac, and outside of
performance tests (!) everything seems to be in order.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
profiling-arc-169
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 2637
Build 2648: GHC Patch Validation (amd64/Linux)
There are a very large number of changes, so older changes are hidden. Show Older Changes
scpmw edited edge metadata.Oct 23 2014, 10:18 AM
ezyang removed a subscriber: ezyang.Oct 27 2014, 2:05 PM
scpmw updated this revision to Diff 1069.Oct 28 2014, 2:20 PM

Rebased to HEAD (and a few code style fixes)

Cool! What is remaining for this patch set to be merged now?

scpmw updated this revision to Diff 1465.Nov 17 2014, 2:29 AM

Stabilisation...

  • Properly creating multiple tick types at the same time needed quite a bit more plumbing.
  • Changed diffExpr so it matches by Core and IdInfo instead of just Core. Actually ran into a case where the Core was identical but IdInfos were different...
  • Fixed cost centre placement - they shouldn't be reordered relative to each other. Also fixed some extra scc annotatations getting added to bindings.

Failing test cases left:

  • topHandler01 (threaded1,threaded2,profthreaded), signals002 (threaded1,threaded2,profthreaded) - quite puzzled where this is coming from.
  • T1969, T3294, T4801, T5642 - compiler performance tests, sadly to be expected. Might have to think of more efficient alternatives to stripTicks & co.
  • ioprof, scc01 - mismatching profile, but at this point the change looks mostly harmless. Will still have to recheck that.

Also we need a few test cases that target this code, that's up next.

Thanks for the update. Keep them coming. :)

scpmw added a comment.Nov 17 2014, 6:33 AM

Sure, working away at it... Don't want to break more than strictly necessary. Some minor notes:

  • Arc keeps complaining about going over 80 columns, are we taking that literally?
  • Also strange to have to give "excuses" for not fixing file endings. After a few revisions I caved now and added the magical newline to UnariseStg.lhs. Not sure this is a good idea - it should probably be a separate patch?
  • Not quite sure what Harbourmaster is trying to tell me, it always fails with strange errors - clearly I am declaring an Ord instance for RealSrcSpan?
  • The patch has grown quite a few trivial changes now. Had I known that splitting up patches in Arc is actually quite easy, I would probably have done that right away. If anybody wants to see the break-down, here's my branch on GitHub: https://github.com/scpmw/ghc/commits/profiling-arc-169
luite added a subscriber: luite.Nov 21 2014, 3:42 PM

Is there anything I can do to help at this point? I've been preparing the GHCJS code generator and pretty printer to keep track of source locations for source maps, so I'm very interested in this. Unfortunately the new codegen branch is still based on GHC 7.8, the 7.9 branch still uses jmacro, so I can't yet do a full end-to-end test. I'd be happy to help out resolving failing test cases or reviewing/testing parts of the code though.

I've reviewed most of the Cmm changes except for Debug.hs. Thanks for adding the comments, it was much easier to review this time.

T1969, T3294, T4801, T5642 - compiler performance tests, sadly to be expected. Might have to think of more efficient alternatives to stripTicks & co.

Do these tests fail only with -g, or also without -g? (if the latter, I'm more worried)

Please fix the lint - 80 columns is the limit we're using for new code.

compiler/cmm/CmmNode.hs
52

It occurs to me that calling these things "ticks" at this point is completely wrong, and using "SourceNote" would be a lot better.

55

What do the arguments mean?

459

Doesn't CmmUnwind have an expression inside it? If this is deliberate, probably deserves a comment.

490

Here again, you have an expr inside CmmUnwind.

543

ditto

569

delete "a"

603

Macro lookatall: look at all the comments

604

Why not UniqSet?

Also, instead of CmmTickScope, what about something more accurate, e.g. CmmSourceScope.

609–614

Seems UniqSet would have made this a whole lot easier.

compiler/cmm/MkGraph.hs
71

I think you could expand the pair here, also in flatten below.

compiler/codeGen/StgCmmMonad.hs
189

don't understand the comment, perhaps "scope new blocks should be added to"?

scpmw added a comment.Nov 23 2014, 5:16 PM

Do these tests fail only with -g, or also without -g? (if the latter, I'm more worried)

The latter - so yes, this *is* certainly something we should take a look at. My suspicions would be stripTicks (used quite often, and probably doesn't optimise well) as well as the tick scopes and general debug structures, which will be maintained even if there's no ticks to track the scope for. So maybe with a bit of refactoring and tactical disabling of code we can regain some ground.

compiler/cmm/CmmNode.hs
52

Well, we are using "tick" in the sense of "annotation" anyway, right? We could call it "CmmAnnot" or "CmmNote" if we want to put more emphasis on the fact that it's just pseudo instructions. Calling it "CmmSourceNote" would just sound like a generalisation waiting to happen...

Even ignoring possible future Cmm annotations - it might be an idea to make comments non-scoping, non-counting tickishs?

55

Assign expression value to register in order to reconstruct the register's old value. Should put that into the comment, yes.

459

No, my mistake - most likely left over from my first version where I didn't allow full CmmExpr as unwind expressions.

604

Well, the idea is that I am mirroring/retaining the semantics of tickishScoped on the Cmm stage, so it doesn't only apply to source notes necessarily. But I suppose I could change that.

I actually wrote a comment on an earlier version here, which unfortunately was lost in Arc's history:

Note that we *could* use sets, but that wouldn't actually buy us too much. Plus with lists, we have nice properties such as that *most* scopes have some sort of common postfix, and that the first element always uniquely identifies the scope.

Not the strongest reason, and I was tempted to put a comment here, but decided against it. If it's too confusing I can add a comment and/or change the implementation.

609–614

Admittedly.

compiler/cmm/MkGraph.hs
71

Sure, would that make more sense to you? I was trying to "package" the scope and the graph as often as I could, as it made for the easiest code.

compiler/codeGen/StgCmmMonad.hs
189

Yes, both. I'll clarify the comment.

simonmar added inline comments.Nov 24 2014, 4:26 AM
compiler/cmm/CmmNode.hs
52

What else can appear here other than a SourceNote? I have no idea what it would mean to put anything else here, so why not just restrict it to the case we're trying to implement?

I don't know whether merging comment and SourceNote into the same CmmNode constructor would make sense. Maybe it would. It depends how often we want to treat them the same.

604

If you have an extra invariant on the representation (and it looks like you do, because you're relying on it in combineTickScopes), then please document it here. In particular it looks like you could optimise equality, so perhaps a newtype is called for too.

compiler/cmm/MkGraph.hs
71

In this case the code is already dealing with the unpacked representation so I think unpacking it here would probably be a net win.

simonmar requested changes to this revision.Nov 24 2014, 4:28 AM
simonmar edited edge metadata.

Back to you for fixes to tests, lint and other minor issues.

This revision now requires changes to proceed.Nov 24 2014, 4:28 AM
tibbe added a comment.Dec 5 2014, 4:32 AM

Peter, we're technically past the freeze date, but I think people will accept this patch if you could just fix the remaining test failures. Any chance of that?

scpmw added a comment.EditedDec 5 2014, 5:07 AM

Yes, I'm aware. @austin told be that he'd still merge once it passes the tests. Trickiest part is still to get the performance tests running (still without -g). Current status:

  • Simplification & Preparation runs without significant regressions. Was tricky enough, as e.g. even changing the strictness properties of cheapEqExpr slightly has performance implications.
  • Code generation has an overall 2% slowdown. Apparently MkCgInfoDown is seriously sensitive to getting new fields - I get the same result from adding a completely unused Int field. This one might have to stay in, even though it causes a performance test to fail on my Linux box.
  • For native code generation it's clearly important to not hold on to data for too long. Right now I'm considering alternatives to accumulating natives per CmmGroup.
scpmw updated this revision to Diff 1890.Dec 7 2014, 11:56 AM
scpmw edited edge metadata.

Fixed lints and test cases, some tick scopes refactoring.

scpmw added a comment.Dec 7 2014, 12:50 PM

Okay, this one passes validate for me, and especially all performance tests. T4801 fails depending on build parameters, apparently due to the extra field in MkCgInfoDown.

This is rebased, which makes the diff a bit hard to follow. I have commented on the significant changes below.

compiler/cmm/CmmNode.hs
53

Ended up not changing this part. Simple matter is that my code has an extra tickish here that I want to pass around (CoreNote, for core-based profiling), and I would hate dealing with the conflicts. Also not entirely convinced that, say, retaining cost-centres profiling notes wouldn't make sense here. That might allow profiling tools to integrate different data sources.

But I have to admit that those are rather diffuse ideas, so if Simon feels strongly about this I can still implement a change. Passing a known constructor around is a design smell for sure.

612

This has changed quite a bit, but it's not as dramatic as it might look - I'm basically making constructors out of the two operations I need. There was actually no pressing reason for using lists (or sets for that matter) outside of pretty-printing, so now I'm doing the "path" conversion only in those cases. Cleaner, in my opinion.

625

Discovered while writing up the thesis - not sure whether it's a problem.

659

Actually the "proper" way of doing equivalence. For merged scopes we have no fresh unique, so we need to compare the underlying scopes. Could be optimised further, but CombinedScope isn't very common, so I'm erring on the side of clarity here.

684

It's important to short-cut a + a/b = a/b, as that will often happen when concatenating blocks.

compiler/cmm/Debug.hs
243

Separated into two stages so native code generation can free the native blocks sooner. It's a bit annoying that we need to do so much just to get the order right...

compiler/cmm/MkGraph.hs
71

Unpacking this actually slightly increased allocation, if I remember correctly.

compiler/codeGen/StgCmmMonad.hs
582

Skip scope creation entirely for non-debug builds...

compiler/coreSyn/CoreUtils.lhs
1453 ↗(On Diff #1890)

This was quite a head-scratcher - problem was that the mere existence of an go e1 (Tick t e2) equation makes cheapEqExpr strict in the right-hand expression, despite the guard being `False.

The new code works around that problem. I made sure that the Core for cheapEqExpr is perfect, but I suspect that cheapEqExpr' might have become slightly slower in the process.

compiler/nativeGen/AsmCodeGen.hs
269

Hope this makes sense. The stats list is now kept unzipped - easier code and performance stays the same. We could probably even flatten some of the nested lists, possibly using OrdList.

compiler/simplCore/OccurAnal.lhs
1349 ↗(On Diff #1890)

Another change made necessary by a performance test. Had to change the case into a strict binding, hope that's acceptable style-wise.

1371 ↗(On Diff #1890)

Changed it here too, for consistency.

compiler/specialise/Rules.lhs
378 ↗(On Diff #1890)

This was quite hard to pin down as well. Problem is that rules are checked quite often, but don't match too frequently. So it's a good idea to collect ticks only after we have actually matched something.

492 ↗(On Diff #1890)

This is now handled top-level, see above.

Okay, this one passes validate for me, and especially all performance tests. T4801 fails depending on build parameters, apparently due to the extra field in MkCgInfoDown.

Does that mean T4801 is failing, or not?

Code generation has an overall 2% slowdown.

Is that just codeGen? That's not a lot of compile time normally (1% if I remember correctly).

scpmw added a comment.Dec 8 2014, 10:54 AM

Does that mean T4801 is failing, or not?

Not for validate, yes when I run the test suite manually - there T4801 starts at +9%, which becomes +11% with my change. Not quite sure what part of my configuration causes that.

Is that just codeGen? That's not a lot of compile time normally (1% if I remember correctly).

No, it's 2% (allocation) overhead for the entire run. Here's the numbers, taken at different "patch levels" of D169. The "Scopes" part is where I introduce the tick scopes and therefore the MkCgInfoDown field:

master:   <<ghc: 410155584 bytes, 214 GCs, 10187837/25097424 avg/max bytes residency (11 samples), 68M in use, 0.001 INIT (0.001 elapsed), 0.198 MUT (0.230 elapsed), 0.421 GC (0.422 elapsed) :ghc>>
CorePrep: <<ghc: 412603584 bytes, 206 GCs, 11073452/25559224 avg/max bytes residency (10 samples), 70M in use, 0.002 INIT (0.002 elapsed), 0.208 MUT (0.254 elapsed), 0.414 GC (0.414 elapsed) :ghc>>
Cmm:      <<ghc: 412601216 bytes, 206 GCs, 10344117/25559224 avg/max bytes residency (11 samples), 70M in use, 0.001 INIT (0.001 elapsed), 0.199 MUT (0.230 elapsed), 0.424 GC (0.425 elapsed) :ghc>>
Scopes:   <<ghc: 422866184 bytes, 193 GCs, 10529981/26408464 avg/max bytes residency (11 samples), 68M in use, 0.001 INIT (0.001 elapsed), 0.201 MUT (0.232 elapsed), 0.440 GC (0.441 elapsed) :ghc>>

The same happens on my Mac, but on a different level - but here the test suite actually doesn't complain:

Cmm:    <<ghc: 458963656 bytes, 869 GCs, 11449077/26607912 avg/max bytes residency (13 samples), 73M in use, 0.001 INIT (0.001 elapsed), 0.245 MUT (0.456 elapsed), 0.691 GC (0.723 elapsed) :ghc>>
Scopes: <<ghc: 469229880 bytes, 895 GCs, 11102287/27717424 avg/max bytes residency (13 samples), 70M in use, 0.001 INIT (0.001 elapsed), 0.235 MUT (0.393 elapsed), 0.699 GC (0.732 elapsed) :ghc>>

The increase in allocation is remarkable, especially given that T4801 is such a small test-case. When I tried reducing the "Scopes" patch I found that it changes exactly when I remove the field in MkCgInfoDown. As I wrote above, I can also replace it by anything else, and the allocation values still changes. Adding further fields changes allocation much less, which makes me think that we just have tripped over some sort of threshold here.

scpmw updated this revision to Diff 1931.Dec 11 2014, 9:38 AM
scpmw edited edge metadata.

Added debug test

simonmar accepted this revision.Dec 15 2014, 10:18 AM
simonmar edited edge metadata.

Ok, let's go ahead with this. It's not great that compiler allocs increase by 2% for reasons that we don't understand, but I don't want that issue to block this going into 7.10.

scpmw updated this revision to Diff 1958.Dec 15 2014, 12:31 PM
scpmw edited edge metadata.

Fixed another rare Core change with annotations found by the testsuite

By the way - arc will squash all this into a single commit by default, right? That would be a shame, as I made sure to structure everything into steps, which was really useful when tracking down bugs and performance regressions. For reference, my arc branch is available on GitHub.

Finally, some test cases are actually expected to fail for the "debug" way. To be concrete, T7837 fails because it dumps rule firings - this fails because annotation linting will run all passes twice. Furthermore, a bunch of other tests timeout because annotation linting takes too long (barton-mangler-bug, for example). Should I add omit_ways for those?

austin accepted this revision.Dec 15 2014, 1:04 PM
austin edited edge metadata.
In D169#16166, @scpmw wrote:

By the way - arc will squash all this into a single commit by default, right? That would be a shame, as I made sure to structure everything into steps, which was really useful when tracking down bugs and performance regressions. For reference, my arc branch is available on GitHub.

Yes, arc squashes by default. However, I'm willing to forgo this right now and merge your original branch if you'd like to keep it separate - although in the future I'd recommend submitting the diffs one-by-one like they were in your branch, of course :)

Finally, some test cases are actually expected to fail for the "debug" way. To be concrete, T7837 fails because it dumps rule firings - this fails because annotation linting will run all passes twice. Furthermore, a bunch of other tests timeout because annotation linting takes too long (barton-mangler-bug, for example). Should I add omit_ways for those?

Yes, please mark those test with omit_ways.

Finally, if you can make sure to rebase this on HEAD (I didn't see if you had or not, sorry!) with those changes, I'll go ahead and merge this and D396 today for you. Thanks!

This revision is now accepted and ready to land.Dec 15 2014, 1:04 PM
scpmw updated this revision to Diff 1961.Dec 15 2014, 7:45 PM
scpmw edited edge metadata.

Rebased, omit failing tests.

scpmw abandoned this revision.Dec 16 2014, 6:48 PM

(landed as multiple commits)