Make tickishContains faster

Authored by niteria on Jan 22 2017, 4:46 PM.



This just reorders some inequality checks to
make the common case cheaper.

The results are quite dramatic for Trac #11095, but that's
probably because something else is causing it to do too
much work.

Before (full

13,589,495,832 bytes allocated in the heap

After (full

7,885,575,872 bytes allocated in the heap

This is with BuildFlavour = devel2, so take it with a
a grain of salt.

For reference, with no -g I get:

155,703,112 bytes allocated in the heap

so we're still quite a way off.

Test Plan

I still have to test locally

Diff Detail

rGHC Glasgow Haskell Compiler
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.
niteria updated this revision to Diff 10624.Jan 22 2017, 4:46 PM
niteria retitled this revision from to Make tickishContains faster.
niteria updated this object.
niteria edited the test plan for this revision. (Show Details)
niteria added a reviewer: bgamari.
niteria updated the Trac tickets for this revision.
niteria updated this revision to Diff 10625.Jan 22 2017, 5:03 PM
niteria edited edge metadata.
  • remove HasCallStack
bgamari edited edge metadata.Jan 23 2017, 8:49 AM

Great catch. I wonder if we instead want to make SourceNote contain a FastString?

Also, you may want to have a look at Debug.cmmDebugGen which has this,

-- Collect ticks from all blocks inside the tick scope.
-- We attempt to filter out duplicates while we're at it.
ticks = nubBy (flip tickishContains) $
        bCtxsTicks bctxs ++ ticksToCopy scope

This looks like it might very easily blow up with large tick counts.

This revision was automatically updated to reflect the committed changes.
dfeuer added a subscriber: dfeuer.Jan 24 2017, 1:37 PM

Part of this needs to be reverted. I'm not convinced any of the changes in compiler/basicTypes/SrcLoc.hs are useful, and some of them are flat-out wrong.


This is semantically all wrong! The old definition almost certainly had nothing to do with the actual problem, and it was correct. The new definition seems to ask whether rectangles contain each other, or something like that.

niteria added inline comments.Jan 24 2017, 2:08 PM

Oops, I got carried away here. Sorry about that.

Ok, I remeasured without the semantic change (still devel2) and not all of it was in vain.

The baseline is 13.5 B allocations.
Comparing the String last in CoreSyn brings it down to 11.2B.
Checking the SrcLoc filename last brings it down to 9.9B.

scpmw added a subscriber: scpmw.EditedJan 25 2017, 6:56 AM

Sorry I didn't react in IRC, was away at the time - Debug.cmmDebugGen simply tries to do the same thing on the Cmm-level that happens in mkTick on Core level: Sometimes Cmm optimisation merges blocks, at which point more source notes enter an existing tick scope, and might become redundant in the process. This does not happen often, yet regularly enough that I noticed it in ThreadScope and added this quick fix.

Not sure whether this could become a performance problem - Cmm generation only happens once, whereas mkTick will get called every time the expression is rebuilt for every single tick. That must traverse the whole list of source ticks in order to look for duplicates, which obviously makes it quadratic in complexity. I had made a few attempts to change the Core definition to Tick [Tickish Id] (Expr b) so we could properly encode a group of Tickish that was known to not have redundancies, but this makes mkTick even more involved (and back then the correct semantics weren't exactly clear).

And yes, this is abusing nubBy quite a bit. I think I promised myself I'd replace it eventually, but never got around to do it. (Edit:) And yes, I think I am even relying on the fact that blocks with smaller ticks are generally added to the end of the list. Ugh, this is terrible.