Haskell land Stack Traces proposal

Authored by Tarrasch on Feb 15 2015, 5:22 PM.


Trac Issues

This is definitely not ready for merging. I mainly want to discuss the
Haskell interface we're exposing here. I rebased @scpmw's .debug_ghc section
patch and his Dwarf reading stuff here, just so that the Haskell interface has
some actual working implementation to rely on. As Peter said in D396, he is not
happy with the way the section is generated now (is that correct Peter?).

The other thing is that I want to show that we're not *that* far from having
low overhead stack traces now that D169 and D396 are landed. I counted this
patch set to about 2500 lines of code.

Please note that this patch set does not yet implement the useful part of being
able to reify the stack automatically on exceptions. So this patch set is still
pretty useless on it's own. But I wanted to keep that change separate, because
it makes changes to critical raise primitive in the RTS. This patch set is
mostly "safe additions", no existing code paths are changed I think.

Test Plan

I've added test cases that test the whole chain (including Peter's whole -g
architecture) but some of the tests I added actually fail now. Will have to
check into that.

Diff Detail

rGHC Glasgow Haskell Compiler
Lint WarningsExcuse: Laters
Warningcompiler/cmm/Debug.hs:328TXT3Line Too Long
Warningcompiler/cmm/Debug.hs:408TXT3Line Too Long
Warningcompiler/cmm/Debug.hs:411TXT3Line Too Long
Warningcompiler/cmm/Debug.hs:414TXT3Line Too Long
Warningcompiler/nativeGen/Dwarf/Types.hs:160TXT3Line Too Long
Warningincludes/rts/Dwarf.h:15TXT2Tab Literal
Warningincludes/rts/Dwarf.h:74TXT3Line Too Long
Warningincludes/rts/Dwarf.h:96TXT3Line Too Long
Warninglibraries/base/tests/executionStack003.hs:21TXT3Line Too Long
Warningrts/Dwarf.c:79TXT3Line Too Long
Warningrts/Dwarf.c:80TXT3Line Too Long
Warningrts/Dwarf.c:81TXT3Line Too Long
Warningrts/Dwarf.c:86TXT3Line Too Long
Warningrts/Dwarf.c:115TXT2Tab Literal
Warningrts/Dwarf.c:163TXT3Line Too Long
Warningrts/Dwarf.c:303TXT3Line Too Long
Warningrts/Dwarf.c:401TXT3Line Too Long
Warningrts/Dwarf.c:506TXT3Line Too Long
Warningrts/Dwarf.c:513TXT3Line Too Long
Warningrts/Dwarf.c:575TXT3Line Too Long
Warningrts/Dwarf.c:612TXT3Line Too Long
Warningrts/Dwarf.c:639TXT3Line Too Long
Warningrts/Dwarf.c:663TXT3Line Too Long
Warningrts/Dwarf.c:744TXT3Line Too Long
Warningrts/Dwarf.c:755TXT3Line Too Long
No Unit Test Coverage
Build Status
Buildable 3162
Build 3189: GHC Patch Validation (amd64/Linux)
Tarrasch updated this revision to Diff 2240.Feb 15 2015, 5:22 PM
Tarrasch retitled this revision from to Haskell land Stack Traces proposal.
Tarrasch updated this object.
Tarrasch edited the test plan for this revision. (Show Details)
Tarrasch added reviewers: scpmw, simonmar, austin, tibbe.
Tarrasch updated the Trac tickets for this revision.
Tarrasch added a subscriber: scpmw.
Tarrasch updated this revision to Diff 2241.Feb 15 2015, 5:30 PM
Tarrasch updated this object.
Tarrasch edited edge metadata.

Just append to summary

Hmm, Peter, how did you get Phabricator to understand that D396 depends on D169? I can't get this one to understand that this diff depends on D661. Hmm :-/

Tarrasch updated this revision to Diff 2270.Feb 20 2015, 1:54 PM
  • Validation fixes
Tarrasch updated this revision to Diff 2271.Feb 20 2015, 5:50 PM

Fixed test cases and amended stuff

Please note that this won't validate on Harbormaster because libdwarf is not installed.

Actually, once D661 is out of the way. I think I could submit this as another partial yet compilable patch by splitting out the stack-reification part of this into it's own patch set. The thing is that stack-reification (even when in Haskell land) does not require the dwarf parts. Would that make sense?

tibbe edited edge metadata.Feb 25 2015, 2:05 AM

I haven't had the time to read the code yet (just made one small comment on something I happened to spot during skimming).

What's the cost (operationally speaking) of getting the current execution stack? Is it a lazy operation that grabs a pointer to the stack frame and thereby keeping it alive as a GC root? If so I imagine getting the current execution stack would be almost free, which is very nice as we could always grab it e.g. when we call error.


You definitely want to UNPACK the Word16 fields.

scpmw edited edge metadata.Feb 25 2015, 3:39 AM

@tibbe: For this patch, reifyStack basically walks the stack and copies found return addresses into a buffer. That's obviously a O(n) operation (even if we introduce a bound), which could be undesirable for exceptions.

We had quite a bit of discussion with @simonmar about more efficient alternatives - for example he suggested "freezing" stack chunks when an exception gets thrown, which could then act as GC-able evidence of the stack at throw time. This is probably what you had in mind? I think the last verdict was that it would be rather tricky to implement, and it seems unclear whether this would be faster than limited reification in the first place.

For whatever it's worth, my suggestion has been to reify until the next exception handler. This should keep exceptions that are caught right away relatively quick, while retaining full information for crashes. Also opens up interesting possibilities for re-throwing exception, which has been another subject of debate.

I was about to answer you @tibbe, but I see that @scpmw already gave a very good and complete answer (as always :)). I would just like to say that I think that it would be very hard to implement freezing efficiently. I'm not even sure it's possible. If it's a GC safe data structure, doesn't it need to traversed during GC anyway? I wrote a lot about efficient stack reification in my master thesis. http://www.arashrouhani.com/papers/master-thesis.pdf, I think the tl;dr is to stick with the simplest solution (O(n) copy).

simonmar edited edge metadata.Feb 26 2015, 3:43 AM

I'm still reading through the patch, but meanwhile a couple of thoughts on reifying the stack. Yes we did think about how to do this lazily, the basic idea is

  • snapshot the current stack chunk (max 32k)
  • mark the rest of the stack chunks copy-on-write
  • the reified stack is some object that keeps a pointer to the original stack chunks
  • if the reified stack object is GC'd, we don't need to copy-on-write any more (the GC does this, I suppose)

it's complicated and doesn't save you from having to copy the first stack chunk anyway. So I agree we shouldn't do this.

I rather like the idea of reifying the stack up to the first catch frame, along with some explicit way to snapshot the rest of the stack in the exception handler. So we keep the part of the stack that is about to be overwritten, while the exception handler gets to choose whether to do a full expensive reification or not. We would want a way for an exception handler to re-throw and build up a chain of reified stack chunks, so an exception handler can delegate the choice of whether to reify fully to the next handler on the stack.

Still, we should have some benchmarks that measure the cost of always reifying up to the first exception handler.

I'm still reading through the patch

Cool! I just wouldn't study it too much in detail. Much of it (particularly Dwarf.c I believe) is going to change. I think.

@scpmw, did you start any Dwarf.c-minification yet? Keep me updated on your moves there. Otherwise I'll start looking into it ;). It would be great to have a super-slimmed version of this patch (D662) that only does symbol names. It's a good start probably as you said in D661.

I rather like the idea of reifying the stack up to the first catch frame, along with some explicit way to snapshot the rest of the stack in the exception handler. ...

Yep, this is what I had in mind as well. Though, as this patch is way too big already, I waited with all reification code related to exception and handlers.

scpmw added a comment.Feb 26 2015, 4:52 PM

@Tarrasch: I won't be able to do anything substantial until after next week. If you want to have a stab at it, go ahead.

One more thought: it occurs to me that we already have a way to save the stack up to the catch frame when throwing an exception. It's exactly what raiseAsync does for an asynchronous exception (well, almost). I don't know if this idea pans out, but it might be worth investigating.

bgamari added a subscriber: bgamari.Mar 2 2015, 3:56 PM
lelf added a subscriber: lelf.Apr 3 2015, 8:59 AM
tibbe added a comment.May 31 2015, 8:37 AM

Is this pull request still in progress?

Tarrasch abandoned this revision.May 31 2015, 10:55 AM

Abandoning to clear reviewers watch list. I just opened this diff to show a "direction" for Haskell land stack traces and I think this diff did that. Look forward to a new patch as I described in D661. :)