Provide DWARF-based backtraces to Haskell-land
ClosedPublic

Authored by bgamari on Sep 1 2015, 1:45 PM.

Details

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • libdw: Add session to Capability
  • base: Add Haskell interface to ExecutionStack
bgamari updated this revision to Diff 4080.Sep 4 2015, 5:06 PM
bgamari edited edge metadata.

Reverse chunk order

bgamari updated this revision to Diff 4088.Sep 5 2015, 9:19 AM
bgamari edited edge metadata.
  • kill debug output
bgamari updated this revision to Diff 4097.Sep 6 2015, 12:10 PM
bgamari edited edge metadata.

Rebase

simonmar requested changes to this revision.EditedOct 13 2015, 8:53 AM
simonmar added a reviewer: simonmar.

I think adding the LibDwSession to the capability is the wrong way to do this. You would need multiple foreign calls into the libdw API to happen on the same OS thread, but there's no guarantee that this is the case unless you're using a bound thread. (edit: that's not right, what I really mean is you would need the Haskell thread to stay on the same capability, and there's no guarantee it will do that unless you use forkOn).

Maybe keep a pool of free LibDwSessions, protected by a lock, grab one from the pool each time you need one. You would need to pass it around in Haskell land, but I think that's ok, and maybe wrap it in a ForeignPtr or use bracket, or both, to ensure that it gets returned to the pool at the end.

This revision now requires changes to proceed.Oct 13 2015, 8:53 AM
simonmar added inline comments.Oct 14 2015, 7:27 AM
libraries/base/GHC/ExecutionStack/Internal.hsc
66

Can't we do this with hsc2hs and (#peek ...)?

73

ditto

78–80

ditto

129

Maybe this works just about, but the way you're using the ForeignPtr is really fragile - the use of unsafeForeignPtrToPtr above relies on the fact that iterChunk is consuming those chunks and does a withForeignPtr, which is a non-local property.

I think we could refactor this to be safer. The right way to do it is with unsafeInterleaveIO, the same way we do lazy I/O in the I/O library. e.g.

stackFrames (StackTrace fptr) = unsafePerformIO $ do
   chunks <- withForeignPtr fptr chunksList -- can do this up-front, because we need to reverse it anyhow
   go (reverse chunks)
 where
  go [] = return []
  go (chunk:chunks) = do
     this <- withForeignPtr fptr $ iterChunk chunk
     rest <- unsafeInterleaveIO chunks
     return (this ++ rest)
bgamari updated this revision to Diff 4612.Oct 23 2015, 4:05 AM
bgamari edited edge metadata.
  • rts: Expose more libdw symbols
  • rts: Add simple resource pool
  • rts: Add LibdwPool, a pool for libdw sessions
  • base: Add Haskell interface to ExecutionStack
bgamari marked 6 inline comments as done.Oct 23 2015, 4:14 AM

@simonmar, how does this look?

libraries/base/GHC/ExecutionStack/Internal.hsc
66

Sure.

129

Sounds reasonable to me.

bgamari updated this revision to Diff 4622.Oct 23 2015, 4:16 AM
bgamari marked 2 inline comments as done.
bgamari edited edge metadata.

Fix long lines

bgamari marked 2 inline comments as done.Oct 23 2015, 4:27 AM
bgamari added inline comments.
rts/LibdwPool.c
19 ↗(On Diff #4622)

I'm not entirely sure how to size the pool. Given that we have relatively little control over how long the user holds on to their StackTrace objects, I'm a bit worried that we may run out of sessions in the pool and consequently deadlock. Then again, allowing an unbounded pool size doesn't seem entirely reasonable either. Ideas?

bgamari updated this revision to Diff 4627.Oct 23 2015, 6:23 AM
bgamari edited edge metadata.
  • rts: Expose more libdw symbols
  • rts: Add simple resource pool
  • rts: Add LibdwPool, a pool for libdw sessions
  • base: Add Haskell interface to ExecutionStack
bgamari updated this revision to Diff 4628.Oct 23 2015, 6:26 AM
bgamari edited edge metadata.
  • Quote GHC path in configure so we can deal with multiple spaces.
  • Quote GHC_PKG in Makefile.
  • rts: Expose more libdw symbols
  • rts: Add simple resource pool
  • rts: Add LibdwPool, a pool for libdw sessions
  • base: Add Haskell interface to ExecutionStack
bgamari updated this revision to Diff 4629.Oct 23 2015, 6:36 AM
bgamari edited edge metadata.

Rebase

bgamari updated this revision to Diff 4632.Oct 23 2015, 6:51 AM
bgamari edited edge metadata.

Rebase?

For reasons beyond my understanding Harbormaster is now refusing to apply this patch.

@austin, do you have any idea what is going on here?

bgamari updated this revision to Diff 4633.Oct 23 2015, 7:07 AM
bgamari edited edge metadata.
  • rts: Expose more libdw symbols
  • rts: Add simple resource pool
  • rts: Add LibdwPool, a pool for libdw sessions
  • base: Add Haskell interface to ExecutionStack

For reasons beyond my understanding Harbormaster is now refusing to apply this patch.

@austin, do you have any idea what is going on here?

Never mind, looks like git silliness is to blame.

bgamari updated this revision to Diff 4636.Oct 23 2015, 8:07 AM
bgamari edited edge metadata.

Fix inline definitions

bgamari updated this revision to Diff 4637.Oct 23 2015, 8:14 AM
bgamari edited edge metadata.

Perhaps this is finally it

Cool! Looked through a bit. Did not read every line in detail though. Do you plan to migrate the test cases from my patch too? Would for sure boost my confidence in the correctness of the good.

Good job! I hope this can be merged sometime!! :)

libraries/base/GHC/ExecutionStack.hs
20

Remove stackFrames. While at it. Flit it into the more common >>=.

Cool! Looked through a bit. Did not read every line in detail though. Do you plan to migrate the test cases from my patch too? Would for sure boost my confidence in the correctness of the good.

Hi Tarrasch!

I have one end-to-end test in D1223 although more would probably be appropriate. I'll have a look at your Diff to see what can be pillaged.

Good job! I hope this can be merged sometime!! :)

Thanks! We're getting there!

simonmar requested changes to this revision.Oct 28 2015, 7:42 PM
simonmar edited edge metadata.
simonmar added inline comments.
libraries/base/GHC/ExecutionStack/Internal.hsc
69–72

I think it might be better to use bracket here, and don't use a ForeignPtr. Also see comments on stackFrames below.

125

can you use a (#const ...) ?

137

Could we grab the session each time we do a chunk? That would avoid holding onto the session for an indefinite amount of time, since it gets stored in a thunk at the end of the list.

167

Is it worth being lazy within a chunk? Maybe we just need to be lazy at the chunk level.

rts/LibdwPool.c
20 ↗(On Diff #4637)

I'd be fine with having just a single session in the pool, but holding onto it for shorter periods of time.

rts/Pool.h
26 ↗(On Diff #4637)

Wow, you've really gone to town on this. One style nit: we normally use camel case for APIs in the RTS.

This revision now requires changes to proceed.Oct 28 2015, 7:42 PM
bgamari added inline comments.Oct 29 2015, 4:51 AM
libraries/base/GHC/ExecutionStack/Internal.hsc
125

Sure.

137

This all gets very tricky when code is dynamically loaded/unloaded as the addresses in your captured stack may no longer be mapped (merely resulting in an unhelpful Location) or may be occupied by a different symbol entirely (resulting in an actively misleading Location). By holding on to the same session as we resolve the frames we know that we'll have the correct Locations even in the face of mapping changes. Do you think we should be worrying about this sort of thing?

167

I don't see how it hurts.

rts/Pool.h
26 ↗(On Diff #4637)

Ahh, right; I really need to fight habit to keep myself from falling back to snake case. I'll go back and fix this.

simonmar added inline comments.Oct 29 2015, 5:41 AM
libraries/base/GHC/ExecutionStack/Internal.hsc
137

No, I don't think we should worry about this. It's a choice between two bad things:

  • Possibility of wrong information if code is unloaded
  • Possibility of having to spin up new sessions if we have unevaluated stack traces that leak

Managing a pool of sessions in the face of dynamically loading and unloading code also seems hard. Have you thought about how to do this? You can only update sessions that are not in use, and when a session is released later we would have to update it then, or discard it. With a single session it's easier: the thread loading the new code would wait until the lock on the session is released and then update it.

We could always add a strict interface for people who care about resolving all the addresses right now, and make it hold the session until the end.

bgamari added inline comments.Oct 29 2015, 8:10 AM
libraries/base/GHC/ExecutionStack/Internal.hsc
137

Managing a pool of sessions in the face of dynamically loading and unloading code also seems hard.

I actually don't think it's hard at all so long as you hold on to the session for as long as you need it. The linker would simply discard every session in the pool and more those currently in use as "dirty" such that they will be discarded on release instead of returned to the pool. This should just work as things are currently implemented.

simonmar added inline comments.Oct 29 2015, 8:25 AM
libraries/base/GHC/ExecutionStack/Internal.hsc
137

Sure (that's what I meant by "discard it" in my comment), but then you end up creating a completely new session instead of updating the existing one.

How much space does a session require, and how long does it take to create a new one?

bgamari added inline comments.Oct 29 2015, 10:37 AM
libraries/base/GHC/ExecutionStack/Internal.hsc
137

My premise has been that updating a session would be too difficult and finicky to be worthwhile. That being said, I realize Facebook makes good use of dynamic loading so feel free to say if you disagree.

I'll try to make some measurements of the session size and initialization time.

bgamari updated this revision to Diff 4858.Nov 1 2015, 11:00 AM
bgamari edited edge metadata.
bgamari marked 5 inline comments as done.

Rebase, rename symbols

bgamari added a comment.EditedNov 1 2015, 11:13 AM

I've pushed 52c6e3d608ee3d010337d4405df86c2620d24887 which renames the libdw symbols to conform with the RTS's CamelCase convention. All of the patches in the DWARF series have been updated to reflect this.

This revision was automatically updated to reflect the committed changes.

I suspect this commit may have broken the build for me on 64-bit Windows:

$ inplace/bin/ghc-stage2.exe --interactive
GHCi, version 7.11.20151123: http://www.haskell.org/ghc/  :? for help
ghc-stage2.exe: unable to load package `base-4.9.0.0'
ghc-stage2.exe: C:\Users\ryanscot\Documents\Software\ghc\libraries\base\dist-install\build\HSbase-4.9.0.0.o: unknown symbol `libdwPoolClear'

Only interactive mode seems to be affected; just running ghc by itself still seems to work.