[RFC] Add libdw-based backtraces
AbandonedPublic

Authored by bgamari on Aug 20 2015, 9:40 AM.

Details

Summary

This is an initial cut at implementing DWARF-based backtraces. I use libdw
for DWARF parsing. There are no user-facing interfaces, so the current code is
tested by simply dumping a stacktrace in schedule.

Unfortunately it may be that I've run into a
bug
in libdw, so I don't have line information at the moment. That being said,
everything is plumbed in for it.

At the moment there are two distinct stacktraces that one can produce: a trace
from the C stack and a trace from the STG stack. I'm not entirely convinced
that the implementation of the latter is quite correct

It seems to be a bit slow, but I suspect it could be made substantially quicker
by caching recent address lookups as there is a great amount of repetition in
the STG stack.

Test Plan

TODO

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
libdw
Lint
Lint WarningsExcuse: I'll fix it
SeverityLocationCodeMessage
Warningrts/Libdw.c:21TXT3Line Too Long
Unit
No Unit Test Coverage
Build Status
Buildable 5345
Build 5678: GHC Patch Validation (amd64/Linux)
Build 5677: arc lint + arc unit
bgamari updated this revision to Diff 3867.Aug 20 2015, 9:40 AM
bgamari retitled this revision from to [RFC] Add libdw-based backtraces.
bgamari updated this object.
bgamari edited the test plan for this revision. (Show Details)
bgamari added reviewers: scpmw, Tarrasch.
bgamari added a subscriber: simonmar.
bgamari added a comment.EditedAug 20 2015, 10:56 AM

Note that this is still fairly early prototype code. There are a number of stylistic issues which I'll address shortly.

We have, however, identified the reason for the lacking line information. It seems that libelf expects that all compilation units of interest are marked with address ranges (either DW_AT_low_pc and DW_AT_high_pc attributes, or, in the case of non-contiguous ranges, a DW_AT_low_pc and one or more DW_AT_ranges). GHC does not provide these on Haskell compilation units.

@scpmw, is there a reason why these aren't provided?

scpmw edited edge metadata.EditedAug 20 2015, 11:17 AM

Interesting. A few points:

  • Does this handle shared libraries properly? Maybe the part with dwfl_linux_proc_report takes care of that?
  • What does portability look like?
  • You choose to re-iterate the whole DWARF information for every symbol. Probably fine for the moment. There's probably some magic behind dwarf_getscopes that does efficient code lookups, right? Hm.

In any case, what's your plan for the code? Just a proof-of-concept, or do you want to integrate it with D963 at some point?

@scpmw, is there a reason why these aren't provided?

No particular reason, no. Feel free to add it, should be really straightforward.

rts/Libdw.c
258

For C it's right - outside of, maybe, some inline cases, I believe? For Haskell code this would be borderline wrong, and getting around this "feature" was a lot of headache for DWARF generation.

@scpmw,

  • I haven't checked the shared library case yet but I believe that dwfl_linux_proc_report should handle this.
  • I'm not terribly optimistic that elfutils will be portable. That being said, I have yet to find a library that can really do this sort of thing portably.
  • I need to look into the performance story; I believe there is some caching done by elfutils but I think I'll need to do some as well
rts/Libdw.c
258

Hmm, I'll need to think about this a bit more in that case.

Can you paste an example of the stack traces? I'd like to see in particular what we get when calling libdw_get_backtrace when the stack contains both C and Haskell code, e.g. from a function we called using the FFI, and also if we call back into Haskell again.

How does this compare with using libunwind? Are there things we can do this way but not the other, and vice versa?

rts/Libdw.c
358

PAP is not a valid stack frame, what's this doing here?

bgamari updated this revision to Diff 3889.Aug 21 2015, 6:24 AM
bgamari edited edge metadata.
  • Configure: Fix symbol name
  • Bag: Fix Foldable instance
  • Dwarf: Output DW_AT_{low,high}_pc attributes
  • Dwarf: Insert missing DW_AT_use_UTF8 attribute
  • Dwarf: Fix pc labels
  • Signals: Print backtrace on SIGUSR2
  • libdw: Clean up uninitialized Backtrace fields
  • libdw: Don't print failed backtraces
  • dwarf: Fix compilation unit {high,low}_pc attributes
  • Fix {low,high}_pc

Can you paste an example of the stack traces? I'd like to see in particular what we get when calling libdw_get_backtrace when the stack contains both C and Haskell code, e.g. from a function we called using the FFI, and also if we call back into Haskell again.

Here are some backtraces by issuing SIGUSR2 to the shootout/pidigits nofib benchmark,

$ inplace/bin/ghc-stage2 -o Main nofib/shootout/pidigits/Main.hs -g -fforce-recomp
[1 of 1] Compiling Main             ( nofib/shootout/pidigits/Main.hs, nofib/shootout/pidigits/Main.o )
Linking Main ...
[1442 ben@ben-laptop ghc(libdw)] $ ./Main 900000 > /dev/null
                  0x70b38f    set_initial_registers ((null):0.0)
            0x7f5cac581a68    dwfl_thread_getframes ((null):0.0)
            0x7f5cac5814bf    (null) ((null):0.0)
            0x7f5cac5817f7    dwfl_getthreads ((null):0.0)
            0x7f5cac581de3    dwfl_getthread_frames ((null):0.0)
                  0x70b91c    libdw_get_backtrace ((null):0.0)
                  0x6fd38e    backtrace_handler ((null):0.0)
            0x7f5cac1dc17f    (null) ((null):0.0)
                  0x70dbec    scavenge_fun_srt (rts/sm/Scav.c:1738.0)
                  0x6fa015    collect_gct_blocks (rts/sm/GC.c:986.0)
                  0x6f4a27    scheduleDoGC ((null):0.0)
                  0x6f5582    schedule (rts/Schedule.c:2380.0)
                  0x6f6f3a    hs_main ((null):0.0)
                  0x409683    (null) ((null):0.0)
            0x7f5cac1c8b45    __libc_start_main ((null):0.0)
                  0x4073a4    _start ((null):0.0)

Main: Failed to get stack frames of current process: No DWARF information found: Success
                  0x70b38f    set_initial_registers ((null):0.0)
            0x7f5cac581a68    dwfl_thread_getframes ((null):0.0)
            0x7f5cac5814bf    (null) ((null):0.0)
            0x7f5cac5817f7    dwfl_getthreads ((null):0.0)
            0x7f5cac581de3    dwfl_getthread_frames ((null):0.0)
                  0x70b91c    libdw_get_backtrace ((null):0.0)
                  0x6fd38e    backtrace_handler ((null):0.0)
            0x7f5cac1dc17f    (null) ((null):0.0)
            0x7f5cad0e948a    __gmpn_sub_n ((null):0.0)

                  0x70b38f    set_initial_registers ((null):0.0)
            0x7f5cac581a68    dwfl_thread_getframes ((null):0.0)
            0x7f5cac5814bf    (null) ((null):0.0)
            0x7f5cac5817f7    dwfl_getthreads ((null):0.0)
            0x7f5cac581de3    dwfl_getthread_frames ((null):0.0)
                  0x70b91c    libdw_get_backtrace ((null):0.0)
                  0x6fd38e    backtrace_handler ((null):0.0)
            0x7f5cac1dc17f    (null) ((null):0.0)
            0x7f5cac1dc3af    __sigprocmask ((null):0.0)
                  0x6fa78d    GarbageCollect ((null):0.0)
                  0x6f4a27    scheduleDoGC ((null):0.0)
                  0x6f5582    schedule (rts/Schedule.c:2380.0)
                  0x6f6f3a    hs_main ((null):0.0)
                  0x409683    (null) ((null):0.0)
            0x7f5cac1c8b45    __libc_start_main ((null):0.0)
                  0x4073a4    _start ((null):0.0)

I'm still working on getting source information working. It seems there may still be an issue in elfutils or our DWARF annotations.

How does this compare with using libunwind? Are there things we can do this way but not the other, and vice versa?

If we are confident that we can express everything necessary to unwind the STG stack in the DWARF call frame information, then libunwind should be fine. If not, however, then we would need to supplement it with another library (probably libdw or a home-grown library) as libunwind provides no ability to lookup symbol and line information for arbitrary addresses.

This is why I eventually just gave up with the higher level libraries (namely libbacktrace and libunwind) and just fell back to libdw. It didn't seem likely that the high-level interfaces would offer enough control to do what we needed without help or hacking..

rts/Libdw.c
358

This needs to be ripped out. As I said, this is still quite early prototype code.

bgamari added a comment.EditedAug 21 2015, 7:50 AM

Also, I have found that the unwinder appears to enter an infinite loop fairly often when unwinding the STG stack. If I manually terminal the unwinding, I find that it's getting stuck on stg_stop_thread_info,

                  0x70b38f    set_initial_registers ((null):0.0)
            0x7ffff7010c9e    dwfl_thread_getframes ((null):0.0)
            0x7ffff7010aa8    get_one_thread_frames_cb ((null):0.0)
            0x7ffff70108d8    get_one_thread_cb ((null):0.0)
            0x7ffff701082c    dwfl_getthreads ((null):0.0)
            0x7ffff7010a2a    getthread ((null):0.0)
            0x7ffff7010aee    dwfl_getthread_frames ((null):0.0)
                  0x70b91c    libdw_get_backtrace ((null):0.0)
                  0x6fd38e    backtrace_handler ((null):0.0)
            0x7ffff6c5c17f    (null) ((null):0.0)
                  0x4591b7    (null) ((null):0.0)
                  0x6ff1b8    stg_marked_upd_frame_info ((null):0.0)
                  0x6ff1b8    stg_marked_upd_frame_info ((null):0.0)
                  0x4591f0    base_GHCziBase_zpzp_info ((null):0.0)
                  0x6ff1b8    stg_marked_upd_frame_info ((null):0.0)
                  0x4591f0    base_GHCziBase_zpzp_info ((null):0.0)
                  0x6ff1b8    stg_marked_upd_frame_info ((null):0.0)
                  0x4eb2b8    base_GHCziIOziHandleziText_zdwa8_info ((null):0.0)
                  0x6ffcd0    stg_catch_frame_info ((null):0.0)
                  0x6fe4b8    stg_stop_thread_info ((null):0.0)
                  0x6fe4b8    stg_stop_thread_info ((null):0.0)
                  0x6fe4b8    stg_stop_thread_info ((null):0.0)
                  0x6fe4b8    stg_stop_thread_info ((null):0.0)
                  0x6fe4b8    stg_stop_thread_info ((null):0.0)
... ad infinitum

@scpmw, perhaps we should handle this symbol specially during DWARF generation to mark the end of the stack (perhaps continuing unwinding on the C stack)?

If we are confident that we can express everything necessary to unwind the STG stack in the DWARF call frame information, then libunwind should be fine. If not, however, then we would need to supplement it with another library (probably libdw or a home-grown library) as libunwind provides no ability to lookup symbol and line information for arbitrary addresses.

I think we need to make the DWARF description complete anyway, so that we can support other tools that inspect the stack. For instance, perf has stack trace support (might want to check what it uses). Ideally I would be able to get a stack trace in gdb that goes through Haskell code. And then if we have that working, would we need RTS support for stack tracing, other than an interface to whatever underlying library we use?

scpmw added a comment.EditedAug 21 2015, 9:07 AM

For instance, perf has stack trace support (might want to check what it uses).

It copies a portion of the memory found at rsp, then later makes sense of it. That's enough for most C programs, but for Haskell that's not too useful.

To get stack traces out of perf we would either have to tell it to do the same for rbp, or somehow program it to recognise the Haskell stack. Not too sure that's realistic though.

Ideally I would be able to get a stack trace in gdb that goes through Haskell code.

That should work - at least as long as you don't mind the source information being rather approximate.

@scpmw, perhaps we should handle this symbol specially during DWARF generation to mark the end of the stack (perhaps continuing unwinding on the C stack)?

... and also it can't go back from Haskell to C (yet?). The "default" unwind rule is to look for the return pointer at *rbp, which will obviously lead to such infinite loops at return targets unless overwritten. That's why low-level Cmm stack frame code always needs to start with an "unwind" clause.

For stg_stop_thread, there'd be two options:

  1. Figure out how to make it return to the C stack in an orderly fashion. Have looked at it a few times, and I remember it being not quite obvious.
  2. Communicate "end of stack" better. GDB normally stops when a frame repeats, so I didn't bother to research further, but if libdw actually loops that might be worth putting some thought into.
bgamari marked 2 inline comments as done.EditedAug 21 2015, 9:13 AM

perf has support for both libunwind and libdw. libdw support came later and I have seen evidence that it is noticeably faster at unwinding while browsing the perf mailing list.

When we get DWARF unwinding of the Haskell stack working then we indeed should have no need for the hack that is libdw_get_stg_backtrace. However I'm still quite unsure of how this should work. It's still not clear to me what in the C stack is supposed to push the unwinder towards the STG stack. This is what the stack looks like when I break into the execution of the pidigits benchmark,

(gdb) bt
#0  0x00007ffff7b84fdb in __gmpn_add_n () from /usr/lib/x86_64-linux-gnu/libgmp.so.10
#1  0x00007ffff7b84cc0 in __gmpn_add () from /usr/lib/x86_64-linux-gnu/libgmp.so.10
#2  0x0000000000659e84 in s9Q0_info () at libraries/integer-gmp/src/GHC/Integer/Type.hs:1621
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) x /40a $sp
0x7fffffff9e88: 0x7ffff7b84cc0 <__gmpn_add+272> 0x9b6e
0x7fffffff9e98: 0x9b6c  0x200964000
0x7fffffff9ea8: 0x2000f4e00     0x200072008
0x7fffffff9eb8: 0x659e84 <s9Q0_info+140>        0x9b6c
0x7fffffff9ec8: 0x0     0x0
0x7fffffff9ed8: 0x0     0x0
0x7fffffff9ee8: 0x0     0x0
0x7fffffff9ef8: 0x0     0x0
0x7fffffff9f08: 0x200304000     0x9b6c
0x7fffffff9f18: 0x9b6c  0x963c58
0x7fffffff9f28: 0x7ff   0x5
0x7fffffff9f38: 0x0     0x0
0x7fffffff9f48: 0x0     0x0
0x7fffffff9f58: 0x0     0x0
...

Perhaps we should expect the s9Q0_info frame should be pushing the unwinder towards the Haskell stack? @scpmw, what did you intend here?

scpmw added a comment.EditedAug 21 2015, 9:18 AM

Perhaps we should expect the s9Q0_info frame should be pushing the underwinder towards the Haskell stack? @scpmw, what did you intend here?

The unwind rule gets determined by the rip - if it's found inside Haskell object code, our unwind rules apply, if it's in C code, their unwind rules apply. Not particularly complicated, we just have to make sure that every frame properly reconstructs the rbp and rsp and it *should* be able to switch correctly.

bgamari added a comment.EditedAug 21 2015, 9:29 AM
In D1156#32130, @scpmw wrote:

For instance, perf has stack trace support (might want to check what it uses).

It copies a portion of the memory found at rsp, then later makes sense of it. That's enough for most C programs, but for Haskell that's not too useful.

Indeed, I think it actually even has kernel support for this to avoid round-trips to user-space.

To get stack traces out of perf we would either have to tell it to do the same for rbp, or somehow program it to recognise the Haskell stack. Not too sure that's realistic though.

I was under the impression that the DWARF CFI information did precisely this. Perhaps I'm missing something?

Ideally I would be able to get a stack trace in gdb that goes through Haskell code.

That should work - at least as long as you don't mind the source information being rather approximate.

So far I've had pretty poor luck with this. See, for instance, the gdb backtrace above.

@scpmw, perhaps we should handle this symbol specially during DWARF generation to mark the end of the stack (perhaps continuing unwinding on the C stack)?

... and also it can't go back from Haskell to C (yet?). The "default" unwind rule is to look for the return pointer at *rbp, which will obviously lead to such infinite loops at return targets unless overwritten. That's why low-level Cmm stack frame code always needs to start with an "unwind" clause.

What is this unwind clause that you speak of? I think I haven't encountered this code yet.

For stg_stop_thread, there'd be two options:

  1. Figure out how to make it return to the C stack in an orderly fashion. Have looked at it a few times, and I remember it being not quite obvious.

@simonmar, any thoughts here?

  1. Communicate "end of stack" better. GDB normally stops when a frame repeats, so I didn't bother to research further, but if libdw actually loops that might be worth putting some thought into.

For now I've special-cased the stop_thread closure in my STG unwinder. I would like to have a better solution than this though (or perhaps even eliminate the STG unwinder altogether).

For now, however, I'm trying to get the elfutils' folks attention so I can get line information.

In D1156#32138, @scpmw wrote:

The unwind rule gets determined by the rip - if it's found inside Haskell object code, our unwind rules apply, if it's in C code, their unwind rules apply. Not particularly complicated, we just have to make sure that every frame properly reconstructs the rbp and rsp and it *should* be able to switch correctly.

Right, this was pretty much the mental model I had built. So it should just be a matter of looking at rip for the last frame before the unwinder goes off the tracks and fixing the CFI generation for the construct found there. Correct?

scpmw added a comment.EditedAug 21 2015, 9:41 AM

Indeed, I think it actually even has kernel support for this to avoid round-trips to user-space.

Well, doing profiling in the kernel is sort-of the entire point of perf.

Just for completeness' sake: Yes we could use perf as a glorified timer, too, and do the stack tracing in user-space. But it would likely be very slow, and it's not quite obvious how we would make it compatible with perf front-end tools.

I was under the impression that the DWARF CFI information did precisely this. Perhaps I'm missing something?

Yes. But the kernel doesn't read DWARF, only the front-end tools do. By that point, the critical bits of information are lost.

Right, this was pretty much the mental model I had built. So it should just be a matter of looking at rip for the last frame before the unwinder goes off the tracks and fixing the CFI generation for the construct found there. Correct?

The rip seems to be correct, as it is (presumably correctly?) locating Haskell source code. I don't have an explanation right away (this normally works), but spontanously I'd guess that either rbp was overwritten and not restored by unwind rules (= doesn't point into the Haskell stack) or the unwound rip isn't actually a valid return pointer despite being Haskell code - somehow.

For the record, after implementing .debug_aranges support, I managed to get this backtrace from this patch,

      0x70b38f    set_initial_registers ((null):0.0)
0x7ffff7010c9e    dwfl_thread_getframes ((null):0.0)
0x7ffff7010aa8    get_one_thread_frames_cb ((null):0.0)
0x7ffff70108d8    get_one_thread_cb ((null):0.0)
0x7ffff701082c    dwfl_getthreads ((null):0.0)
0x7ffff7010a2a    getthread ((null):0.0)
0x7ffff7010aee    dwfl_getthread_frames ((null):0.0)
      0x70b92c    libdw_get_backtrace ((null):0.0)
      0x6fd38e    backtrace_handler ((null):0.0)
0x7ffff6c5c17f    (null) ((null):0.0)
      0x66629c    integerzmgmp_GHCziIntegerziType_timesInteger_info ((null):0.0)
      0x6ff0f8    stg_upd_frame_info ((null):0.0)
      0x6ff0f8    stg_upd_frame_info ((null):0.0)
      0x652338    integerzmgmp_GHCziIntegerziType_eqIntegerzh_info ((null):0.0)
      0x540538    base_GHCziReal_zdfIntegralIntegerzuzdcdivMod_info ((null):0.0)
      0x6ff0f8    stg_upd_frame_info ((null):0.0)
      0x700220    stg_sel_1_upd_info ((null):0.0)
      0x6ff0f8    stg_upd_frame_info ((null):0.0)
      0x666a48    integerzmgmp_GHCziIntegerziType_plusInteger_info ((null):0.0)
      0x6ff0f8    stg_upd_frame_info ((null):0.0)
      0x651448    integerzmgmp_GHCziIntegerziType_geIntegerzh_info ((null):0.0)
      0x6516f8    integerzmgmp_GHCziIntegerziType_geInteger_info ((null):0.0)
      0x6ff0f8    stg_upd_frame_info ((null):0.0)
      0x409110    s36b_info ((null):0.0)
      0x6ff1b8    stg_marked_upd_frame_info ((null):0.0)
      0x50e230    base_GHCziList_zdwsplitAtzq_info ((null):0.0)
      0x50e190    s3gT_info ((null):0.0)
      0x7001b0    stg_sel_0_upd_info ((null):0.0)
      0x459098    s44C_info ((null):0.0)
      0x6ff1b8    stg_marked_upd_frame_info ((null):0.0)
      0x6ff1b8    stg_marked_upd_frame_info ((null):0.0)
      0x4591f0    base_GHCziBase_zpzp_info ((null):0.0)
      0x6ff1b8    stg_marked_upd_frame_info ((null):0.0)
      0x4591f0    base_GHCziBase_zpzp_info ((null):0.0)
      0x6ff1b8    stg_marked_upd_frame_info ((null):0.0)
      0x4eb2b8    base_GHCziIOziHandleziText_zdwa8_info ((null):0.0)
      0x6ffcd0    stg_catch_frame_info ((null):0.0)
      0x6fe4b8    stg_stop_thread_info ((null):0.0)

This was from SIGUSR2 which doesn't explicitly print the STG stack so it appears that we are now getting from the C stack to the STG. As @scpmw stated, however, it seems that we can't unwind back to the C stack.

bgamari updated this revision to Diff 3898.Aug 21 2015, 3:16 PM
bgamari edited edge metadata.
  • Add libbacktrace-based backtraces
  • Dwarf: Fix DW_AT_use_UTF8 attribute
  • Dwarf: Produce {low,high}_pc attributes for compilation units
  • Dwarf: Produce .dwarf_aranges section
  • Signals: Print backtrace on SIGUSR2
bgamari updated this revision to Diff 3899.Aug 21 2015, 3:17 PM
bgamari edited edge metadata.

Fix base commit

Tarrasch edited edge metadata.Aug 21 2015, 4:18 PM

Cool! Awesome progress @bgamari, such productivity! :)

Sad to see that we still haven't found any portable and modern library yet. The symbol name and dwarf info lookups that we are doing are not different from how others are doing it, right? What are other language implementations? Do Go for instance have stack traces supported for all platforms?

bgamari added a comment.EditedAug 22 2015, 2:11 AM

It seems I finally have the GHC's DWARF output enough complete enough to get proper line numbers out. If I throw SIGUSR2 to nbody I get,

$ time ./Main 10000500
-0.169075164
                  0x6ff81f    set_initial_registers (rts/Libdw.c:323.0)
            0x7fd852eaba68    dwfl_thread_getframes ((null):0.0)
            0x7fd852eab4bf    (null) ((null):0.0)
            0x7fd852eab7f7    dwfl_getthreads ((null):0.0)
            0x7fd852eabde3    dwfl_getthread_frames ((null):0.0)
                  0x6ffbfc    libdw_get_backtrace (rts/Libdw.c:295.0)
                  0x6f187e    backtrace_handler (rts/posix/Signals.c:540.0)
            0x7fd852b0617f    (null) ((null):0.0)
                  0x407b58    s91t_info (nofib/shootout/n-body/Main.hs:66.27)
                  0x407cf8    r8YB_info (nofib/shootout/n-body/Main.hs:83.5)
                  0x408408    s99O_info (nofib/shootout/n-body/Main.hs:28.19)
                  0x409e80    Main_main1_info (nofib/shootout/n-body/Main.hs:27.26)
                  0x6f41c0    stg_catch_frame_info (rts/Exception.cmm:370.1)
                  0x6f29a8    stg_stop_thread_info (rts/StgStartup.cmm:42.1)
-0.169034834

real    0m2.195s
user    0m2.183s
sys     0m0.012s

For posterity, we were missing a few things.

  • I believe the .debug_info table was technically mis-formed, with a value being given for the DW_AT_use_UTF8 attribute yet no corresponding definition in the abbreviation
  • We were missing DW_AT_{high,low}_pc attributes on the compilation units. elfutils insists that these are present to allow for efficient lookup
  • We were also missing the entire .dwarf_aranges address range table. elfutils also insists that this is present, again to allow for for efficient lookup. It's not entirely clear to me why it wants both the {high,low}_pc attributes and the aranges table, but fair enough.

Note how the backtrace makes the jump from unwinding the C to the Haskell stack. However, it still can't go the other way around. This is something that will require more thought.

My next step will be to figure out how to expose this functionality to users. @Tarrasch, I think I'll be building off of your patch for this. I intend on keeping your symbol table implementation for cases where we lack libdw.

@Tarrasch, indeed the lack of a portable library is unfortunate. libunwind is at least simpler to use and better documented to libdw and it may to be useful to support it as an alternative to libdw (as done by perf). I have patches lying around implementing support for unwinding with libunwind. I had previously dropped them as the backtraces it produced were quite pitiful, but this may have been due to the issues in our DWARF annotations. I can try them again to see whether libunwind provides usable output now since our DWARF annotations are correct.

One thing I have noticed is that there are no DW_TAG_lexical_block entities in the produced debug information. From the description in the thesis, it sounds as though these should denote Cmm blocks and so I would have imagined they would be quite ubiquitous. @scpmw, did you ever have this working?

It seems that the reason for the lacking lexical scopes is that they are explicitly stripped in Dwarf.hs. Presumably this is because they aren't yet useful as we can't profile. @scpmw, could you comment here?

Nice that you build upon D963 so that patch doesn't get lost. ^^

rts/sm/GC.c
198 ↗(On Diff #3899)

How structured is this Backtrace value? Can it be (with some smaller transformations) exposed as a Haskell value? I just can't find anything on the internet about libdw_get_backtrace ...

For the record, Mark Wielaard has written up some great notes regarding our elf-utils usage. You can find lots of nice tidbits in this thread.

bgamari updated this revision to Diff 3911.Aug 24 2015, 4:10 AM
bgamari edited edge metadata.
  • Libdw: Initial commit of DWARF-based stack unwinding
  • Schedule: Test backtraces
  • GC: Test backtracing
  • Dwarf: Fix DW_AT_use_UTF8 attribute
  • Dwarf: Produce {low,high}_pc attributes for compilation units
  • Dwarf: Produce .dwarf_aranges section
  • Signals: Print backtrace on SIGUSR2
  • Libdw: whitespace
  • Libdw: Shuffle code
  • Libdw: Save object information as well
  • libdw: Add session to capability
  • base: Rework GHC.ExecutionStack.Internal
  • Fix stack
  • Dwarf: [HACK] Don't strip blocks
  • Schedule: Rip out backtrace test
  • GC: Rip out backtrace test

I've pushed a new version of this patch, we begins to expose the DWARF-based backtraces to the Haskell world. I apologize for the state of the code at the moment; there is a bunch of dead code which I'll need to cull but at the moment I'm just trying to get things running quickly. I'll eventually end up splitting up this into several diffs after I'm satisfied with the structure of things.

I intend to place both @Tarrasch's CodeMap and my libdw backend (and potentially even libbacktrace- or libunwind-based backends) behind some sort of abstraction, allowing us to use whatever stacktrace mechanism is available at compile time. At the moment I'm leaning towards having all of these use the same StackTrace type.

Ultimately I would like to implement some basic profiling support with this functionality. @scpmw gives a nice scheme for doing this in Chapter 5 of his thesis. He begins by encoding the tick structure of the executable in a separate .debug_ghc section, using the event-log format. As far as I understand it, this section would be copied to the eventlog at runtime, along with relocated address information for each symbol.

Sampling would consist of pulling the code pointer and writing a record to the event log. An extension to this that we may want to accommodate is to support sampling stack unwindings (probably of limited depth). Moreover, Peter's thesis also suggests that we might record sampling of things beyond the stack (e.g. RTS statistics).

Line information in profiling

The thesis seems to explicitly exclude line information from the debugging information included in the event-log. I still haven't decided how important line information is in this context. On one hand you'll likely be referring back to the Core more often than the original code. On the other, providing source locations to the user may be a great improvement in user experience as they can quickly orient themselves in the program.

Unfortunately, providing line information isn't trivial. The trouble is that this information tends to be large with sparse changes. DWARF for instance, uses a small state machine to generate this information. In principle we could just cut the .debug_line section out of the objects we are running and paste it wholesale into the event-log. However, making sense of this information later isn't entirely trivial.

scpmw added a comment.EditedAug 24 2015, 5:48 AM

I intend to place both @Tarrasch's CodeMap and my libdw backend (and potentially even libbacktrace- or libunwind-based backends) behind some sort of abstraction, allowing us to use whatever stacktrace mechanism is available at compile time. At the moment I'm leaning towards having all of these use the same StackTrace type.

Note that the stack tracing mechanism of @Tarrasch was especially meant to work with Haskell exceptions at some point - especially handling re-throws and such. Some things to think about:

  1. Before we could take stack traces without loading DWARF data. This is good because it allows us to save stack traces even for trivial throws and only print them if we're actually interested in them. This obviously won't work with most DWARF-based unwinders. On the other hand, @Tarrasch already built bracket functions for the code map, and we might repurpose those to turn the whole stack tracing feature on and off (globally overridable by RTS option?). Would load DWARF more eagerly, but probably also okay.
  2. We would need to integrate "piecewise" stack unwinding in some way (at least for my "chunked" call stack approach). Not a big thing, but we would probably have to identify catch frames at minimum.

Also let me ask again - what exactly is the goal here? At this point, you have basically reproduced something we had working about a year ago using a different library. You have added a nice extra feature (being able to trace through C code), but unless I'm wrong we have gained nothing on the portability front, which was the original reason for simplification of the code (plus making it easier to review, obviously). By all means continue working on this, just wanted to clarify where we are. Having multiple competing implementations in the RTS is exactly what I wanted to avoid here.

Peter's thesis also suggests that we might record sampling of things beyond the stack (e.g. RTS statistics).

Other way around - sample *by* RTS statistics. Sampling by allocation, for example, is cheap to the point of being almost free, and very, very useful for optimisation purposes. What we sample is a separate question, and as usual stack traces would be even more useful here.

In fact, if you want to go into profiling I would highly suggest you start here, because doing it from the RTS means that you know the stack state. If you do "proper" program state sampling you will run into all sorts of places where the stack unwinding information is not quite right, which will be a *lot* of headache.

The thesis seems to explicitly exclude line information from the debugging information included in the event-log.

Where did you get that impression from? That's what the "debug source" records refer to (see Listing 5.38). In fact, you have a lot more information than what .debug_line gives you: *All* source locations, including every function that was inlined at the point in question.

bgamari added a comment.EditedAug 24 2015, 7:04 AM
In D1156#32370, @scpmw wrote:

I intend to place both @Tarrasch's CodeMap and my libdw backend (and potentially even libbacktrace- or libunwind-based backends) behind some sort of abstraction, allowing us to use whatever stacktrace mechanism is available at compile time. At the moment I'm leaning towards having all of these use the same StackTrace type.

Note that the stack tracing mechanism of @Tarrasch was especially meant to work with Haskell exceptions at some point - especially handling re-throws and such. Some things to think about:

Thanks for the comments!

  1. Before we could take stack traces without loading DWARF data. This is good because it allows us to save stack traces even for trivial throws and only print them if we're actually interested in them. This obviously won't work with most DWARF-based unwinders. On the other hand, @Tarrasch already built bracket functions for the code map, and we might repurpose those to turn the whole stack tracing feature on and off (globally overridable by RTS option?). Would load DWARF more eagerly, but probably also okay.

Right, this is precisely why I want to keep @Tarrasch's symbol table unwinder as an option. As I point out below, I do not view the symbol table unwinder as competing with the DWARF-based unwinder. I believe DWARF unwinding will be a bit heavier than a symbol table lookup. User's who don't want to pay this cost can use the symbol table or disable backtraces altogether.

  1. We would need to integrate "piecewise" stack unwinding in some way (at least for my "chunked" call stack approach). Not a big thing, but we would probably have to identify catch frames at minimum.

Could you elaborate here? Do you mean that you only want to report the backtrace from the point where the exception is thrown to the point of the first catch frame?

Also let me ask again - what exactly is the goal here? At this point, you have basically reproduced something we had working about a year ago using a different library. You have added a nice extra feature (being able to trace through C code), but unless I'm wrong we have gained nothing on the portability front, which was the original reason for simplification of the code (plus making it easier to review, obviously).

@scpmw, the goal here is to offer stack trace support in the RTS to be consumed by both Haskell and RTS code. This could be used by Haskell code to request explicit backtraces, by exception handlers, and by the GHC runtime itself.

The latter usecase is what prompted me to enter down this rabbit-hole. In particular, I was motivated by GHC Trac Trac #9221. This bug points out the fact that GHC's -j option scales poorly over large core counts. There are two reasons that this may be the case, both of which proper stack-traces could be eludicated with proper stack traces,

  • there appear to be long pauses while capabilities enter GC. I would like to add tracing support in the RTS to dump a stack trace of a thread which takes an unexpectedly long time to enter GC (these threads may be in C or Haskell code, so it is important that we can unwind both stacks, allowing us to pin-point where these threads are stalling
  • there appears to be a substantial amount of blackhole blocking. Currently we have no way to identify which expressions are being black-holed or from where they are evaluated. I would like to add support for collecting stack-traces on black-hole events

In order to get to a point where I could attack either of these needed to learn a great deal about how DWARF debug information is structured. The change from libdwarf to libdw afforded me that opportunity. The decision was in part informed by the decisions made by other projects with similar goals to ours and in part driven by my very positive interactions with the project's developers. This switch had the pleasant side-effect of pointing out a number of issues with our existing DWARF annotations. I'm not by any means tied to this particular implementation but I found that starting with a clean slate was the quickest way for me to get up and running.

Mark Wielaard has described the current portability story of libdw in the thread I cited earlier,

It is ported across a lot of arches on GNU/Linux. i386, x86_64, ppc, ppc64, ppc64le, s390x, arm and aarch64 are at least regularly tested (should be zero fail at release time) and there are other ports in the backends both in tree [alpha, ia64, sparc, tilegx] and some not yet merged out of tree [mips, m68k, hppa]. In theory it should also work on other ELF/DWARF bases systems like *BSD, Debian has some limited success with kfreebsd, and Solaris. But there are some tricky dependencies of some of the dwfl functions on the /proc file system and ptrace, not all of them have clean backend/ebl functions. Darwin/MacOS is a bit harder since it doesn't use ELF and libdw currently depends on the DWARF container being ELF (and I have no idea what the ptrace/proc story is on Darwin). Windows is probably pretty hard given that it doesn't natively support ELF, DWARF or ptrace/proc.

I believe this is similar to libdwarf. Unfortunately this is about as good a portability story as I have seen from a library in this space that meets our interface requirements (although these things seem to be poorly documented in general).

By all means continue working on this, just wanted to clarify where we are. Having multiple competing implementations in the RTS is exactly what I wanted to avoid here.

I don't view the symbol table and DWARF implementations as at all competing. They serve different uses. Many distributions ship binaries stripped of debug symbols; in this case we are currently unable to provide any useful backtraces. @Tarrasch's patch addresses this, allowing us to offer symbol names that might at least point the user in the right direction. On the other hand, users with debug information would likely use the DWARF implementation, which offers substantially more context.

Peter's thesis also suggests that we might record sampling of things beyond the stack (e.g. RTS statistics).

Other way around - sample *by* RTS statistics. Sampling by allocation, for example, is cheap to the point of being almost free, and very, very useful for optimisation purposes. What we sample is a separate question, and as usual stack traces would be even more useful here.

Right, I was specially pointing to the fact that you may want to collect things in addition to stack traces. I mention this as it has direct bearing on the event log record format. I understand that the samples may be triggered by a number of different sources (e.g. allocations, GC synchronization pauses, or blackhole blocking events as I mentioned above).

In fact, if you want to go into profiling I would highly suggest you start here, because doing it from the RTS means that you know the stack state. If you do "proper" program state sampling you will run into all sorts of places where the stack unwinding information is not quite right, which will be a *lot* of headache.

I don't understand what "here" refers to. Do you mean "implement profiling in the RTS?" If so I agree wholeheartedly and that is precisely what I intend to do.

The thesis seems to explicitly exclude line information from the debugging information included in the event-log.

Where did you get that impression from? That's what the "debug source" records refer to (see Listing 5.38). In fact, you have a lot more information than what .debug_line gives you: *All* source locations, including every function that was inlined at the point in question.

Ahh, my apologies. It seems I missed these until now. Very good, then line numbers aren't problematic. I don't recall where I got the impression that you explicitly punted on line information. Silly me.

In the end (quite soon I believe) this Diff will be split into several independent Diffs,

  • Implement .debug_aranges
  • Add {low,high}_pc to compilation units
  • Add backtrace abstraction and StackTrace type
  • Add @Tarrasch's symbol table unwinder
  • Add libdw-based unwinder
  • Add SIGUSR2 signal handler to dump stacktrace
  • Add Haskell-land interfaces to stack tracing
  • Things that haven't yet been done,
    • Fix CFI information for stop_frame (I believe this is possible but I haven't yet tried)
    • Profiling...

I've done quite a bit of editing to my original message. Please disregard the text in the notification email and see the text on the Diff itself when you get a chance,

scpmw added a comment.EditedAug 24 2015, 7:54 AM

The latter case is what prompted me to enter down this rabbit-hole, namely Trac #9221. This bug points out the fact that GHC's -j option scales poorly over large core counts.

Okay, I knew I was missing something here. Yes, I remember having a quick look when you pinged me, but given that I don't have a build computer yet (working on it) and the investigation seemed to have moved on I lost track.

But yes, this makes sense. I think sampling on black-holes came up a few times during my work, but I never had a good example for it. Maybe this is one.

In order to get to a point where I could do this I needed to learn a great deal about how DWARF annotations are structured. The change from libdwarf to libdw afforded me that opportunity. The decision was in-part informed by what other projects with similar goals have done. Moreover, this switch had the pleasant side-effect of pointing out a number of issues with our existing DWARF annotations.

Not questioning the side benefits.

Unfortunately this is about as good a portability story as I have seen from a library in this space which meets our interface requirements.

Puzzling given that DWARF is basically the same for Mac Os and Windows. Yet the library support on these platforms seems basically non-existant.

I don't view the symbol table and DWARF implementations as at all competing. They serve different uses. Many distributions ship binaries stripped of debug symbols; in this case we are currently unable to provide any useful backtraces. @Tarrasch's patch addresses this, allowing us to offer symbol names that might at least point the user in the right direction. On the other hand, users with debug information would likely use the DWARF implementation, which offers substantially more context.

Code always implicitly competes for developer attention, and especially maintaining dependencies is rarely fun.

I feel we especially have to keep the concerns of getting a backtrace (bunch of rip pointers) and digging up information on them apart. Right now we have two stack trace providers within reach:

  • RTS (@Tarrasch's patch)
  • libdw (your patch)
  • (libunwind etc?)

And at minimum five address lookup facilities:

  • dladdr (untested)
  • symbol table, by libelf (my code)
  • DWARF, by libdw (your patch)
  • DWARF, by libdwarf (my code)
  • .debug_ghc (my code)
  • (libbfd? more?)

If libelf is shipped normally along with libdw, there is little reason to have separate code paths for them, especially because libdw can probably do symbol table lookups as well (right?). So it might make more sense to instead have dladdr be the fallback option?

My point is that experiments are good, but before thinking about generalisation we should take a hard look at what exactly we want to achieve and whether this is actually effective at getting us there.

I'm not sure I understand what "here" refers to. Do you mean "implement profiling in the RTS?" If so, I agree wholeheartedly and that is precisely what I intend to do.

I meant allocation profiling. But profiling by black holes should work just as well.

bgamari added a comment.EditedAug 24 2015, 9:21 AM
In D1156#32378, @scpmw wrote:

The latter case is what prompted me to enter down this rabbit-hole, namely Trac #9221. This bug points out the fact that GHC's -j option scales poorly over large core counts.

Okay, I knew I was missing something here. Yes, I remember having a quick look when you pinged me, but given that I don't have a build computer yet (working on it) and the investigation seemed to have moved on I lost track.

But yes, this makes sense. I think sampling on black-holes came up a few times during my work, but I never had a good example for it. Maybe this is one.

Indeed; the blackhole problem is something I've run into in the past as well. I think this represents a rather significant gap in our current performance characterization tools.

In order to get to a point where I could do this I needed to learn a great deal about how DWARF annotations are structured. The change from libdwarf to libdw afforded me that opportunity. The decision was in-part informed by what other projects with similar goals have done. Moreover, this switch had the pleasant side-effect of pointing out a number of issues with our existing DWARF annotations.

Not questioning the side benefits.

Unfortunately this is about as good a portability story as I have seen from a library in this space which meets our interface requirements.

Puzzling given that DWARF is basically the same for Mac Os and Windows. Yet the library support on these platforms seems basically non-existant.

Indeed this is perplexing. IIRC I believe it wouldn't be hard to add OS X support to libdw or libunwind. All that this would require is adding support for Mach-O. I think what is lacking is a motivated person to pick up the project.

I don't view the symbol table and DWARF implementations as at all competing. They serve different uses. Many distributions ship binaries stripped of debug symbols; in this case we are currently unable to provide any useful backtraces. @Tarrasch's patch addresses this, allowing us to offer symbol names that might at least point the user in the right direction. On the other hand, users with debug information would likely use the DWARF implementation, which offers substantially more context.

Code always implicitly competes for developer attention, and especially maintaining dependencies is rarely fun.

My point is ultimately I think we

I feel we especially have to keep the concerns of getting a backtrace (bunch of rip pointers) and digging up information on them apart.

Yes, I agree; there is no reason why these two concerns need to be coupled together.

Right now we have two stack trace providers within reach:

...

If libelf is shipped normally along with libdw, there is little reason to have separate code paths for them, especially because libdw can probably do symbol table lookups as well (right?).

Not quite. Yes, they are both included in elfutils but they are separate packages on Debian derivatives.

I think there is a much higher probability that libelf is available than libdw. For instance, I just checked on an Ubuntu-based Amazon EC2 instance I have running. Despite the fact that I have installed essentially nothing on the machine libelf is installed. libdw, on the other hand, is not. That being said, maybe we just want to say that maintaining these fine-grained dependencies isn't worth the effort.

So it might make more sense to instead have dladdr be the fallback option?

That may be a good option.

My point is that experiments are good, but before thinking about generalisation we should take a hard look at what exactly we want to achieve and whether this is actually effective at getting us there.

Sure. I see three use-cases,

  1. Grab a sample for a profile
  2. Grab a backtrace from the RTS for logging to the console
  3. Grab a backtrace from Haskell code

Of these, (1) will only include collection of a stack trace whereas (2) and (3) will involve both stack trace collection and symbol/source location lookup.

At the moment I think it is unavoidable that we will end up having two stack trace providers: while the STG stack unwinding implemented by @Tarrasch has minimal dependencies, it does not support unwinding into C code, which I think it an important use-case.

Moreover, I think we will have at least two symbol lookup mechanisms,

  • Symbol table lookup (offered by libelf or glibc)
  • DWARF (via libdw)

Moreover, in the event that a Windows or OS X contributor comes along with a library which supports their platform yet not Linux, we would want to support them as well.

On the other hand, I think we agree that the Haskell world should not care how we got our stacktrace. IMHO we should expose a Haskell interface like,

-- | A location in the original program source.
data SrcLoc = SrcLoc { sourceFile   :: String
                     , sourceLine   :: Int
                     , sourceColumn :: Int
                     }

-- | A frame of an execution stack
data StackFrame = StackFrame { objectName   :: String
                             , functionName :: String
                             , srcLoc       :: Maybe SrcLoc
                             }

data StackTrace = ...

-- | How many frames deep is this stack trace?
stackDepth :: StackTrace -> Int

-- | Return a list of the frames of the stack trace, starting with the inner-most frame.
getFrames :: StackTrace -> [StackFrame]

getStackTrace :: IO StackTrace

This provides access to nearly every useful piece of information we have in the DWARF case and yet can still be implemented by something as simple as dladdr. One thing we should keep in mind is Trac #10068, which points out that we currently have a number of representations for source locations in the base libraries which should be cleaned up. We certainly want to avoid contributing to this mess. One way would be to use the following stack frame representation instead,

-- | A frame of an execution stack
data StackFrame = StackFrame { executableObject :: Object
                             , functionName     :: String
                             , srcLoc           :: Maybe SrcLoc
                             }

-- | An object file. Either a Haskell module or native code
data ExecutableObject
    = HsModule Module       -- ^ A Haskell module
    | NativeModule FilePath -- ^ A native object file

Where Module is the type roughly laid out in Trac #10068. This may require some help to the compiler to ensure we can lookup the Module in any given Haskell object file but this shouldn't be so hard (simply generate an additional binding with a predictable name in every module).

Whether we want to abstract over the stack trace source in base, ghc-prim, or in the RTS is a question which I am relatively neutral on.

I'm not sure I understand what "here" refers to. Do you mean "implement profiling in the RTS?" If so, I agree wholeheartedly and that is precisely what I intend to do.

I meant allocation profiling. But profiling by black holes should work just as well.

Ahh, I see. Thanks for the clarification.

scpmw added a comment.EditedAug 24 2015, 10:20 AM

I think there is a much higher probability that libelf is available than libdw.

Well, but how much higher? I also initially assumed that basically every system had libelf, but that appears to be wrong. In my opinion, if we add another top-level dependency to GHC, we better make sure we are getting something good for it. And currently libelf seems to be a lot of trouble for little payoff.

Moreover, I think we will have at least two symbol lookup mechanisms,

  • Symbol table lookup (offered by libelf or glibc)
  • DWARF (via libdw)

Moreover, in the event that a Windows or OS X contributor comes along with a library which supports their platform yet not Linux, we would want to support them as well.

Still wouldn't want to completely dismiss the possibility of rolling a DWARF reader ourselves. Impractical for unwinding the stack using DWARF information, but for just linking .debug_ghc to addresses this will get us fairly far with little effort.

stackDepth :: StackTrace -> Int
getFrames :: StackTrace -> [StackFrame]
getStackTrace :: IO StackTrace

I am still not sure why we're not simply providing getStackTrace :: IO [StackFrame]... How often do you need the length, but not the items?

data ExecutableObject
     = HsModule Module       -- ^ A Haskell module
     | NativeModule FilePath -- ^ A native object file

Might also be useful to give some more documentation for known RTS symbols here.

In D1156#32380, @scpmw wrote:

I think there is a much higher probability that libelf is available than libdw.

Well, but how much higher? I also initially assumed that basically every system had libelf, but that appears to be wrong. In my opinion, if we add another top-level dependency to GHC, we better make sure we are getting something good for it. And currently libelf seems to be a lot of trouble for little payoff.

Well, as I mentioned earlier libelf was available on my essentially untouched EC2 instance yet libdw wasn't. It's not at all clear to me that libelf imposes that much cost above what we would have to accept to use libdw (I would put it at a couple of dozen lines at most). For this reason, in my (weakly held) opinion it makes sense to use libelf if that's all we have available. I don't really mind one way or another, however.

Moreover, I think we will have at least two symbol lookup mechanisms,

  • Symbol table lookup (offered by libelf or glibc)
  • DWARF (via libdw)

Moreover, in the event that a Windows or OS X contributor comes along with a library which supports their platform yet not Linux, we would want to support them as well.

Still wouldn't want to completely dismiss the possibility of rolling a DWARF reader ourselves. Impractical for unwinding the stack using DWARF information, but for just linking .debug_ghc to symbol information this will get us fairly far with little effort.

Fair enough but this would only apply to profiling, no? I wouldn't be opposed to trying something like this, but rolling our own might be a bit tricky (relocations come to mind as one rather sticky point).

stackDepth :: StackTrace -> Int
getFrames :: StackTrace -> [StackFrame]
getStackTrace :: IO StackTrace

I am still not sure why we're not simply providing getStackTrace :: IO [StackFrame]... How often do you need the length, but not the items?

I am not opposed to this per se then again I think it's very likely that any internal representation that we might use that would give us O(1) access to the length, so I don't think we are really tying our hands but exposing it.

data ExecutableObject
     = HsModule Module       -- ^ A Haskell module
     | NativeModule FilePath -- ^ A native object file

Might also be useful to give some more documentation for known RTS symbols here.

I'm not sure I follow. Do you mean add add a constructor to ExecutableObject to denote an RTS object?

I have opened D1172, D1173, and D1174 with the first three pieces of this set. I'll be updating this diff momentarily to base off of these.

bgamari updated this revision to Diff 3920.Aug 25 2015, 3:23 AM
bgamari edited edge metadata.
  • Libdw: Initial commit of DWARF-based stack unwinding
  • Libdw: Save object information as well
  • libdw: Add session to capability
  • Signals: Print backtrace on SIGUSR2
  • base: Rework GHC.ExecutionStack.Internal
  • Fix stack
  • Dwarf: [HACK] Don't strip blocks
  • Binary: Add diffBin
  • Binary: Add getBinMemBuf
  • Debug: Remove extraneous LANGUAGE CPP
  • Core notes
  • Generate .debug_ghc section
bgamari updated this revision to Diff 3921.Aug 25 2015, 3:25 AM
bgamari edited edge metadata.
  • Debug: Remove extraneous LANGUAGE CPP
bgamari updated this revision to Diff 3922.Aug 25 2015, 3:46 AM
bgamari edited edge metadata.
  • Libdw: Initial commit of DWARF-based stack unwinding
  • libdw: Add session to capability
  • Signals: Print backtrace on SIGUSR2
  • Debug: Remove extraneous LANGUAGE CPP
This comment was removed by bgamari.
bgamari abandoned this revision.EditedSep 1 2015, 1:40 PM

Abandoning in favor of D1196.