Add stack traces on crashes on Windows
ClosedPublic

Authored by Phyx on Sep 1 2017, 4:36 PM.

Details

Summary

This patch adds the ability to generate stack traces on crashes for Windows.
When running in the interpreter this attempts to use symbol information from
the interpreter and information we know about the loaded object files to
resolve addresses to symbols.

When running compiled it doesn't have this information and then defaults
to using symbol information from PDB files. Which for now means only
files compiled with ICC or MSVC will show traces compiled.

But I have a future patch that may address this shortcoming.

Also since I don't know how to walk a pure haskell stack, I can for now
only show the last entry. I'm hoping to figure out how Apply.cmm works to
be able to walk the stalk and give more entries for pure haskell code.

In GHCi

$ echo main | inplace/bin/ghc-stage2.exe --interactive ./testsuite/tests/rts/derefnull.hs
GHCi, version 8.3.20170830: http://www.haskell.org/ghc/  :? for help
Ok, 1 module loaded.
Prelude Main>
Access violation in generated code when reading 0x0

 Attempting to reconstruct a stack trace...

   Frame        Code address
 * 0x77cde10    0xc370229 E:\ghc-dev\msys64\home\Tamar\ghc\libraries\base\dist-install\build\HSbase-4.10.0.0.o+0x190031
                 (base_ForeignziStorable_zdfStorableInt4_info+0x3f)

and compiled

Access violation in generated code when reading 0x0

 Attempting to reconstruct a stack trace...

   Frame        Code address
 * 0xf0dbd0     0x40bb01 E:\ghc-dev\msys64\home\Tamar\ghc\testsuite\tests\rts\derefnull.run\derefnull.exe+0xbb01
Test Plan

./validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Lint SkippedExcuse: linter got stuck in a loop and never finished., will re-enable when I add changelog
Unit
No Unit Test Coverage
Build Status
Buildable 17380
Build 33180: [GHC] Linux/amd64: Patch building
Build 33179: [GHC] OSX/amd64: Continuous Integration
Build 33178: [GHC] Windows/amd64: Continuous Integration
Build 33177: arc lint + arc unit
Phyx created this revision.Sep 1 2017, 4:36 PM
simonmar edited edge metadata.Sep 2 2017, 3:27 AM

Nice! I don't follow the Win32 code, but I've no objection to this going in. Don't forget to add it to the user guide and release notes.

rts/RtsStartup.c
609

Why do we need this?

bgamari requested changes to this revision.EditedSep 2 2017, 6:12 PM

Looks pretty reasonable to me. A few small requests inline.

Also since I don't know how to walk a pure haskell stack, I can for now
only show the last entry. I'm hoping to figure out how Apply.cmm works to
be able to walk the stalk and give more entries for pure haskell code.

A nice example of how to walk the Haskell stack can be found in printStackChunk in Printer.c. Essentially just start at Sp and walk down in increments of stack_frame_sizeW(sp). The stack frames themselves can be treated as closures.

rts/Linker.c
983

Would this really be a pathchar? If so, perhaps pathchar should be renamed?

rts/RtsFlags.c
441

Users guide documentation would be nice.

rts/RtsStartup.c
609

A good question.

This revision now requires changes to proceed.Sep 2 2017, 6:12 PM
Phyx added a comment.Sep 3 2017, 5:07 AM

be able to walk the stalk and give more entries for pure haskell code.

A nice example of how to walk the Haskell stack can be found in printStackChunk in Printer.c. Essentially just start at Sp and walk down in increments of stack_frame_sizeW(sp). The stack frames themselves can be treated as closures.

Ah,great, thanks! I'll go take a look.

rts/RtsStartup.c
609

We don't, I was debugging something and forgot to undo this hunk. :(

Phyx added inline comments.Sep 3 2017, 11:50 AM
rts/Linker.c
983

The message contains path information, which may be unicode, which is why it forced this entire function to use pathchar. So it is correct in that sense.

I think just kill the abort and we'll be all set.

rts/Linker.c
983

Alright, fair enough.

Phyx retitled this revision from Add stack traces on crashed on Windows to Add stack traces on crashes on Windows.Sep 23 2017, 8:26 PM
Phyx updated this revision to Diff 14073.Sep 23 2017, 8:30 PM
Phyx edited edge metadata.

rebased & fix lint errors & changelog

Phyx added a comment.Sep 25 2017, 4:20 AM
In D3913#109869, @Phyx wrote:

be able to walk the stalk and give more entries for pure haskell code.

A nice example of how to walk the Haskell stack can be found in printStackChunk in Printer.c. Essentially just start at Sp and walk down in increments of stack_frame_sizeW(sp). The stack frames themselves can be treated as closures.

Ah,great, thanks! I'll go take a look.

I've left this for now, I couldn't get it to work properly. When the exception happens you may or may not be in the middle of a Haskell closure, but even when I was, finding the starting address of the closure and looking at it the first few bytes were always null, so I could never get the type of it.

Made even more complicated by that C and haskell stacks can be interleaved? So I need to know which one I'm traversing. Probably doing something wrong, but will come back to it later.

Phyx updated this revision to Diff 14139.Sep 26 2017, 2:12 PM
  • rebase
bgamari requested changes to this revision.Oct 3 2017, 1:00 PM

I've left this for now, I couldn't get it to work properly. When the exception happens you may or may not be in the middle of a Haskell closure, but even when I was, finding the starting address of the closure and looking at it the first few bytes were always null, so I could never get the type of

Ahh, tricky. Yes, I suppose in the general case we may crash when the stack is in an inconsistent state. I wonder if we could be more careful in this regard though.

Anyways, bumping out of the review queue for now.

This revision now requires changes to proceed.Oct 3 2017, 1:00 PM
Phyx added a comment.Oct 19 2017, 1:14 AM

I've left this for now, I couldn't get it to work properly. When the exception happens you may or may not be in the middle of a Haskell closure, but even when I was, finding the starting address of the closure and looking at it the first few bytes were always null, so I could never get the type of

Ahh, tricky. Yes, I suppose in the general case we may crash when the stack is in an inconsistent state. I wonder if we could be more careful in this regard though.

Anyways, bumping out of the review queue for now.

Hmm did you want me to fix the full Haskell stack now?
I was going to refine it later since the C stack works fine and the Haskell one gives you only the top of the stack, but still a lot more useful information than what it currently says..

So is it ok to use this version and expand it later?

bgamari accepted this revision.Oct 19 2017, 8:00 AM
In D3913#114819, @Phyx wrote:

Hmm did you want me to fix the full Haskell stack now?
I was going to refine it later since the C stack works fine and the Haskell one gives you only the top of the stack, but still a lot more useful information than what it currently says..

So is it ok to use this version and expand it later?

Ahh, I think I misinterpreted,

I've left this for now, I couldn't get it to work properly.

I was under the impression that this would crash when taking a stack trace (as DWARF lookup used to do), which would have been a regression. If this patch strictly improves things then I am happy to take it. Having a C stacktrace is certainly better than nothing.

This revision is now accepted and ready to land.Oct 19 2017, 8:00 AM
Phyx updated this revision to Diff 14436.Oct 20 2017, 3:27 PM
  • rebase
Phyx updated this revision to Diff 14440.Oct 21 2017, 5:06 AM
  • fix werror on unused variables.
Phyx closed this revision.Oct 22 2017, 6:33 AM

Committed but harbormaster seems not to have noticed.

trofi added a subscriber: trofi.Dec 2 2017, 5:20 AM
trofi added inline comments.
rts/linker/PEi386.c
159

I suggest using lowercase filenames in library and header names to make cross-compilers
from linux to windows easier to build.

Sent https://phabricator.haskell.org/D4247 for review.