stack: fix stack allocations on Windows
ClosedPublic

Authored by Phyx on Sun, Jul 1, 5:52 AM.

Details

Summary

On Windows one is not allowed to drop the stack by more than a page size.
The reason for this is that the OS only allocates enough stack till what
the TEB specifies. After that a guard page is placed and the rest of the
virtual address space is unmapped.

The intention is that doing stack allocations will cause you to hit the
guard which will then map the next page in and move the guard. This is
done to prevent what in the Linux world is known as stack clash
vulnerabilities https://access.redhat.com/security/cve/cve-2017-1000364.

There are modules in GHC for which the liveliness analysis thinks the
reserved 8KB of spill slots isn't enough. One being DynFlags and the
other being Cabal.

Though I think the Cabal one is likely a bug:

4d6544:       81 ec 00 46 00 00       sub    $0x4600,%esp
4d654a:       8d 85 94 fe ff ff       lea    -0x16c(%ebp),%eax
4d6550:       3b 83 1c 03 00 00       cmp    0x31c(%ebx),%eax
4d6556:       0f 82 de 8d 02 00       jb     4ff33a <_cLpg_info+0x7a>
4d655c:       c7 45 fc 14 3d 50 00    movl   $0x503d14,-0x4(%ebp)
4d6563:       8b 75 0c                mov    0xc(%ebp),%esi
4d6566:       83 c5 fc                add    $0xfffffffc,%ebp
4d6569:       66 f7 c6 03 00          test   $0x3,%si
4d656e:       0f 85 a6 d7 02 00       jne    503d1a <_cLpb_info+0x6>
4d6574:       81 c4 00 46 00 00       add    $0x4600,%esp

It allocates nearly 18KB of spill slots for a simple 4 line function
and doesn't even use it. Note that this doesn't happen on x64 or
when making a validate build. Only when making a build without a
validate and build.mk.

This and the allocation in DynFlags means the stack allocation will jump
over the guard page into unmapped memory areas and GHC or an end program
segfaults.

The pagesize on x86 Windows is 4KB which means we hit it very easily for
these two modules, which explains the total DOA of GHC 32bit for the past
3 releases and the "random" segfaults on Windows.

0:000> bp 00503d29
0:000> gn
Breakpoint 0 hit
WARNING: Stack overflow detected. The unwound frames are extracted from outside normal stack bounds.
eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40
eip=00503d29 esp=013e96fc ebp=03cf8f70 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
setup+0x103d29:
00503d29 89442440        mov     dword ptr [esp+40h],eax ss:002b:013e973c=????????
WARNING: Stack overflow detected. The unwound frames are extracted from outside normal stack bounds.
WARNING: Stack overflow detected. The unwound frames are extracted from outside normal stack bounds.
0:000> !teb
TEB at 00384000
    ExceptionList:        013effcc
    StackBase:            013f0000
    StackLimit:           013eb000

This doesn't fix the liveliness analysis but does fix the allocations, by emitting a function call to
__chkstk_ms when doing allocations of larger than a page, this will make sure the stack is probed every
page so the kernel maps in the next page.

__chkstk_ms is provided by libGCC, which is under the GNU runtime exclusion license, so it's safe to link
against it, even for proprietary code. (Technically we already do since we link compiled C code in.)

For allocations smaller than a page we drop the stack and probe the new address. This avoids the function
call and still makes sure we hit the guard if needed.

PS: In case anyone is Wondering why we didn't notice this before, it's because we only test x86_64 and on
Windows 10. On x86_64 the page size is 8KB and also the kernel is a bit more lenient on Windows 10
in that it seems to catch the segfault and resize the stack if it was unmapped:

0:000> t
eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40
eip=00503d2d esp=013e96fc ebp=03cf8f70 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
setup+0x103d2d:
00503d2d 8b461b          mov     eax,dword ptr [esi+1Bh] ds:002b:03b6b9e4=03cac431
0:000> !teb
TEB at 00384000
    ExceptionList:        013effcc
    StackBase:            013f0000
    StackLimit:           013e9000

This fixes the stack allocations, and as soon as I get the time I will look at the liveliness analysis.
I find it highly unlikely that simple Cabal function requires ~2200 spill slots.

Test Plan

./validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
Phyx created this revision.Sun, Jul 1, 5:52 AM
Phyx edited the summary of this revision. (Show Details)Sun, Jul 1, 6:03 AM
Phyx edited the summary of this revision. (Show Details)Sun, Jul 1, 6:10 AM
Phyx edited the summary of this revision. (Show Details)Sun, Jul 1, 6:14 AM

and as soon as I get the time I will look at the liveliness analysis. I find it highly unlikely that simple Cabal function requires ~2200 spill slots.

When you do open a ticket for this would you mind adding me in CC. I'm intrigued by this and would like to hear what caused this if it was a bug.

Phyx added a comment.Mon, Jul 2, 12:05 PM

and as soon as I get the time I will look at the liveliness analysis. I find it highly unlikely that simple Cabal function requires ~2200 spill slots.

When you do open a ticket for this would you mind adding me in CC. I'm intrigued by this and would like to hear what caused this if it was a bug.

Sure, no problem.

It allocates nearly 18KB of spill slots for a simple 4 line function and doesn't even use it.

This is orthogonal to the issue at hand but terrible nevertheless. Do you know which function causes such terrible code generation? Perhaps GHC can do better here (or perhaps Cabal just needs some refactoring).

Phyx added a comment.Mon, Jul 2, 2:34 PM

It allocates nearly 18KB of spill slots for a simple 4 line function and doesn't even use it.

This is orthogonal to the issue at hand but terrible nevertheless. Do you know which function causes such terrible code generation? Perhaps GHC can do better here (or perhaps Cabal just needs some refactoring).

This module is full of such allocations, buildExe for instance is one, but there are a lot of these generated info tables like _cNW6_info that all do this.
Interestingly enough this does not happen with validate or devel2 builds. So it must be something between the default optimization level and whatever validate and devel2 do.

Cabal isn't the only offender here though, DynFlags has a few functions which allocate huge chunks too for instance, these are what caused the reports where ghc-pkg or ghc died ungracefully when starting up.

Phyx added a comment.Tue, Jul 10, 11:27 AM

Ping... Ideally this should be in 8.6 it's critical enough that without it we shouldn't even make windows 32 bit builds.. 64 we'll probably get away with for now as before..

AndreasK added inline comments.Tue, Jul 10, 4:00 PM
compiler/nativeGen/X86/Instr.hs
909
-- These will clobber AX but since this will be the first thing in
-- an _info table this should be ok.

Shouldn't it say "first thing after"?

Maybe add a small explaination why this is fine too.

  • If we enter a proc from calls or global jumps eax is dead (since it's caller saved).
  • We never enter the block containing the stack allocation code from within the same proc.

The later is enforced by rewriting any jumps targeting this block to it's successor when we insert the dealloc
instructions.

Phyx added a comment.Wed, Jul 11, 1:38 PM

Thanks for the comments! I'll update the comment when I get home tomorrow.

bgamari requested changes to this revision.Thu, Jul 12, 8:27 AM
bgamari added inline comments.
compiler/nativeGen/X86/Instr.hs
909

I agree with @AndreasK; this definitely could use more explanation. Register liveness arguments are quite time-consuming to reconstruct so ample comments are warranted here.

Instead of "first thing in an _info table" I would just say "this is the thing we do when entering the closure".

This revision now requires changes to proceed.Thu, Jul 12, 8:27 AM

Looks quite good other than the comments.

I agree that this should be in 8.6.

Phyx updated this revision to Diff 17314.Sat, Jul 14, 6:33 AM
Phyx marked 2 inline comments as done.

update comments

bgamari accepted this revision.Sat, Jul 14, 6:49 PM

Much better; thanks @Phyx!

This revision is now accepted and ready to land.Sat, Jul 14, 6:49 PM
This revision was automatically updated to reflect the committed changes.