On Windows one is not allowed to drop the stack by more than a page size.
The reason for this is that the OS only allocates enough stack till what
the TEB specifies. After that a guard page is placed and the rest of the
virtual address space is unmapped.
The intention is that doing stack allocations will cause you to hit the
guard which will then map the next page in and move the guard. This is
done to prevent what in the Linux world is known as stack clash
There are modules in GHC for which the liveliness analysis thinks the
reserved 8KB of spill slots isn't enough. One being DynFlags and the
other being Cabal.
Though I think the Cabal one is likely a bug:
4d6544: 81 ec 00 46 00 00 sub $0x4600,%esp 4d654a: 8d 85 94 fe ff ff lea -0x16c(%ebp),%eax 4d6550: 3b 83 1c 03 00 00 cmp 0x31c(%ebx),%eax 4d6556: 0f 82 de 8d 02 00 jb 4ff33a <_cLpg_info+0x7a> 4d655c: c7 45 fc 14 3d 50 00 movl $0x503d14,-0x4(%ebp) 4d6563: 8b 75 0c mov 0xc(%ebp),%esi 4d6566: 83 c5 fc add $0xfffffffc,%ebp 4d6569: 66 f7 c6 03 00 test $0x3,%si 4d656e: 0f 85 a6 d7 02 00 jne 503d1a <_cLpb_info+0x6> 4d6574: 81 c4 00 46 00 00 add $0x4600,%esp
It allocates nearly 18KB of spill slots for a simple 4 line function
and doesn't even use it. Note that this doesn't happen on x64 or
when making a validate build. Only when making a build without a
validate and build.mk.
This and the allocation in DynFlags means the stack allocation will jump
over the guard page into unmapped memory areas and GHC or an end program
The pagesize on x86 Windows is 4KB which means we hit it very easily for
these two modules, which explains the total DOA of GHC 32bit for the past
3 releases and the "random" segfaults on Windows.
0:000> bp 00503d29 0:000> gn Breakpoint 0 hit WARNING: Stack overflow detected. The unwound frames are extracted from outside normal stack bounds. eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40 eip=00503d29 esp=013e96fc ebp=03cf8f70 iopl=0 nv up ei pl nz na po nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202 setup+0x103d29: 00503d29 89442440 mov dword ptr [esp+40h],eax ss:002b:013e973c=???????? WARNING: Stack overflow detected. The unwound frames are extracted from outside normal stack bounds. WARNING: Stack overflow detected. The unwound frames are extracted from outside normal stack bounds. 0:000> !teb TEB at 00384000 ExceptionList: 013effcc StackBase: 013f0000 StackLimit: 013eb000
This doesn't fix the liveliness analysis but does fix the allocations, by emitting a function call to
__chkstk_ms when doing allocations of larger than a page, this will make sure the stack is probed every
page so the kernel maps in the next page.
__chkstk_ms is provided by libGCC, which is under the GNU runtime exclusion license, so it's safe to link
against it, even for proprietary code. (Technically we already do since we link compiled C code in.)
For allocations smaller than a page we drop the stack and probe the new address. This avoids the function
call and still makes sure we hit the guard if needed.
PS: In case anyone is Wondering why we didn't notice this before, it's because we only test x86_64 and on
Windows 10. On x86_64 the page size is 8KB and also the kernel is a bit more lenient on Windows 10
in that it seems to catch the segfault and resize the stack if it was unmapped:
0:000> t eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40 eip=00503d2d esp=013e96fc ebp=03cf8f70 iopl=0 nv up ei pl nz na po nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202 setup+0x103d2d: 00503d2d 8b461b mov eax,dword ptr [esi+1Bh] ds:002b:03b6b9e4=03cac431 0:000> !teb TEB at 00384000 ExceptionList: 013effcc StackBase: 013f0000 StackLimit: 013e9000
This fixes the stack allocations, and as soon as I get the time I will look at the liveliness analysis.
I find it highly unlikely that simple Cabal function requires ~2200 spill slots.