The current OS memory allocator conflates the concepts of allocating
address space and allocating memory, which makes the HEAP_ALLOCED()
implementation excessively complicated (as the only thing it cares
about is address space layout) and slow. Instead, what we want
is to allocate a single insanely large contiguous block of address
space (to make HEAP_ALLOCED() checks fast), and then commit subportions
of that in 1MB blocks as we did before.
This is currently behind a flag, USE_LARGE_ADDRESS_SPACE, that is only enabled for
The current OS memory allocator conflates the concepts of allocating
I think we want this to be a private symbol, so move it into rts/sm/MBlock.h, and the related macros too. That should mean you get one fewer memory references when the RTS is a shared library.
We can always provide an external API for this if people want it, but I seriously doubt anyone is using HEAP_ALLOCED outside of the RTS.
I'm less convinced about moving block descriptors, so it's good this patch is separate.
Does this work on OS X where we reserve the memory using vm_allocate?
these should be sysErrorBelch
I see what you mean. I tried to keep the convention to reserve/commit following the windows naming, but yes, it is sort of confusing that to get the reserve behavior we need to pass noreserve to mmap.
Good question, I don't have an OS X machine to test on.
Yes, allocation is O(n), but only for allocation of large (>= 2 MB) contiguous blocks, which should not happen often.
On the issue of code duplication, one possibility would be to remove the free list in the block allocator altogheter, and immediately release the memory to the mblock allocator in free_mega_group. The mblock layer would then decide whether to wait for returnMemoryToOS, decommit or decommit+release free.
Ugh, you're right, thanks for spotting this.
Yes - if you were to create a benchmark that stresses this code path and show that there's little difference with the malloc-based free list then I'd be happy to go with this.
Updated with some of the review comments - now builds fine,
passes validate and should be bug free. I'll work on addressing
the other comments soon, but I wanted to give out a working
version in case some one else wants to play with it.
Ready for merge, I hope.
Ok, all review comments have been addressed now, and this
passes validate locally on Linux and Windows.
I have two benchmarks for the O(n) allocation behavior
(in which the run time does not change significantly),
but I did not find how to submit nofib patches.
CR is all comment fixes, and one minor UI nit. But we need to test on Darwin before we can push this.
We need to validate this on Darwin, which hasn't been done yet.
The new state parameter is a bit funny, it would be best to say exactly why we need it (i.e. the specific use-case). I feel like there should be a better way to structure this.
say what this function does
And here, you should definitely explain what the state is for, in this concrete case.
I think it would be better for us to specifically report that we ran out of preallocated virtual address space, because this is not a strictly "out of memory" condition.
MB = megablock, not megabyte!
Hm, is it inevitable that we lose the request size?
Build error on Darwin:
rts/posix/OSMem.c: In function ‘my_mmap’: rts/posix/OSMem.c:149:0: error: ‘MAP_RESERVE’ undeclared (first use in this function) rts/posix/OSMem.c:149:0: error: (Each undeclared identifier is reported only once
I'll try and fix it.
After fixing the typo, darwin validate errors
Unexpected failures: ../../libraries/unix/tests user001 [bad stdout] (normal) cabal/cabal01 cabal01 [bad exit code] (normal) cabal/cabal06 cabal06 [bad stdout] (normal) cabal/sigcabal01 sigcabal01 [bad stderr] (normal) ffi/should_run T2276_ghci [bad stdout or stderr] (ghci) ghc-api T8628 [bad exit code] (normal) ghc-api T8639_api [bad exit code] (normal) ghci/prog003 prog003 [bad stderr] (ghci) ghci/scripts T8696 [bad stderr] (ghci) ghci/scripts ghci058 [bad stderr] (ghci) llvm/should_run/subsections_via_symbols subsections_via_symbols [bad exit code] (normal) rts T5435_dyn_asm [bad stdout] (normal) rts linker_unload [bad exit code] (normal) Unexpected stat failures: perf/compiler T4801 [stat too good] (normal) perf/compiler T9675 [stat not good enough] (optasm)
But I think a good chunk of these are unrelated.
Here is a baseline validate for comparison:
Unexpected failures: ../../libraries/unix/tests user001 [bad stdout] (normal) cabal/cabal01 cabal01 [bad exit code] (normal) cabal/cabal06 cabal06 [bad stdout] (normal) cabal/sigcabal01 sigcabal01 [bad stderr] (normal) ffi/should_run T2276_ghci [bad stdout or stderr] (ghci) ghci/prog003 prog003 [bad stderr] (ghci) ghci/scripts T8696 [bad stderr] (ghci) ghci/scripts ghci058 [bad stderr] (ghci) llvm/should_run/subsections_via_symbols subsections_via_symbols [bad exit code] (normal) rts T5435_dyn_asm [bad stdout] (normal) Unexpected stat failures: perf/compiler T4801 [stat too good] (normal) perf/compiler T9675 [stat not good enough] (optasm)
OK, I've verified that these failures are specific to this patch. We better fix them.
=====> T8639_api(normal) 953 of 4339 [0, 0, 0] cd ./ghc-api && $MAKE -s --no-print-directory T8639_api </dev/null >T8639_api.run.stdout 2>T8639_api.run.stderr Wrong exit code (expected 0 , actual 2 ) Stdout: Stderr: T8639_api: internal error: stg_ap_v_ret (GHC version 7.9.20141209 for x86_64_apple_darwin) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug make: *** [T8639_api] Abort trap: 6 *** unexpected failure for T8639_api(normal) =====> T8628(normal) 954 of 4339 [0, 1, 0] cd ./ghc-api && $MAKE -s --no-print-directory T8628 </dev/null >T8628.run.stdout 2>T8628.run.stderr Wrong exit code (expected 0 , actual 2 ) Stdout: Stderr: T8628: internal error: stg_ap_v_ret (GHC version 7.9.20141209 for x86_64_apple_darwin) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug make: *** [T8628] Abort trap: 6 *** unexpected failure for T8628(normal)
Good news: the error tickles an ASSERT:
relocateSection: length = 2, thing = 0, baseValue = 0x10105803fc2 relocateSection: looking up external symbol _exp : type = 1 : sect = 0 : desc = 0 : value = 0x0 lookupSymbol: looking up _exp lookupSymbol: symbol not found lookupSymbol: looking up _exp with dlsym relocateSection: external symbol _exp, address 0x7fff94b667f0 relocateSection: value = 0x7fff94b667f0 T8628: internal error: ASSERTION FAILED: file rts/Linker.c, line 6626 (GHC version 7.9.20141209 for x86_64_apple_darwin) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
We have a key insight: the bug does not occur with dynamic linking, so we think it's a bug in the Mach-O static linker where relocations are being resolved improperly when they are too big.
gcampax and I figured out how to fix this: the problem is PIC'd code still needs to mapped be in the low 8GB segment, but you run out of space due to the terabyte reservation. gcampax is cooking up a fix.
This is the updated patch to fix the aforementioned bug.
It validates fine on OS X and Linux, unfortunately after
rebasing it picked up a problem in Windows and now it
does not work.
I wanted to fix it before updating again, but I have realized
I do not have time at the moment, so here's the patch if
you want to take a look.
The error I get is "not enough swap space to perform
the operation", as if it was trying to commit 1TB of
memory. Debugging is unreliable though: it appears that
VirtualReserve succeeds, but then the program fails in
a way the debugger does not understand.
Hope anyone can figure this out, otherwise it will be
necessary to disable this code on Windows.
I fixed the bug on Windows (update coming soon), but unfortunately I think this scheme might be doomed on Windows. If I run a make -j4, I get this:
VirtualAlloc MEM_RESERVE 1099512676352 bytes failed: The paging file is too small for this operation to complete.
So it looks like MEM_RESERVE has some limit that we're running into when there are multiple processes running simultaneously.
- Fix Windows implementation (piggybacking on the old support for memory allocation didn't work, and it's simpler to just call VirtualAlloc directly).
- disable use_large_address_space on Windows.
I say ship it. It's a bit disappointing that Windows can't handle this, but a confirmation that this was a Windows problem was enough to hook me up with Go having similar issues (since the reason I thought MEM_RESERVE would work on Windows was because this was what Go was doing):
// Number of bits in page to span calculations (4k pages). // On Windows 64-bit we limit the arena to 32GB or 35 bits. // Windows counts memory used by page table into committed memory // of the process, so we can't reserve too much memory. // See https://golang.org/issue/5402 and https://golang.org/issue/5236. // On other 64-bit platforms, we limit the arena to 512GB, or 39 bits. // On 32-bit, we don't bother limiting anything, so we use the full 32-bit address. // On Darwin/arm64, we cannot reserve more than ~5GB of virtual memory, // but as most devices have less than 4GB of physical memory anyway, we // try to be conservative here, and only ask for a 2GB heap.
Note that there is a little bit about Go relying on overcommitting causing problems for people on systems that don't overcommit; this is not relevant to us since we're not actually committing memory and relying on the OS to lazily initialize it if they don't actually use it.
@ezyang, yes so the problem appears to be that Windows counts the size of page tables as committed memory, so the amount of memory you can reserve is limited by this.
We will need to find another solution on Windows though - the current HEAP_ALLOCED() implementation is much worse than I thought when using multiple cores with a large heap size, due to contention for the lock in the HEAP_ALLOCED_miss() path. We've been suffering pretty badly from this in our system.
Could someone validate on OS X for me? I don't have a machine set up to do this.
I observed the following test failures that might be related (but might also be a problem in the test or the driver, because the test program behaves reasonably, just the stdout doesn't match)
rts/T9579 T9579_outofheap_rtsall [bad stdout] (normal) rts/T9579 T9579_outofheap_rtsall_no_suggestions [bad stdout] (normal) rts/T9579 T9579_outofheap_rtsnone [bad stdout] (normal) rts/T9579 T9579_outofheap_rtssome [bad stdout] (normal) rts/T9579 T9579_stackoverflow_rtsall [bad stdout] (normal) rts/T9579 T9579_stackoverflow_rtsall_no_suggestions [bad stdout] (normal) rts/T9579 T9579_stackoverflow_rtsnone [bad stdout] (normal) rts/T9579 T9579_stackoverflow_rtssome [bad stdout] (normal)
OK nevermind, I checked again without the patch and I get the same failures (on 214596de224afa576a9c295bcf53c6941d6892e0, aka today's master, on OS X Yosemite, bootstrapped with ghc 7.10.1 from homebrew / clang 6.1.0).
Looks like this patch has no regressions on validate, so it should be good to go.