We fill mmap'ed pages as much as possible instead of allocating one page
per object. Huge memory gains (174MB -> 84KB in the test case given in
trac). Startup performance should be better too because of the reduced
number of mmap calls.
- Share the same allocator for several archives
Loaded objects are never freed. Hence we don't really have to flush the
allocator at the end of loadArchive. By doing this, objects from different
archives may share the same pages and the memory usage should be lower
On the whole this looks quite nice.
What do others think about placing m32 into a new source file? It exposes a rather clean interface and Linker.c is already rather unwieldy due to its size.
On the more pedantic side of things, I noticed a few places where the use of whitespace is a bit inconsistent (in particular after commas in function call argument lists).
Other thoughts inline.
We tend to denote longer comments like this as "notes" and format them such that they are easily referenced and grepped for. See, for instance, Note [LoopBreader OccInfo] in compiler/basicTypes/BasicTypes.hs.
I think it would be quite helpful to turn this comment into a Note.
It would be nice if this sentence were a bit more clear on precisely what is being counted here (e.g. the number of objects allocated into the page).
See my comment on m32_flush.
Perhaps we should abort here (see below)?
The purpose of this function could be described a bit better. My understanding is this,
Two spaces should be condensed to one.
Do we really want to continue execution in this case? If I'm not mistaken the only cases where munmap might fail are those where it is passed a region which was never mmap'd. If this happens it seems to me that something has gone seriously wrong and we should likely bail.
It would be easy to break up the argument list across multiple lines to stay under 80 characters.
Usually I don't like raising a fuss over line length but this is rather long and could be easily improved if broken at ||.
- Fix warnings and improve comments
I have fixed the warnings and taken into account comments from @bgamari.
Some remaining questions:
- Should we abort when munmap fails?
memory improvement is so important that you may not have the issue you had in
the first place anymore. If it is not enough, we could combine both approaches:
mmap files without the 32-bit constraint and then m32_alloc and memcpy
only the sections we are interested in (not rounded-up to page size). I tend to
prefer this solution with the allocator to the one with the linear search in
maybeShareSection (D975). It is more generic as we don't really distinguish
sections allocated in different ways (except for the extra symbol hack) as it
is encapsulated in the allocator code and it is very easy to make thread-safe.
- Should we place the M32 allocator in a new source file? I have no strong
opinion about this as I don't know the conventions for GHC (I'm quite new here
- About coding conventions: is there a guideline for C code in the GHC tree?
As noticed by @bgamari, the use of whitespace is inconsistent. Curly brace
positioning is too.
- I have used 8-byte alignment everywhere mmapForLinker has been replaced
with m32_alloc. I don't know if we can relax the alignment constraint for some
of the calls? It may spare a few bytes...
@hsyl20 no, this doesn't fix the issue I'm addressing in D975. The problem is that the object file is large (say 300MB), but only about 80MB of that is actual code or data, the rest is symbols and debug info. If we map the whole object into the 1-2GB region, we quickly run out of space. In D975 I'm re-mapping the code/data from the object into the 1-2GB region.
I agree that maybeShareSection is a bit of a hack - I did this to avoid creating too many separate mappings when loading .a files. It would be fine to replace that with m32_alloc, and it shouldn't be too hard.
I like not having to memcpy all the memory for an object file into the 1-2GB region, we just mmap it - this is also useful because various tools (like perf) understand the mapping and can give symbol names. For .a files I think it's fine to memcpy the bits though.