Updated PE linker, section alignment and cleanup.
ClosedPublic

Authored by Phyx on Sep 1 2017, 5:11 PM.

Details

Summary

This patch is to address a couple of short comings of the PE linker.

The first thing it does is properly honor section alignments, so SSE code
will work reliably.

While doing this I've also changed how it reads and stores ObjectFile
information. Previously the entire object file was read in and treated
as one blob, including headers, symbol tables etc.

Now the ObjectFile is read in but stored in chunks, tables go into a temporary
info struct and code/data into a new private heap. This allows me to free all
meta data once we're done relocating. Which means we can reclaim this memory.

As I've mentioned above I've also moved from using VirtualAlloc to HeapAlloc.
The reason is VirtualAlloc is meant to be used for more low level memory
allocation, it's very fast because it can only allocate whole blocks,
(64k) by default, and the memory must be paged (4k) aligned.

So when you ask for e.g. 30k of memory, you're given a whole block where 34k
will be wasted memory. Nothing else can ever access that untill you free the 30k.

One downside of HeapAlloc is that you're not in control of how the heap grows,
and heap memory is always committed. So it's harder to tell how much we're
actually using now.

Another big upside of splitting off the ObjectCode tables to info structs
is that I can adjust them, so that later addressings can just use array
subscripts to index into them. This simplifies the code a lot and a lot of
complicated casts and indexing can be removed. Leaving less and more simple
code.

This patch doesn't fix the memprotection but it doesn't regress it either.
It does however make the next changes smaller and fixes the alignments.

Test Plan

./validate , new test T13617

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
There are a very large number of changes, so older changes are hidden. Show Older Changes
simonmar added inline comments.Sep 5 2017, 11:12 AM
rts/LinkerInternals.h
54

What does SECTION_COPY mean? How is the memory allocated? If the storage doesn't need to be freed, does this need to be distinguished from SECTION_NOMEM?

Phyx added a comment.Sep 5 2017, 12:26 PM

The approach sounds reasonable. One thing that occurs to me is: how does this interact with the security mechanism that doesn't allow executing code from normally allocated memory? (I forgot what it's called on Windows)

You mean DEP? The private heap is created with execute permissions

code_heap = HeapCreate (HEAP_CREATE_ENABLE_EXECUTE,
                        initHeapSizeMB * sSysInfo.dwPageSize , 0);

I was toying with the idea of creating two private heaps, one that's executable and one that's not. and put the sections in the one they belong in, but for now I've left it just as it was before that everything is executable.

rts/LinkerInternals.h
54

with COPY I mean that the memory is going to be moved. initially the OC is read in using fread and just stored on the main program heap. Once we finish parsing everything and know where the sections are, on addCopySection I'll copy the section data from the main heap to the newly created private heap.

this is done because the image data read in with fread is then completely freed leaving behind only the data we need to do relocations, and once relocations are done we free everything else except for the section data that's in the new heap.

The linker if I'm not mistaken treats NOMEM as don't do anything?

simonmar added inline comments.Sep 11 2017, 5:23 AM
rts/LinkerInternals.h
54

SectionAlloc is only used to decide how to release the memory when the object file is unloaded. So I think you're saying that the memory for the section doesn't need to be explicity freed (it's released with the heap?), in which case SECTION_NOMEM would be the right thing. I'm not seeing what extra semantics SECTION_COPY gives you.

Suggestion:

  • Remove SECTION_COPY, use SECTION_NOMEM
  • In places where you're calling addCopySection(..., SECTION_MALLOC,..), call addSection instead (similarly for SECTION_NOMEM)
  • in addCopySection, remove the conditional and always do the copying
Phyx added a comment.Sep 12 2017, 7:56 AM

Ah, fair enough. Will do

Phyx updated this revision to Diff 14078.Sep 24 2017, 11:56 AM
Phyx marked 3 inline comments as done.
  • Rebase
  • Add comments
  • Add test
  • Fix lint errors
Phyx retitled this revision from [RFC] Updated PE linker, section alignment, heap alloc and cleanup. to Updated PE linker, section alignment, heap alloc and cleanup..Sep 24 2017, 12:00 PM
Phyx edited the test plan for this revision. (Show Details)
Phyx updated this revision to Diff 14081.Sep 24 2017, 12:04 PM

cleaning diff

Phyx updated this revision to Diff 14082.Sep 24 2017, 1:18 PM
  • clean up messy rebase.
Phyx updated this revision to Diff 14083.Sep 24 2017, 1:23 PM
  • hopefully clean diff now.

This should fix hmatrix @RyanGlScott, could you give it a try when you get a chance?

In D3915#112389, @Phyx wrote:

This should fix hmatrix @RyanGlScott, could you give it a try when you get a chance?

This patch caused both the hmatrix example (and my minimized version here) to no longer segfault for me! Thanks, @Phyx!

Phyx added a comment.Sep 25 2017, 2:27 PM

@RyanGlScott thanks for testing! I'll make some time for the C++ support next for libllvm.a

Harbormaster failed remote builds in B17807: Diff 14081!
Harbormaster failed remote builds in B17808: Diff 14082!

I don't have any particular objection to the move to HeapAlloc. Seems reasonable to me.

rts/linker/PEi386.c
2104

Frankly I would probably have chosen more memory and security. IMHO it's quite scary how loose we are with memory protection in the RTS linker and this should be fixed. However, we can leave it for future work, perhaps done at the same time as fixing the ELF linker.

rts/linker/PEi386.h
139

Couldn't this be in the source file?

bgamari requested changes to this revision.Oct 3 2017, 12:22 PM

Whoops, submitted prematurely previously. I'll admit I do wish this patch were split up a bit but if this isn't easily doable then so be it.

However, after reading the Note it's not clear to me whether we can properly protect pages we get from HeapAlloc. If this indeed is the case then I may need to amend my previous statement of no objection to the switch. As I pointed out earlier, I think we really do want to protect our executable pages correctly; this is a common sense security precaution and, given the surface area of the RTS and the fact that we routinely link against C code, is something we really should be doing today. This issue is being tracked as (the perhaps poorly named) Trac #13678.

rts/linker/PEi386.c
158

Note that this may make debugging harder; at least it does when we do so on Linux since debuggers generally look at memory mappings to determine symbol information.

172

Why?

273

Perhaps this could be made const and initialised with sizeof(pe_alignments) / sizeof(Alignments)?

279

What are these? What is an LFH? Are these notionally an enumeration?

1067

Nice.

1252

Is this assignment actually used anywhere?

Frankly, I kind of wish that we declared the index variables in the for loops here; I find it difficult to keep track of which variables are just induction variables and which are actually live outside of the loop.

1261

Oh, I see. Perhaps we could declare j inside this block so at least it's clear where it's used?

1556

Do you mean 6.4.1? Where did this text come from? Is there a ticket somewhere?

1876–1877

Wibble: I would really prefer to see sizeof(symbol)-1 here (and below) instead of 1000-1.

rts/linker/PEi386.h
139

It seems to me like this is something that really could just live next to the alignments table since it's static anyways.

This revision now requires changes to proceed.Oct 3 2017, 12:22 PM
Phyx added a comment.Oct 3 2017, 3:35 PM

@bgamari Thanks for the review! I appreciate that these linker stuff are never fun to review..

Whoops, submitted prematurely previously. I'll admit I do wish this patch were split up a bit but if this isn't easily doable then so be it.

I'll think about if I can do this, but the problem is the majority of the rewrites is because I changed how sections are read so I can align them.
That said I think I can do a better job splitting this out..

However, after reading the Note it's not clear to me whether we can properly protect pages we get from HeapAlloc. If this indeed is the case then I may need to amend my previous statement of no objection to the switch. As I pointed out earlier, I think we really do want to protect our executable pages correctly; this is a common sense security precaution and, given the surface area of the RTS and the fact that we routinely link against C code, is something we really should be doing today. This issue is being tracked as (the perhaps poorly named) Trac #13678.

HeapAlloc only allows Heaps with ReadWrite or ReadWriteExecute permissions. Based on this patch setup it would be easy to add that level of protection. Technically speaking you can still call VirtualProtect on memory returned from the Heap manager, but msdn has an ominous warning that the heap manager expects all pages be at least ReadWrite. Which makes sense as it can add more data to a page later.

So if we want all three levels of granularity Read, ReadWrite and ReadWriteExecute, we have to go back to VirtualAlloc.

rts/linker/PEi386.c
158

hmm, I did not know this. But I'm wondering if that still holds for how we currently allocate memory using VirtualAlloc, you'd never be able to find the start of the COFF header from just an address in the section I think..

172

whoops, sorry this is support to say 8 bytes aligned

279

I'll add a comment, they're constants missing from the mingw headers, LFH = Low Fragmentation Heap and HeapOptimizeResources basically tells it to do some cache optimizations. Which is new since Windows 8.1

1203

should be "to access it using"

1261

fair enough, I'll give it a more descriptive name as well.

1556

That's a good question, it came from https://github.com/ghc/ghc/blob/master/rts/linker/PEi386.c#L1347

I just move it as is, the reason for the move is that I had to restructure the code a bit to only have to do one scan over the number of sections as opposed to the 2 it had to do before.

2104

Fair enough, Initially I wanted to lower the memory consumption since putting each section in it's own page would waste an awful lot of memory. especially for a C library compiled with -ffunction-sections. But with a little bit of elbow grease I think I can get the best of both worlds...

austin resigned from this revision.Nov 9 2017, 5:38 PM
Phyx retitled this revision from Updated PE linker, section alignment, heap alloc and cleanup. to [WIP] Updated PE linker, section alignment, TLSF2 memory allocator and cleanup..Jan 24 2018, 5:01 PM
Phyx removed a reviewer: austin.
Phyx updated this revision to Diff 15211.Jan 24 2018, 5:10 PM
Phyx marked 17 inline comments as done.
  • process feedback
  • Mem: added tlsf
  • Merge branch 'master' into alignment-diff
  • partial 1
  • partial 2
  • partial 3
  • partial 4
  • Working alloc
  • Mem: working on fixing implementation.
  • Fixing memory allocation
  • MemAlloc: fixed existing leaks.

First implementation of new memory allocator in place. Memory constraints are not yet being enforced but
allocations from loadObj are being separated into different access pools. Next comes using the new allocator
as allocator for the entire rts. and then the rest of the compiler.

After that will turn on memory protection and see what falls through.

Phyx updated this revision to Diff 17647.Aug 12 2018, 9:46 AM
Phyx marked 2 inline comments as done.
  • adress comments and rebase
Phyx retitled this revision from [WIP] Updated PE linker, section alignment, TLSF2 memory allocator and cleanup. to Updated PE linker, section alignment and cleanup. (round 1).Aug 12 2018, 9:51 AM
Phyx edited the summary of this revision. (Show Details)
Phyx added a comment.Aug 12 2018, 9:54 AM

@bgamari I have been spending the majority of my time on the I/O library. So I have reverted this patch to an earlier state that should be finished and want to get this one in for the next GHC for sure.

I'll then work on the mem protect for a follow up patch. But need some changes to loadLibrary, which will come after I get the I/O manager in a stable shape.

Harbormaster seems broken, I'd like to see it all green and test ok before committing, but review welcome.

rts/linker/PEi386.c
2104

Ok, so so far I have been working on heaving my cake and eating it too. I have had various methods that work.. but I need to make loadArchive lazier which is taking time and the complexity is getting to large. I would like to accept this part first as is as it doesn't regress the security/memprotection compared to the current implementation. and then submit a second request to fix the memprotect.

This will also give me some breathing room as the I/O manager has been taking the majority of my time but this shouldn't block the usage of more SIMD in the compiler then.

Phyx added a comment.Aug 22 2018, 1:45 PM

The windows build is stuck restarting. can an administrator unstick it please?

Phyx updated this revision to Diff 17753.Aug 22 2018, 4:05 PM
  • fix trampoline calculation.
Phyx updated this revision to Diff 17755.Aug 22 2018, 5:12 PM
  • fix use after free for import libraries
Phyx added a comment.EditedAug 22 2018, 5:14 PM

GHC seems to have become so fat that the extreme amount of memory we're wasting with the usage of VirtualAlloc means we quickly eat up the lower 4GB of virtual memory. This means the trampolines are allocated outside the 4GB range and are thus useless.

I can't seem to run GHCi HEAD at all anymore without this patch which fixes the wasting.


EDIT: nvm, seems this was caused by something in GHC triggering the heap verification that I had enabled on ghc-stage2.exe to find memory issues. After the rebase it went crazy. Probably still something wrong but memory usage seems to be down to normal 56mb with verification turned off. Will need to look into this at some point.

Phyx added a comment.Sat, Sep 8, 10:58 AM

Ping, this version is ready for review..

nickkuk added a subscriber: nickkuk.Sun, Sep 9, 2:44 AM
nickkuk added inline comments.
rts/LinkerInternals.h
41

Do these comments are in improper order?

Phyx marked an inline comment as not done.Tue, Sep 11, 12:45 PM
Phyx added inline comments.
rts/LinkerInternals.h
41

yup, I'll correct it soon. Thanks.

Phyx updated this revision to Diff 18008.Sat, Sep 15, 8:48 AM
Phyx marked 3 inline comments as done.
  • fix comments and made some fields const
Phyx retitled this revision from Updated PE linker, section alignment and cleanup. (round 1) to Updated PE linker, section alignment and cleanup..Sat, Sep 15, 8:49 AM
Phyx added a reviewer: angerman.

@angerman Mind giving this a review for me? thanks!

angerman accepted this revision.Sun, Sep 16, 9:00 PM

Two minor issues, I'd appreciate to see addressed otherwise LGTM.

rts/Linker.c
1500–1533

This is a mess. It used to be one before as well.

Can we add a comment here that clarifies the intention?

This change would result in calling ocAllocateSymbolExtras after the ocGetNames; are we sure that this is identical to calling ocAllocateSymbolExtras befor calling ocGetNames?

rts/linker/LoadArchive.c
493

didn't we drop ios_HOST_OS in favour of darwin_HOST_OS? Is it still used anywhere else?

Phyx added inline comments.Mon, Sep 17, 2:38 AM
rts/Linker.c
1500–1533

I'll add a comment, but yes it's perfectly safe. ocAllocateSymbolsExtras has only two pre-requisites, it must run after preloadObjectFile and ocVerify. Neither have changed. On most targets allocating the extras is independent on parsing the section data, so the order between these two never mattered.

On Windows, when we have an import library we (for now, as we don't honor the lazy loading semantics of the library and instead GHCi is already lazy) don't use the library after ocGetNames as it just populates the symbol table. Allocating space for jump tables in ocAllocateSymbolExtras would just be a waste then.

rts/linker/LoadArchive.c
493

I don't know, I only just removed the special casing for Windows, https://github.com/ghc/ghc/blob/master/rts/linker/LoadArchive.c#L498 seems like it's still there in head.

Phyx updated this revision to Diff 18017.Mon, Sep 17, 4:21 PM
Phyx marked 4 inline comments as done.
  • rebase and add comment to loadOc
This revision was not accepted when it landed; it landed in state Needs Review.Mon, Sep 17, 5:11 PM
This revision was automatically updated to reflect the committed changes.