[WIP] Updated PE linker, section alignment, TLSF2 memory allocator and cleanup.
Needs ReviewPublic

Authored by Phyx on Sep 1 2017, 5:11 PM.

Details

Reviewers
bgamari
erikd
simonmar
hvr
Trac Issues
#13617
Summary

This patch is to address a couple of short comings of the PE linker.

The first thing it does is properly honor section alignments, so SSE code
will work reliably.

While doing this I've also changed how it reads and stores ObjectFile
information. Previously the entire object file was read in and treated
as one blob, including headers, symbol tables etc.

Now the ObjectFile is read in but stored in chunks, tables go into a temporary
info struct and code/data into a new private heap. This allows me to free all
meta data once we're done relocating. Which means we can reclaim this memory.

As I've mentioned above I've also moved from using VirtualAlloc to HeapAlloc.
The reason is VirtualAlloc is meant to be used for more low level memory
allocation, it's very fast because it can only allocate whole blocks,
(64k) by default, and the memory must be paged (4k) aligned.

So when you ask for e.g. 30k of memory, you're given a whole block where 34k
will be wasted memory. Nothing else can ever access that untill you free the 30k.

One downside of HeapAlloc is that you're not in control of how the heap grows,
and heap memory is always committed. So it's harder to tell how much we're
actually using now.

Another big upside of splitting off the ObjectCode tables to info structs
is that I can adjust them, so that later addressings can just use array
subscripts to index into them. This simplifies the code a lot and a lot of
complicated casts and indexing can be removed. Leaving less and more simple
code.

This patch still has one or two failing tests, hence the RFC status.
Also wanted to get some feedback on the approach.

Test Plan

./validate , new test T13617

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
alignment-diff
Lint
Lint WarningsExcuse: wip
SeverityLocationCodeMessage
Warningrts\linker\PEi386.c:1546TXT3Line Too Long
Warningrts\linker\PEi386.c:1843TXT3Line Too Long
Warningrts\win32\tlsf.c:94ExtJsonLint
Warningrts\win32\tlsf.c:156ExtJsonLint
Warningrts\win32\tlsf.c:278ExtJsonLint
Warningrts\win32\tlsf.c:385ExtJsonLint
Warningrts\win32\tlsf.c:412ExtJsonLint
Warningrts\win32\tlsf.c:469TXT3Line Too Long
Warningrts\win32\tlsf.c:543ExtJsonLint
Warningrts\win32\tlsf.c:567ExtJsonLint
Warningrts\win32\tlsf.c:600ExtJsonLint
Warningrts\win32\tlsf.c:619ExtJsonLint
Warningrts\win32\tlsf.c:630TXT3Line Too Long
Warningrts\win32\tlsf.c:657ExtJsonLint
Warningrts\win32\tlsf.c:680ExtJsonLint
Warningrts\win32\tlsf.c:765ExtJsonLint
Warningrts\win32\tlsf.c:783ExtJsonLint
Warningrts\win32\tlsf.h:1ExtJsonLint
Warningrts\win32\tlsf.h:47ExtJsonLint
Warningrts\win32\tlsf.h:89ExtJsonLint
Warningrts\win32\tlsf.h:116ExtJsonLint
Warningrts\win32\tlsf.h:130ExtJsonLint
Warningrts\win32\tlsf.h:134ExtJsonLint
Warningrts\win32\tlsf_utils.h:1ExtJsonLint
Unit
No Unit Test Coverage
Build Status
Buildable 19243
Build 40240: [GHC] Linux/amd64: Continuous Integration
Build 40239: [GHC] OSX/amd64: Continuous Integration
Build 40238: [GHC] Windows/amd64: Continuous Integration
Build 40237: arc lint + arc unit
Phyx created this revision.Sep 1 2017, 5:11 PM

The approach sounds reasonable. One thing that occurs to me is: how does this interact with the security mechanism that doesn't allow executing code from normally allocated memory? (I forgot what it's called on Windows)

rts/LinkerInternals.h
54

What does SECTION_COPY mean? How is the memory allocated? If the storage doesn't need to be freed, does this need to be distinguished from SECTION_NOMEM?

Phyx added a comment.Sep 5 2017, 12:26 PM

The approach sounds reasonable. One thing that occurs to me is: how does this interact with the security mechanism that doesn't allow executing code from normally allocated memory? (I forgot what it's called on Windows)

You mean DEP? The private heap is created with execute permissions

code_heap = HeapCreate (HEAP_CREATE_ENABLE_EXECUTE,
                        initHeapSizeMB * sSysInfo.dwPageSize , 0);

I was toying with the idea of creating two private heaps, one that's executable and one that's not. and put the sections in the one they belong in, but for now I've left it just as it was before that everything is executable.

rts/LinkerInternals.h
54

with COPY I mean that the memory is going to be moved. initially the OC is read in using fread and just stored on the main program heap. Once we finish parsing everything and know where the sections are, on addCopySection I'll copy the section data from the main heap to the newly created private heap.

this is done because the image data read in with fread is then completely freed leaving behind only the data we need to do relocations, and once relocations are done we free everything else except for the section data that's in the new heap.

The linker if I'm not mistaken treats NOMEM as don't do anything?

simonmar added inline comments.Sep 11 2017, 5:23 AM
rts/LinkerInternals.h
54

SectionAlloc is only used to decide how to release the memory when the object file is unloaded. So I think you're saying that the memory for the section doesn't need to be explicity freed (it's released with the heap?), in which case SECTION_NOMEM would be the right thing. I'm not seeing what extra semantics SECTION_COPY gives you.

Suggestion:

  • Remove SECTION_COPY, use SECTION_NOMEM
  • In places where you're calling addCopySection(..., SECTION_MALLOC,..), call addSection instead (similarly for SECTION_NOMEM)
  • in addCopySection, remove the conditional and always do the copying
Phyx added a comment.Sep 12 2017, 7:56 AM

Ah, fair enough. Will do

Phyx updated this revision to Diff 14078.Sep 24 2017, 11:56 AM
Phyx marked 3 inline comments as done.
  • Rebase
  • Add comments
  • Add test
  • Fix lint errors
Phyx retitled this revision from [RFC] Updated PE linker, section alignment, heap alloc and cleanup. to Updated PE linker, section alignment, heap alloc and cleanup..Sep 24 2017, 12:00 PM
Phyx edited the test plan for this revision. (Show Details)
Phyx updated this revision to Diff 14081.Sep 24 2017, 12:04 PM

cleaning diff

Phyx updated this revision to Diff 14082.Sep 24 2017, 1:18 PM
  • clean up messy rebase.
Phyx updated this revision to Diff 14083.Sep 24 2017, 1:23 PM
  • hopefully clean diff now.

This should fix hmatrix @RyanGlScott, could you give it a try when you get a chance?

In D3915#112389, @Phyx wrote:

This should fix hmatrix @RyanGlScott, could you give it a try when you get a chance?

This patch caused both the hmatrix example (and my minimized version here) to no longer segfault for me! Thanks, @Phyx!

Phyx added a comment.Sep 25 2017, 2:27 PM

@RyanGlScott thanks for testing! I'll make some time for the C++ support next for libllvm.a

I don't have any particular objection to the move to HeapAlloc. Seems reasonable to me.

rts/linker/PEi386.c
2069

Frankly I would probably have chosen more memory and security. IMHO it's quite scary how loose we are with memory protection in the RTS linker and this should be fixed. However, we can leave it for future work, perhaps done at the same time as fixing the ELF linker.

rts/linker/PEi386.h
140

Couldn't this be in the source file?

bgamari requested changes to this revision.Oct 3 2017, 12:22 PM

Whoops, submitted prematurely previously. I'll admit I do wish this patch were split up a bit but if this isn't easily doable then so be it.

However, after reading the Note it's not clear to me whether we can properly protect pages we get from HeapAlloc. If this indeed is the case then I may need to amend my previous statement of no objection to the switch. As I pointed out earlier, I think we really do want to protect our executable pages correctly; this is a common sense security precaution and, given the surface area of the RTS and the fact that we routinely link against C code, is something we really should be doing today. This issue is being tracked as (the perhaps poorly named) Trac #13678.

rts/linker/PEi386.c
158

Note that this may make debugging harder; at least it does when we do so on Linux since debuggers generally look at memory mappings to determine symbol information.

172

Why?

274

Perhaps this could be made const and initialised with sizeof(pe_alignments) / sizeof(Alignments)?

280

What are these? What is an LFH? Are these notionally an enumeration?

1044

Nice.

1225

Is this assignment actually used anywhere?

Frankly, I kind of wish that we declared the index variables in the for loops here; I find it difficult to keep track of which variables are just induction variables and which are actually live outside of the loop.

1234

Oh, I see. Perhaps we could declare j inside this block so at least it's clear where it's used?

1524

Do you mean 6.4.1? Where did this text come from? Is there a ticket somewhere?

1843

Wibble: I would really prefer to see sizeof(symbol)-1 here (and below) instead of 1000-1.

rts/linker/PEi386.h
140

It seems to me like this is something that really could just live next to the alignments table since it's static anyways.

This revision now requires changes to proceed.Oct 3 2017, 12:22 PM
Phyx added a comment.Oct 3 2017, 3:35 PM

@bgamari Thanks for the review! I appreciate that these linker stuff are never fun to review..

Whoops, submitted prematurely previously. I'll admit I do wish this patch were split up a bit but if this isn't easily doable then so be it.

I'll think about if I can do this, but the problem is the majority of the rewrites is because I changed how sections are read so I can align them.
That said I think I can do a better job splitting this out..

However, after reading the Note it's not clear to me whether we can properly protect pages we get from HeapAlloc. If this indeed is the case then I may need to amend my previous statement of no objection to the switch. As I pointed out earlier, I think we really do want to protect our executable pages correctly; this is a common sense security precaution and, given the surface area of the RTS and the fact that we routinely link against C code, is something we really should be doing today. This issue is being tracked as (the perhaps poorly named) Trac #13678.

HeapAlloc only allows Heaps with ReadWrite or ReadWriteExecute permissions. Based on this patch setup it would be easy to add that level of protection. Technically speaking you can still call VirtualProtect on memory returned from the Heap manager, but msdn has an ominous warning that the heap manager expects all pages be at least ReadWrite. Which makes sense as it can add more data to a page later.

So if we want all three levels of granularity Read, ReadWrite and ReadWriteExecute, we have to go back to VirtualAlloc.

rts/linker/PEi386.c
158

hmm, I did not know this. But I'm wondering if that still holds for how we currently allocate memory using VirtualAlloc, you'd never be able to find the start of the COFF header from just an address in the section I think..

172

whoops, sorry this is support to say 8 bytes aligned

280

I'll add a comment, they're constants missing from the mingw headers, LFH = Low Fragmentation Heap and HeapOptimizeResources basically tells it to do some cache optimizations. Which is new since Windows 8.1

1180

should be "to access it using"

1234

fair enough, I'll give it a more descriptive name as well.

1524

That's a good question, it came from https://github.com/ghc/ghc/blob/master/rts/linker/PEi386.c#L1347

I just move it as is, the reason for the move is that I had to restructure the code a bit to only have to do one scan over the number of sections as opposed to the 2 it had to do before.

2069

Fair enough, Initially I wanted to lower the memory consumption since putting each section in it's own page would waste an awful lot of memory. especially for a C library compiled with -ffunction-sections. But with a little bit of elbow grease I think I can get the best of both worlds...

austin resigned from this revision.Nov 9 2017, 5:38 PM
Phyx retitled this revision from Updated PE linker, section alignment, heap alloc and cleanup. to [WIP] Updated PE linker, section alignment, TLSF2 memory allocator and cleanup..Wed, Jan 24, 5:01 PM
Phyx removed a reviewer: austin.
Phyx updated this revision to Diff 15211.Wed, Jan 24, 5:10 PM
Phyx marked 17 inline comments as done.
  • process feedback
  • Mem: added tlsf
  • Merge branch 'master' into alignment-diff
  • partial 1
  • partial 2
  • partial 3
  • partial 4
  • Working alloc
  • Mem: working on fixing implementation.
  • Fixing memory allocation
  • MemAlloc: fixed existing leaks.

First implementation of new memory allocator in place. Memory constraints are not yet being enforced but
allocations from loadObj are being separated into different access pools. Next comes using the new allocator
as allocator for the entire rts. and then the rest of the compiler.

After that will turn on memory protection and see what falls through.