Big-obj support for the Windows runtime linker
ClosedPublic

Authored by Phyx on May 2 2017, 1:45 AM.

Details

Summary

The normal object file on Windows has a limit of 2^16
sections that can be in an object-file.

The big-obj format raises this to 2^32 sections.

The implementation is made difficult because we now need to support
two header formats and two section formats that differ only by a single
element size within each. The element that's different is in the middle
of the structs and since the structs are used to map regions of memory
directly, it means we need to know which struct it is when we do the
mapping or pointer arithmetics.

This is the final Object-Code format which Windows compilers can generate
which we do not support yet in GHCI. All other major compilers on the platforms
can produce it and all linkers consume it (bfd and lld).

See http://tinyurl.com/bigobj

This patch abstracts away retrieving the fields to functions which all take
an struct which describes which object format is currently being parsed.
These functions are always in-lined as they're small but would looks messy
being copy-pasted everywhere.

Test Plan

./validate and new test big-obj

Tamar@Rage MINGW64 /r
$ gcc -c -Wa,-mbig-obj foo.c -o foo.o

Tamar@Rage MINGW64 /r
$ objdump -h foo.o

foo.o:     file format pe-bigobj-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000010  0000000000000000  0000000000000000  00000128  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000000  2**4
                  ALLOC, LOAD, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  00000000  2**4
                  ALLOC
  3 .xdata        00000008  0000000000000000  0000000000000000  00000138  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .pdata        0000000c  0000000000000000  0000000000000000  00000140  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  5 .rdata$zzz    00000030  0000000000000000  0000000000000000  0000014c  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

Tamar@Rage MINGW64 /r
$ echo main | ~/ghc/inplace/bin/ghc-stage2.exe --interactive bar.hs foo.o
GHCi, version 8.3.20170430: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Main             ( bar.hs, interpreted )
Ok, modules loaded: Main.
*Main> 17
*Main> Leaving GHCi.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
Phyx created this revision.May 2 2017, 1:45 AM
awson added a subscriber: awson.May 2 2017, 2:21 AM

With the correct linker used, prelinked GHCi object files *should* have all "sub"-sections merged into their "parent" sections, and we don't need anything from this.

OTOH, IIRC, we still support handling of archives of "normal" object files, and this is the only use-case when we need bigobj handling in the runtime linker.

Phyx added a comment.May 2 2017, 2:25 AM
In D3523#100280, @awson wrote:

With the correct linker used, prelinked GHCi object files *should* have all "sub"-sections merged into their "parent" sections, and we don't need anything from this.

OTOH, IIRC, we still support handling of archives of "normal" object files, and this is the only use-case when we need bigobj handling in the runtime linker.

Yes, you said this before, and I don't see how. merging sections has no meaning as it doesn't change the object format. If I pass mbig-obj to the assembler, regardless of
the sections are merged or not the resulting object file uses the big-obj format. So I don't quite understand what you mean with "correct linker".

Unless you're saying linker scripts can also be used to change object formats.

awson added a comment.EditedMay 2 2017, 2:26 AM

But I think this use-case can happen pretty rarely, since when we create a library with GHC, we can *also* create a prelinked object file for GHCi, which should *not* have this problem.

Thus we should (perhaps) only handle a possibility of such a library (containing bigobj modules) going from another vendor.

awson added a comment.May 2 2017, 2:28 AM
In D3523#100282, @Phyx wrote:

Yes, you said this before, and I don't see how. merging sections has no meaning as it doesn't change the object format. If I pass mbig-obj to the assembler, regardless of
the sections are merged or not the resulting object file uses the big-obj format. So I don't quite understand what you mean with "correct linker".

Unless you're saying linker scripts can also be used to change object formats.

Prelinked object files for GHCi are produced by ld linker, not an assembler.

awson added a comment.May 2 2017, 2:41 AM

Btw, I use GHC built with --split-sections and all my packages are built with --split-sections, and I use mbig-obj on a couple of packages (at least haskell-src-exts and GHC's template-haskell require this).

And all this works for me for 1.5 years and I never needed big-obj support in the runtime linker.

Phyx added a comment.May 2 2017, 2:43 AM

Unless you're saying linker scripts can also be used to change object formats.

Prelinked object files for GHCi are produced by ld linker, not an assembler.

Fair enough, initially, I added -mbig-obj to every invocation of gcc from the driver. Which means
any object files coming out of GHC will be in the big-obj format. Perhaps this isn't needed.

But I did this, because for every symbol in an .hs file we get 3 sections with split-sections. the function
and two closures. which means a file with ~21.854 exports would hit the limit. I'm pretty sure things like DynFlags
come pretty close.

Phyx abandoned this revision.May 2 2017, 2:45 AM
In D3523#100286, @awson wrote:

Btw, I use GHC built with --split-sections and all my packages are built with --split-sections, and I use mbig-obj on a couple of packages (at least haskell-src-exts and GHC's template-haskell require this).

And all this works for me for 1.5 years and I never needed big-obj support in the runtime linker.

OK,

awson added a comment.May 2 2017, 4:13 AM
In D3523#100287, @Phyx wrote:

Fair enough, initially, I added -mbig-obj to every invocation of gcc from the driver. Which means
any object files coming out of GHC will be in the big-obj format. Perhaps this isn't needed.

GHC itself (aka ghc package) never was built using --split-objs and should not be built using --split-sections either, thus we definitely need no -mbig-obj at least when building GHC-the-compiler.

As I've already mentioned in the current GHC only template-haskell package needs it.

Phyx reclaimed this revision.May 2 2017, 7:43 AM
In D3523#100283, @awson wrote:

But I think this use-case can happen pretty rarely, since when we create a library with GHC, we can *also* create a prelinked object file for GHCi, which should *not* have this problem.

Thus we should (perhaps) only handle a possibility of such a library (containing bigobj modules) going from another vendor.

Or just a user specifying big-obj themselves. Anyway, didn't see this this morning so perhaps I was too quick to discard this patch as it may be useful. I'll re-open the review and have the patch as a standalone new feature.

Phyx edited the summary of this revision. (Show Details)May 2 2017, 7:43 AM
Phyx updated the Trac tickets for this revision.
bgamari requested changes to this revision.May 4 2017, 2:38 PM

Thanks @Phyx, but I'm afraid I'm a tad lost here. Do we need this? If so, why? If perhaps, under what circumstances?

I do like that some comments were added where there were previously none, but none of these comments seem to discuss the principle point of this patch. If this really is needed can we have a note briefly explaining what -mbig-obj is, when it's needed, and how the linker implements support for it?

This revision now requires changes to proceed.May 4 2017, 2:38 PM
Phyx added a comment.May 5 2017, 12:49 PM

@bgamari Initially I thought it was needed for split-sections but as @awson correctly pointed out, it's not.

So I left the patch as a stand alone new feature. Basically it adds support for the third object code format on Windows.
This would be required is a user wants to load code into ghci compiled with -mbig-object. Which would only be needed
if their object file would have more than 2^16 symbols. It's worth noting that all major compilers and linkers on Windows support this format.

I do think it makes the code a bit harder to follow, so I initially discarded the patch because of this and the fact it wasn't strictly needed.

Phyx edited the summary of this revision. (Show Details)Jun 11 2017, 5:55 AM
Phyx edited the summary of this revision. (Show Details)Jun 11 2017, 5:57 AM
Phyx requested review of this revision.Jun 11 2017, 6:01 AM
Phyx edited edge metadata.

@bgamari I've amended the summary and created a standalone feature request.

I would like for this to be considered on it's own as a new feature.

This looks great. However, can I ask you for a short note listing the various image formats that we support and when we expect to find them (perhaps with references to tickets as appropriate)?

rts/linker/PEi386.c
308

I'm a bit confused by this and the above comment. It looks like in the code the first two components are given in little-endian form, whereas the latter two are given in big-endian. Perhaps the comment is wrong?

822

Nit: Shouldn't this be checkAndLoadImportLibrary?

Phyx added inline comments.Jun 17 2017, 12:36 PM
rts/linker/PEi386.c
308

This is a quirk of how GUIDs are stored.

The struct for GUIDs are

typedef struct _GUID {
  DWORD Data1;
  WORD  Data2;
  WORD  Data3;
  BYTE  Data4[8];
} GUID;

so the first 32 and next two 16 bits values are affected by endianness, but the rest are just 8 1 byte values, so they're not affected by endianness. So the comment and code are correct :))

bgamari added inline comments.Jun 17 2017, 12:42 PM
rts/linker/PEi386.c
308

Wow. Just... wow.

bgamari accepted this revision.Jun 21 2017, 2:33 PM

Except for the error message wobble which I can fix on metge, looks good to me.

This revision is now accepted and ready to land.Jun 21 2017, 2:33 PM
Phyx added a comment.Jun 21 2017, 2:35 PM

Except for the error message wobble which I can fix on metge, looks good to me.

The rebase will be a bit messy as it conflicts with my other import library change. I was planning on doing it this weekend.

rts/linker/PEi386.c
822

Sorry, still planning on correcting this :) But was busy getting my I/O manager in a workable state before I did. Since this patch needs a rebase against my other linker changes as well :)

bgamari requested changes to this revision.Jun 23 2017, 10:48 AM

Alright, bumping out of the review queue for now.

This revision now requires changes to proceed.Jun 23 2017, 10:48 AM
Phyx updated this revision to Diff 12923.Jun 25 2017, 4:14 PM
Phyx edited edge metadata.
  • rebase
Phyx planned changes to this revision.Jun 25 2017, 5:43 PM

rebase isn't clean.

Phyx updated this revision to Diff 13046.Jul 7 2017, 1:07 PM
  • Finish rebase.
Phyx planned changes to this revision.Jul 7 2017, 4:55 PM

some very suspicious segfaults in the result that need a look at.

Phyx updated this revision to Diff 13060.Jul 8 2017, 4:05 AM
  • Finish rebase.
  • fix segfault and add note
Phyx edited the summary of this revision. (Show Details)Jul 8 2017, 10:57 AM
Phyx updated the Trac tickets for this revision.
Phyx updated the Trac tickets for this revision.
This revision was automatically updated to reflect the committed changes.