RTS linker refactoring
AbandonedPublic

Authored by bgamari on Nov 12 2015, 7:23 PM.

Details

Reviewers
simonmar
hsyl20
erikd
austin
Trac Issues
#9314
Summary

This diff started as a simple fix for a linker issue (Trac #9314) and ended up with this quite big refactoring to make it easier to ensure that the fix was correct in every case.

To fix the memory consumption issue in the RTS linker (Trac #9314), we fill the mmap'ed pages as much as possible instead of allocating one page per object code section. We obtain huge memory gains (153MB -> 86KB in the test case given in trac Trac #9314). [Startup performance may be better too because of the reduced number of mmap calls (~20k -> 470)]

During the refactoring, I figured out that section alignment constraints were not always correctly handled (iirc, the alignment had to be correct in the object code file in the first place, which is not always the case). Now it should always be correctly handled when we load a section.

Test Plan

We should test on Windows, OS X, PowerPC and ARM in addition to Linux/x86-64.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
D1470
Lint
Lint WarningsExcuse: Never fear, Quinn the Eskimo is here!
SeverityLocationCodeMessage
Warningrts/Linker.c:385TXT3Line Too Long
Warningrts/Linker.c:393TXT3Line Too Long
Warningrts/Linker.c:395TXT3Line Too Long
Warningrts/Linker.c:397TXT3Line Too Long
Unit
No Unit Test Coverage
Build Status
Buildable 7849
Build 9649: GHC Patch Validation (amd64/Linux)
Build 9648: arc lint + arc unit
There are a very large number of changes, so older changes are hidden. Show Older Changes
bgamari edited edge metadata.Nov 19 2015, 7:10 AM

A couple of comments.

rts/Linker.c
2585

Is there any need to set this here? Perhaps it would be safer to leave it uninitialized and ensure that each of the codepaths sets it, ensuring that the compiler can warn us if we've missed a codepath.

2916

Why not rtsBool?

hsyl20 updated this revision to Diff 5215.Nov 19 2015, 8:02 AM
hsyl20 edited edge metadata.

Fix for @bgamari's comments

hsyl20 planned changes to this revision.Nov 19 2015, 9:24 AM
erikd added a comment.EditedNov 19 2015, 1:43 PM

I tested the version from about 12 hours ago on powerpc/linux and there were no long any RTS segfaults or invalid free operations.

hsyl20 updated this revision to Diff 5242.Nov 21 2015, 2:42 AM
hsyl20 edited edge metadata.

Big refactoring: should be simpler to maintain

  • Use mmap for archive members too: reduce memory from 86kB to 72kB in Trac #9314 compared to the previous versions of this patch on Linux/x86-64
  • Support direct section mapping from object files in archives for large sections if alignment constraints are fulfilled
  • Correct handling of section alignment constraint: most sections end up not directly mappable from file (at least in the example in Trac #9314)
  • New static options
  • Documentation

I have tested on Linux/x86-64:

  • MAP_IMAGES set/unset
  • FORCE_CONTIGUOUS_SECTIONS set/unset
  • USE_MMAP set
  • USE_MMAP unset (with and without the X86_64_ELF_NONPIC_HACK): fail at runtime, but it's expected

@erikd: it would be good if you could test it on powerpc/linux.

We should test it on Windows, Darwin and maybe other Unices too. And also on 32-bit architectures if possible.

@simonmar: it should still work as before for large object files, except that now the linker only maps sections from files if they are correctly aligned (there may only be a few of them). In addition, we release the image of the original file just after having performed the relocations (leaving only the sections of interest and the symbol names in memory), except for PE files: maybe it will reduce the memory consumption in your case?

hsyl20 updated this revision to Diff 5243.Nov 21 2015, 3:19 AM
hsyl20 edited edge metadata.
  • Minor changes
erikd requested changes to this revision.Nov 21 2015, 4:50 AM
erikd edited edge metadata.

Fails to compile on PowerPC:

rts/Linker.c: In function 'ocFlushInstructionCache':

rts/Linker.c:2920:47: error:
     error: 'ObjectCode {aka struct _ObjectCode}' has no member named 'misalignment'
         ocFlushInstructionCacheFrom(oc->image + oc->misalignment, oc->fileSize);
                                                   ^

rts/Linker.c:2920:65: error:
     error: 'ObjectCode {aka struct _ObjectCode}' has no member named 'fileSize'
         ocFlushInstructionCacheFrom(oc->image + oc->misalignment, oc->fileSize);
                                                                     ^
rts/Linker.c: At top level:

rts/Linker.c:1594:1: error:
     warning: 'm32_alloc' defined but not used [-Wunused-function]
     m32_alloc(m32_allocator m32, unsigned int size,
     ^
`gcc' failed in phase `C Compiler'. (Exit code: 1)
This revision now requires changes to proceed.Nov 21 2015, 4:50 AM
hsyl20 updated this revision to Diff 5252.Nov 21 2015, 12:16 PM
hsyl20 edited edge metadata.
  • Fix ocFlushInstructionCache

Thanks @erikd. Should be fixed I hope.

I think I have somehow reintroduced Trac #8935. For instance, with T10508_api if I disable X86_64_ELF_NONPIC_HACK, 'puts' symbol is not in the 32-bit range, hence not relocated; if I enable it, the link for 'argv' (returned by getArgs) is not correct because of the hack.

erikd requested changes to this revision.Nov 21 2015, 2:16 PM
erikd edited edge metadata.
rts/Linker.c: In function 'ocFlushInstructionCache':

rts/Linker.c:2938:9: error:
     error: expected ';' before '}' token
             }
             ^
rts/Linker.c: At top level:

rts/Linker.c:1354:15: error:
     warning: 'mmapForLinker' defined but not used [-Wunused-function]
     static void * mmapForLinker (size_t bytes, nat flags, int fd, int offset)
                   ^
`gcc' failed in phase `C Compiler'. (Exit code: 1)

Basically just a missing semicolon. This patch:

diff --git a/rts/Linker.c b/rts/Linker.c
index 6fb87e2..7c0ea27 100644
--- a/rts/Linker.c
+++ b/rts/Linker.c
@@ -2934,7 +2934,7 @@ ocFlushInstructionCache( ObjectCode *oc )
     for (i=0; i<oc->n_sections; i++) {
         if (oc->sections[i].start != NULL) {
            ocFlushInstructionCacheFrom(oc->sections[i].start,
-                                       oc->sections[i].size)
+                                       oc->sections[i].size);
         }
     }

fixes the compile problem but I still haven't run the test suite.

This revision now requires changes to proceed.Nov 21 2015, 2:16 PM
hsyl20 updated this revision to Diff 5253.Nov 21 2015, 2:37 PM
hsyl20 edited edge metadata.
  • Minor fix for powerpc
erikd requested changes to this revision.Nov 21 2015, 4:36 PM
erikd edited edge metadata.

Sorry, something very, very wrong with this code on PowerPC. There are 500+ test failures. All the GHCi and all TH tests fail with what seems to be a segmentation fault.

Let me investigate.

This revision now requires changes to proceed.Nov 21 2015, 4:36 PM
erikd added a comment.Nov 21 2015, 4:41 PM

It segfaults with just inplace/bin/ghc-stage2 --interactive so I ran it under GDB and got:

Program received signal SIGSEGV, Segmentation fault.
0xf517ecd8 in ?? ()
(gdb) bt
#0  0xf517ecd8 in ?? ()
#1  0x1249f90c in ocRunInit_ELF (oc=0x12b0a538, oc=0x12b0a538) at rts/Linker.c:5924
#2  resolveObjs_ () at rts/Linker.c:2584
#3  resolveObjs () at rts/Linker.c:2624
#4  0x10ede348 in r4hb_info ()
#5  0x12498fbc in schedule (initialCapability=<optimized out>, task=0xf) at rts/Schedule.c:457
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
erikd added a comment.Nov 21 2015, 4:52 PM

Also have the following warning:

rts/Linker.c:1354:15: error:
     warning: 'mmapForLinker' defined but not used [-Wunused-function]
     static void * mmapForLinker (size_t bytes, nat flags, int fd, int offset)
erikd added a comment.EditedNov 21 2015, 5:18 PM

Adding some debug:

ocRunInit_ELF 5905 : /home/erikd/Git/ghc-upstream/libraries/ghc-prim/dist-install/build/HSghc-prim-0.5.0.0.o
ocRunInit_ELF 5905 : /home/erikd/Git/ghc-upstream/libraries/integer-gmp/dist-install/build/HSinteger-gmp-1.0.0.0.o
ocRunInit_ELF 5905 : /home/erikd/Git/ghc-upstream/libraries/base/dist-install/build/HSbase-4.9.0.0.o
Segmentation fault

I find that two object files get loaded without a problem and the third goes bang.

Interestingly, if I add another printf to print the address of the init function that is called it turns out that the first two object files don't have any init functions while HSbase does have some and crashes when the first one is called.

simonmar requested changes to this revision.Nov 23 2015, 7:37 AM
simonmar edited edge metadata.

I love the comments and cleanup, however this diff is now a lot more than just "reduce linker memory usage". A couple of suggestions for the future that would make life easier for your reviewers:

  • Please keep cleanup changes separate from functionality changes and bugfixes
  • Please don't substantially change a revision after it has been reviewed, other than to fix bugs and address issues that came up during review

It costs a bit more time to make things more reviewer-digestable, but it's time well spent. Thanks :)

Requesting changes because it looks like this will be significantly worse for archives, unless I've missed something.

rts/Linker.c
220

I'm not sure it's worth having configuration options in the source file with fixed settings, ie. not dependent on platform or environmental conditions. Why would anyone change this, and how would they know to do so?

239–251

what does this do, and what goes wrong if we change it?

1994–2013

Doesn't this mean that for an archive we're going to mmap each member of the archive separately?

hsyl20 updated this revision to Diff 5303.Nov 24 2015, 9:26 AM
hsyl20 edited edge metadata.
  • Fix direct section mapping bug
  • Fix for PowerPC (use executable memory!) and doc

@erikd: I forgot that malloc'ed memory is usually not executable... I guess that's why you got the error. Now I reuse mmap by default.

@simonmar: Yes I'm very sorry for the transformation of the (too) simple original diff into this big one. I will try to follow your suggestions as much as possible for the next ones. I will update the diff description when I get my Internet access back (I'm currently using my phone as a modem with a Edge connection and arcanist is already very slow).

Regarding archives, I don't think it will be worse than before (except for badly aligned sections, but it was bug prone to load them unaligned). I have explained what we do in the Note at the beginning of the file (in the "Loading" section). Do you think there is something wrong in it?

Using mmap or fread for object codes (either from .o or from .o in .a) should be equivalent. Maybe mmap would be faster? The MAP_IMAGES option is there to easily test both, but maybe we should commit to mmap if USE_MMAP is set and remove MAP_IMAGES.

Regarding the alignment of the jump islands (symbol extras), initially they were only 4-byte aligned for PowerPC. It is not required for x86-64. I set it to 8 to be safe as it doesn't cost much... I don't know about arm or ia64 (do we really support ia64? There is some conditional PLT code for it in LinkerInternals.h).

Regarding archives, I don't think it will be worse than before (except for badly aligned sections, but it was bug prone to load them unaligned). I have explained what we do in the Note at the beginning of the file (in the "Loading" section). Do you think there is something wrong in it?

Well, there's a behaviour change: previously when loading an archive we would malloc+fread() each .o file, and now we're mmap()ing each .o file. Maybe that's ok now that we're not doing split-objs, but it's not obvious to me that this is right given that the original goal was to reduce the number of mmap calls.

Regarding the alignment of the jump islands ...

My question was more aimed at: shouldn't this be documented? (maybe just copy your answer into a comment)

erikd added a subscriber: trofi.Nov 24 2015, 4:56 PM

Builds on PowerPC and there are no new test failures.

I'm not able to test on Arm right now because my Arm machine is suffering from the problem in Trac #11123 . Maybe @bgamari can test this patch on Arm.

As for ia64, there are people with access to one of those, but i can't remember who. Maybe its @trofi or @kgardas ?

erikd accepted this revision.Nov 30 2015, 3:51 PM
erikd edited edge metadata.

I tested this patch on Arm and ran into problems explained here : https://ghc.haskell.org/trac/ghc/ticket/11123#comment:5 .

Since those problems are un-related to this patch, I'm happy with it.

bgamari requested changes to this revision.Dec 7 2015, 5:39 AM
bgamari edited edge metadata.

I now have an ARM test of my own running on this patch.

However, I'm sorry to say that this now needs to be rebased.

This revision now requires changes to proceed.Dec 7 2015, 5:39 AM
hsyl20 updated this revision to Diff 5527.Dec 7 2015, 9:39 AM
hsyl20 edited edge metadata.
  • Rebase
hsyl20 retitled this revision from Reduce linker memory usage to RTS linker refactoring.Dec 8 2015, 7:00 AM
hsyl20 updated this object.
hsyl20 edited the test plan for this revision. (Show Details)
hsyl20 edited edge metadata.
hsyl20 updated this revision to Diff 5568.Dec 8 2015, 9:25 AM
hsyl20 edited the test plan for this revision. (Show Details)
  • Fix for Windows and MacOS
hsyl20 updated this revision to Diff 5574.Dec 8 2015, 12:53 PM
hsyl20 edited edge metadata.
  • More fixes for Windows and MacOS
bgamari requested changes to this revision.Dec 11 2015, 7:29 AM
bgamari edited edge metadata.

@hsyl20, I found the source of the Windows and OS X build errors. See inline comments.

rts/Linker.c
4372

Block starts here.

4940–4941

This won't get defined on non-ELF platforms as it is defined inside a #if defined(OBJFORMAT_ELF) block starting on line 4331.

This revision now requires changes to proceed.Dec 11 2015, 7:29 AM

I think one of the next things we should do is try to break up Linker.c a bit. It's currently just terribly unwieldy.

bgamari commandeered this revision.Dec 27 2015, 1:26 PM
bgamari edited reviewers, added: hsyl20; removed: bgamari.

I've rebased this.

bgamari updated this revision to Diff 5999.Dec 27 2015, 1:27 PM
bgamari edited edge metadata.
  • Linker: Clarify type of getSectionKind_ELF
  • Linker: Factor out getSectionKind_MachO
bgamari planned changes to this revision.Dec 27 2015, 2:25 PM
bgamari marked an inline comment as done.

@hsyl20, I believe the handling of MachO relocations may need some attention. Currently the relocations are computed relative to the wrong address. See comment inline.

rts/Linker.c
6250

This should be relative to the oc->sections[i]->start, not image, I believe. Unfortunately it appears that we have no way to correlate the ObjectCode Sections with MachO sections at this point.

bgamari abandoned this revision.Feb 3 2016, 1:06 PM

@hsyl20, I've fixed a few issues with this but won't have time to look into the OS X issues.

hsyl20 edited edge metadata.Feb 3 2016, 5:26 PM

@bgamari No problem. I'll try to get back to it when I will have more time.