Implement function-sections for Haskell code, #8405
ClosedPublic

Authored by olsner on Sep 13 2015, 6:15 PM.

Details

Summary

This adds a flag -split-sections that does similar things to -split-objs, but
using sections in single object files instead of relying on the Satanic
Splitter and other abominations. This is very similar to the GCC flags
-ffunction-sections and -fdata-sections.

The --gc-sections linker flag, which allows unused sections to actually
be removed, is added to all link commands (if the linker supports it) so that
space savings from having base compiled with sections can be realized.

Supported both in LLVM and the native code-gen, in theory for all
architectures, but really tested on x86 only.

In the GHC build, a new SplitSections variable enables -split-sections for
relevant parts of the build.

Test Plan

validate with both settings of SplitSections

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
D1242_rebased
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 6157
Build 6950: GHC Patch Validation (amd64/Linux)
Build 6949: arc lint + arc unit
There are a very large number of changes, so older changes are hidden. Show Older Changes

It doesn't validated on Windows, most likely because of Trac #10672 (D1244).

ghc: internal error: checkProddableBlock: invalid fixup in runtime linker: 0000000006aa7320
    (GHC version 7.11.20150919 for x86_64_unknown_mingw32)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
bindisttest/ghc.mk:21: recipe for target 'test_bindist' failed

Ok, good news. Using both D1244 and this patch, the build does succeed on Windows 64 bit. The testsuite only reports a few failing tests, some of which are known to fail:

Unexpected passes:
   ghci/scripts  T9878b (ghci)

Unexpected failures:
   rts         T7037 [bad stdout] (normal)
   rts         outofmem [bad stdout] (normal)
   rts/T10672  T10672_x64 [bad exit code] (normal)

Unexpected stat failures:
   perf/haddock  haddock.base [stat too good] (normal)

I suggest we first review and land D1244, and then come back to this patch.

Phyx edited edge metadata.Sep 19 2015, 3:23 PM

Hmmm

rts/T10672  T10672_x64 [bad exit code] (normal)

That shouldn't be failing though.. I'll take a look at that..

austin requested changes to this revision.Sep 21 2015, 9:53 PM
austin edited edge metadata.

Since it looks like this will need D1244, marking as 'Request Changes' to get out of the queue. Otherwise it looks pretty good! (I'll do another pass soon)

This revision now requires changes to proceed.Sep 21 2015, 9:53 PM

perf.haskell.org results:

nofib/time/k-nucleotide 	4.382	- 3.65%	4.222	seconds
nofib/size/compress 	953121	- 10.88%	849389	bytes
nofib/size/grep 	912623	- 4.92%	867733	bytes
nofib/size/knights 	922645	- 3.33%	891955	bytes
nofib/size/veritas 	1803176	- 5.25%	1708470	bytes
thomie added a comment.Oct 3 2015, 9:30 AM

@kgardas: do you have the Solaris/x86/x64/SPARC results?

Phyx added inline comments.Oct 4 2015, 10:33 AM
compiler/nativeGen/X86/CodeGen.hs
2621–2626

Is there any distro that still uses this older version? 2.17 seems c.a. 2006.
Could we remove the hack?

@thomie solaris results will be done this week (up to Wednesday). Last week I needed to be out of town. W.r.t binutils < 2.17 I just signal that Solaris is using 2.23 and OpenBSD/amd64 is using 2.17 -- so from those two platforms it's safe to remove mentioned hack.

simonmar edited edge metadata.Oct 5 2015, 3:18 AM

There might be some interaction between this patch and D975, but I haven't tested it. My worry is that the linker will end up making a million tiny mmap calls.

olsner added a comment.Oct 6 2015, 3:36 AM

There might be some interaction between this patch and D975, but I haven't tested it. My worry is that the linker will end up making a million tiny mmap calls.

I take it you would need to test the combination of the patches with your specific project to be able to tell if it works?

The total number of sections with split-sections (and the size of each section) should be comparable to what you get with split-objs though, so if split-objs works with your patch (and project) I would expect split-sections to work too. If I understand D975 correctly, small sections should get allocated with the m32 allocator and be able to share pages.

Well, one difference is that with -split-objs, I can make a single object using ld -r and then use that with the RTS linker. With this patch I can still do that, but the object file will have a million tiny sections each of which will get mmap'd separately by the RTS linker. Is there any way to combine the sections into a single text section using ld?

olsner added a comment.Oct 8 2015, 4:01 AM

Well, one difference is that with -split-objs, I can make a single object using ld -r and then use that with the RTS linker. With this patch I can still do that, but the object file will have a million tiny sections each of which will get mmap'd separately by the RTS linker. Is there any way to combine the sections into a single text section using ld?

ld -r does indeed produce an object with all the sections. I can't see a flag for telling ld to merge sections, but it can be done with a linker script (e.g. https://github.com/olsner/ghc/commit/D1242_merge_sections). This is fine when using GNU ld, and Mac OS doesn't need the script anyway (it doesn't use sections), but if e.g. Solaris or BSD doesn't use GNU ld those platforms would still get lots of sections to load (as I believe they don't support linker scripts?).

I also just noticed that without the merging, disabling DYNAMIC_GHC_PROGRAMS and enabling sections gives a broken ghci. The RTS linker only supports 64k ELF sections, and HSbase.o gets more sections than that...

I had a patch lined up for fixing this in the linker, but didn't include it here since everything seemingly worked fine without it. I thought it had just become unnecessary due to changes to the splitting code, but apparently not :)
I'll aim to clean that up for review this evening. Should I submit the linker fix as another diff that we can merge separately, or just add it to this diff?

Sorry for delay, I've done promised Solaris 11 testing on i386 and amd64 on HEAD + arc patch D1242 and have not found any new regression. To my surprise arc patch D1242 did from previously checked out HEAD something more older so I've also needed to manually patch issue in testlib.py to make it compatible with python 2.6 (original patch by Thomas is already in HEAD).
So far I've not been able to test SPARC NCG, as this does have its own set of troubles.

In D1242#37263, @olsner wrote:

ld -r does indeed produce an object with all the sections. I can't see a flag for telling ld to merge sections, but it can be done with a linker script (e.g. https://github.com/olsner/ghc/commit/D1242_merge_sections). This is fine when using GNU ld, and Mac OS doesn't need the script anyway (it doesn't use sections), but if e.g. Solaris or BSD doesn't use GNU ld those platforms would still get lots of sections to load (as I believe they don't support linker scripts?).

Yes, Solaris 11 is not using GNU ld and yes it does not support GNU ld linker scripts. On the other hand my OpenBSD 5.8 is using GNU ld coming from binutils 2.17 so I guess it also supports linker scripts at least on a level of 2.17 release.

Any progress on this @olsner? It would be great to have this for 8.0.

thomie added a comment.EditedOct 22 2015, 2:40 PM

I can't see a flag for telling ld to merge sections, but it can be done with a linker script (e.g. https://github.com/olsner/ghc/commit/D1242_merge_sections).

driver/utils/merge_sections.ld seems like a fine location to me.

Should I submit the linker fix as another diff that we can merge separately

Yes please.

olsner updated this revision to Diff 4607.Oct 22 2015, 4:35 PM
olsner edited edge metadata.
  • Merge sections when incrementally linking libraries (if ld is GNU ld)

So, with the updated patch the merging for ghci libs is done on GNU ld platforms, and the RTS linker fix is submitted as D1357. Other ELF platforms will not properly support -split-sections in ghci until that's in.
I also did some testing without the merging (and with the linker fix) to see if there was any bad interaction with D975, but didn't notice anything.

Windows should be saved by ld -r too since it uses GNU ld, but otherwise I think many-sections support would require linking libraries with the -mbig-obj flag (which switches to a different format of some headers that has room for more than 32k sections), as well as adding RTS linker support for that new "big object" format. (Although it sounds scary, it looks like the big object format reuses almost all of the rest of COFF, so it shouldn't be too difficult if we end up needing it.)

bgamari requested changes to this revision.Oct 25 2015, 9:31 AM
bgamari edited edge metadata.

A few small changes. Looks great on the whole, however!

compiler/main/DriverPipeline.hs
1901

Do we really always want to enable this? How expensive is this? If it has non-trivial cost then we probably only want to enable this when Opt_SplitSections is enabled.

compiler/nativeGen/Dwarf.hs
87

Could you make emission of separate aranges dependent upon whether split sections is enabled?

compiler/nativeGen/X86/CodeGen.hs
2621–2626

Perhaps but let's do it in another patch if so.

utils/mkUserGuidePart/Options/Linking.hs
109

A few more words here wouldn't hurt. Perhaps mention that the intent is to produce smaller executables.

This revision now requires changes to proceed.Oct 25 2015, 9:31 AM

Several nits and a few questions, but overall this looks good to me.

compiler/cmm/PprCmmDecl.hs
158

char '.' instead of text "."

158

char '.' instead of text "."

163

ptext plesae, and apply sLit to each of the strings below

compiler/main/DriverPipeline.hs
1901

I agree, it would be good to know what the impact on link time is. Unfortunately it doesn't make sense to make it conditional on Opt_SplitSections, because we need to know how the libraries were built, which is unrelated to whether the flag is set now.

compiler/nativeGen/PPC/Ppr.hs
318

these should be pText (sLit "foo") too

compiler/nativeGen/PprBase.hs
99

pText and sLit please

compiler/nativeGen/SPARC/Ppr.hs
330

pText

compiler/nativeGen/X86/Ppr.hs
92–94

Is it possible to make this fail much earlier, during flag parsing, rather than a panic?

394

pText

mk/config.mk.in
313

Should it be enabled by default on supported platforms? See the equivalent default setting for SplitObjs.

utils/mkUserGuidePart/Options/Linking.hs
108

copy/paste-o

olsner updated this revision to Diff 4773.Oct 29 2015, 6:37 PM
olsner edited edge metadata.

Various review fixes

  • Use ptext+sLit instead of text
  • Check for -split-section vs SubsectionsViaSymbols earlier (Keeps the "panic" in codegen though, since Section demands a CLabel)
  • Only generate split-up dwarf aranges if splitting sections
  • Documentation tweaks (+ fix copy-pasto...)
olsner added inline comments.Oct 29 2015, 7:47 PM
compiler/main/DriverPipeline.hs
1901

For linking stage2 GHC with split sections enabled (i.e. libraries have many sections but the GHC code itself doesn't), --gc-sections adds 2.5s of link time, or roughly doubles it (3s vs 5.5s).
For a sort of worst-case comparison, enabling sections for the compiler as well makes --gc-sections link about 4x slower than without gc (9s vs 38s). libHSghc has 600k sections in that config...

In the corresponding test without sections enabled for anything, --gc-sections hardly affects link time at all. For a smaller program like hello world (with libraries built with sections), the overhead from --gc-sections is also minimal.

compiler/main/HscMain.hs
1428

Random thought here - when only using subsections via symbols (= on Mac), we don't seem to do anything special with the SRT tables, so either I'm missing something or Mac doesn't actually get much useful dead code stripping at all (the SRT tables would keep everything alive). Perhaps subsections-via-symbols should also enable srt splitting here.

compiler/nativeGen/X86/Ppr.hs
92–94

Added a check to DynFlags, but just realized I could do even better - the line that prints dspSection comes directly after another pprSectionAlign for Text, so the whole thing is unnecessary.

mk/config.mk.in
313

Maybe :) I was vaguely planning to leave this as an optional feature for now, and later revisit if it's stable enough to make the default (and on which platforms), but maybe that's just me being overly risk-averse.

olsner updated this revision to Diff 4775.Oct 29 2015, 7:50 PM
olsner edited edge metadata.
  • Remove dspSection panic hack
simonmar accepted this revision.Oct 30 2015, 3:30 AM
simonmar edited edge metadata.

LGTM

Nice work. Some small comments. Only address them if you still feel like, because this patch has been delayed quite a bit already.

compiler/llvmGen/LlvmCodeGen/Data.hs
78

Missing docstring. From https://ghc.haskell.org/trac/ghc/wiki/Commentary/CodingStyle#Commentsontop-levelentities:
"Every top-level entity should have a Haddock comment that describes what it does and, if needed, why it's there."

compiler/main/HscMain.hs
1428

Maybe create a ticket for this, with operating system=MacOSX, so this comment doesn't get lost.

driver/utils/merge_sections.ld
3

I would be nice to explain here that the sections need to be merged again because otherwise "the object file will have a million tiny sections each of which will get mmap'd separately by the RTS linker. "

olsner updated this revision to Diff 4871.Nov 1 2015, 1:01 PM
olsner marked 6 inline comments as done.
olsner edited edge metadata.

Oops, the conditional I added for dwarf aranges was the wrong way around...

  • Haddocks for llvmSection/Type
  • Fix flipped conditional
thomie requested changes to this revision.Nov 2 2015, 7:46 AM
thomie added a reviewer: thomie.

This needs looking into, from https://phabricator.haskell.org/harbormaster/build/7466/:

Actual stdout output differs from expected:
987	--- ./driver/recomp011/recomp011.stdout.normalised	2015-11-01 20:09:59.954940998 +0000
988	+++ ./driver/recomp011/recomp011.run.stdout.normalised	2015-11-01 20:09:59.954940998 +0000
989	@@ -7,4 +7,5 @@
990	 [1 of 1] Compiling Main             ( Main.hs, Main.o ) [A.hsinc changed]
991	 Linking Main ...
992	 4343
993	+Linking Main ...
994	 4343
995	\ No newline at end of file
996	*** unexpected failure for recomp011(normal)
997	
998	Unexpected results from:
999	TEST="recomp011"

A comment in testsuite/tests/driver/recomp011/Makefile says:

# don't change anything; check that no compilation happened

When no compilation happens, no linking should happen either (I think). Trac #10974 and D1381 are related.

This revision now requires changes to proceed.Nov 2 2015, 7:46 AM
hsyl20 added a subscriber: hsyl20.Nov 4 2015, 5:00 AM

function-sections has a bad interaction with D1375: the link-info section added by GHC is garbage collected by --gc-sections when the PROGBITS section type is used (confirmed by @olsner with --print-gc-sections).

A fix would be to revert D1375 (i.e. use NOTE section type to avoid garbage section collection) but to use the correct NOTE format this time. Then to read it back, we could either parse "readelf -n" output or directly read the binary. I am much in favor of the latter approach and I could easily add a few lines to D1381 to do it. Before I do it, we must arbitrate between the readelf approach (cf D1326) and D1381.

D1381 is now in so this should be ready to test again.

hsyl20 added inline comments.Nov 11 2015, 8:10 AM
driver/utils/merge_sections.ld
4

Is it still the case? I thought i fixed that when i fixed Trac #9314 with the "m32 allocator" in D985 which has been merged in D975 and applied.

olsner updated this revision to Diff 5047.Nov 11 2015, 7:17 PM
olsner edited edge metadata.
olsner marked 4 inline comments as done.

Rebased on master, and checked that recomp011 and recomp015 still work.

Also found a problem when combining -split-sections with -g. Fixed by not
splitting debug info when splitting sections.

thomie accepted this revision.Nov 12 2015, 3:49 AM
thomie edited edge metadata.

Ok, let's go!

bgamari accepted this revision.Nov 12 2015, 4:06 AM
bgamari edited edge metadata.
In D1242#42663, @olsner wrote:

Rebased on master, and checked that recomp011 and recomp015 still work.

Great, I'll go ahead and merge.

Also found a problem when combining -split-sections with -g. Fixed by not
splitting debug info when splitting sections.

Can you describe this problem a bit more?

This revision was automatically updated to reflect the committed changes.
In D1242#42663, @olsner wrote:

Also found a problem when combining -split-sections with -g. Fixed by not
splitting debug info when splitting sections.

Can you describe this problem a bit more?

The labels emitted at the start of each debug section (dwarfLineLabel, etc) get duplicated once for each split - with -split-objs those end up in different assembly files compiled to different object files (and they'll be needed once in each file), but in this case where all the sections will be assembled in the same file you get conflicting symbol errors instead.

Doing something like section splitting on the debug info itself is not very useful (and after some reading on that it seems it would not have worked anyway). When linking debug info with --gc-sections ld rather keeps all the debug info and replaces references to GC'd functions with placeholder values. (and tools like gdb are aware of that workaround)