Hadrian: support dynamically linking ghc
Concern Raised79d5427e1f9d

Authored by DavidEichmann on Nov 29 2018, 11:22 AM.

Description

Hadrian: support dynamically linking ghc

  • (Trac #15837 point 5) Use the -rpath gcc option and using the $ORIGIN

variable which the dynamic linker sets to the location of the ghc
binary.

  • (Trac #15837 point 4) "-fPIC -dynamic" options are used when building ghc

when either ghc or the rts have a dynamic way.

  • (Trac #15837 point 7) "-shared -dynload deploy" options are only used when

linking a library (no longer when linking a program).

Reviewers: bgamari, alpmestan

Reviewed By: alpmestan

Subscribers: adamse, rwbarton, carter

Differential Revision: https://phabricator.haskell.org/D5281

Details

Auditors
harpocrates
Committed
alpmestanNov 29 2018, 11:22 AM
Reviewer
alpmestan
Differential Revision
D5281: Hadrian: support dynamically linking ghc
Parents
rGHCfb9971607c5a: Hadrian: bump Cabal submodule, install extra dynamic flavours of RTS
Branches
Unknown
Tags
Unknown

This breaks the default ./hadrian/build.sh on OSX. :(

I'd love to help debug this. Also, why is it that GHC needs to be built dynamically if any of the RTS ways is dynamic?

/cc @alpmestan

harpocrates raised a concern with this commit.Dec 1 2018, 8:59 PM

I should have marked my previous comment (https://phabricator.haskell.org/rGHC79d5427e1f9de02c0b171bf5db46b6b49c6f85e3#141680) as a concern.

This commit now has outstanding concerns.Dec 1 2018, 8:59 PM

This breaks the default ./hadrian/build.sh on OSX. :(

Uh oh, sorry about this.

I'd love to help debug this.

Great, that'd be helpful since I don't have an OS X machine around.

Also, why is it that GHC needs to be built dynamically if any of the RTS ways is dynamic?

@DavidEichmann checked and this is what the make build system does, it does default to building a dynamic GHC when the circumstances allow it (not on Windows etc). We may however have missed some special work done on OS X.

Could you show us how exactly the build fails? Ideally running with --trace and showing the failing command, in addition to the error message.

Then we can start poking at the patch to see what''s causing troubles on OS X. Maybe the -rpath business?

I did some sleuthing. Apologies if this is overly verbose - I'm out of my depth, so it's difficult for me to gauge what is obvious from what isn't. Also, please call out anything that sounds incorrect!

With Hadrian

The _build/stage1/bin/ghc binary (with dynamic linking) builds just fine, but as soon as you call it, things go wrong:

$ _build/stage1/bin/ghc
dyld: Symbol not found: _ffi_type_double
  Referenced from: _build/stage1/lib/../lib/x86_64-osx-ghc-8.7.20181202/libHSghci-8.7-ghc8.7.20181202.dylib
  Expected in: flat namespace
 in _build/stage1/lib/../lib/x86_64-osx-ghc-8.7.20181202/libHSghci-8.7-ghc8.7.20181202.dylib
[1]    52701 abort      _build/stage1/bin/ghc

The missing symbol _ffi_type_double is supposed to come from libffi.dylib. That suggests that dylibs are getting yanked out of _build/stage1/bin/ghc. Sure enough, looks like libffi.dylib is missing:

$ otool -L _build/stage1/bin/ghc
_build/stage1/bin/ghc:
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)
        @rpath/libHShaskeline-0.7.4.3-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSghc-8.7-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSprocess-1.6.3.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSghci-8.7-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHStransformers-0.5.5.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSghc-boot-8.7-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSghc-boot-th-8.7-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSdirectory-1.3.3.1-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSunix-2.7.2.2-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHStime-1.9.2-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSfilepath-1.4.2.1-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHScontainers-0.6.0.1-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSbytestring-0.10.9.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSbase-4.12.0.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSinteger-gmp-1.0.2.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSghc-prim-0.5.3-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSrts-1.0_thr-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)      

So how is _build/stage1/bin/ghc being produced? Take a look at the final gcc command (which I got to by passing -v -keep-tmp-files to the ghc -o _build/stage1/bin/ghc ... call):

$ gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -Wl,-rpath -Wl,/../lib/x86_64-darwin-ghc-8.7.20181202 \
 -lgmp -o _build/stage1/bin/ghc -lm -fno-common -U__PIC__ -D__PIC__ -Wl,-no_compact_unwind \
 _build/stage1/ghc/build/c/hschooks.o _build/stage1/ghc/build/Main.o \
 _build/stage1/ghc/build/GHCi/Leak.o _build/stage1/ghc/build/GHCi/UI.o \
 _build/stage1/ghc/build/GHCi/UI/Info.o _build/stage1/ghc/build/GHCi/UI/Monad.o \
 _build/stage1/ghc/build/GHCi/UI/Tags.o -L_build/stage1/lib/../lib/x86_64-osx-ghc-8.7.20181202 \
 -Xlinker -rpath -Xlinker _build/stage1/lib/../lib/x86_64-osx-ghc-8.7.20181202 \
 /var/folders/n5/0p2l4ydj6b34mxcd3bfz7djc0000gp/T/ghc44686_0/ghc_2.o \
 -Wl,-u,_base_GHCziTopHandler_runIO_closure -Wl,-u,_base_GHCziTopHandler_runNonIO_closure \
 -Wl,-u,_ghczmprim_GHCziTuple_Z0T_closure -Wl,-u,_ghczmprim_GHCziTypes_True_closure \
 -Wl,-u,_ghczmprim_GHCziTypes_False_closure -Wl,-u,_base_GHCziPack_unpackCString_closure \
 -Wl,-u,_base_GHCziWeak_runFinalizzerBatch_closure \
 -Wl,-u,_base_GHCziIOziException_stackOverflow_closure \
 -Wl,-u,_base_GHCziIOziException_heapOverflow_closure \
 -Wl,-u,_base_GHCziIOziException_allocationLimitExceeded_closure \
 -Wl,-u,_base_GHCziIOziException_blockedIndefinitelyOnMVar_closure \
 -Wl,-u,_base_GHCziIOziException_blockedIndefinitelyOnSTM_closure \
 -Wl,-u,_base_GHCziIOziException_cannotCompactFunction_closure \
 -Wl,-u,_base_GHCziIOziException_cannotCompactPinned_closure \
 -Wl,-u,_base_GHCziIOziException_cannotCompactMutable_closure \
 -Wl,-u,_base_ControlziExceptionziBase_absentSumFieldError_closure \
 -Wl,-u,_base_ControlziExceptionziBase_nonTermination_closure \
 -Wl,-u,_base_ControlziExceptionziBase_nestedAtomically_closure \
 -Wl,-u,_base_GHCziEventziThread_blockedOnBadFD_closure \
 -Wl,-u,_base_GHCziConcziSync_runSparks_closure \
 -Wl,-u,_base_GHCziConcziIO_ensureIOManagerIsRunning_closure \
 -Wl,-u,_base_GHCziConcziIO_ioManagerCapabilitiesChanged_closure \
 -Wl,-u,_base_GHCziConcziSignal_runHandlersPtr_closure \
 -Wl,-u,_base_GHCziTopHandler_flushStdHandles_closure -Wl,-u,_base_GHCziTopHandler_runMainIO_closure \
 -Wl,-u,_ghczmprim_GHCziTypes_Czh_con_info -Wl,-u,_ghczmprim_GHCziTypes_Izh_con_info \
 -Wl,-u,_ghczmprim_GHCziTypes_Fzh_con_info -Wl,-u,_ghczmprim_GHCziTypes_Dzh_con_info \
 -Wl,-u,_ghczmprim_GHCziTypes_Wzh_con_info -Wl,-u,_base_GHCziPtr_Ptr_con_info \
 -Wl,-u,_base_GHCziPtr_FunPtr_con_info -Wl,-u,_base_GHCziInt_I8zh_con_info \
 -Wl,-u,_base_GHCziInt_I16zh_con_info -Wl,-u,_base_GHCziInt_I32zh_con_info \
 -Wl,-u,_base_GHCziInt_I64zh_con_info -Wl,-u,_base_GHCziWord_W8zh_con_info \
 -Wl,-u,_base_GHCziWord_W16zh_con_info -Wl,-u,_base_GHCziWord_W32zh_con_info \
 -Wl,-u,_base_GHCziWord_W64zh_con_info -Wl,-u,_base_GHCziStable_StablePtr_con_info \
 -Wl,-u,_hs_atomic_add8 -Wl,-u,_hs_atomic_add16 -Wl,-u,_hs_atomic_add32 -Wl,-u,_hs_atomic_sub8 \
 -Wl,-u,_hs_atomic_sub16 -Wl,-u,_hs_atomic_sub32 -Wl,-u,_hs_atomic_and8 -Wl,-u,_hs_atomic_and16 \
 -Wl,-u,_hs_atomic_and32 -Wl,-u,_hs_atomic_nand8 -Wl,-u,_hs_atomic_nand16 -Wl,-u,_hs_atomic_nand32 \
 -Wl,-u,_hs_atomic_or8 -Wl,-u,_hs_atomic_or16 -Wl,-u,_hs_atomic_or32 -Wl,-u,_hs_atomic_xor8 \
 -Wl,-u,_hs_atomic_xor16 -Wl,-u,_hs_atomic_xor32 -Wl,-u,_hs_cmpxchg8 -Wl,-u,_hs_cmpxchg16 \
 -Wl,-u,_hs_cmpxchg32 -Wl,-u,_hs_atomicread8 -Wl,-u,_hs_atomicread16 -Wl,-u,_hs_atomicread32 \
 -Wl,-u,_hs_atomicwrite8 -Wl,-u,_hs_atomicwrite16 -Wl,-u,_hs_atomicwrite32 -Wl,-search_paths_first \
 -Wl,-dead_strip \
 -lHShaskeline-0.7.4.3-ghc8.7.20181202 -lHSstm-2.5.0.0-ghc8.7.20181202 \
 -lHSghc-8.7-ghc8.7.20181202 -lHSterminfo-0.4.1.2-ghc8.7.20181202 -lHSprocess-1.6.3.0-ghc8.7.20181202 \
 -lHShpc-0.6.0.3-ghc8.7.20181202 -lHSghci-8.7-ghc8.7.20181202 \
 -lHStransformers-0.5.5.0-ghc8.7.20181202 -lHStemplate-haskell-2.15.0.0-ghc8.7.20181202 \
 -lHSpretty-1.1.3.6-ghc8.7.20181202 -lHSghc-heap-8.7-ghc8.7.20181202 -lHSghc-boot-8.7-ghc8.7.20181202 \
 -lHSghc-boot-th-8.7-ghc8.7.20181202 -lHSbinary-0.8.6.0-ghc8.7.20181202 \
 -lHSdirectory-1.3.3.1-ghc8.7.20181202 -lHSunix-2.7.2.2-ghc8.7.20181202 \
 -lHStime-1.9.2-ghc8.7.20181202 -lHSfilepath-1.4.2.1-ghc8.7.20181202 \
 -lHScontainers-0.6.0.1-ghc8.7.20181202 -lHSbytestring-0.10.9.0-ghc8.7.20181202 \
 -lHSdeepseq-1.4.4.0-ghc8.7.20181202 -lHSarray-0.5.2.0-ghc8.7.20181202 \
 -lHSbase-4.12.0.0-ghc8.7.20181202 -lHSinteger-gmp-1.0.2.0-ghc8.7.20181202 \
 -lHSghc-prim-0.5.3-ghc8.7.20181202 -lHSrts-1.0_thr-ghc8.7.20181202 \
 -lffi -lncurses -liconv -lgmp -lm -ldl -Wl,-dead_strip_dylibs

Note in particular the very last -Wl,-dead_strip_dylibs. If we remove that, _build/stage1/bin/ghc starts working again and otool lists libffi (and two other system libs which also seem important and a bunch of unimportant haskell dylibs):

$ otool -L _build/stage1/bin/ghc
_build/stage1/bin/ghc:
        /usr/local/opt/gmp/lib/libgmp.10.dylib (compatibility version 14.0.0, current version 14.2.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)
        @rpath/libHShaskeline-0.7.4.3-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSstm-2.5.0.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSghc-8.7-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSterminfo-0.4.1.2-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSprocess-1.6.3.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHShpc-0.6.0.3-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSghci-8.7-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHStransformers-0.5.5.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHStemplate-haskell-2.15.0.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSpretty-1.1.3.6-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSghc-heap-8.7-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSghc-boot-8.7-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSghc-boot-th-8.7-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSbinary-0.8.6.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSdirectory-1.3.3.1-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSunix-2.7.2.2-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHStime-1.9.2-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSfilepath-1.4.2.1-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHScontainers-0.6.0.1-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSbytestring-0.10.9.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSdeepseq-1.4.4.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSarray-0.5.2.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSbase-4.12.0.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSinteger-gmp-1.0.2.0-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSghc-prim-0.5.3-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libHSrts-1.0_thr-ghc8.7.20181202.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libffi.dylib (compatibility version 1.0.0, current version 1.0.0)
        /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
        /usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)

So: the problem is figuring out how to prevent a handful of important libraries from being stripped out (without also including the libraries that really aren't needed).

With the Make system

So how on earth does the Make system answer this question? It doesn't. It (dynamically) links in only non-Haskell libraries, and makes a wrapper script (inplace/bin/ghc-stage1) which sets DYLD_LIBRARY_PATH to include locations of _all_ Haskell dylibs that the linker may end up being interested in. As you can see the underlying binary doesn't actually have any libraries linked in to it:

$ otool -L inplace/lib/bin/ghc-stage1
inplace/lib/bin/ghc-stage1:
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)
        /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
        /usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)
        /usr/local/opt/gmp/lib/libgmp.10.dylib (compatibility version 14.0.0, current version 14.2.0)

The solution?

There's got to be a clean way to exempt some library from being -Wl,-dead_strip_dylibs. One not-so-clean way would be to use -Wl,-u for one symbol from each library we want to force the linker to keep (that'll convince the linker to leave that library in). I'll keep looking for the right settings...

In any case, the rpath approach is the one we want since it avoids the non-relocatable DYLD_LIBRARY_PATH-based wrapper script.

There is a similar issue on linux where we link the system ffi instead of the local one. This fails e.g. on CircleCI as ld cannot find the ffi library. I don't know if this is related to the OSX failure, but I'll continue investigating this on linux.

There is a similar issue on linux where we link the system ffi instead of the local one. This fails e.g. on CircleCI as ld cannot find the ffi library. I don't know if this is related to the OSX failure, but I'll continue investigating this on linux.

I don't think this is related, but you are totally right - the linker is picking up the wrong ffi library.

Back to the original issue: it looks like GHC is passing -Wl,-dead_strip_dylibs only as a way of working around a limitation Mac has for the number of dylibs linked into an executable (relevant commit: https://git.haskell.org/ghc.git/commitdiff/b592bd98ff25730bbe3c13d6f62a427df8c78e28). Since GHC the binary depends on so few dylibs though, I think it would be a solution to this whole issue if we found a way to get rid of -Wl,-dead_strip_dylibs in this case here. That's easier said than done - -Wl,-dead_strip_dylibs is unconditionally emitted by GHC itself (and I don't think the Mac linker supports inverse flags)...

adamse added a subscriber: adamse.Dec 2 2018, 3:28 PM

The question (to me) is: why doesn't ghc reference libffi if it needs it? Or are we bundling libffi just for convenience?

For completeness sake: here's the dead_strip_dylib documentation:

-dead_strip_dylibs
            Remove dylibs that are unreachable by the entry point or exported symbols. That is, suppresses the generation of load command commands for dylibs
            which supplied no symbols during the link. This option should not be used when linking against a dylib which is required at runtime for some indi-
            rect reason such as the dylib has an important initializer.

So we are looking for libffi as a dependency of ghci, that makes some sense as ghci does reference quite a bit of libffi symbols.

However ghci.cabal does *not* list

extra-libraries: ffi

As such I assume we are not linking it, and expect the symbols to magically appear later?