RTS linker: don't crash early when not finding extra DLL, most likely it's not needed
ClosedPublic

Authored by awson on Sep 22 2018, 1:36 AM.

Details

Summary

Signed-off-by: Kyrill Briantsev <kyrab@mail.ru>

Allow GHCi to not crash when no assumed DLL is found in the standard location.
E.g. when loading the package built "dyn" way, we may well have the package's
DLL around, and it's the system linker which loads necessary dependencies.

Why does this (partially) fix Trac #11042? It's because we often (and when having packages built dyn way — almost always) don't need to load anything recorded in the extra-libraries stanza, since if the package DLL exists, GHCi linker simply calls the system linker (via dlopen/ LoadLibrary APIs) to load it and doesn't bother to load package prelinked object file (if any) or package static library.

Thus, all "regular" (with no fancy low-level package content manipulation) packages built "dyn" way should be OK after this fix.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
awson created this revision.Sep 22 2018, 1:36 AM
awson edited the summary of this revision. (Show Details)Sep 22 2018, 1:37 AM
awson retitled this revision from Make an excape hatch to not crash early. to Make an escape hatch to not crash early..Sep 22 2018, 4:38 AM
awson edited the summary of this revision. (Show Details)Oct 7 2018, 4:06 AM
awson retitled this revision from Make an escape hatch to not crash early. to RTS linker: don't crash early when not finding extra DLL, most likely it's not needed.Oct 8 2018, 4:04 AM
awson edited the summary of this revision. (Show Details)Oct 15 2018, 3:03 AM

I have tested this patch using https://github.com/serokell/trac11042 and it indeed has the intended effect – the error is now a warning:

[nix-shell:~/trac11042]$ ghc -package zlib ./Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Warning: can't load .so/.DLL for: libz.so (libz.so: cannot open shared object file: No such file or directory)
Linking Main ...

To make sure that we indeed have access to zlib in TH, I also tested a modified version:

{-# LANGUAGE TemplateHaskell, OverloadedStrings #-}

module Main where

import Control.Monad.IO.Class (liftIO)
import Codec.Compression.Zlib (compress)

$(do liftIO $ print (compress "check"); return [])

main = print (compress "test")

It worked correctly as well:

[nix-shell:~/trac11042]$ ghc -package zlib ./Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Warning: can't load .so/.DLL for: libz.so (libz.so: cannot open shared object file: No such file or directory)
"x\156K\206HM\206\ACK\NUL\ENQ\244\SOH\255"
Linking Main ...

[nix-shell:~/trac11042]$ ./Main
"x\156+I-.\SOH\NUL\EOT]\SOH\193"

There are two worries I have:

  1. What if future modifications to the linker cause a regression? We need to figure out an OS-independent test case for this (reproduction instructions I have work only on NixOS).
  2. There's no way to disable this warning. I don't think it provides useful information on systems like NixOS where this is a normal occurrence.

Problem (2) should be easy to address, as load_dyn has access to HscEnv:

  • create a new flag -Wlinker-missing-shared-object enabled by default: add Opt_WarnLinkerMissingSharedObject to WarningFlag in DynFlags.hs, add a matching flagSpec to wWarningFlagsDeps, include it in standardWarnings, update the User's Guide.
  • check for Opt_WarnLinkerMissingSharedObject in hsc_dflags before reporting the warning in load_dyn.

This will allow the user to pass -Wno-linker-missing-shared-object.

Problem (1) is harder to address, I have no good suggestions. But we do need a regression test for this, it's a delicate issue.

I would also suggest that you copy your explanation about the approach and why it fixes Trac #11042 from Diff Summary into a Note and reference it from load_dyn.

compiler/ghci/Linker.hs
1402

Could we improve the wording here to clarify that we searched in standard locations only and to inform the user that it's alright because the system linker may take care of it?

"Couldn't load .so/.DLL from a standard location: " ++ dll ++ " (" ++ err ++ "). Delegating the job to the system linker..."
bgamari requested changes to this revision.Oct 15 2018, 12:25 PM

As another NixOS user, I agree that there should be some way to disable this warning.

compiler/ghci/Linker.hs
1393

Can you add some discussion to the comment describing what the crash_early parameter means?

This revision now requires changes to proceed.Oct 15 2018, 12:25 PM
awson updated this revision to Diff 18331.EditedOct 16 2018, 9:05 AM

I've amended comments slightly and added a large Note.

Does the explanation in the Note make things more clear?

awson updated this revision to Diff 18332.Oct 16 2018, 9:14 AM
awson marked an inline comment as done.

Fix a bug in Note.

awson updated this revision to Diff 18334.Oct 16 2018, 10:22 AM
  • Improve wording.
Harbormaster failed remote builds in B23100: Diff 18332!
Harbormaster failed remote builds in B23099: Diff 18331!
awson updated this revision to Diff 18335.Oct 16 2018, 11:09 AM
  • Improve Note.
awson updated this revision to Diff 18381.Oct 19 2018, 8:01 AM
awson marked an inline comment as done.
  • Add flag to control the 'missed extra shared lib' warning

Current documentation talks about GHCi, but we actually face the same issue when using TemplateHaskell because it loads library dynamically, too. Could you mention TemplateHaskell there?

compiler/ghci/Linker.hs
1406

When GHC prints other warnings, it also mentions the flag that enabled it:

Prelude> t = \a -> \a -> a

<interactive>:3:12: warning: [-Wname-shadowing]
    This binding for ‘a’ shadows the existing binding
      bound at <interactive>:3:6

Note the [-Wname-shadowing].

Could we do that here, too?

Warning [-Wmissed-extra-shared-lib]:
  Can't load .so/.DLL for zlib.
  ...
1407

Would it be possible to use proper formatting primitives for this message?

  1. $$ instead of "\n" above
  2. A single nest instead of spaces inside each string.
bgamari added inline comments.Oct 19 2018, 12:21 PM
compiler/ghci/Linker.hs
1373

Shouldn't we add a testcase for this situation?

1404

I might reword this to "can't load dynamic library for '" ++ dll ++ ": " ++ err.

1406

Right, I think we want to use printBagOfErrors here. This takes care of all of the usual error formatting.

awson added inline comments.Oct 22 2018, 9:46 AM
compiler/ghci/Linker.hs
1406

AFAIUI, printBagOfErrors is used to report errors in Haskell source code, it requires ErrMsg which contains SrcSpan, which is meaningless here, since the problem is not in the Haskell source code.

GHC code contains places where warnings are reported the same way I've done it. More or less all of RTS does the same thing too. And this problem is an RTS-related one.

And yeah, you mostly can't selectively disable RTS warnings, and I don't think we need a separate warning flag here too. Yes, I've introduced it, but I'm absolutely not happy about this.

int-index added inline comments.Oct 22 2018, 11:21 AM
compiler/ghci/Linker.hs
1406

which contains SrcSpan, which is meaningless here

We have noSrcSpan for this.

GHC code contains places where warnings are reported the same way I've done it.

Is there a reason for this? Maybe we should update these other places to use printBagOfErrors too, as it has superior formatting?

I don't think we need a separate warning flag here too. Yes, I've introduced it, but I'm absolutely not happy about this.

Why?

awson added inline comments.Oct 22 2018, 2:09 PM
compiler/ghci/Linker.hs
1406

Thanks for noSrcSpan suggestion.

Regarding the warning flag: this patch should mostly satisfy those who don't care. Things crashed before this patch and stopped crashing after.

There is no much difference between supplying an extra command-line option (disable the warning) or setting some environment variable (e.g. allowing the linker to finally find the problematic shared library).

Thus, if you care, you simply devise a workaround, like e.g. NixOS does.

int-index added inline comments.Oct 22 2018, 2:45 PM
compiler/ghci/Linker.hs
1406

Seeing that this patch does not solve the actual issue and there are cases when linking still fails, I can see why you don't want a flag to disable this warning – so that users are inclined to set up their build environment properly.

My suggestion to add the flag was coming from a misconception that this fix is sufficient in all cases and could be used on NixOS instead of the LD_LIBRARY_PATH hack.

awson added inline comments.Oct 22 2018, 2:56 PM
compiler/ghci/Linker.hs
1406

Pretty much this, but still the patch solves the cases the ticket mentions.

awson added inline comments.Oct 27 2018, 2:35 AM
compiler/ghci/Linker.hs
1406

And the patch can be used on NixOS instead of the LD_LIBRARY_PATH hack.

awson updated this revision to Diff 18482.Oct 27 2018, 3:04 AM
  • Remove clutter. Use formatting primitives.
awson marked 10 inline comments as done.Oct 27 2018, 3:13 AM
awson added inline comments.
compiler/ghci/Linker.hs
1406

ErrMsg also wants absolutely irrelevant PrintUnqualified parameter. We absolutely don't need this.

awson added a comment.EditedOct 27 2018, 3:29 AM

I think I'm mostly done with this.

I understand that the patch is a bit unusual for people to be comfort with it.

But it fixes some known use cases right now and, perhaps, fixes others which we don't know about. It fixes the ticket use case.

And, most important, it doesn't make a single thing any worse.

int-index requested changes to this revision.Oct 27 2018, 3:31 AM

You marked https://phabricator.haskell.org/D5170#inline-41160 as "Done" but it's not done.

compiler/ghci/Linker.hs
1406

And the patch can be used on NixOS instead of the LD_LIBRARY_PATH hack.

It does not cover all cases while setting LD_LIBRARY_PATH does. For instance, if a package uses foreign import and then refers to the imported entity in a TemplateHaskell splice in the same package. (At least according to my current understanding of the change).

Right now Nix integration in stack sets LD_LIBRARY_PATH and it wouldn't be right to remove this functionality after this patch.

1409

Why would you apply nest to each line individually and introduce a helper instead of simply applying to all lines at once?

nest 2 (vcat
  [ text ...
  , text ...
  , text ... ])
This revision now requires changes to proceed.Oct 27 2018, 3:31 AM
awson marked an inline comment as done.Oct 27 2018, 3:45 AM
awson added inline comments.
compiler/ghci/Linker.hs
1409

Because the first line has no nest.

int-index added inline comments.Oct 27 2018, 3:48 AM
compiler/ghci/Linker.hs
1409

I don't see how the first line is relevant.

text first_line $$
nest 2 (vcat
  [ text ...
  , text ...
  , text ... ])
awson marked an inline comment as done.Oct 27 2018, 4:32 AM
awson added inline comments.
compiler/ghci/Linker.hs
1406

Yeah, we discussed this problem before.

But AFAIUI the problem is that now GHCi don't even try to load anything listed in extra-libraries of the package when compiling this package.

It only tries to load only extra-libraries of dependency package when loading it.

awson added inline comments.Oct 27 2018, 4:42 AM
compiler/ghci/Linker.hs
1406

So your scenario won't work anyway on all OSes, not only on NixOS.

awson updated this revision to Diff 18484.Oct 27 2018, 6:44 AM
  • Polish.
awson marked 7 inline comments as done.Oct 27 2018, 7:00 AM
awson added inline comments.
compiler/ghci/Linker.hs
1373

It's exist in a slightly different form (when a package doesn't expose anything except the library in extra-libraries), see, ghcilink004 for example.

In fact this double-purpose semantics of extra-libraries (to list dependencies and to expose the purely non-haskell code of the package) is a mess IMHO and the huge sources of problems.

awson marked 2 inline comments as done.Oct 27 2018, 7:03 AM
int-index accepted this revision.Oct 27 2018, 7:03 AM

I'd be happy to see a test case as I asked before, but overall looks good now.

Phyx added a subscriber: Phyx.Wed, Nov 21, 2:25 AM

Is there any outstanding work here or is everyone happy?

bgamari accepted this revision.Tue, Dec 11, 12:44 PM

Alright, this looks okay to me. Thanks for sticking with it @awson!

This revision is now accepted and ready to land.Tue, Dec 11, 12:44 PM
This revision was automatically updated to reflect the committed changes.