RTS linker: don't crash early when not finding extra DLL, most likely it's not needed
Needs ReviewPublic

Authored by awson on Sep 22 2018, 1:36 AM.

Details

Reviewers
hvr
bgamari
Trac Issues
#11042
Summary

Signed-off-by: Kyrill Briantsev <kyrab@mail.ru>

Allow GHCi to not crash when no assumed DLL is found in the standard location.
E.g. when loading the package built "dyn" way, we may well have the package's
DLL around, and it's the system linker which loads necessary dependencies.

Why does this (partially) fix Trac #11042? It's because we often (and when having packages built dyn way — almost always) don't need to load anything recorded in the extra-libraries stanza, since if the package DLL exists, GHCi linker simply calls the system linker (via dlopen/ LoadLibrary APIs) to load it and doesn't bother to load package prelinked object file (if any) or package static library.

Thus, all "regular" (with no fancy low-level package content manipulation) packages built "dyn" way should be OK after this fix.

awson created this revision.Sep 22 2018, 1:36 AM
awson edited the summary of this revision. (Show Details)Sep 22 2018, 1:37 AM
awson retitled this revision from Make an excape hatch to not crash early. to Make an escape hatch to not crash early..Sep 22 2018, 4:38 AM
awson edited the summary of this revision. (Show Details)Sun, Oct 7, 4:06 AM
awson retitled this revision from Make an escape hatch to not crash early. to RTS linker: don't crash early when not finding extra DLL, most likely it's not needed.Mon, Oct 8, 4:04 AM
awson edited the summary of this revision. (Show Details)Mon, Oct 15, 3:03 AM

I have tested this patch using https://github.com/serokell/trac11042 and it indeed has the intended effect – the error is now a warning:

[nix-shell:~/trac11042]$ ghc -package zlib ./Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Warning: can't load .so/.DLL for: libz.so (libz.so: cannot open shared object file: No such file or directory)
Linking Main ...

To make sure that we indeed have access to zlib in TH, I also tested a modified version:

{-# LANGUAGE TemplateHaskell, OverloadedStrings #-}

module Main where

import Control.Monad.IO.Class (liftIO)
import Codec.Compression.Zlib (compress)

$(do liftIO $ print (compress "check"); return [])

main = print (compress "test")

It worked correctly as well:

[nix-shell:~/trac11042]$ ghc -package zlib ./Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Warning: can't load .so/.DLL for: libz.so (libz.so: cannot open shared object file: No such file or directory)
"x\156K\206HM\206\ACK\NUL\ENQ\244\SOH\255"
Linking Main ...

[nix-shell:~/trac11042]$ ./Main
"x\156+I-.\SOH\NUL\EOT]\SOH\193"

There are two worries I have:

  1. What if future modifications to the linker cause a regression? We need to figure out an OS-independent test case for this (reproduction instructions I have work only on NixOS).
  2. There's no way to disable this warning. I don't think it provides useful information on systems like NixOS where this is a normal occurrence.

Problem (2) should be easy to address, as load_dyn has access to HscEnv:

  • create a new flag -Wlinker-missing-shared-object enabled by default: add Opt_WarnLinkerMissingSharedObject to WarningFlag in DynFlags.hs, add a matching flagSpec to wWarningFlagsDeps, include it in standardWarnings, update the User's Guide.
  • check for Opt_WarnLinkerMissingSharedObject in hsc_dflags before reporting the warning in load_dyn.

This will allow the user to pass -Wno-linker-missing-shared-object.

Problem (1) is harder to address, I have no good suggestions. But we do need a regression test for this, it's a delicate issue.

I would also suggest that you copy your explanation about the approach and why it fixes Trac #11042 from Diff Summary into a Note and reference it from load_dyn.

compiler/ghci/Linker.hs
1402

Could we improve the wording here to clarify that we searched in standard locations only and to inform the user that it's alright because the system linker may take care of it?

"Couldn't load .so/.DLL from a standard location: " ++ dll ++ " (" ++ err ++ "). Delegating the job to the system linker..."
bgamari requested changes to this revision.Mon, Oct 15, 12:25 PM

As another NixOS user, I agree that there should be some way to disable this warning.

compiler/ghci/Linker.hs
1393

Can you add some discussion to the comment describing what the crash_early parameter means?

This revision now requires changes to proceed.Mon, Oct 15, 12:25 PM
awson updated this revision to Diff 18331.EditedTue, Oct 16, 9:05 AM

I've amended comments slightly and added a large Note.

Does the explanation in the Note make things more clear?

awson updated this revision to Diff 18332.Tue, Oct 16, 9:14 AM
awson marked an inline comment as done.

Fix a bug in Note.

awson updated this revision to Diff 18334.Tue, Oct 16, 10:22 AM
  • Improve wording.
Harbormaster failed remote builds in B23100: Diff 18332!
Harbormaster failed remote builds in B23099: Diff 18331!
awson updated this revision to Diff 18335.Tue, Oct 16, 11:09 AM
  • Improve Note.
awson updated this revision to Diff 18381.Fri, Oct 19, 8:01 AM
awson marked an inline comment as done.
  • Add flag to control the 'missed extra shared lib' warning

Current documentation talks about GHCi, but we actually face the same issue when using TemplateHaskell because it loads library dynamically, too. Could you mention TemplateHaskell there?

compiler/ghci/Linker.hs
1406

When GHC prints other warnings, it also mentions the flag that enabled it:

Prelude> t = \a -> \a -> a

<interactive>:3:12: warning: [-Wname-shadowing]
    This binding for ‘a’ shadows the existing binding
      bound at <interactive>:3:6

Note the [-Wname-shadowing].

Could we do that here, too?

Warning [-Wmissed-extra-shared-lib]:
  Can't load .so/.DLL for zlib.
  ...
1407

Would it be possible to use proper formatting primitives for this message?

  1. $$ instead of "\n" above
  2. A single nest instead of spaces inside each string.
bgamari added inline comments.Fri, Oct 19, 12:21 PM
compiler/ghci/Linker.hs
1373

Shouldn't we add a testcase for this situation?

1404

I might reword this to "can't load dynamic library for '" ++ dll ++ ": " ++ err.

1406

Right, I think we want to use printBagOfErrors here. This takes care of all of the usual error formatting.

awson added inline comments.Mon, Oct 22, 9:46 AM
compiler/ghci/Linker.hs
1406

AFAIUI, printBagOfErrors is used to report errors in Haskell source code, it requires ErrMsg which contains SrcSpan, which is meaningless here, since the problem is not in the Haskell source code.

GHC code contains places where warnings are reported the same way I've done it. More or less all of RTS does the same thing too. And this problem is an RTS-related one.

And yeah, you mostly can't selectively disable RTS warnings, and I don't think we need a separate warning flag here too. Yes, I've introduced it, but I'm absolutely not happy about this.

int-index added inline comments.Mon, Oct 22, 11:21 AM
compiler/ghci/Linker.hs
1406

which contains SrcSpan, which is meaningless here

We have noSrcSpan for this.

GHC code contains places where warnings are reported the same way I've done it.

Is there a reason for this? Maybe we should update these other places to use printBagOfErrors too, as it has superior formatting?

I don't think we need a separate warning flag here too. Yes, I've introduced it, but I'm absolutely not happy about this.

Why?

awson added inline comments.Mon, Oct 22, 2:09 PM
compiler/ghci/Linker.hs
1406

Thanks for noSrcSpan suggestion.

Regarding the warning flag: this patch should mostly satisfy those who don't care. Things crashed before this patch and stopped crashing after.

There is no much difference between supplying an extra command-line option (disable the warning) or setting some environment variable (e.g. allowing the linker to finally find the problematic shared library).

Thus, if you care, you simply devise a workaround, like e.g. NixOS does.

int-index added inline comments.Mon, Oct 22, 2:45 PM
compiler/ghci/Linker.hs
1406

Seeing that this patch does not solve the actual issue and there are cases when linking still fails, I can see why you don't want a flag to disable this warning – so that users are inclined to set up their build environment properly.

My suggestion to add the flag was coming from a misconception that this fix is sufficient in all cases and could be used on NixOS instead of the LD_LIBRARY_PATH hack.

awson added inline comments.Mon, Oct 22, 2:56 PM
compiler/ghci/Linker.hs
1406

Pretty much this, but still the patch solves the cases the ticket mentions.