Always check the relocation value for x86_64
ClosedPublic

Authored by watashi on Sep 20 2018, 10:01 AM.

Details

Summary

Always check that no overflow happens during relocation for x86_64.
It's not safe to assume the result returned by ocAllocateSymbolExtras is
always valid if we allocate it neither in lower 2G nor in a contiguous range
with the image.

There are also some minor fixes in this diff:

  • off >= 0x7fffffffL should be >
  • use of unaligned pointer is undefined behavior, use memcpy instead, gcc will be able to optimize it to mov %edx, (%rax).
Test Plan

build ghci with:

DYNAMIC_GHC_PROGRAMS = NO
DYNAMIC_BY_DEFAULT = NO

and play with it.

./validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
watashi created this revision.Sep 20 2018, 10:01 AM

Great! I believe this would have helped me during my investigation from Trac #15570.

simonmar added inline comments.Sep 20 2018, 3:11 PM
rts/linker/Elf.c
1670–1677

How can this be out of range? I thought makeSymbolExtra should allocate the indirection next to the object file, so the offset shouldn't be over 2GB.

1673

Do you mean -fPIC instead of -split-objs here?

1698

-fPIC?

watashi added inline comments.Sep 20 2018, 4:38 PM
rts/linker/Elf.c
1670–1677

Right now, I don't think this can happen when we use mmap in x86_64 as we always allocate to the lower 2G.
The plan is to put PIC code anywhere, then we must make sure the SymbolExtras is within 2G of the image (e.g., by using contiguous memory).

We cannot really use #define ALWAYS_PIC for the PIC object code generated by ghc in x86_64 because it contains relocations like COMPAT_R_X86_64_PC32 to a different section in the same object. This also requires us to put sections from the same object within 2G range (e.g. by not loading sections separately).

1673

Assume this caused by an object file is bigger than 2G.
It can also be caused by using a RTS comiled with bad combination of flags. (e.g. not using contiguous mmap while not allocate to lower 2G)

I think if we see COMPAT_R_X86_64_GOTPCREL, it's likely this is already compiled with -fPIC.

watashi added inline comments.Sep 20 2018, 6:42 PM
rts/linker/Elf.c
1673

OK, I don't think -split-objs is the right error message here. As currently the split objects can have R_X86_64_PC32 relocation to symbol defined in a different split:

watashi % cat B.hs
module B where

foo :: Int -> Int
foo = subtract 42
{-# NOINLINE foo #-}

bar :: Int -> Int
bar = foo . foo
watashi % ghc -fPIC -dynamic -fforce-recomp -split-objs B.hs -c
watashi % objdump -r B_o_split/B__8.o | grep foo
0000000000000050 R_X86_64_PC32     B_foo_closure-0x0000000000000004
0000000000000057 R_X86_64_PC32     B_foo_closure-0x0000000000000004
0000000000000000 R_X86_64_64       B_foo_closure
watashi % objdump -t B_o_split/B__8.o | grep foo
0000000000000000         *UND*	0000000000000000 B_foo_closure
watashi % objdump -t B_o_split/B__7.o | grep foo
0000000000000000 g     O .data	0000000000000000 B_foo_closure
0000000000000018 g     O .text	000000000000005a B_foo_info

Probably we should just panic.

simonmar added inline comments.Sep 21 2018, 2:11 AM
rts/linker/Elf.c
1670–1677

The SymbolExtras area is allocated contiguously with the image, wherever it is in the address space, see https://phabricator.haskell.org/diffusion/GHC/browse/master/rts%2Flinker%2FSymbolExtras.c$53

Indeed it *must* be adjacent to the image, since that's the point of the GOT/PLT.

Do we really have relative relocations between sections? I thought the linker would reject those. But in any case, we can probably arrange to map the whole image rather than mapping sections separately.

1673

Object files cannot be larger than 2GB in the x86_64 small memory model, and GHC doesn't support any other models. So it's just an error for a single object or shared library to be larger than 2GB. (or perhaps for any individual section to be larger than 2GB)

-split-objs is deprecated, we use -ffunction-sections now (from GHC 8.2+ I think). (and that might complicate things). We definitely don't want to use -split-objs in conjunction with dynamic linking because loading all the small objects separately is really slow.

For -ffunction-sections we have a linker script to put everything in the same section for GHCi, otherwise linking is really slow, see https://phabricator.haskell.org/diffusion/GHC/browse/master/driver/utils/merge_sections.ld

watashi added inline comments.Sep 21 2018, 5:26 AM
rts/linker/Elf.c
1670–1677

I think we actually go through this code path instead: https://phabricator.haskell.org/diffusion/GHC/browse/master/rts%2Flinker%2FSymbolExtras.c$77-80
It always allocate to lower 2G, so it's also fine.

We will want the one you pointed to instead.
The check here can help us detect errors if we don't use the correct one, this bug is otherwise very hard to debug.

Do we really have relative relocations between sections?

Yes, see we earlier comment. It can even reference to a symbol in another split object.

I thought the linker would reject those.

Yes, very interesting.

ld would reject it, but ghc is able to link it.

But in any case, we can probably arrange to map the whole image rather than mapping sections separately.

Right, that's the plan.

1673

What error message will you suggest here?
-split-objs is definitely wrong, -fPIC is not accurate either, there are many possible ways we can end up with this overflow.

simonmar added inline comments.Sep 25 2018, 5:09 PM
rts/linker/Elf.c
1673

Would -fPIC -fexternal-dynamic-refs be enough?

watashi added inline comments.Sep 25 2018, 5:27 PM
rts/linker/Elf.c
1673

Yes, assume RTS is built with current flags.
I will see if I can measure the overhead of this and decide if we should remove this check.
But as we have makeSymbolExtra, which is relatively expensive, before this check, so it should not be too much.

simonmar added inline comments.Sep 27 2018, 7:01 AM
rts/linker/Elf.c
1673

I'm not all that concerned about the overhead of the check, the main issue for me is understanding in what circumstances the check may fail, so that we can add appropriate comments and give a helpful error message.

watashi added inline comments.Sep 27 2018, 9:39 AM
rts/linker/Elf.c
1673

The old ones will fail if the it's not built with -fPIC -fexternal-dynamic-refs and the error message is correct.

the new ones added in this diff, which are related to got/plt, will fail if the rts is built with neither of following conditions to be true:

  • the object is always load to lower 2G address
  • the extra symbol is contiguous with the object file image.

So how about a message like

Make sure the RTS linker is compiled with correct flags to either allocate in lower 2G address space or use contiguous address for GOT/PLT.

or, in the future, when those flags are actually exposed, change this to

Make sure the RTS linker is compiled with either RTS_LINKER_MMAP_32BIT or RTS_LINKER_CONTIGUOUS_MMAP

Suggestions on how to make this message more concise are welcomed.

simonmar added inline comments.Sep 27 2018, 11:02 AM
rts/linker/Elf.c
1673

Can we make it so that the GOT/PLT is always within 2Gb of the object file? There's no reason to allow the GOT/PLT to be more than 2Gb away from the object, because it just can't work.

If we did that, then this error will never happen, right?

watashi added inline comments.Sep 27 2018, 12:35 PM
rts/linker/Elf.c
1673

Good point!
Yes, if we don't always load to the lower 2G, then we should always put image and symbol extra in contiguous address space.

Does an error message like

This should never happen, please report this bug.

make sense to you?

This error message will be helpful for ghc developers, if the overhead is a concern, we can enable if only for debug build

simonmar added inline comments.Sep 27 2018, 1:58 PM
rts/linker/Elf.c
1673

Sure, making it a barf would be fine, or an ASSERT if the overhead is a concern, but my guess is a barf would be the best option.

watashi planned changes to this revision.Sep 27 2018, 1:59 PM
watashi updated this revision to Diff 18142.Sep 28 2018, 7:32 AM

Improve error message:

  • For non-pic relocation, suggest recompile with -fPIC -fexternal-dynamic-refs
  • For pic relocation, suggest report bug of ghc
watashi marked 17 inline comments as done.Sep 28 2018, 7:33 AM
This revision is now accepted and ready to land.Sep 28 2018, 6:27 PM
This revision was automatically updated to reflect the committed changes.