We are dropping the last bit of *all* symbols for arm. This leads to invalid offsets for non-function symbols.
rwbarton bgamari austin erikd simonmar trofi
- rGHCDIFFe662a6cb9fb6: [Elf/arm] Thumb indicator bit only for STT_FUNC
rGHCDIFFdf58be5bd0d9: [Elf/arm] Thumb indicator bit only for STT_FUNC
rGHCe662a6cb9fb6: [Elf/arm] Thumb indicator bit only for STT_FUNC
rGHCdf58be5bd0d9: [Elf/arm] Thumb indicator bit only for STT_FUNC
Two questions here:
/* Note [The ARM/Thumb Story] ~~~~~~~~~~~~~~~~~~~~~~~~~~ Support for the ARM architecture is complicated by the fact that ARM has not one but several instruction encodings. The two relevant ones here are the original ARM encoding and Thumb, a more dense variant of ARM supporting only a subset of the instruction set. How the CPU decodes a particular instruction is determined by a mode bit. This mode bit is set on jump instructions, the value being determined by the low bit of the target address: An odd address means the target is a procedure encoded in the Thumb encoding whereas an even address means it's a traditional ARM procedure (the actual address jumped to is even regardless of the encoding bit). Interoperation between Thumb- and ARM-encoded object code (known as "interworking") is tricky. If the linker needs to link a call by an ARM object into Thumb code (or vice-versa) it will produce a jump island using makeArmSymbolExtra. This, however, is incompatible with GHC's tables-next-to-code since pointers fixed-up in this way will point to a bit of generated code, not a info table/Haskell closure like TNTC expects. For this reason, it is critical that GHC emit exclusively ARM or Thumb objects for all Haskell code. We still do, however, need to worry about calls to foreign code, hence the need for makeArmSymbolExtra. */
The case where this came up was the following, when using the ghc interpreter.
Go obtain the name of some object, we use a relative offset to the function, which
in this case happened to have the first bit set. The relocation then wrote back a for
the symbol that had the first bit cleared (-295856 instead of -295855), and thus
we ended up pointing to a NUL byte, which for a string is perfectly legal, yet wrong.
(lldb) mem read `(char*)(con_info+1)-295856` 0xa0aa9710: 00 67 68 63 2d 70 72 69 6d 3a 47 48 43 2e 54 79 .ghc-prim:GHC.Ty 0xa0aa9720: 70 65 73 2e 49 23 00 67 68 63 2d 70 72 69 6d 3a pes.I#.ghc-prim: