llvmGen: Compatibility with LLVM 3.5
ClosedPublic

Authored by bgamari on Aug 13 2014, 11:42 PM.

Details

Summary

Due to changes in LLVM 3.5 aliases now may only refer to definitions.
Previously to handle symbols defined outside of the current commpilation
unit GHC would emit both an external declaration, as well as an alias
pointing to it, e.g.,

@stg_BCO_info = external global i8
@stg_BCO_info$alias = alias private i8* @stg_BCO_info

Where references to stg_BCO_info will use the alias
stg_BCO_info$alias. This is not permitted under the new alias
behavior, resulting in errors resembling,

Alias must point to a definition
i8* @"stg_BCO_info$alias"

To fix this, we invert the naming relationship between aliases and
definitions. That is, now the symbol definition takes the name
@stg_BCO_info$def and references use the actual name, @stg_BCO_info.
This means the external symbols can be handled by simply emitting an
external declaration,

@stg_BCO_info = external global i8

Whereas in the case of a forward declaration we emit,

@stg_BCO_info = alias private i8* @stg_BCO_info$def
Test Plan

Validate with LLVM 3.4 and 3.5

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
bgamari updated this revision to Diff 354.Aug 13 2014, 11:42 PM
bgamari retitled this revision from to llvmGen: Compatibility with LLVM 3.5.
bgamari updated this object.
bgamari edited the test plan for this revision. (Show Details)
bgamari added reviewers: austin, scpmw.
bgamari updated the Trac tickets for this revision.

While this set is still preliminary (e.g. still waiting on builds), it would be good to get some eyes on it to make sure I haven't overlooked anything major.

scpmw edited edge metadata.Aug 14 2014, 5:04 AM

Hm, quick thoughts:

  • Do the new aliases survive optimisations? Up to now we could be sure that LLVM would short-cut aliases completely, not sure whether this still holds. This means that we could get more complicated assembly, and in the worst case less optimised code.
  • What do the symbol tables look like? We would not want the "$def" symbols to appear there, so they should probably receive "Internal" linkage. I believe you mentioned this earlier in the discussion, but I'm not quite sure where this is implemented.
  • Finally note that your solution aliases every single function, therefore increasing output size. No major problem, just something to keep in mind.

Looks good otherwise, I think this has a decent chance of working out. I'll test it once I get around to it.

bgamari added a comment.EditedAug 16 2014, 7:09 PM

Testing

Tested on x86_64.

*Note*: For some reason -fno-warn-unsupported-llvm-version doesn't
appear to be effective for some testsuite tests. I've not had a chance
to look into this but instead just bumped MaxSupportedLlvmVersion.

LLVM 3.4.2, dynllvm

$ make test THREADS=4 WAY=dynllvm
Unexpected results from:
TEST="T367_letnoescape T7702 derefnull"

OVERALL SUMMARY for test run started at Sat Aug 16 20:00:03 2014 EDT
 0:05:47 spent to go through
    4087 total tests, which gave rise to
   18567 test cases, of which
   16888 were skipped

      28 had missing libraries
    1631 expected passes
      17 expected failures

       0 caused framework failures
       0 unexpected passes
       3 unexpected failures

Unexpected failures:
   concurrent/should_run     T367_letnoescape [bad exit code] (dynllvm)
   rts                       derefnull [bad exit code] (dynllvm)
   simplCore/should_compile  T7702 [stat not good enough] (dynllvm)

LLVM 3.4.2, llvm

$ make test THREADS=4 WAY=llvm
Unexpected results from:
TEST="T7571"

OVERALL SUMMARY for test run started at Sat Aug 16 17:57:41 2014 EDT
 0:04:40 spent to go through
    4087 total tests, which gave rise to
   18567 test cases, of which
   16882 were skipped

      31 had missing libraries
    1634 expected passes
      19 expected failures

       0 caused framework failures
       0 unexpected passes
       1 unexpected failures

Unexpected failures:
   llvm/should_compile  T7571 [stderr mismatch] (llvm)

LLVM 3.5.0, dynllvm

$ make test THREADS=5 WAY=dynllvm EXTRA_HC_OPTS="-pgmlo /home/ben/trees/root-llvm-3.5/bin/opt -pgmlc /home/ben/trees/root-llvm-3.5/bin/llc"
Unexpected results from:
TEST="T367_letnoescape T7702 derefnull"

OVERALL SUMMARY for test run started at Sun Aug 17 10:49:33 2014 EDT
 0:05:53 spent to go through
    4087 total tests, which gave rise to
   18567 test cases, of which
   16888 were skipped

      28 had missing libraries
    1631 expected passes
      17 expected failures

       0 caused framework failures
       0 unexpected passes
       3 unexpected failures

Unexpected failures:
   concurrent/should_run     T367_letnoescape [bad exit code] (dynllvm)
   rts                       derefnull [bad exit code] (dynllvm)
   simplCore/should_compile  T7702 [stat not good enough] (dynllvm)

LLVM 3.5.0, llvm

$ make test THREADS=5 WAY=llvm EXTRA_HC_OPTS="-pgmlo /home/ben/trees/root-llvm-3.5/bin/opt -pgmlc /home/ben/trees/root-llvm-3.5/bin/llc"
Unexpected results from:
TEST="T7571"

OVERALL SUMMARY for test run started at Sun Aug 17 10:44:24 2014 EDT
 0:04:50 spent to go through
    4087 total tests, which gave rise to
   18567 test cases, of which
   16882 were skipped

      31 had missing libraries
    1634 expected passes
      19 expected failures

       0 caused framework failures
       0 unexpected passes
       1 unexpected failures

Unexpected failures:
   llvm/should_compile  T7571 [stderr mismatch] (llvm)
scpmw added a comment.Oct 2 2014, 1:24 PM

Just as a quick bump - even though I'm mainly preoccupied with D169, I have not forgotten about this. The new aliases definitely survive optimisations, which makes it slightly dangerous. As it happens, on my computer some regular expression in the Mac Os splitter actually chokes on the new $def labels that now appear in the assembly. I'll have to investigate that a bit more.

David, please take a look at this if you get a chance - it'd be nice to have for 7.10.

scpmw added a comment.EditedOct 25 2014, 8:50 AM

Had another look at this - good news is that it seems to be compatible with LLVM versions down to 2.8 (which is as far as we care).

My main complaint would be all the extra $def labels we get, especially because they appear as global symbols in the symbol table. And apparantly getting rid of them isn't as easy as declaring the definitions Internal - if we do that, LLVM will actually short-cut some of the aliases (which is good), but "forget" the section we want the definition in (which is very, very bad, as it will mess up tables-next-to-code at minimum). Best fix I could find was to declare the definitions "used" so they never get eliminated. The end result is that we still get all $def symbols, but at least now they are local like they should be.

The part that is still puzzling me is this code in the splitter (driver/split/ghc-split.lprl):

$str =~ s/L_.*\$.*:\n(.|\n)*//m;

This will actually throw all code away after local symbols that contain a $ character. This applies to the above-mentioned definitions, and effectively break compilation on Mac Os when using -split-objs (which, admittedly, is a bit silly). I have no idea whatsoever what this regular expression is trying to achieve - it goes back to a 2002 commit by Wolfgang Thaller. Maybe we should just disable the object splitter for the LLVM back-end, given that we are looking for a more principled solution anyway?

ezyang removed a subscriber: ezyang.Oct 27 2014, 2:05 PM

Huh, that might explain some hard to reproduce bugs i've hit when building split objects code on OS X.

I assume that should get fixed?

In D155#8775, @scpmw wrote:

The part that is still puzzling me is this code in the splitter (driver/split/ghc-split.lprl):

$str =~ s/L_.*\$.*:\n(.|\n)*//m;

This will actually throw all code away after local symbols that contain a $ character. This applies to the above-mentioned definitions, and effectively break compilation on Mac Os when using -split-objs (which, admittedly, is a bit silly). I have no idea whatsoever what this regular expression is trying to achieve - it goes back to a 2002 commit by Wolfgang Thaller. Maybe we should just disable the object splitter for the LLVM back-end, given that we are looking for a more principled solution anyway?

With HEAD as of 8:52 UTC on 31/10/2014, with llvm 3.5, and with this patch applied, nofib with -fllvm freezes on fannkuch-redux.

Thanks for the review, Peter. The symbols are indeed a bit more visible than they should be.

It's quite unfortunate that TNTC is preventing us from simply marking these as Internal; I'm hoping that sometime in the near future we will be able to drop the section mangling approach to TNTC. Unfortunately, this requires changes on the LLVM side that I've been gently pushing for but the conversation keeps stalling. I'll try to prod things back into motion.

If 7.10 weren't so imminent I'd say we might be better off simply expediting Austin's package-LLVM-with-GHC proposal and moving to LLVM 3.6. Unfortunately it's pretty much certain that 3.6 will be released in time for this to be viable.

For the time being I'll examine marking the symbols as Internal and used.

scpmw added a comment.EditedNov 2 2014, 11:21 AM

Hm, interesting - I can reproduce the problem @spacekitteh mentions with and without the patch, as well as with GHC 7.8.3 and 7.6.1. It is gone for GHC 7.4.2, as well as for a random 7.7 version I had lying around (mid 2013). The latter has LLVM streaming (my first suspicion), so the bug must be something different here.

@spacekitteh: This is likely unrelated to this patch, and needs further investigation. Can you open a ticket for this?

bgamari updated this revision to Diff 1377.Nov 10 2014, 9:46 AM
bgamari edited edge metadata.
  • llvmGen: Bump maximum supported LLVM version
  • LlvmGen: Mark $def symbols with Internal linkage

The updates I pushed a few days ago should fix up the concerns brought up by Peter. I mark the $def symbols as having Internal linkage and place them on the used list to ensure that they aren't dropped. This validates for me with 3.5 (and apparently Phabricator). Unfortunately T5681 and T5486 both fail with LLVM 3.4 with errors of the form,

/tmp/ghc15683_0/ghc15683_5.s:125:0:
     Error: can't resolve `.rodata' {.rodata section} - `ZCMain_main_info$def' {.text section}

@scpmw is this what you saw?

austin accepted this revision.Nov 20 2014, 11:24 PM
austin edited edge metadata.

Perhaps we should blacklist 3.4 with GHC 7.10 and say it's unsupported? The patch looks OK to me. Does this work on e.g. ARM or anything?

This revision is now accepted and ready to land.Nov 20 2014, 11:24 PM

@austin, this has been tested on ARM and x86_64 (Debian Jessie in both cases) at some point. I actually don't believe I've tested this particular version of the patch on ARM yet; I'll start a build now.

@scpmw, does this now address your concerns?

This revision was automatically updated to reflect the committed changes.

Uh, sorry, wanted to get a closer look at that build error before I say something. I now had a chance to check the LLVM 3.4 behaviour - apparently what is happening here is (indeed) exactly what I had before. Except that for LLVM 3.4 setting the definition as "used" seemingly doesn't suffice to protect it.

Hm. To be honest it is starting to scare me a bit how erratic LLVM is behaving here...

@spacekitteh have you opened an issue for the nofib hang you reported above?

one support issue that will come up is that the current Xcode release of Clang doesn't support the full range of assembly that LLVM 3.5 emits, (though a built of the official llvm 3.5 CLANG works fine)

It seems that by marking the $def symbols as internal we uncover new optimization opportunities to LLVM (yay!), which then compiles some functions down to tail recursions. This would be fine, but it appears that LLVM's implementation of the GHC calling convention doesn't use the LLVM's emitEpilogue function. Among other things, this handles lowering of LLVM's TCReturn pseudo-instruction introduced during construction of the tail-call down to machine code. This means that AsmWriter gets passed a non-ARM instruction which causes things to fall over with an error like,

llc: /mnt/ext/ghc/llvm/lib/Target/ARM/InstPrinter/../ARMGenAsmWriter.inc:6048: void llvm::ARMInstPrinter::printInstruction(const llvm::MCInst *, llvm::raw_ostream &): Assertion `Bits != 0 && "Cannot print this instruction."' failed.

I have a patch fixing this issue unfortunately it likely won't make it in until 3.6. Looks like we'll have yet another release with broken ARM support.

we can at least make sure / keep an eye on making 3.6 work with 7.10 right?

@carter, at this point that is my goal. D530 moves the LLVM backend to LLVM 3.6 exclusively. This was just my first hacked-together attempt, but the fact that it went together so quickly makes me optimistic that this might not be so hard. Unfortunately it requires http://reviews.llvm.org/D6454 which may take some time to get merged.

@scpmw, just to make sure we are on the same page here: I wrote a bit of a wall of text last week about the current state of the LLVM backend; in said wall I discussed the LLVM 3.4 issue a bit. To summarize, I'm a bit concerned that we'll freeze out a good portion of users if we simply blacklist 3.4. What do you think about simply keeping the $def symbols as External.

scpmw added a comment.EditedDec 3 2014, 4:09 AM

You are referring to your blog post, I presume? Not too keen on leaving the symbols external - are we sure this won't cause problems when names collide? Maybe just do it for LLVM 3.4, if we really must?

At this point I'm actually in favour of just removing streaming for the moment... If it locks us out of entire LLVM versions, it's probably not worth it. Plus with object file splitting we might have to revisit this anyway: If we end up splitting the LLVM file (which is probably necessary?) most of the problematic symbol references would become external anyway, removing the typing restrictions. Maybe we could even abuse llvm-link here somehow.

Anyway, right now I'm a bit tied up with D169, but we have enough options left to test. The real question is what appears most stable for 7.10 - special case for LLVM 3.4 or last-ditch effort to restore pre-streaming behaviour? I have a half-working patch, it would surely need some work/testing. So in the end, the former has more risks on LLVM's side, the latter on GHC's. Hm.

erikd added a subscriber: erikd.Mar 29 2015, 6:25 PM
This comment was removed by erikd.