Replace opt and llc with clang
Needs ReviewPublic

Authored by angerman on Mar 15 2017, 9:01 PM.

Details

Summary

The LLVM project suggests using clang as the unified interface instead of
relying on opt and llc, unless absolutely necessary.

This diff drops the dependency on opt and llc, (and hence will not need
the full llvm package, but be satisfied with clang only).

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
feature/use-clang-D3352
Lint
Lint WarningsExcuse: readabiluty
SeverityLocationCodeMessage
Warningcompiler/llvmGen/LlvmCodeGen.hs:52TXT3Line Too Long
Warningcompiler/llvmGen/LlvmCodeGen.hs:61TXT3Line Too Long
Warningcompiler/llvmGen/LlvmCodeGen/Base.hs:197TXT3Line Too Long
Warningcompiler/main/DriverPhases.hs:139TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1453TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1473TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1474TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1475TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1476TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1477TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1478TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1479TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1481TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1482TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1483TXT3Line Too Long
Warningcompiler/main/DriverPipeline.hs:1505TXT3Line Too Long
Warningcompiler/main/SysTools.hs:67TXT3Line Too Long
Warningcompiler/main/SysTools.hs:594TXT3Line Too Long
Warningcompiler/main/SysTools.hs:598TXT3Line Too Long
Warningcompiler/main/SysTools.hs:628TXT3Line Too Long
Unit
No Unit Test Coverage
Build Status
Buildable 15478
Build 26170: [GHC] Linux/amd64: Patch building
Build 26169: [GHC] OSX/amd64: Continuous Integration
Build 26168: [GHC] Windows/amd64: Continuous Integration
Build 26167: arc lint + arc unit
angerman created this revision.Mar 15 2017, 9:01 PM

This is my first stab at simplifying the llvm pipeline, any suggestions are welcome. I'm uploading this rough diff to solicit some early feedback before sinking more time into this.

I've been using this setup for aarch64-iOS/x86_64-macOS/aarch64-android/armv7-android, with no visible issues so far.
(Not that for iOS, a patched clang is required, that corrects dead_stripping -- see https://reviews.llvm.org/D30770)

erikd added a comment.Mar 16 2017, 2:37 AM

Currently, I can build GHC on say Arm/Linux with ghc, gcc, opt and llc. That is specifically, *without* clang.

Is this patch forcing me to install clang?

In D3352#96290, @erikd wrote:

Currently, I can build GHC on say Arm/Linux with ghc, gcc, opt and llc. That is specifically, *without* clang.

Is this patch forcing me to install clang?

Yes. This will require a clang binary. But it will specifically *not* require opt and llc ;-)
What are the reasons to prefer opt and llc over clang for you?

erikd added a comment.EditedMar 16 2017, 3:23 AM

What are the reasons to prefer opt and llc over clang for you?

At the moment I know that opt and llc work and I don't know enough about the implications and mechanics of this change.

This patch raises so many questions. Some of those questions:

  • On arm/linux and aarch64/linux does GHC still produce LLVM text IR and run that through Clang?
  • Are we going to be locked to a single version of Clang like we are currently locked to a single version of LLVM?
  • Previously changes in the LLVM version often required changes in the LLVM IR codegen. How do we manage than if we swicth to Clang?
  • The opt and llc tools are built by default to be multi-arch which can make bootstrapping new platforms much easier. Is clang build to be multi-arch?
  • If GHC ever learns to generate binary LLVM IR, can Clang handle that?
In D3352#96302, @erikd wrote:

What are the reasons to prefer opt and llc over clang for you?

At the moment I know that opt and llc work and I don't know enough about the implications and mechanics of this change.

This patch raises so many questions. Some of those questions:

  • On arm/linux and aarch64/linux does GHC still produce LLVM text IR and run that through Clang?

In principle yes; in this draft state however, not yet, as these lines are currently missing (in runPhase (RealPhase LlvmAs) input_fn dflags)

(ArchARM ARMv7 _ _, OSLinux) -> return "armv6-unknown-linux-gnueabihf"
(ArchARM64,         OSLinux) -> return "aarch64-none-linux-gnu"
  • Are we going to be locked to a single version of Clang like we are currently locked to a single version of LLVM?

clang should come hand in hand with the rest of the llvm tools. So again in principle it will be the same as before.
You get clang as part of the llvm suite (http://releases.llvm.org/download.html) it is often broken into clang and tools, as you do not
necessarily need the tools, unless you want to do something very specific.
Think of clang as the unified interface to llvm, and opt, llc, ... as the development tools for the llvm suite.

  • Previously changes in the LLVM version often required changes in the LLVM IR codegen. How do we manage than if we swicth to Clang?

For 8.3 however we can currently use clang from llvm3.9 and llvm4.0 (if we do not -dead_strip on mach-o platforms -- that requires a patched clang).
There is now a supported versions value. However apple being apple, needs to brand their clang with their own version... m(

  • The opt and llc tools are built by default to be multi-arch which can make bootstrapping new platforms much easier. Is clang build to be multi-arch?

I don't know if clang comes with multi-arch support by default in every distribution, all the ones I had my hands on though did come with multi-arch support.
And compiling clang from source defaults to multi-arch as well. (See the ghc-clang.sh)

  • If GHC ever learns to generate binary LLVM IR, can Clang handle that?

Yes. clang is the unified interface to llvm, and can compile, link, etc. .c, .ll, .bc, .s and other files.

In general this patch is trying to address Trac #10074.
The switch to clang as the interface originates here: http://lists.llvm.org/pipermail/llvm-dev/2017-March/110962.html


In general, almost everything is going to almost stay as it was, we are just swapping out opt and llc for clang.

We used to have:

ghc -> textual ir -> opt -> llc -> mangle -> as -> link

and now we have

ghc -> textual ir -> clang -> link

E.g. we are assembling straight from the ir now, we could even ask clang to emit bitcode, and start plugging a lto bitcode linker at the end, or swap the textual ir generator for a bitcode ir generator.

erikd requested changes to this revision.Mar 16 2017, 4:26 AM

You get clang as part of the llvm suite (http://releases.llvm.org/download.html) it is often broken into clang and tools, as you do not necessarily need the tools, unless you want to do something very specific.

Up until now, I have just installed what LLVM (and also Clang) version is needed from the Debian repositories. Debian provides packages for multiple versions of LLVM and the can install in parallel.

clang should come hand in hand with the rest of the llvm tools. So again in principle it will be the same as before.

So where are the configure script changes to validate the Clang version? I repeat, building a local clang is not IMO a reasonable approach.

The switch to clang as the interface originates here: http://lists.llvm.org/pipermail/llvm-dev/2017-March/110962.html

The suggest clang than then say "or better link to the llvm libraries and generate the code from there". So according to the LLVM devs, we are going from one sub-optimal to another sub-optimal solution.

I don't know if clang comes with multi-arch support

Well I suggest we figure that out, at least for OS X and Debian/Ubuntu which is probably at least 50% of GHC users.

For 8.3 however we can currently use clang from llvm3.9 and llvm4.0

What about people on OS X? The Clang Apple provides with XCode is the same as the upstream Clang and their version numbers see mostly un-related.

I've marked this as "request changes" but its more like "what you are proposing is a huge change and I want to make sure we're not burning any bridges here".

ghc-clang.sh
6

What? Why? Where is this script being used?

I can guarantee that while my Arm board has enough sorage to build GHC, there is no way on the world it has enough to build LLVM.

I would *much* prefer to install Clang from a Debian package like I currently do for the LLVM tools. If Clang needs to be patched for this work, then we should wait for that patch to hit a released version of Clang.

This revision now requires changes to proceed.Mar 16 2017, 4:26 AM
In D3352#96309, @erikd wrote:

You get clang as part of the llvm suite (http://releases.llvm.org/download.html) it is often broken into clang and tools, as you do not necessarily need the tools, unless you want to do something very specific.

Up until now, I have just installed what LLVM (and also Clang) version is needed from the Debian repositories. Debian provides packages for multiple versions of LLVM and the can install in parallel.

clang should come hand in hand with the rest of the llvm tools. So again in principle it will be the same as before.

So where are the configure script changes to validate the Clang version? I repeat, building a local clang is not IMO a reasonable approach.

There are none. The only validation against multiple version is done when ghc checks the supportedLlvmVersions.

The switch to clang as the interface originates here: http://lists.llvm.org/pipermail/llvm-dev/2017-March/110962.html

The suggest clang than then say "or better link to the llvm libraries and generate the code from there". So according to the LLVM devs, we are going from one sub-optimal to another sub-optimal solution.

As far as I know, linking with llvm is not an acceptable option, has this changed? But, yes we are only getting a bit better here, by using clang and hence use a better tested and more stable interface.
The benefit is, again, a unified interface, and fewer intermediate steps/files.

I don't know if clang comes with multi-arch support

Well I suggest we figure that out, at least for OS X and Debian/Ubuntu which is probably at least 50% of GHC users.

I'll try to see how to figure this out properly. Sadly clang doesn't provide a convenient way to list the supported architectures, similar to what llc --version gives.

For 8.3 however we can currently use clang from llvm3.9 and llvm4.0

What about people on OS X? The Clang Apple provides with XCode is the same as the upstream Clang and their version numbers see mostly un-related.

Yes this is quite annoying. Which is why I've added a vendor flag to the supportedLlvmVersions, such that one can test against that.

I've marked this as "request changes" but its more like "what you are proposing is a huge change and I want to make sure we're not burning any bridges here".

I know, which is why I'm looking for early feedback instead of adding many more changes just to get this shot down.

ghc-clang.sh
6

It is not used. And I would hope we do not have to use it at all.

However, if you want to use -dead_strip on macOS/iOS, you will need a patched clang. https://reviews.llvm.org/D30770 was landed, but did
not make the cut for llvm4, so the first llvm version supporting prefix data, and -dead_strip will be llvm5.

if we end up needing to actually ship a llvm version with ghc, we'd provide binaries for the relevant platforms, in which case this script can be
quite helpful; it also doesn't build all of llvm, just clang (and what ever clang requires).

Trac #10074 was about actually shipping llvm tooling with ghc. My hope is we don't have to. However if we end up needing to do so, shipping
just clang, seems like reduced complexity.

So in essence: I hope we will never have to use this script to build clang binaries to ship alongside with ghc. For now if you want '-dead_strip', you will
need a patched clang for mach-o. If we end up shipping clang and you don't trust our binaries, you can use this script to build the exactly same
clang on your own.

In D3352#96309, @erikd wrote:

You get clang as part of the llvm suite (http://releases.llvm.org/download.html) it is often broken into clang and tools, as you do not necessarily need the tools, unless you want to do something very specific.

Up until now, I have just installed what LLVM (and also Clang) version is needed from the Debian repositories. Debian provides packages for multiple versions of LLVM and the can install in parallel.

I don't believe this patch changes this. You can use GHC against Debian's clang, just as you can use it against Debian's llc and opt.

However, @angerman does have some orthogonal changes in the pipeline which will require a patched LLVM (at least until the next release). We will need to figure out how we want to manage this.

The switch to clang as the interface originates here: http://lists.llvm.org/pipermail/llvm-dev/2017-March/110962.html

The suggest clang than then say "or better link to the llvm libraries and generate the code from there". So according to the LLVM devs, we are going from one sub-optimal to another sub-optimal solution.

I personally don't think we want to go down the road of linking against LLVM. The binary distributions offered by the LLVM project itself don't offer shared libraries, so linking against LLVM would require that either,

  1. we link statically and ship LLVM with every GHC distribution, blowing up bindist sizes significantly with a feature that only a fraction of users will ever use. Moreover, distributions like Debian will hate us for this.
  2. build and ship LLVM shared object distributions ourselves, allowing us to ship an LLVM distribution separately from GHC. However, this would require that we have in place the requisite linking glue to allow GHC to load LLVM only when necessary.

Frankly I am not excited about either of these options; both come with a non-negligible amount of complexity and, as far as I can tell, neither offer any new functionality.

It seems to me that the sweet-spot is @angerman's current approach of emitting bitcode. This gives us a stable interface to LLVM while still allowing us to piggy-back on LLVM's existing binary distributions.

I don't know if clang comes with multi-arch support

Well I suggest we figure that out, at least for OS X and Debian/Ubuntu which is probably at least 50% of GHC users.

If I understand your question correctly, it does. Unlike gcc, LLVM is natively a cross-compiler. Any Clang/LLVM build can build for any supported target.

In D3352#96309, @erikd wrote:

However, @angerman does have some orthogonal changes in the pipeline which will require a patched LLVM (at least until the next release). We will need to figure out how we want to manage this.

Just to add what this change is. We currently use the Mangler to basically prevent any ghc generated code from being -dead_striped (by the mach-o linker) by removing the .subsections_via_symbols directive
from the assembly, which llvm ingeniously unconditionally injects into any assembly it produces for mach-o. The basic issue here is that the linker does not understand the prefix data (our info tables) belongs
to the function, and just strippes them away. After some digging we found the .alt_entry directive for mach-o. An upstream patch in llvm (which has been landed by now, and will therefore be available in llvm5
i believe) emits a temporary symbol at the start of the info table, and marks the function entry as an .alt_entry. This is sufficient for the linker to understand that the prefix data and the function belong together
and must not be separated nor stripped. This should allow proper dead_stripping with ghc generated code on mach-o!

If I understand your question correctly, it does. Unlike gcc, LLVM is natively a cross-compiler. Any Clang/LLVM build can build for any supported target.

The only annoyance is that we can't properly query clang if it was specifically built without support for some targets. If only the proposed solutions from these slides would have made it into clang yet:
http://llvm.org/devmtg/2014-04/PDFs/LightningTalks/2014-3-31_ClangTargetSupport_LighteningTalk.pdf

angerman updated this revision to Diff 12030.Fri, Apr 7, 4:59 AM
  • disable thumb code generation.
angerman updated this revision to Diff 12071.Mon, Apr 10, 10:04 PM
  • [clang-as] disable thumb code generation.
  • rebase onto master
angerman updated this revision to Diff 12277.Tue, Apr 25, 9:15 PM
  • [clang-as] disable thumb code generation.
  • rebase