Add new mbmi and mbmi2 compiler flags
ClosedPublic

Authored by newhoggy on Oct 3 2017, 9:56 AM.

Details

Summary

This adds support for the bit deposit and extraction operations provided by the
BMI and BMI2 instruction set extensions on modern amd64 machines.

Test Plan

Validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
bgamari created this revision.Oct 3 2017, 9:56 AM
bgamari added a comment.EditedOct 3 2017, 9:57 AM

This was authored by John Ky, who should soon commandeer it. See https://github.com/ghc/ghc/pull/76.

bgamari planned changes to this revision.Oct 3 2017, 9:58 AM

Tests needed.

newhoggy commandeered this revision.EditedOct 3 2017, 4:31 PM
newhoggy added a reviewer: bgamari.
newhoggy added a subscriber: newhoggy.

Outstanding issues:

(x) Ensure that -mbmi flag calls the relevant code to generate native instructions
(x) Implement the native instructions for x64

Does Debug.Trace work in the GHC source? It's possible my isBmi2Enabled function is never called and I'm trying to figure out why.

I added tracing to isBmi2Enabled and I don't get any output:

$ git diff compiler/main/DynFlags.hs | cat
diff --git a/compiler/main/DynFlags.hs b/compiler/main/DynFlags.hs
index b580e81c31..7f7f83fece 100644
--- a/compiler/main/DynFlags.hs
+++ b/compiler/main/DynFlags.hs
@@ -235,6 +235,8 @@ import qualified GHC.LanguageExtensions as LangExt

 import Foreign (Ptr) -- needed for 2nd stage

+import Debug.Trace
+
 -- Note [Updating flag description in the User's Guide]
 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 --
@@ -5364,13 +5366,13 @@ data BmiVersion = BMI1
                 deriving (Eq, Ord)

 isBmiEnabled :: DynFlags -> Bool
-isBmiEnabled dflags = case platformArch (targetPlatform dflags) of
+isBmiEnabled dflags = trace "isBmiEnabled called" $ case platformArch (targetPlatform dflags) of
     ArchX86_64 -> bmiVersion dflags >= Just BMI1
     ArchX86    -> bmiVersion dflags >= Just BMI1
     _          -> False

 isBmi2Enabled :: DynFlags -> Bool
-isBmi2Enabled dflags = case platformArch (targetPlatform dflags) of
+isBmi2Enabled dflags = trace "isBmi2Enabled called" $ case platformArch (targetPlatform dflags) of
     ArchX86_64 -> bmiVersion dflags >= Just BMI2
     ArchX86    -> bmiVersion dflags >= Just BMI2
     _          -> False
$ /usr/local/bin/ghci-8.3.20171010 -mbmi2
GHCi, version 8.3.20171010: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /Users/jky/.jky-secure/.ghci
λ> import GHC.Base
(0.00 secs, 0 bytes)
λ> :set -XMagicHash
λ> I# (word2Int# (case 1 of W# n -> pdep# n n))
Called hs_pdep64
1
it :: Int
(0.01 secs, 69,000 bytes)
λ>
Leaving GHCi.

Does Debug.Trace work in the GHC source? It's possible my isBmi2Enabled function is never called and I'm trying to figure out why.

I generally find pprTrace to be more useful, but yes, either should work.

I added tracing to isBmi2Enabled and I don't get any output:

I suspect this is because you are working in ghci, which doesn't use the native code generator (where all of the references to isBmi2Enabled reside). Do you see your trace output if you compile a test program?

You are right. I must use ghc rather than ghci. It seems I must also use at least -0.

Doing above, I can see the trace for sse4.2, but I can't see the trace for bmi2. Something else must be missing.

This is how I patched the code to add tracing for both sse4.2 and bmi2:

$ git diff ./compiler/main/DynFlags.hs | cat
diff --git a/compiler/main/DynFlags.hs b/compiler/main/DynFlags.hs
index b580e81c31..a836e7e7d3 100644
--- a/compiler/main/DynFlags.hs
+++ b/compiler/main/DynFlags.hs
@@ -235,6 +235,8 @@ import qualified GHC.LanguageExtensions as LangExt

 import Foreign (Ptr) -- needed for 2nd stage

+import Debug.Trace
+
 -- Note [Updating flag description in the User's Guide]
 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 --
@@ -5319,13 +5321,13 @@ data SseVersion = SSE1
                 deriving (Eq, Ord)

 isSseEnabled :: DynFlags -> Bool
-isSseEnabled dflags = case platformArch (targetPlatform dflags) of
+isSseEnabled dflags = pprTrace "isSseEnabled called" (ppr ()) $ case platformArch (targetPlatform dflags) of
     ArchX86_64 -> True
     ArchX86    -> sseVersion dflags >= Just SSE1
     _          -> False

 isSse2Enabled :: DynFlags -> Bool
-isSse2Enabled dflags = case platformArch (targetPlatform dflags) of
+isSse2Enabled dflags = pprTrace "isSse2Enabled called" (ppr ()) $ case platformArch (targetPlatform dflags) of
     ArchX86_64 -> -- SSE2 is fixed on for x86_64.  It would be
                   -- possible to make it optional, but we'd need to
                   -- fix at least the foreign call code where the
@@ -5336,7 +5338,7 @@ isSse2Enabled dflags = case platformArch (targetPlatform dflags) of
     _          -> False

 isSse4_2Enabled :: DynFlags -> Bool
-isSse4_2Enabled dflags = sseVersion dflags >= Just SSE42
+isSse4_2Enabled dflags = pprTrace "isSse4_2Enabled called" (ppr ()) $ sseVersion dflags >= Just SSE42

 isAvxEnabled :: DynFlags -> Bool
 isAvxEnabled dflags = avx dflags || avx2 dflags || avx512f dflags
@@ -5364,13 +5366,13 @@ data BmiVersion = BMI1
                 deriving (Eq, Ord)

 isBmiEnabled :: DynFlags -> Bool
-isBmiEnabled dflags = case platformArch (targetPlatform dflags) of
+isBmiEnabled dflags = pprTrace "isBmiEnabled called" (ppr ()) $ case platformArch (targetPlatform dflags) of
     ArchX86_64 -> bmiVersion dflags >= Just BMI1
     ArchX86    -> bmiVersion dflags >= Just BMI1
     _          -> False

 isBmi2Enabled :: DynFlags -> Bool
-isBmi2Enabled dflags = case platformArch (targetPlatform dflags) of
+isBmi2Enabled dflags = pprTrace "isBmi2Enabled called" (ppr ()) $ case platformArch (targetPlatform dflags) of
     ArchX86_64 -> bmiVersion dflags >= Just BMI2
     ArchX86    -> bmiVersion dflags >= Just BMI2
     _          -> False

This is the code I was compiling:

$ cat Main.hs
{-# LANGUAGE MagicHash #-}

module Main where

import Data.Bits
import Data.Monoid
import Data.Word
import GHC.Base

main :: IO ()
main = do
  putStrLn $ "popCount: " <> show (popCount (1 :: Word64))
  putStrLn $ "pdep: " <> show (I# (word2Int# (case 1 of W# n -> pdep# n n)))
  return ()

This is the output when using the compiler correctly:

$ ghc -O -mbmi2 -msse4.2 Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
isSse2Enabled called ()
isSse2Enabled called ()
isSse2Enabled called ()
isSse2Enabled called ()
isSse2Enabled called ()
isSse2Enabled called ()
isSse4_2Enabled called ()
isSse2Enabled called ()
isSse2Enabled called ()
isSse2Enabled called ()
isSse2Enabled called ()
isSse2Enabled called ()
Linking Main ...

I can see trace for sse4.2 but not for bmi2.

I've found the problem.

There is a failed pattern match here:

genCCall dflags is32Bit (PrimTarget (MO_Pdep width)) dest_regs@[dst]
         args@[src] = do

One of the last two arguments is not a single element list. This makes sense because pdep takes two arguments. I will have to look more closely to see what I need to do here.

I tried the following:

PDEP   _ src mask dst -> mkRU (use_R src $ use_R mask) (def_W dst)

As suggested in https://github.com/ghc/ghc/pull/76, but I get the following compiler error:

compiler/nativeGen/X86/Instr.hs:468:48: error:
    • Couldn't match expected type ‘[Reg]’
                  with actual type ‘[Reg] -> [Reg]’
    • Probable cause: ‘use_R’ is applied to too few arguments
      In the second argument of ‘($)’, namely ‘use_R mask’
      In the first argument of ‘mkRU’, namely ‘(use_R src $ use_R mask)’
      In the expression: mkRU (use_R src $ use_R mask) (def_W dst)
    |
468 |     PDEP   _ src mask dst -> mkRU (use_R src $ use_R mask) (def_W dst)
    |                                                ^^^^^^^^^^

The original line was:

PDEP   _ src dst -> mkRU (use_R src []) [dst]

Looks like it works now on my Macbook:

$ ghc -O -mbmi2 -msse4.2 Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
$ ./Main
popCount: 1
pdep: 1

How can I push these changes?

I did some more testing and found that the arguments are actually swapped around.

The function is defined as pdep src mask and I consistently used that ordering everywhere, but upon testing I found it is actually doing pdep mask src. Does assembly apply the arguments in reverse order? I'm not familiar enough to know.

Anyhow, I'm going to swap the order.

Also I still need help with pushing my latest code here with arc. Anyone can help?

I did some more testing and found that the arguments are actually swapped around.

The function is defined as pdep src mask and I consistently used that ordering everywhere, but upon testing I found it is actually doing pdep mask src. Does assembly apply the arguments in reverse order? I'm not familiar enough to know.

Note that AT&T and Intel syntax may differ in this regard. I don't know how they treat the pdep instruction.

Anyhow, I'm going to swap the order.

Also I still need help with pushing my latest code here with arc. Anyone can help?

Sure, you should be able to update the differential with arc --update D4063 $base_commit (where $base_commit is the commit before the first commit of your change). Does this not work for you?

newhoggy updated this revision to Diff 14457.Oct 24 2017, 4:13 PM

Working pdep code for x86:

  • Properly initialise bmiVersion field
  • Implement x86 code generator for pdep and pext

Thanks for the help.

I tried arc diff --update D4063, which worked in that it did something :D.

I see quite a few changes that I didn't expect though. For example validate:184

Right, it looks like you need to manually specify a base commit.

I'm creating a new test case cgrun075. I'm presuming the number I choose is no significant.

Is there a way for me to invoke this specific test case rather than having to run make test at the top level to run all of them?

I'm creating a new test case cgrun075. I'm presuming the number I choose is no significant.

Is there a way for me to invoke this specific test case rather than having to run make test at the top level to run all of them?

make test TEST=cgrun075.

Thanks! That helps a lot.

newhoggy updated this revision to Diff 14534.Nov 2 2017, 5:57 AM

pdep and pext test cases

newhoggy updated this revision to Diff 14535.Nov 2 2017, 6:20 AM
  • pdep and pext test cases

So looks like the CPU native x86 instruction implementation and the C-function emulation works now.

I have added some tests under cgrun075 and cgrun076.

compiler/codeGen/StgCmmPrim.hs
885

I sill need to implement these, but I don't know what these cases are for and how to test them.

So looks like the CPU native x86 instruction implementation and the C-function emulation works now.

Yay!

I have added some tests under cgrun075 and cgrun076.

Great.

compiler/codeGen/StgCmmPrim.hs
885

The generic cases are for all of the code-generation backends that aren't included in the guard above. I believe in this case this only includes the C code generator. Indeed testing this is a bit tricky. Ideally we would periodically test this backend but currently that doesn't happen.

austin resigned from this revision.Nov 6 2017, 3:56 PM
bgamari requested changes to this revision.Nov 9 2017, 3:47 PM

Requesting changes due to the missing generic C-- implementations.

This revision now requires changes to proceed.Nov 9 2017, 3:47 PM

Thanks Ben, I'll try to fill in the missing generic bits. Will appreciate any help on how to test it.

If it turns out that the cases callishPrimOpSupported for pdep and pext aren't necessary then I can remove them.

After that, I haven't been able to identify any other necessary changes for this patch, in which case, this patch would be done.

Can someone confirm?

compiler/codeGen/StgCmmPrim.hs
885

I was just looking at this again and realised that the match of op in callishPrimOpSupported dflags op does not have a case for popCnt either.

For example, these aren't matched:

PopCnt8Op
PopCnt16Op
PopCnt32Op
PopCnt64Op
PopCntOp
Pdep8Op
Pdep16Op
Pdep32Op
Pdep64Op
PdepOp

Could it be that I don't actually need to implement this?

bgamari accepted this revision.Nov 12 2017, 4:27 PM

Thanks Ben, I'll try to fill in the missing generic bits. Will appreciate any help on how to test it.

I think the only way is to try building an unregisterised compiler. Unfortunately I just realized that the unregisterised build is currently broken (Trac #14454).

If it turns out that the cases callishPrimOpSupported for pdep and pext aren't necessary then I can remove them.

After that, I haven't been able to identify any other necessary changes for this patch, in which case, this patch would be done.

Can someone confirm?

I believe you have covered it. Just remove the extraneous cases and you are done. Thanks @newhoggy!

compiler/codeGen/StgCmmPrim.hs
885

Well, I'm not entirely keen on letting the C backend fall too far out of sync with the NCG as it's an important tool for bootstrapping. That being said, I suppose these instructions really aren't necessary and likely won't find using in GHC itself.

In light of this I would be okay with merging without the generic implementations. Feel free to drop these.

Sorry for the change of course!

This revision is now accepted and ready to land.Nov 12 2017, 4:27 PM
newhoggy updated this revision to Diff 14641.Nov 13 2017, 2:11 AM
  • Fix pattern match for pdep and pext instructions

I only just noticed after running my tests, the following output:

$ make test TEST="cgrun075 cgrun076"
...
SUMMARY for test run started at Mon Nov 13 19:12:21 2017 AEDT
 0:00:04 spent to go through
       2 total tests, which gave rise to
      20 test cases, of which
      16 were skipped

       0 had missing libraries
       4 expected passes
       0 expected failures

       0 caused framework failures
       0 caused framework warnings
       0 unexpected passes
       0 unexpected failures
       0 unexpected stat failures

In particular the fact that 16 test cases were skipped. If this is not a problem, the patch is good to go.

compiler/codeGen/StgCmmPrim.hs
599

After removing cases from callishPrimOpSupported, I discovered that the above pattern matches were failing because they only took a single argument [x].

I've modified the pattern matching accept [src, mask] instead.

The tests still pass after this change.

bgamari accepted this revision.Nov 13 2017, 11:17 AM

I only just noticed after running my tests, the following output:

$ make test TEST="cgrun075 cgrun076"
...
SUMMARY for test run started at Mon Nov 13 19:12:21 2017 AEDT
 0:00:04 spent to go through
       2 total tests, which gave rise to
      20 test cases, of which
      16 were skipped

       0 had missing libraries
       4 expected passes
       0 expected failures

       0 caused framework failures
       0 caused framework warnings
       0 unexpected passes
       0 unexpected failures
       0 unexpected stat failures

In particular the fact that 16 test cases were skipped. If this is not a problem, the patch is good to go.

Yes, this should be fine.

Thanks @newhoggy. Great work!

This is exciting. Thank Ben for all your help!

This revision was automatically updated to reflect the committed changes.
angerman added inline comments.
compiler/llvmGen/LlvmCodeGen/CodeGen.hs
738–739

I'm suspicious that this is enough. As far as I can see my LLVM (5.0) has no llvm.pdep. or llvm.pext. intrinsics. Only llvm.x86.bmi.pdep.{32,64} and the same for pext.
This also does not seem to call the hs_pdep and hs_pext on non-x86_64 platforms. (e.g. arm, aarch64, ...)

I stumbled over this, implementing it in my llvm-ng branch (wip/angerman/llvmng).

Given the following .c file:

#include <stdint.h>
#include <stdio.h>
#include <x86intrin.h>
uint64_t hello(uint64_t x) { return _pdep_u64(x, 4); }

int main(int argc, char ** argv) {
  printf("%ld\n", hello(1));
  return 0;
}

clang will emeit the follwing llvm ir:

; ModuleID = '<stdin>'
source_filename = "tmp.c"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.12.0"

@.str = private unnamed_addr constant [5 x i8] c"%ld\0A\00", align 1

; Function Attrs: noinline nounwind optnone ssp uwtable
define i64 @hello(i64) #0 {
  %2 = alloca i64, align 8
  %3 = alloca i64, align 8
  %4 = alloca i64, align 8
  store i64 %0, i64* %4, align 8
  %5 = load i64, i64* %4, align 8
  store i64 %5, i64* %2, align 8
  store i64 4, i64* %3, align 8
  %6 = load i64, i64* %2, align 8
  %7 = load i64, i64* %3, align 8
  %8 = call i64 @llvm.x86.bmi.pdep.64(i64 %6, i64 %7) #3
  ret i64 %8
}

; Function Attrs: noinline nounwind optnone ssp uwtable
define i32 @main(i32, i8**) #0 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca i8**, align 8
  store i32 0, i32* %3, align 4
  store i32 %0, i32* %4, align 4
  store i8** %1, i8*** %5, align 8
  %6 = call i64 @hello(i64 1)
  %7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 %6)
  ret i32 0
}

declare i32 @printf(i8*, ...) #1

; Function Attrs: nounwind readnone
declare i64 @llvm.x86.bmi.pdep.64(i64, i64) #2

attributes #0 = { noinline nounwind optnone ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+bmi2,+cx16,+fxsr,+mmx,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+bmi2,+cx16,+fxsr,+mmx,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind readnone }
attributes #3 = { nounwind }

!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{!"clang version 5.0.0 (tags/RELEASE_500/final)"}

changing the llvm.x86.bmi.pdep.64 to llvm.pdep.i results in

Undefined symbols for architecture x86_64:
  "_llvm.pdep.i64", referenced from:
      _hello in tmp-3c9daa.o
ld: symbol(s) not found for architecture x86_64
clang-5.0: error: linker command failed with exit code 1 (use -v to see invocation)

Using the llvm.x86.bmi.pdep. family of functions on aarch64 results in:

LLVM ERROR: Cannot select: intrinsic %llvm.x86.bmi.pdep.32

In addition to @angerman's concerns, I'm afraid there are a few more issues which only came to light now. I'm afraid I need to revert this for now since it breaks the 32-bit build.

Do you suppose you could have a look, @newhoggy?

libraries/ghc-prim/cbits/pdep.c
6

I should have noticed this earlier but unfortunately this is broken on 32-bit architectures. I believe both the result and the mask should be StgWord64.

65

Why is this repeated?

libraries/ghc-prim/cbits/pext.c
6

Same here; I believe all of these should be StgWord64. However, then this operation will presumably be slower on 32-bits (although perhaps we don't mind).

61

Why the repetition?

Thanks Ben,

I'll take a look tonight.

Cheers!

Just for reference, I believe this: https://github.com/ghc/ghc/commit/39f7fc86bb0a4cbf0476f98819d597c0a00d1210 will be needed for the LLVM backend as well.
The llvm code should probably look similar to: https://github.com/ghc/ghc/commit/8427df332d2db338f0fc0c1a1976696227a280f6.

Note: all those commits are from my wip/angerman/llvmng branch, which adds a completely new llvm backend to GHC.

I have the following problem when attempting to build on 32-bit Ubuntu Linux:

configure: error: in `/home/jky/ghc/libffi/build/i386-unknown-linux-gnu':
configure: error: C++ preprocessor "/lib/cpp" fails sanity check
See `config.log' for more details
libffi/ghc.mk:47: recipe for target 'libffi/stamp.ffi.static-shared.configure' failed
make[1]: *** [libffi/stamp.ffi.static-shared.configure] Error 1
Makefile:122: recipe for target 'all' failed
make: *** [all] Error 2
$ uname -a
Linux ubuntu 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:33:49 UTC 2017 i686 i686 i686 GNU/Linux
$ cat /etc/issue
Ubuntu 17.10 \n \l

I set up another 32-bit environment, which seems to work better and have reproduced the build problem for 32-bit systems:

libraries/ghc-prim/cbits/pdep.c:6:1: error:
     error: conflicting types for 'hs_pdep64'
     hs_pdep64(StgWord src, StgWord mask)
     ^
  |
6 | hs_pdep64(StgWord src, StgWord mask)
  | ^

libraries/ghc-prim/cbits/pdep.c:4:16: error:
     note: previous declaration of 'hs_pdep64' was here
     extern StgWord hs_pdep64(StgWord64 src, StgWord mask);
                    ^
  |
4 | extern StgWord hs_pdep64(StgWord64 src, StgWord mask);
  |                ^
libraries/ghc-prim/cbits/pdep.c: In function 'hs_pdep64':

libraries/ghc-prim/cbits/pdep.c:18:51: error:
     warning: left shift count >= width of type [-Wshift-count-overflow]
         const uint64_t lsb = (uint64_t)((int64_t)(src << 63) >> 63);
                                                       ^
   |
18 |     const uint64_t lsb = (uint64_t)((int64_t)(src << 63) >> 63);
   |                                                   ^
`gcc' failed in phase `C Compiler'. (Exit code: 1)
libraries/ghc-prim/ghc.mk:4: recipe for target 'libraries/ghc-prim/dist-install/build/cbits/pdep.o' failed
make[1]: *** [libraries/ghc-prim/dist-install/build/cbits/pdep.o] Error 1
Makefile:122: recipe for target 'all' failed
make: *** [all] Error 2

Looks like I'm unable to push to this revision anymore because it's closed?

$ arc diff --update D4063 a36eea1af4faabdf8fcf0a68dbd4f9946bf6d65a
...
To ssh://phabricator-origin.haskell.org:2222/diffusion/GHCDIFF/GHC-Differentials.git
 * [new tag]               a36eea1af4faabdf8fcf0a68dbd4f9946bf6d65a -> phabricator/base/14813
 * [new tag]               bd88568e8ca65f58923b5612ec06133e98637651 -> phabricator/diff/14813
 Exception
ERR_CLOSED: This revision has already been closed.
(Run with `--trace` for a full exception trace.)

I have the following problem when attempting to build on 32-bit Ubuntu Linux:

configure: error: in `/home/jky/ghc/libffi/build/i386-unknown-linux-gnu':
configure: error: C++ preprocessor "/lib/cpp" fails sanity check
See `config.log' for more details
libffi/ghc.mk:47: recipe for target 'libffi/stamp.ffi.static-shared.configure' failed
make[1]: *** [libffi/stamp.ffi.static-shared.configure] Error 1
Makefile:122: recipe for target 'all' failed
make: *** [all] Error 2
$ uname -a
Linux ubuntu 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:33:49 UTC 2017 i686 i686 i686 GNU/Linux
$ cat /etc/issue
Ubuntu 17.10 \n \l

Just create a new one with —create.

Cool. Thanks! I've created a new revision: https://phabricator.haskell.org/D4236

I also seem to have broken something. I can't run make anymore:

$ make
<command line>: does not exist: libraries/text/cbits/cbits.c
make[1]: *** [utils/ghc-cabal/dist/build/tmp/ghc-cabal] Error 1
make: *** [all] Error 2

Not sure if it happened after $ git checkout wip/angerman/llvmng then git checkout arcpatch-D4236 or if I broken it earlier whilst installing llvm.

I also seem to have broken something. I can't run make anymore:

$ make
<command line>: does not exist: libraries/text/cbits/cbits.c
make[1]: *** [utils/ghc-cabal/dist/build/tmp/ghc-cabal] Error 1
make: *** [all] Error 2

Not sure if it happened after $ git checkout wip/angerman/llvmng then git checkout arcpatch-D4236 or if I broken it earlier whilst installing llvm.

Don't try the llmvng branch that's highly in flux, requires a bunch of additional packages and is barely buildable with a custom hadrian branch right now.
ghc HEAD should be fine for this patch.

The easiest will be:

./boot
./configure ...
# edit mk/build.mk and set `quick-cross` as flavour.

Cloning from scratch seems to help. I'm letting this compile overnight.