Add new mbmi and mbmi2 compiler flags
ClosedPublic

Authored by newhoggy on Nov 25 2017, 8:20 AM.

Details

Summary

This adds support for the bit deposit and extraction operations provided by the
BMI and BMI2 instruction set extensions on modern amd64 machines.

Implement x86 code generator for pdep and pext. Properly initialise bmiVersion field.

pdep and pext test cases

Fix pattern match for pdep and pext instructions

Fix build of pdep and pext code for 32-bit architectures

Test Plan

Validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
newhoggy created this revision.Nov 25 2017, 8:20 AM
angerman requested changes to this revision.Nov 25 2017, 8:28 AM
angerman added a subscriber: angerman.

I'm also missing the changes to the driver pipeline, adding the relevant attributes to the llvm tool invocation. See https://github.com/ghc/ghc/commit/39f7fc86bb0a4cbf0476f98819d597c0a00d1210, I believe that code needs to be part of this diff as well.

compiler/llvmGen/LlvmCodeGen/CodeGen.hs
782–783

I'm afraid this is likely still incorrect. Was this tested with the LLVM backend?
As pointed out in D4063, I'm pretty certain this needs to be llvm.x86.bmi.pdep. and llvm.x86.bmi.pext.

I also believe this needs guards:

hasBmi <- isBmiEnabled <$> getDynFlags
let lit = if hasBmi then "llvm.x86.bmi.pdep." else "hs_pdep"
in fsLit $ lit ++ showSDoc dflags (ppr $ widthToLlvmInt w)
This revision now requires changes to proceed.Nov 25 2017, 8:28 AM
newhoggy updated this revision to Diff 14819.Nov 25 2017, 5:18 PM

Add new mbmi and mbmi2 compiler flags

Summary:
This adds support for the bit deposit and extraction operations provided by the
BMI and BMI2 instruction set extensions on modern amd64 machines.

Test Plan: Validate

Reviewers: austin, simonmar, bgamari

Subscribers: newhoggy, rwbarton, thomie

GHC Trac Issues: Trac #14206

Differential Revision: https://phabricator.haskell.org/D4063

Implement x86 code generator for pdep and pext. Properly initialise bmiVersion field.

pdep and pext test cases

newhoggy updated this revision to Diff 14828.Nov 27 2017, 6:01 AM
  • Add llvm options for bmi & bmi2
  • Use llvm.x86.bmi.pdep and llvm.x86.bmi.pext instead

I've installed llvm on my macos and updated the GHC code to use llvm.x86.bmi.pext and llvm.x86.bmi.pdep, but have been unable to get the compile to work. I may need a bit of hand-holding here since I've not had any llvm experience before.

Here is where I'm at:

$ brew install --with-toolchain llvm
$ export PATH=/usr/local/opt/llvm/bin:$PATH
$ export CPLUS_INCLUDE_PATH=$(llvm-config --includedir):$CPLUS_INCLUDE_PATH
$ export LD_LIBRARY_PATH=$(llvm-config --libdir):$LD_LIBRARY_PATH
$ ghc -fllvm Main.hs
Linking Main ...
Undefined symbols for architecture x86_64:
  "_llvm.x86.bmi.pext.i64", referenced from:
      _c2cz_info$def in Main.o
      _c2gb_info$def in Main.o
  "_llvm.x86.bmi.pdep.i64", referenced from:
      _c2a5_info$def in Main.o
      _c2hi_info$def in Main.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
`gcc' failed in phase `Linker'. (Exit code: 1)

Please let me know what I'm doing wrong.

Thanks!

I also tried this in case it made a difference:

$ ghc -fllvm -mbmi2 Main.hs
Linking Main ...
Undefined symbols for architecture x86_64:
  "_llvm.x86.bmi.pext.i64", referenced from:
      _c2cz_info$def in Main.o
      _c2gb_info$def in Main.o
  "_llvm.x86.bmi.pdep.i64", referenced from:
      _c2a5_info$def in Main.o
      _c2hi_info$def in Main.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
`gcc' failed in phase `Linker'. (Exit code: 1)
angerman requested changes to this revision.Nov 27 2017, 6:32 AM

I also tried this in case it made a difference:

$ ghc -fllvm -mbmi2 Main.hs
Linking Main ...
Undefined symbols for architecture x86_64:
  "_llvm.x86.bmi.pext.i64", referenced from:
      _c2cz_info$def in Main.o
      _c2gb_info$def in Main.o
  "_llvm.x86.bmi.pdep.i64", referenced from:
      _c2a5_info$def in Main.o
      _c2hi_info$def in Main.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
`gcc' failed in phase `Linker'. (Exit code: 1)

See inline comments.

compiler/llvmGen/LlvmCodeGen/CodeGen.hs
786–787

For what ever inconsistency reasons, they are called

; Function Attrs: nounwind readnone
declare i64 @llvm.x86.bmi.pdep.64(i64, i64) #2

specifically not that there is no i in front of the 64. As such, you won't be able to use (ppr $ widthToLlvmInt w), as that will output i64 instead of 64.

This revision now requires changes to proceed.Nov 27 2017, 6:32 AM

Thanks again for looking at this, @newhoggy!

I have a small comment regarding the C functions.

compiler/nativeGen/X86/CodeGen.hs
2758

I think you want to do fsLit $ pdepLabel w here and not ignore the w(idth) argument
and similar below for MO_Pext.

libraries/ghc-prim/cbits/pdep.c
51–64

This could be deleted if you call the C function hs*32 or hs*64 according to word width in the backends, which you do except in the X86 native code backend (see comment above).

libraries/ghc-prim/cbits/pext.c
47–60

Same as above in hs_pdep

newhoggy updated this revision to Diff 14837.Nov 27 2017, 6:34 PM
  • Use 32/64 suffix instead of i32/i64 suffix
newhoggy added inline comments.Nov 27 2017, 6:36 PM
compiler/llvmGen/LlvmCodeGen/CodeGen.hs
787

@angerman Your advice helped.

I've changed it to use widthInBits instead.

I now have a different issue:

$ rm -f Main.hi Main.o Main; ghc -fllvm -mbmi2 Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Callsite was not defined with variable arguments!
i64 (i64)* @llvm.x86.bmi.pdep.64
Callsite was not defined with variable arguments!
i64 (i64)* @llvm.x86.bmi.pext.64
Callsite was not defined with variable arguments!
i64 (i64)* @llvm.x86.bmi.pext.64
Callsite was not defined with variable arguments!
i64 (i64)* @llvm.x86.bmi.pdep.64
opt: /var/folders/8d/3xbnllbx76gcbk3wwy086vlm0000gn/T/ghc11619_0/ghc_2.ll: error: input module is broken!
`opt' failed in phase `LLVM Optimiser'. (Exit code: 1)

Thanks!

I performed an llvm dump like so:

$ rm -f Main.hi Main.o Main; ghc -fllvm -ddump-llvm -mbmi2 Main.hs > Main.llvm
Callsite was not defined with variable arguments!
i64 (i64)* @llvm.x86.bmi.pdep.64
Callsite was not defined with variable arguments!
i64 (i64)* @llvm.x86.bmi.pext.64
Callsite was not defined with variable arguments!
i64 (i64)* @llvm.x86.bmi.pext.64
Callsite was not defined with variable arguments!
i64 (i64)* @llvm.x86.bmi.pdep.64
opt: /var/folders/8d/3xbnllbx76gcbk3wwy086vlm0000gn/T/ghc43023_0/ghc_2.ll: error: input module is broken!
`opt' failed in phase `LLVM Optimiser'. (Exit code: 1)
 ✘  ~/wrk/haskell-works/ghc-play 
$ vim Main.llvm
 ~/wrk/haskell-works/ghc-play 
$ cat Main.llvm| grep bmi
  %ln2bU = call ccc i64 (i64) @llvm.x86.bmi.pdep.64( i64 %ln2bS )
declare ccc i64 @llvm.x86.bmi.pdep.64(i64)
  %ln2en = call ccc i64 (i64) @llvm.x86.bmi.pext.64( i64 %ln2el )
declare ccc i64 @llvm.x86.bmi.pext.64(i64)
  %ln2mg = call ccc i64 (i64) @llvm.x86.bmi.pext.64( i64 %ln2me )
  %ln2tw = call ccc i64 (i64) @llvm.x86.bmi.pdep.64( i64 %ln2tu )

Although I don't know llvm I suspect the failure might be related to @llvm.x86.bmi.pdep.64 and @llvm.x86.bmi.pext.64 needing to take two arguments? I can only see what looks like a single argument.

newhoggy updated this revision to Diff 14842.Nov 28 2017, 6:14 AM
  • Fix number of arguments in pdep and pext calls
newhoggy updated this revision to Diff 14843.Nov 28 2017, 6:18 AM
  • Fix number of arguments in pdep and pext calls

I'm still missing the guards that fallback to hs_pdep

compiler/llvmGen/LlvmCodeGen/CodeGen.hs
592

I added an extra width element here too. No idea what that was necessary.

599

I discovered this zip was causing only one argument to show up. I added an extra element to the second list in a new copy of genCallSimpleCast function.

Looks like the generated llvm call is now working.

Is there a way I can run make test TEST="cgrun075 cgrun076" on llvm backend?

bgamari requested changes to this revision.Nov 29 2017, 9:10 AM

Getting there!

compiler/llvmGen/LlvmCodeGen/CodeGen.hs
592

Indeed something seems fishy here. I believe you want this list to rather be the same length as args. Consequently I think map (const width) args would be more appropriate here.

599

Ahh yes. I've been bitten in this same way by zip.

Is there a way I can run make test TEST="cgrun075 cgrun076" on llvm backend?

make test TEST="cgrun075 cgrun076" WAY=optllvm

This revision now requires changes to proceed.Nov 29 2017, 9:10 AM

Seems like make test TEST="cgrun075 cgrun076" WAY=optllvm is insufficient to test the llvm backend.

I can tell because I intentionally injected an error and it did not trigger.

newhoggy updated this revision to Diff 14877.Dec 3 2017, 4:38 AM
  • Add size suffix
  • Support more than two arguments
newhoggy updated this revision to Diff 14884.Dec 5 2017, 10:22 AM
  • Fallback to C functions when BMI2 is unavailable
newhoggy updated this revision to Diff 14885.Dec 5 2017, 10:31 AM
  • Remove unused C functions

I think I've incorporated all the changes requested. Let me know if I've missed everything. Thanks!

attached is the llvm-ng patch for this logic. Just so I don't lose it.

Looks better. I'll give it a test before merging.

How's it looking. 😄

Sorry for the delay, @newhoggy; the holidays ended up being a bit more busy than I expected. Unfortunately this still doesn't pass the testsuite with the LLVM way. In particular, cgrun075 fails with (using LLVM 5.0),

=====> cgrun075(optllvm) 1 of 1 [0, 0, 0]
cd "./codeGen/should_run/cgrun075.run" &&  "/mnt/work/ghc/ghc-testing/inplace/test   spaces/ghc-stage2" -o cgrun075 cgrun075.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -dno-debug-output  -O -fllvm 
cd "./codeGen/should_run/cgrun075.run" && ./cgrun075  
Actual stdout output differs from expected:
diff -uw "./codeGen/should_run/cgrun075.run/cgrun075.stdout.normalised" "./codeGen/should_run/cgrun075.run/cgrun075.run.stdout.normalised"
--- ./codeGen/should_run/cgrun075.run/cgrun075.stdout.normalised	2018-01-02 18:53:28.847209031 -0500
+++ ./codeGen/should_run/cgrun075.run/cgrun075.run.stdout.normalised	2018-01-02 18:53:28.847209031 -0500
@@ -1,5 +1,14 @@
 OK
-OK
-OK
-OK
+FAIL
+   Input: (40,150)
+Expected: 128
+  Actual: -128
+FAIL
+   Input: (48250,63528)
+Expected: 61472
+  Actual: -4064
+FAIL
+   Input: (824360058,3457084310)
+Expected: 2316176260
+  Actual: -1978791036
 OK
*** unexpected failure for cgrun075(optllvm)

Did this work for your under LLVM?

newhoggy added a comment.EditedJan 5 2018, 4:29 PM

Sorry for the delay, @newhoggy; the holidays ended up being a bit more busy than I expected. Unfortunately this still doesn't pass the testsuite with the LLVM way. In particular, cgrun075 fails with (using LLVM 5.0),

=====> cgrun075(optllvm) 1 of 1 [0, 0, 0]
cd "./codeGen/should_run/cgrun075.run" &&  "/mnt/work/ghc/ghc-testing/inplace/test   spaces/ghc-stage2" -o cgrun075 cgrun075.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -dno-debug-output  -O -fllvm 
cd "./codeGen/should_run/cgrun075.run" && ./cgrun075  
Actual stdout output differs from expected:
diff -uw "./codeGen/should_run/cgrun075.run/cgrun075.stdout.normalised" "./codeGen/should_run/cgrun075.run/cgrun075.run.stdout.normalised"
--- ./codeGen/should_run/cgrun075.run/cgrun075.stdout.normalised	2018-01-02 18:53:28.847209031 -0500
+++ ./codeGen/should_run/cgrun075.run/cgrun075.run.stdout.normalised	2018-01-02 18:53:28.847209031 -0500
@@ -1,5 +1,14 @@
 OK
-OK
-OK
-OK
+FAIL
+   Input: (40,150)
+Expected: 128
+  Actual: -128
+FAIL
+   Input: (48250,63528)
+Expected: 61472
+  Actual: -4064
+FAIL
+   Input: (824360058,3457084310)
+Expected: 2316176260
+  Actual: -1978791036
 OK
*** unexpected failure for cgrun075(optllvm)

Did this work for your under LLVM?

This worked for my setup. I'm on OS X:

$ uname -a
Darwin galois.lan 17.3.0 Darwin Kernel Version 17.3.0: Thu Nov  9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64 x86_64

I'm using llvm 5.0.1:

 brew info llvm
llvm: stable 5.0.1 (bottled), HEAD [keg-only]
Next-gen compiler infrastructure
https://llvm.org/
/usr/local/Cellar/llvm/5.0.1 (4,890 files, 2.3GB)
  Built from source on 2018-01-05 at 11:19:37 with: --with-toolchain

llvm was installed with:

brew install --with-toolchain llvm

My git-hash is 8a528f7126dd643ec6b3e61f59741cd546610344

This is my test output:

$ make test TEST="cgrun075 cgrun076" WAY=optllvm
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f ghc.mk testsuite_utils
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/c++/4.2.1
make[1]: Nothing to be done for `testsuite_utils'.
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C testsuite/tests CLEANUP=1 SUMMARY_FILE=../../testsuite_summary.txt
PYTHON="python3" "python3" ../driver/runtests.py  -e "ghc_compiler_always_flags='-dcore-lint -dcmm-lint -no-user-package-db -rtsopts  -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -dno-debug-output'" -e config.compiler_debugged=False -e ghc_with_native_codegen=1 -e config.have_vanilla=True -e config.have_dynamic=True -e config.have_profiling=True -e ghc_with_threaded_rts=1 -e ghc_with_dynamic_rts=1 -e config.have_interp=True -e config.unregisterised=False -e config.ghc_dynamic_by_default=False -e config.ghc_dynamic=True -e ghc_with_smp=1 -e ghc_with_llvm=0 -e windows=False -e darwin=True -e config.in_tree_compiler=True -e config.cleanup=True -e config.local=True --rootdir=. --config-file=../config/ghc -e 'config.confdir="../config"' -e 'config.platform="x86_64-apple-darwin"' -e 'config.os="darwin"' -e 'config.arch="x86_64"' -e 'config.wordsize="64"' -e 'config.timeout=int() or config.timeout' -e 'config.exeext=""' -e 'config.top="/Users/jky/wrk/haskell-works/ghc/testsuite"' --config 'compiler="/Users/jky/wrk/haskell-works/ghc/inplace/test   spaces/ghc-stage2"' --config 'ghc_pkg="/Users/jky/wrk/haskell-works/ghc/inplace/test   spaces/ghc-pkg"' --config 'haddock="/Users/jky/wrk/haskell-works/ghc/inplace/test   spaces/haddock"' --config 'hp2ps="/Users/jky/wrk/haskell-works/ghc/inplace/test   spaces/hp2ps"' --config 'hpc="/Users/jky/wrk/haskell-works/ghc/inplace/test   spaces/hpc"' --config 'gs="gs"' --config 'timeout_prog="../timeout/install-inplace/bin/timeout"' -e "config.stage=2" --summary-file "../../testsuite_summary.txt"   --rootdir=../../libraries/array/tests  --rootdir=../../libraries/base/tests  --rootdir=../../libraries/binary/tests  --rootdir=../../libraries/bytestring/tests  --rootdir=../../libraries/containers/tests  --rootdir=../../libraries/deepseq/tests  --rootdir=../../libraries/directory/tests  --rootdir=../../libraries/filepath/tests  --rootdir=../../libraries/ghc-compact/tests  --rootdir=../../libraries/ghc-prim/tests  --rootdir=../../libraries/haskeline/tests  --rootdir=../../libraries/hpc/tests  --rootdir=../../libraries/pretty/tests  --rootdir=../../libraries/process/tests  --rootdir=../../libraries/stm/tests  --rootdir=../../libraries/template-haskell/tests  --rootdir=../../libraries/text/tests  --rootdir=../../libraries/unix/tests \
		 --only=cgrun075  --only=cgrun076 \
		 \
		 --way=optllvm \
		 \
		 \
		 \

WARNING: Unknown WAY optllvm in --way
"gs" -dNODISPLAY -dBATCH -dQUIET -dNOPAUSE "./good.ps"
GhostScript not available for hp2ps tests
Timeout is 300
Found 398 .T files...
Beginning test run at Sat Jan  6 09:43:25 2018 AEDT
====> Scanning ./ado/all.T
====> Scanning ./annotations/should_compile/all.T
====> Scanning ./annotations/should_compile/T13818/all.T
====> Scanning ./annotations/should_compile/th/all.T
====> Scanning ./annotations/should_fail/all.T
====> Scanning ./annotations/should_run/all.T
====> Scanning ./array/should_run/all.T
====> Scanning ./arrows/should_compile/all.T
====> Scanning ./arrows/should_fail/all.T
====> Scanning ./arrows/should_run/all.T
====> Scanning ./backpack/cabal/bkpcabal01/all.T
====> Scanning ./backpack/cabal/bkpcabal02/all.T
====> Scanning ./backpack/cabal/bkpcabal03/all.T
====> Scanning ./backpack/cabal/bkpcabal04/all.T
====> Scanning ./backpack/cabal/bkpcabal05/all.T
====> Scanning ./backpack/cabal/bkpcabal06/all.T
====> Scanning ./backpack/cabal/bkpcabal07/all.T
====> Scanning ./backpack/reexport/all.T
====> Scanning ./backpack/should_compile/all.T
====> Scanning ./backpack/should_fail/all.T
====> Scanning ./backpack/should_run/all.T
====> Scanning ./boxy/all.T
====> Scanning ./cabal/all.T
====> Scanning ./cabal/T12485/all.T
====> Scanning ./cabal/T12733/all.T
====> Scanning ./cabal/cabal01/all.T
====> Scanning ./cabal/cabal03/all.T
====> Scanning ./cabal/cabal04/all.T
====> Scanning ./cabal/cabal05/all.T
====> Scanning ./cabal/cabal06/all.T
====> Scanning ./cabal/cabal08/all.T
====> Scanning ./cabal/cabal09/all.T
====> Scanning ./cabal/pkg02/all.T
====> Scanning ./cabal/sigcabal01/all.T
====> Scanning ./callarity/perf/all.T
====> Scanning ./callarity/should_run/all.T
====> Scanning ./callarity/unittest/all.T
====> Scanning ./codeGen/should_compile/all.T
====> Scanning ./codeGen/should_fail/all.T
====> Scanning ./codeGen/should_gen_asm/all.T
====> Scanning ./codeGen/should_run/all.T
====> Scanning ./concurrent/T13615/all.T
====> Scanning ./concurrent/T2317/all.T
====> Scanning ./concurrent/prog001/all.T
====> Scanning ./concurrent/prog002/all.T
====> Scanning ./concurrent/prog003/all.T
====> Scanning ./concurrent/should_run/all.T
====> Scanning ./cpranal/should_compile/all.T
====> Scanning ./cpranal/should_run/all.T
====> Scanning ./cps/all.T
====> Scanning ./deSugar/should_compile/all.T
====> Scanning ./deSugar/should_fail/all.T
====> Scanning ./deSugar/should_run/all.T
====> Scanning ./dependent/ghci/all.T
====> Scanning ./dependent/should_compile/all.T
====> Scanning ./dependent/should_fail/all.T
====> Scanning ./dependent/should_run/all.T
====> Scanning ./deriving/perf/all.T
====> Scanning ./deriving/should_compile/all.T
====> Scanning ./deriving/should_fail/all.T
====> Scanning ./deriving/should_run/all.T
====> Scanning ./determinism/T13807/all.T
====> Scanning ./determinism/determ001/all.T
====> Scanning ./determinism/determ002/all.T
====> Scanning ./determinism/determ003/all.T
====> Scanning ./determinism/determ004/all.T
====> Scanning ./determinism/determ005/all.T
====> Scanning ./determinism/determ006/all.T
====> Scanning ./determinism/determ007/all.T
====> Scanning ./determinism/determ008/all.T
====> Scanning ./determinism/determ009/all.T
====> Scanning ./determinism/determ010/all.T
====> Scanning ./determinism/determ011/all.T
====> Scanning ./determinism/determ012/all.T
====> Scanning ./determinism/determ013/all.T
====> Scanning ./determinism/determ014/all.T
====> Scanning ./determinism/determ015/all.T
====> Scanning ./determinism/determ016/all.T
====> Scanning ./determinism/determ017/all.T
====> Scanning ./determinism/determ018/all.T
====> Scanning ./determinism/determ019/all.T
====> Scanning ./determinism/determ021/all.T
====> Scanning ./determinism/determ022/all.T
====> Scanning ./dph/classes/dph-classes.T
====> Scanning ./dph/diophantine/dph-diophantine.T
====> Scanning ./dph/dotp/dph-dotp.T
====> Scanning ./dph/enumfromto/dph-enumfromto.T
====> Scanning ./dph/modules/dph-modules.T
====> Scanning ./dph/nbody/dph-nbody.T
====> Scanning ./dph/primespj/dph-primespj.T
====> Scanning ./dph/quickhull/dph-quickhull.T
====> Scanning ./dph/smvm/dph-smvm.T
====> Scanning ./dph/sumnats/dph-sumnats.T
====> Scanning ./dph/words/dph-words.T
====> Scanning ./driver/all.T
====> Scanning ./driver/T12062/all.T
====> Scanning ./driver/T13392/all.T
====> Scanning ./driver/T13710/all.T
====> Scanning ./driver/T1372/all.T
====> Scanning ./driver/T13803/all.T
====> Scanning ./driver/T13914/all.T
====> Scanning ./driver/T14075/all.T
====> Scanning ./driver/T1959/test.T
====> Scanning ./driver/T3007/all.T
====> Scanning ./driver/T437/all.T
====> Scanning ./driver/T5147/all.T
====> Scanning ./driver/T7373/all.T
====> Scanning ./driver/T7835/all.T
====> Scanning ./driver/T8184/all.T
====> Scanning ./driver/T8526/T8526.T
====> Scanning ./driver/T8602/T8602.T
====> Scanning ./driver/T9562/all.T
====> Scanning ./driver/bug1677/all.T
====> Scanning ./driver/conflicting_flags/test.T
====> Scanning ./driver/dynamicToo/all.T
====> Scanning ./driver/dynamicToo/dynamicToo001/test.T
====> Scanning ./driver/dynamicToo/dynamicToo002/test.T
====> Scanning ./driver/dynamicToo/dynamicToo004/test.T
====> Scanning ./driver/dynamicToo/dynamicToo005/test.T
====> Scanning ./driver/dynamic_flags_001/all.T
====> Scanning ./driver/dynamic_flags_002/all.T
====> Scanning ./driver/linkwhole/all.T
====> Scanning ./driver/objc/all.T
====> Scanning ./driver/recomp001/all.T
====> Scanning ./driver/recomp002/all.T
====> Scanning ./driver/recomp003/all.T
====> Scanning ./driver/recomp004/all.T
====> Scanning ./driver/recomp005/all.T
====> Scanning ./driver/recomp006/all.T
====> Scanning ./driver/recomp007/all.T
====> Scanning ./driver/recomp008/all.T
====> Scanning ./driver/recomp009/all.T
====> Scanning ./driver/recomp010/all.T
====> Scanning ./driver/recomp011/all.T
====> Scanning ./driver/recomp012/all.T
====> Scanning ./driver/recomp013/all.T
====> Scanning ./driver/recomp015/all.T
====> Scanning ./driver/recomp016/all.T
====> Scanning ./driver/recomp017/all.T
====> Scanning ./driver/retc001/all.T
====> Scanning ./driver/retc002/all.T
====> Scanning ./driver/retc003/all.T
====> Scanning ./driver/should_fail/all.T
====> Scanning ./dynlibs/all.T
====> Scanning ./ffi/should_compile/all.T
====> Scanning ./ffi/should_fail/all.T
====> Scanning ./ffi/should_run/all.T
====> Scanning ./gadt/all.T
====> Scanning ./generics/all.T
====> Scanning ./generics/GEq/test.T
====> Scanning ./generics/GFunctor/test.T
====> Scanning ./generics/GMap/test.T
====> Scanning ./generics/GShow/test.T
====> Scanning ./generics/T10604/all.T
====> Scanning ./generics/Uniplate/test.T
====> Scanning ./ghc-api/all.T
====> Scanning ./ghc-api/T10052/all.T
====> Scanning ./ghc-api/T4891/all.T
====> Scanning ./ghc-api/T7478/all.T
====> Scanning ./ghc-api/annotations/all.T
====> Scanning ./ghc-api/annotations-literals/all.T
====> Scanning ./ghc-api/apirecomp001/all.T
====> Scanning ./ghc-api/dynCompileExpr/all.T
====> Scanning ./ghc-api/show-srcspan/all.T
====> Scanning ./ghc-e/should_fail/all.T
====> Scanning ./ghc-e/should_run/all.T
====> Scanning ./ghci/T11827/all.T
====> Scanning ./ghci/linking/all.T
====> Scanning ./ghci/linking/dyn/all.T
====> Scanning ./ghci/prog001/prog001.T
====> Scanning ./ghci/prog002/prog002.T
====> Scanning ./ghci/prog003/prog003.T
====> Scanning ./ghci/prog004/prog004.T
====> Scanning ./ghci/prog005/prog005.T
====> Scanning ./ghci/prog006/prog006.T
====> Scanning ./ghci/prog007/prog007.T
====> Scanning ./ghci/prog008/prog008.T
====> Scanning ./ghci/prog009/ghci.prog009.T
====> Scanning ./ghci/prog010/all.T
====> Scanning ./ghci/prog011/prog011.T
====> Scanning ./ghci/prog012/all.T
====> Scanning ./ghci/prog013/prog013.T
====> Scanning ./ghci/prog014/prog014.T
====> Scanning ./ghci/prog015/prog015.T
====> Scanning ./ghci/prog016/prog016.T
====> Scanning ./ghci/prog017/prog017.T
====> Scanning ./ghci/scripts/all.T
====> Scanning ./ghci/should_fail/all.T
====> Scanning ./ghci/should_run/all.T
====> Scanning ./ghci.debugger/scripts/all.T
====> Scanning ./ghci.debugger/scripts/break022/all.T
====> Scanning ./ghci.debugger/scripts/break023/all.T
====> Scanning ./haddock/haddock_examples/test.T
====> Scanning ./haddock/should_compile_flag_haddock/all.T
====> Scanning ./haddock/should_compile_flag_nohaddock/all.T
====> Scanning ./haddock/should_compile_noflag_haddock/all.T
====> Scanning ./haddock/should_compile_noflag_nohaddock/all.T
====> Scanning ./haddock/should_fail_flag_haddock/all.T
====> Scanning ./hpc/all.T
====> Scanning ./hsc2hs/all.T
====> Scanning ./indexed-types/should_compile/all.T
====> Scanning ./indexed-types/should_compile/T13092b/all.T
====> Scanning ./indexed-types/should_fail/all.T
====> Scanning ./indexed-types/should_fail/T13092/all.T
====> Scanning ./indexed-types/should_fail/T13092c/all.T
====> Scanning ./indexed-types/should_fail/T13102/all.T
====> Scanning ./indexed-types/should_run/all.T
====> Scanning ./layout/all.T
====> Scanning ./lib/integer/all.T
====> Scanning ./llvm/should_compile/all.T
====> Scanning ./llvm/should_run/subsections_via_symbols/all.T
====> Scanning ./mdo/should_compile/all.T
====> Scanning ./mdo/should_fail/all.T
====> Scanning ./mdo/should_run/all.T
====> Scanning ./module/all.T
====> Scanning ./module/base01/all.T
====> Scanning ./module/mod175/all.T
====> Scanning ./monadfail/all.T
====> Scanning ./numeric/should_compile/all.T
====> Scanning ./numeric/should_run/all.T
====> Scanning ./overloadedlists/should_fail/all.T
====> Scanning ./overloadedlists/should_run/all.T
====> Scanning ./overloadedrecflds/ghci/all.T
====> Scanning ./overloadedrecflds/should_compile/all.T
====> Scanning ./overloadedrecflds/should_fail/all.T
====> Scanning ./overloadedrecflds/should_run/all.T
====> Scanning ./overloadedstrings/should_run/all.T
====> Scanning ./package/all.T
====> Scanning ./parser/prog001/test.T
====> Scanning ./parser/should_compile/all.T
====> Scanning ./parser/should_compile/T7476/all.T
====> Scanning ./parser/should_fail/all.T
====> Scanning ./parser/should_run/all.T
====> Scanning ./parser/unicode/all.T
====> Scanning ./partial-sigs/should_compile/all.T
====> Scanning ./partial-sigs/should_fail/all.T
====> Scanning ./partial-sigs/should_run/all.T
====> Scanning ./patsyn/should_compile/all.T
====> Scanning ./patsyn/should_compile/T13350/all.T
====> Scanning ./patsyn/should_fail/all.T
====> Scanning ./patsyn/should_run/all.T
====> Scanning ./perf/compiler/all.T
====> Scanning ./perf/haddock/all.T
====> Scanning ./perf/join_points/all.T
====> Scanning ./perf/should_run/all.T
====> Scanning ./perf/space_leaks/all.T
====> Scanning ./plugins/all.T
====> Scanning ./pmcheck/complete_sigs/all.T
====> Scanning ./pmcheck/should_compile/all.T
====> Scanning ./polykinds/all.T
====> Scanning ./primops/should_compile/all.T
====> Scanning ./primops/should_run/all.T
====> Scanning ./printer/all.T
====> Scanning ./profiling/should_compile/all.T
====> Scanning ./profiling/should_fail/all.T
====> Scanning ./profiling/should_run/all.T
====> Scanning ./programs/10queens/test.T
====> Scanning ./programs/Queens/test.T
====> Scanning ./programs/andre_monad/test.T
====> Scanning ./programs/andy_cherry/test.T
====> Scanning ./programs/barton-mangler-bug/test.T
====> Scanning ./programs/cholewo-eval/test.T
====> Scanning ./programs/cvh_unboxing/test.T
====> Scanning ./programs/fast2haskell/test.T
====> Scanning ./programs/fun_insts/test.T
====> Scanning ./programs/galois_raytrace/test.T
====> Scanning ./programs/hs-boot/all.T
====> Scanning ./programs/jl_defaults/test.T
====> Scanning ./programs/joao-circular/test.T
====> Scanning ./programs/jq_readsPrec/test.T
====> Scanning ./programs/jtod_circint/test.T
====> Scanning ./programs/jules_xref/test.T
====> Scanning ./programs/jules_xref2/test.T
====> Scanning ./programs/launchbury/test.T
====> Scanning ./programs/lennart_range/test.T
====> Scanning ./programs/lex/test.T
====> Scanning ./programs/life_space_leak/test.T
====> Scanning ./programs/maessen-hashtab/test.T
====> Scanning ./programs/north_array/test.T
====> Scanning ./programs/okeefe_neural/test.T
====> Scanning ./programs/record_upd/test.T
====> Scanning ./programs/rittri/test.T
====> Scanning ./programs/sanders_array/test.T
====> Scanning ./programs/seward-space-leak/test.T
====> Scanning ./programs/strict_anns/test.T
====> Scanning ./programs/thurston-modular-arith/test.T
====> Scanning ./quasiquotation/all.T
====> Scanning ./quasiquotation/T13863/all.T
====> Scanning ./quasiquotation/T4491/test.T
====> Scanning ./quasiquotation/qq001/test.T
====> Scanning ./quasiquotation/qq002/test.T
====> Scanning ./quasiquotation/qq003/test.T
====> Scanning ./quasiquotation/qq004/test.T
====> Scanning ./quasiquotation/qq005/test.T
====> Scanning ./quasiquotation/qq006/test.T
====> Scanning ./quasiquotation/qq007/test.T
====> Scanning ./quasiquotation/qq008/test.T
====> Scanning ./quasiquotation/qq009/test.T
====> Scanning ./quotes/all.T
====> Scanning ./quotes/TH_spliceViewPat/test.T
====> Scanning ./rebindable/all.T
====> Scanning ./regalloc/all.T
====> Scanning ./rename/prog001/test.T
====> Scanning ./rename/prog002/test.T
====> Scanning ./rename/prog003/test.T
====> Scanning ./rename/prog004/test.T
====> Scanning ./rename/prog005/test.T
====> Scanning ./rename/prog006/all.T
====> Scanning ./rename/should_compile/all.T
====> Scanning ./rename/should_compile/T3103/test.T
====> Scanning ./rename/should_fail/all.T
====> Scanning ./roles/should_compile/all.T
====> Scanning ./roles/should_fail/all.T
====> Scanning ./rts/all.T
====> Scanning ./rts/T10672/all.T
====> Scanning ./rts/T11223/all.T
====> Scanning ./rts/T12031/all.T
====> Scanning ./rts/T12771/all.T
====> Scanning ./rts/T13082/all.T
====> Scanning ./rts/T13287/all.T
====> Scanning ./rts/T1791/all.T
====> Scanning ./rts/T5644/all.T
====> Scanning ./rts/T7289/all.T
====> Scanning ./rts/T8308/all.T
====> Scanning ./rts/T9579/all.T
====> Scanning ./rts/flags/all.T
====> Scanning ./runghc/all.T
====> Scanning ./safeHaskell/check/all.T
====> Scanning ./safeHaskell/check/pkg01/all.T
====> Scanning ./safeHaskell/flags/all.T
====> Scanning ./safeHaskell/ghci/all.T
====> Scanning ./safeHaskell/overlapping/all.T
====> Scanning ./safeHaskell/safeInfered/all.T
====> Scanning ./safeHaskell/safeLanguage/all.T
====> Scanning ./safeHaskell/unsafeLibs/all.T
====> Scanning ./showIface/all.T
====> Scanning ./simplCore/T9646/test.T
====> Scanning ./simplCore/prog001/test.T
====> Scanning ./simplCore/prog002/test.T
====> Scanning ./simplCore/prog003/test.T
====> Scanning ./simplCore/should_compile/all.T
====> Scanning ./simplCore/should_fail/all.T
====> Scanning ./simplCore/should_run/all.T
====> Scanning ./simplStg/should_compile/all.T
====> Scanning ./simplStg/should_run/all.T
====> Scanning ./stage1/all.T
====> Scanning ./stranal/should_compile/all.T
====> Scanning ./stranal/should_run/all.T
====> Scanning ./stranal/should_run/T8425/all.T
====> Scanning ./stranal/sigs/all.T
====> Scanning ./th/all.T
====> Scanning ./th/T2014/all.T
====> Scanning ./th/TH_import_loop/TH_import_loop.T
====> Scanning ./th/TH_linker/all.T
====> Scanning ./th/should_compile/T13949/all.T
====> Scanning ./th/should_compile/T8025/all.T
====> Scanning ./typecheck/T11824/all.T
====> Scanning ./typecheck/T12441/all.T
====> Scanning ./typecheck/T13168/all.T
====> Scanning ./typecheck/bug1465/all.T
====> Scanning ./typecheck/prog001/test.T
====> Scanning ./typecheck/prog002/test.T
====> Scanning ./typecheck/should_compile/all.T
====> Scanning ./typecheck/should_fail/all.T
====> Scanning ./typecheck/should_run/all.T
====> Scanning ./typecheck/testeq1/test.T
====> Scanning ./unboxedsums/all.T
====> Scanning ./unboxedsums/module/all.T
====> Scanning ./warnings/minimal/all.T
====> Scanning ./warnings/should_compile/all.T
====> Scanning ./warnings/should_compile/T10637/all.T
====> Scanning ./warnings/should_compile/T10890/all.T
====> Scanning ./warnings/should_compile/T13727/all.T
====> Scanning ./warnings/should_fail/all.T
====> Scanning ./wcompat-warnings/all.T
====> Scanning ../../libraries/array/tests/all.T
====> Scanning ../../libraries/base/tests/all.T
====> Scanning ../../libraries/base/tests/Concurrent/all.T
====> Scanning ../../libraries/base/tests/IO/all.T
====> Scanning ../../libraries/base/tests/IO/T12010/test.T
====> Scanning ../../libraries/base/tests/Numeric/all.T
====> Scanning ../../libraries/base/tests/System/all.T
====> Scanning ../../libraries/base/tests/Text.Printf/all.T
====> Scanning ../../libraries/ghc-compact/tests/all.T
====> Scanning ../../libraries/hpc/tests/fork/test.T
====> Scanning ../../libraries/hpc/tests/function/test.T
====> Scanning ../../libraries/hpc/tests/function2/test.T
====> Scanning ../../libraries/hpc/tests/ghc_ghci/test.T
====> Scanning ../../libraries/hpc/tests/raytrace/test.T
====> Scanning ../../libraries/hpc/tests/raytrace/tixs/test.T
====> Scanning ../../libraries/hpc/tests/simple/test.T
====> Scanning ../../libraries/hpc/tests/simple/tixs/test.T
====> Scanning ../../libraries/process/tests/all.T
====> Scanning ../../libraries/process/tests/T9775/all.T
====> Scanning ../../libraries/stm/tests/all.T
====> Scanning ../../libraries/template-haskell/tests/all.T
====> Scanning ../../libraries/unix/tests/all.T
====> Scanning ../../libraries/unix/tests/libposix/all.T
=====> cgrun075(normal) 1 of 2 [0, 0, 0]
cd "./codeGen/should_run/cgrun075.run" &&  "/Users/jky/wrk/haskell-works/ghc/inplace/test   spaces/ghc-stage2" -o cgrun075 cgrun075.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -dno-debug-output
cd "./codeGen/should_run/cgrun075.run" && ./cgrun075
=====> cgrun075(g1) 1 of 2 [0, 0, 0]
cd "./codeGen/should_run/cgrun075.run" &&  "/Users/jky/wrk/haskell-works/ghc/inplace/test   spaces/ghc-stage2" -o cgrun075 cgrun075.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -dno-debug-output
cd "./codeGen/should_run/cgrun075.run" && ./cgrun075 +RTS -G1 -RTS
=====> cgrun076(normal) 2 of 2 [0, 0, 0]
cd "./codeGen/should_run/cgrun076.run" &&  "/Users/jky/wrk/haskell-works/ghc/inplace/test   spaces/ghc-stage2" -o cgrun076 cgrun076.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -dno-debug-output
cd "./codeGen/should_run/cgrun076.run" && ./cgrun076
=====> cgrun076(g1) 2 of 2 [0, 0, 0]
cd "./codeGen/should_run/cgrun076.run" &&  "/Users/jky/wrk/haskell-works/ghc/inplace/test   spaces/ghc-stage2" -o cgrun076 cgrun076.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -dno-debug-output
cd "./codeGen/should_run/cgrun076.run" && ./cgrun076 +RTS -G1 -RTS

SUMMARY for test run started at Sat Jan  6 09:43:25 2018 AEDT
 0:00:05 spent to go through
       2 total tests, which gave rise to
      20 test cases, of which
      16 were skipped

       0 had missing libraries
       4 expected passes
       0 expected failures

       0 caused framework failures
       0 caused framework warnings
       0 unexpected passes
       0 unexpected failures
       0 unexpected stat failures

One thing that seems suspicious is the absence of -fllvm in my output despite having supplied WAY=optllvm.

I've reproduced the problem. Looks like for whatever reason, I need to use WAY=llvm instead of WAY=optllvm.

angerman requested changes to this revision.Jan 5 2018, 7:04 PM

I've reproduced the problem. Looks like for whatever reason, I need to use WAY=llvm instead of WAY=optllvm.

This is rather concerning. It seems like your testdriver decided you did not have optllvm

WARNING: Unknown WAY optllvm in --way

Acording to the testsuite/config/ghc script:

if (ghc_with_llvm == 1 and not config.unregisterised):
    config.compile_ways.append('optllvm')
    config.run_ways.append('optllvm')

it should be included if GHC is built with LLVM.

Compared to llvm optllvm adds -O to the invocation:

'llvm'         : ['-fllvm'],
'optllvm'      : ['-O', '-fllvm'],

As such, it seems that the optimizer breaks bmi somehwere
along the line.

This is something we must fix, I'm afraid!


You could just compile the test by hand and ignore the testsuite. Which might be the quicker way forward here, if the testsuite refuses to work with optllvm for you.

This revision now requires changes to proceed.Jan 5 2018, 7:04 PM
newhoggy added a comment.EditedJan 7 2018, 12:42 AM

I made the following changes to truncate the word to the correct word size and the tests pass.

 instance Pdep Word where
-  pdep (W#   src#) (W#   mask#) = W#   (pdep#   src# mask#)
+  pdep (W#   src#) (W#   mask#) = W#   (pdep#   src# mask#) .&. 0xffffffffffffffff

 instance Pdep Word8 where
-  pdep (W8#  src#) (W8#  mask#) = W8#  (pdep8#  src# mask#)
+  pdep (W8#  src#) (W8#  mask#) = W8#  (pdep8#  src# mask#) .&. 0xff

 instance Pdep Word16 where
-  pdep (W16# src#) (W16# mask#) = W16# (pdep16# src# mask#)
+  pdep (W16# src#) (W16# mask#) = W16# (pdep16# src# mask#) .&. 0xffff

 instance Pdep Word32 where
-  pdep (W32# src#) (W32# mask#) = W32# (pdep32# src# mask#)
+  pdep (W32# src#) (W32# mask#) = W32# (pdep32# src# mask#) .&. 0xffffffff

I shouldn't have to do this. It would seem that there is something about unboxed functions that I don't quite understand. Or something about the manner in which I should be generating the llvm code.

Maybe the generated llvm byte code needs to include extra instructions to mask away the extraneous bits?

I made the following changes to truncate the word to the correct word size and the tests pass.

 instance Pdep Word where
-  pdep (W#   src#) (W#   mask#) = W#   (pdep#   src# mask#)
+  pdep (W#   src#) (W#   mask#) = W#   (pdep#   src# mask#) .&. 0xffffffffffffffff

 instance Pdep Word8 where
-  pdep (W8#  src#) (W8#  mask#) = W8#  (pdep8#  src# mask#)
+  pdep (W8#  src#) (W8#  mask#) = W8#  (pdep8#  src# mask#) .&. 0xff

 instance Pdep Word16 where
-  pdep (W16# src#) (W16# mask#) = W16# (pdep16# src# mask#)
+  pdep (W16# src#) (W16# mask#) = W16# (pdep16# src# mask#) .&. 0xffff

 instance Pdep Word32 where
-  pdep (W32# src#) (W32# mask#) = W32# (pdep32# src# mask#)
+  pdep (W32# src#) (W32# mask#) = W32# (pdep32# src# mask#) .&. 0xffffffff

I shouldn't have to do this. It would seem that there is something about unboxed functions that I don't quite understand. Or something about the manner in which I should be generating the llvm code.

Maybe the generated llvm byte code needs to include extra instructions to mask away the extraneous bits?

It would be nice to see the diff in the generated bytecode.

I struggled a bit with the diff due to the generated bytecode being very different as a result of generated variable names, but I managed to pin it down with some editing.

This is the diff I have:

   %aaaaa = call ccc i32 (i32, i32) @llvm.x86.bmi.pdep.32( i32 xxxxx, i32 yyyyy )
   %bbbbb = sext i32 %aaaaa to i64
   store i64 %bbbbb, i64* %ccccc
   %ddddd = load i64, i64* %ccccc
   store i64 %ddddd, i64* %eeeee
+  %fffff = load i64, i64* %eeeee
+  %ggggg = and i64 %fffff, 4294967295
+  store i64 %ggggg, i64* %hhhhh
+  %iiiii = load i64, i64* %hhhhh
+  store i64 %iiiii, i64* %jjjjj
   %kkkkk = ptrtoint i8* @base_GHCziWord_W32zh_con_info to i64
   %lllll = load i64*, i64** %Hp_Var
   %mmmmm = getelementptr inbounds i64, i64* %lllll, i32 -1
   store i64 %kkkkk, i64* %mmmmm, !tbaa !3
-  %nnnnn = load i64, i64* %eeeee
+  %nnnnn = load i64, i64* %jjjjj
   %ooooo = load i64*, i64** %Hp_Var
   %ppppp = getelementptr inbounds i64, i64* %ooooo, i32 0
   store i64 %nnnnn, i64* %ppppp, !tbaa !3

This is the code before the fix:

{-# LANGUAGE MagicHash #-}

module Main where

import Data.Bits
import Data.Monoid
import Data.Word
import GHC.Base
import GHC.Word
import System.Environment

pdep :: Word32 -> Word32 -> Word32
pdep (W32# s) (W32# m) = W32# (pdep32# s m)

main :: IO ()
main = putStrLn $ "pdep: " <> show (pdep 824360058 3457084310)

This is the code after:

{-# LANGUAGE MagicHash #-}

module Main where

import Data.Bits
import Data.Monoid
import Data.Word
import GHC.Base
import GHC.Word
import System.Environment

pdep :: Word32 -> Word32 -> Word32
pdep (W32# s) (W32# m) = W32# (pdep32# s m `and#` 0xffffffff##)

main :: IO ()
main = putStrLn $ "pdep: " <> show (pdep 824360058 3457084310)

I've reproduced the problem with the following snippet of llvm code:

declare ccc i32 @llvm.x86.bmi.pdep.32(i32, i32)

@.str2 = private unnamed_addr constant [6 x i8] c"%lld\0A\00", align 1

declare i32 @printf(i8*, ...) nounwind

; Definition of main function
define i32 @main() {
  %ss = getelementptr inbounds [6 x i8], [6 x i8]*  @.str2, i32 0, i32 0

  %res1 = call ccc i32 (i32, i32) @llvm.x86.bmi.pdep.32( i32 824360058, i32 3457084310 )
  %res2 = sext i32 %res1 to i64
  call i32 (i8*, ...) @printf( i8* %ss, i64 %res2)

  ret i32 0
}

When I run this code, I get the incorrect result:

$ lli -mcpu=haswell helloWorld.ll
-1978791036

If I then modify the sext to a zext, I get the correct result

$ lli -mcpu=haswell helloWorld.ll
2316176260

What's happening is the current implementation generates an instruction to sign extend the integer.

I need to somehow modify the code to zero extent the integer instead.

newhoggy updated this revision to Diff 15157.Jan 19 2018, 5:58 AM
  • Use zero extension rather than sign extension for pdep and pext intrinsics
  • Remove unused code

Tests now also passing for llvm

$ make test TEST="cgrun075 cgrun076"
SUMMARY for test run started at Sat Jan 20 01:29:39 2018 AEDT
 0:00:06 spent to go through
       2 total tests, which gave rise to
      20 test cases, of which
      16 were skipped

       0 had missing libraries
       4 expected passes
       0 expected failures

       0 caused framework failures
       0 caused framework warnings
       0 unexpected passes
       0 unexpected failures
       0 unexpected stat failures

$ make test TEST="cgrun075 cgrun076" WAY=llvm
SUMMARY for test run started at Sat Jan 20 01:30:36 2018 AEDT
 0:00:05 spent to go through
       2 total tests, which gave rise to
      22 test cases, of which
      20 were skipped

       0 had missing libraries
       2 expected passes
       0 expected failures

       0 caused framework failures
       0 caused framework warnings
       0 unexpected passes
       0 unexpected failures
       0 unexpected stat failures
bgamari accepted this revision.Jan 21 2018, 10:48 AM

Tests now also passing for llvm

Yay, great work @newhoggy. Thanks for sticking with this! I'll test and merge shortly.

This revision was automatically updated to reflect the committed changes.

Thanks Ben!

maoe added a subscriber: maoe.Mar 2 2018, 4:06 PM
newhoggy added a comment.EditedJun 16 2018, 10:25 AM
This comment has been deleted.