WIP Add support for SIMD integer instructions
Needs RevisionPublic

Authored by newhoggy on Jun 21 2018, 7:33 AM.

Details

Summary

Add support for SIMD integer instructions

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
arcpatch-D4882_1
Lint
Lint WarningsExcuse: Ignore Line Too Long
SeverityLocationCodeMessage
Warningcompiler/cmm/CmmCommonBlockElim.hs:162TXT3Line Too Long
Warningcompiler/cmm/CmmExpr.hs:215TXT3Line Too Long
Warningcompiler/cmm/CmmExpr.hs:248TXT3Line Too Long
Warningcompiler/cmm/CmmInfo.hs:442TXT3Line Too Long
Warningcompiler/cmm/CmmInfo.hs:464TXT3Line Too Long
Warningcompiler/cmm/CmmInfo.hs:472TXT3Line Too Long
Warningcompiler/cmm/CmmNode.hs:459TXT3Line Too Long
Warningcompiler/cmm/CmmNode.hs:490TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:56TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:97TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:100TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:190TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:197TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:199TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:252TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:353TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:356TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:359TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:395TXT3Line Too Long
Warningcompiler/cmm/CmmOpt.hs:401TXT3Line Too Long
Warningcompiler/cmm/CmmSink.hs:752TXT3Line Too Long
Warningcompiler/cmm/CmmUtils.hs:230TXT3Line Too Long
Warningcompiler/cmm/CmmUtils.hs:240TXT3Line Too Long
Warningcompiler/cmm/CmmUtils.hs:244TXT3Line Too Long
Warningcompiler/cmm/CmmUtils.hs:286TXT3Line Too Long
Unit
No Unit Test Coverage
Build Status
Buildable 21525
Build 48950: [GHC] Linux/amd64: Continuous Integration
Build 48949: [GHC] OSX/amd64: Continuous Integration
Build 48948: [GHC] Windows/amd64: Continuous Integration
Build 48947: arc lint + arc unit
newhoggy created this revision.Jun 21 2018, 7:33 AM

I've pushed the code changes I have so far. It is compiling, but not working in the sense that the generated assembly code does not compile.

Here is the Haskell code that I'm compiling and the errors in the resulting assembly:

$ cat Test.hs
{-# LANGUAGE BangPatterns  #-}
{-# LANGUAGE MagicHash     #-}
{-# LANGUAGE UnboxedTuples #-}

module Main where

import GHC.Exts

main :: IO ()
main = do
 case unpackInt64X4# (broadcastInt64X4# 1#) of
   (# a, b, c, d #) -> print (I# a)
$ ./bin/ghc -mavx512f Test.hs
[1 of 1] Compiling Main             ( Test.hs, Test.o )

/var/folders/8d/3xbnllbx76gcbk3wwy086vlm0000gn/T/ghc42845_0/ghc_2.s:72:2: error:
     error: instruction requires: AVX-512 ISA AVX-512 VL ISA
            vmovdqu64 %xmm0,%xmm0
            ^
   |
72 |         vmovdqu64 %xmm0,%xmm0
   |  ^

/var/folders/8d/3xbnllbx76gcbk3wwy086vlm0000gn/T/ghc42845_0/ghc_2.s:73:2: error:
     error: instruction requires: AVX-512 ISA AVX-512 VL ISA
            vmovdqu64 %xmm0,%xmm0
            ^
   |
73 |         vmovdqu64 %xmm0,%xmm0
   |  ^

/var/folders/8d/3xbnllbx76gcbk3wwy086vlm0000gn/T/ghc42845_0/ghc_2.s:74:18: error:
     error: invalid operand for instruction
            vmovdqu32 %xmm0,%rax
                            ^~~~
   |
74 |         vmovdqu32 %xmm0,%rax
   |                  ^

/var/folders/8d/3xbnllbx76gcbk3wwy086vlm0000gn/T/ghc42845_0/ghc_2.s:75:19: error:
     error: invalid operand for instruction
            vpshufd $1,%xmm0,%rbx
                             ^~~~
   |
75 |         vpshufd $1,%xmm0,%rbx
   |                   ^

/var/folders/8d/3xbnllbx76gcbk3wwy086vlm0000gn/T/ghc42845_0/ghc_2.s:76:19: error:
     error: invalid operand for instruction
            vpshufd $2,%xmm0,%rbx
                             ^~~~
   |
76 |         vpshufd $2,%xmm0,%rbx
   |                   ^

/var/folders/8d/3xbnllbx76gcbk3wwy086vlm0000gn/T/ghc42845_0/ghc_2.s:77:19: error:
     error: invalid operand for instruction
            vpshufd $3,%xmm0,%rbx
                             ^~~~
   |
77 |         vpshufd $3,%xmm0,%rbx
   |                   ^
`gcc' failed in phase `Assembler'. (Exit code: 1)

I've also pasting here the problematic generated assembly:

$ ./bin/ghc -mavx2 Test.hs -S
$ cat Test.s
.section	__TEXT,__cstring,cstring_literals
.align 1
.align 0
_r1yJ_bytes:
	.asciz "main"
.data
.align 3
.align 0
_r1z4_closure:
	.quad	_ghczmprim_GHCziTypes_TrNameS_con_info
	.quad	_r1yJ_bytes
.section	__TEXT,__cstring,cstring_literals
.align 1
.align 0
_r1z5_bytes:
	.asciz "Main"
.data
.align 3
.align 0
_r1z6_closure:
	.quad	_ghczmprim_GHCziTypes_TrNameS_con_info
	.quad	_r1z5_bytes
.data
.align 3
.align 0
.globl _Main_zdtrModule_closure
_Main_zdtrModule_closure:
	.quad	_ghczmprim_GHCziTypes_Module_con_info
	.quad	_r1z4_closure+1
	.quad	_r1z6_closure+1
	.quad	3
.data
.align 3
.align 0
_u1JW_srt:
	.quad	_stg_SRT_2_info
	.quad	_base_SystemziIO_print_closure
	.quad	_base_GHCziShow_zdfShowInt_closure
	.quad	0
.text
.align 3
_Main_main_info_dsp:
.align 3
	.quad	0
	.long	21
	.long	_u1JW_srt-(_Main_main_info)+0
.globl _Main_main_info
_Main_main_info:
Lc1JP:
	leaq -16(%rbp),%rax
	cmpq %r15,%rax
	jb Lc1JS
Lc1JT:
	addq $16,%r12
	cmpq 856(%r13),%r12
	ja Lc1JV
Lc1JU:
	subq $8,%rsp
	movq %r13,%rax
	movq %rbx,%rsi
	movq %rax,%rdi
	xorl %eax,%eax
	call _newCAF
	addq $8,%rsp
	testq %rax,%rax
	je Lc1JL
Lc1JK:
	leaq _stg_bh_upd_frame_info(%rip),%rbx
	movq %rbx,-16(%rbp)
	movq %rax,-8(%rbp)
	insertps $0,Ln1K1(%rip),%xmm0
	vmovdqu64 %xmm0,%xmm0
	vmovdqu64 %xmm0,%xmm0
	vmovdqu32 %xmm0,%rax
	vpshufd $1,%xmm0,%rbx
	vpshufd $2,%xmm0,%rbx
	vpshufd $3,%xmm0,%rbx
	leaq _ghczmprim_GHCziTypes_Izh_con_info(%rip),%rbx
	movq %rbx,-8(%r12)
	movq %rax,(%r12)
	leaq -7(%r12),%rax
	movq %rax,%rsi
	leaq _base_GHCziShow_zdfShowInt_closure(%rip),%r14
	leaq _base_SystemziIO_print_closure(%rip),%rbx
	addq $-16,%rbp
	jmp _stg_ap_pp_fast
Lc1JL:
	jmp *(%rbx)
Lc1JV:
	movq $16,904(%r13)
Lc1JS:
	jmp *-16(%r13)
	.long  _Main_main_info - _Main_main_info_dsp
.const
.align 3
.align 3
Ln1K1:
	.quad	1
.data
.align 3
.align 0
.globl _Main_main_closure
_Main_main_closure:
	.quad	_Main_main_info
	.quad	0
	.quad	0
	.quad	0
.data
.align 3
.align 0
_u1Kf_srt:
	.quad	_stg_SRT_2_info
	.quad	_base_GHCziTopHandler_runMainIO_closure
	.quad	_Main_main_closure
	.quad	0
.text
.align 3
_ZCMain_main_info_dsp:
.align 3
	.quad	0
	.long	21
	.long	_u1Kf_srt-(_ZCMain_main_info)+0
.globl _ZCMain_main_info
_ZCMain_main_info:
Lc1Kc:
	leaq -16(%rbp),%rax
	cmpq %r15,%rax
	jb Lc1Kd
Lc1Ke:
	subq $8,%rsp
	movq %r13,%rax
	movq %rbx,%rsi
	movq %rax,%rdi
	xorl %eax,%eax
	call _newCAF
	addq $8,%rsp
	testq %rax,%rax
	je Lc1Kb
Lc1Ka:
	leaq _stg_bh_upd_frame_info(%rip),%rbx
	movq %rbx,-16(%rbp)
	movq %rax,-8(%rbp)
	leaq _Main_main_closure(%rip),%r14
	leaq _base_GHCziTopHandler_runMainIO_closure(%rip),%rbx
	addq $-16,%rbp
	jmp _stg_ap_p_fast
Lc1Kb:
	jmp *(%rbx)
Lc1Kd:
	jmp *-16(%r13)
	.long  _ZCMain_main_info - _ZCMain_main_info_dsp
.data
.align 3
.align 0
.globl _ZCMain_main_closure
_ZCMain_main_closure:
	.quad	_ZCMain_main_info
	.quad	0
	.quad	0
	.quad	0
.subsections_via_symbols
.ident "GHC 8.5.20180620"

The problem I have trouble understanding is why there is an additional two uses of the vmovdqu64 instruction preceding the vmovdqu32 %xmm0,%rax that I generate:

	vmovdqu64 %xmm0,%xmm0
	vmovdqu64 %xmm0,%xmm0
        vmovdqu32 %xmm0,%rax

Also, I don't understand assembly well enough to know what's wrong with the use of the %rax register in the second operand and how to fix it:

/var/folders/8d/3xbnllbx76gcbk3wwy086vlm0000gn/T/ghc42869_0/ghc_2.s:74:18: error:
     error: invalid operand for instruction
            vmovdqu32 %xmm0,%rax
                            ^~~~
   |
74 |         vmovdqu32 %xmm0,%rax
   |                  ^

I figured I'm going to need to figure out how gcc compiles code into assembly to inform me of the kind of assembly I should be generating, so I put together a small Github project where I can drop C files and have then compiled into assembly:

https://github.com/haskell-works/gcc-reverse-engineering

For example:

git clone git@github.com:haskell-works/gcc-reverse-engineering.git
$ make
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C avx2
gcc -O2 -mavx2 -S _mm512_extracti32x8_epi32.c
gcc -O2 -mavx2 -S _mm512_extracti64x4_epi64.c
$ cat avx2/_mm512_extracti64x4_epi64.s
	.section	__TEXT,__text,regular,pure_instructions
	.macosx_version_min 10, 13
	.globl	_example                ## -- Begin function example
	.p2align	4, 0x90
_example:                               ## @example
	.cfi_startproc
## BB#0:
	pushq	%rbp
Lcfi0:
	.cfi_def_cfa_offset 16
Lcfi1:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
Lcfi2:
	.cfi_def_cfa_register %rbp
	vmovups	%ymm0, (%rdi)
	movq	%rdi, %rax
	popq	%rbp
	vzeroupper
	retq
	.cfi_endproc
                                        ## -- End function
	.globl	_main                   ## -- Begin function main
	.p2align	4, 0x90
_main:                                  ## @main
	.cfi_startproc
## BB#0:
	pushq	%rbp
Lcfi3:
	.cfi_def_cfa_offset 16
Lcfi4:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
Lcfi5:
	.cfi_def_cfa_register %rbp
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function

.subsections_via_symbols

I figured I'm going to need to figure out how gcc compiles code into assembly to inform me of the kind of assembly I should be generating, so I put together a small Github project where I can drop C files and have then compiled into assembly:

I always used https://godbolt.org/ for that.

@newhoggy Well some of the assembler errors are quite clear like the errors with vmovdqu64 which states error: instruction requires: AVX-512 ISA AVX-512 VL ISA and the Intel doc also mentions that these are AVX 512 operations which are supported in the very high end Xeon Phi processors or if you have the X generation of core i3, i5, i7 processors.

Also, I don't understand assembly well enough to know what's wrong with the use of the %rax register in the second operand and how to fix it:

The instruction that you have pointed which is VMOVDQU32, from the documentation (https://www.felixcloutier.com/x86/MOVDQU:VMOVDQU8:VMOVDQU16:VMOVDQU32:VMOVDQU64.html) supports only XMM registers in the destination operand. %rax is a general register and the assembler won't allow that to pass.

why there is an additional two uses of the vmovdqu64 instruction preceding the vmovdqu32 %xmm0,%rax

The root for this might lie at the C-- generated. I have had to modify certain parts of the Stg -> Cmm layer to rectify certain translations. You would have to stare at the C-- generated for a while, to debug this one.

Also another concern you will face shortly with the AVX2 families is that a majority of them work on YMM registers and while your current operations seem to use the XMM registers, when you start using the YMM based operation, I think we have to extend some portions of the infrastructure to support YMM registers, which I believe is currently unsupported.

@newhoggy with respect to your comment on D4813 about how you can merge your code, I can make a suggestion that you can try raising a PR against my wip/simd-ncg-support. Otherwise you might end up duplicating a lot of the effort and some of the more important types like Format or important functions like getVecRegisterReg might change in their representation, in which case you might have to end up rewriting a lot of your code. So my suggestion would be perhaps raising a PR against my branch so it can accumulate all the changes related to vectorization in NCG.

However I am not fully aware of how the merging of a long running branch takes places in GHC so @bgamari or @carter might suggest some better alternatives for you to merge your code.

Unfortunately, I've not been able to clone from https://github.com/Abhiroop/ghc and do a build there.

I get errors like these.

Cloning into '/Users/jky/wrk/haskell-works/ghc-simd/.arc-linters/arcanist-external-json-linter'...
Username for 'https://github.com':

I will try working off a clone of the ghc git repository and submit a patch to your fork that way.

@newhoggy The project that you trying build is an old fork of ghc which I haven't updated in a long time. My changes are all available here: https://github.com/Abhiroop/ghc-1

And all the changes are available in this branch: https://github.com/Abhiroop/ghc-1/tree/wip/simd-ncg-support

This should build fine.

Thanks for your help.

I'm still having problems, which seem to be related to cloning the project rather than the building:

Cloning into '/Users/jky/wrk/haskell-works/ghc-simd/.arc-linters/arcanist-external-json-linter'...
Username for 'https://github.com': newhoggy
Password for 'https://newhoggy@github.com':
remote: Invalid username or password.
fatal: Authentication failed for 'https://github.com/Abhiroop/arcanist-external-json-linter.git/'
fatal: clone of 'https://github.com/Abhiroop/arcanist-external-json-linter.git' into submodule path '/Users/jky/wrk/haskell-works/ghc-simd/.arc-linters/arcanist-external-json-linter' failed
Failed to clone '.arc-linters/arcanist-external-json-linter'. Retry scheduled
Cloning into '/Users/jky/wrk/haskell-works/ghc-simd/hadrian'...
Username for 'https://github.com':

I just had a thought. Perhaps the problem might be my use of the https URL instead of a git url.

I'm trying this now: git clone git@github.com:Abhiroop/ghc-1.git ghc-simd --recursive

I've found there were three call sites at which the MO_V_Insert constructor was called.

For the code:

unpackInt64X4# (broadcastInt64X4# 1#)

The call site that created the CmmMachOp (MO_V_Insert _ _) value that was passed into vector_unpack function is this one:

doVecBroadcastOp :: Maybe MachOp  -- Cast from element to vector component
                 -> CmmType       -- Type of vector
                 -> CmmExpr       -- Initial vector
                 -> CmmExpr     -- Elements
                 -> CmmFormal     -- Destination for result
                 -> FCode ()
doVecBroadcastOp maybe_pre_write_cast ty _ es res = do
    --emitAssign (CmmLocal dst) z
    vecBroadcast es 0
  where
    vecBroadcast :: CmmExpr -> Int -> FCode ()
    vecBroadcast e i = do
        dst <- newTemp ty
        if isFloatType (vecElemType ty)
          then emitAssign (CmmLocal dst) (CmmMachOp (MO_VF_Broadcast len wid)
                                                    [cast e, iLit])
               --TODO : Add the MachOp MO_V_Broadcast
          else emitAssign (CmmLocal dst) (CmmMachOp (MO_V_Insert len wid) -- <-------
                                                    [cast e, iLit])
        emitAssign (CmmLocal res) (CmmReg (CmmLocal dst))
      where
        -- vector indices are always 32-bits
        iLit = CmmLit (CmmInt (toInteger i) W32)

    cast :: CmmExpr -> CmmExpr
    cast val = case maybe_pre_write_cast of
                 Nothing   -> val
                 Just cast -> CmmMachOp cast [val]

    len :: Length
    len = vecLength ty

    wid :: Width
    wid = typeWidth (vecElemType ty)

For the code:

unpackInt64X4# (packInt64X4# (# 1#, 2#, 3#, 4# #))

The call site that created the CmmMachOp (MO_V_Insert _ _) value that was passed into vector_unpack function is this one:

doVecPackOp :: Maybe MachOp  -- Cast from element to vector component
            -> CmmType       -- Type of vector
            -> CmmExpr       -- Initial vector
            -> [CmmExpr]     -- Elements
            -> CmmFormal     -- Destination for result
            -> FCode ()
doVecPackOp maybe_pre_write_cast ty z es res = do
    dst <- newTemp ty
    vecPack dst es 0
  where
    vecPack :: CmmFormal -> [CmmExpr] -> Int -> FCode ()
    vecPack src [] _ =
        emitAssign (CmmLocal res) (CmmReg (CmmLocal src))

    vecPack src (e : es) i = do
        if isFloatType (vecElemType ty)
          then emitAssign (CmmLocal src) (CmmMachOp (MO_VF_Insert len wid)
                                           [cast e, iLit])
          else emitAssign (CmmLocal src) (CmmMachOp (MO_V_Insert len wid)
                                           [cast e, iLit])
        vecPack src es (i + 1)
      where
        -- vector indices are always 32-bits
        iLit = CmmLit (CmmInt ((toInteger i) * 80) W32)

    cast :: CmmExpr -> CmmExpr
    cast val = case maybe_pre_write_cast of
                 Nothing   -> val
                 Just cast -> CmmMachOp cast [val]

    len :: Length
    len = vecLength ty

    wid :: Width
    wid = typeWidth (vecElemType ty)

Not sure if that brings me any closer to solving the issue.

newhoggy added a comment.EditedJun 26 2018, 11:52 PM

I traced the problem code through these functions that seem have something to do with constant folding.

cmmMachOpFold dflags op args = fromMaybe (CmmMachOp op args) (cmmMachOpFoldM dflags op args)
wrapRecExp f (CmmMachOp op es)    = f (CmmMachOp op $ map (wrapRecExp f) es)
wrapRecExp f (CmmLoad addr ty)    = f (CmmLoad (wrapRecExp f addr) ty)
wrapRecExp f e                    = f e
CmmMachOp s mop args
   -> do args' <- mapM (cmmExprNative DataReference) args
         return $ CmmMachOp mop args'

I also saw this comment:

there's one case it can't handle: when the comparison is over
floating-point values, we can't invert it, because floating-point
comparisions aren't invertible (because NaN).

Is it possible that floats are handled differently because of this and I can't just emulate vectorised float code?

newhoggy updated this revision to Diff 17107.Jun 27 2018, 5:37 AM
  • Add tracing

I added some tracing and have pushed the code so that I can better illustrate the problem.

This is the program I'm compiling:

{-# OPTIONS_GHC -mavx #-}
{-# OPTIONS_GHC -msse4 #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}

import GHC.Exts

main :: IO ()
main = do
 case unpackInt64X4# (broadcastInt64X4# 1#) of
   (# a, b, c, d #) -> print (I# a)

When I compile it, it triggers a panic that I intentionally added to print some information:

$ ./bin/ghc -O2 -mavx512f -mavx2 -S Int32x8PackUnpack.hs
ghc: panic! (the 'impossible' happened)
  (GHC version 8.5.20180624 for x86_64-apple-darwin):
	Unpack not supported for XXX 2:
  [A, A, A]   [D, D, D]   [C, m, m, O, p, t, 2, :, 3, :, c, m, m, E,
                           x, p, r, N, a, t, i, v, e]   [C, m, m, O, p, t, 2, :, 3, :, c, m,
                                                         m, E, x, p, r, N, a, t, i, v, e]
  Call stack:
      CallStack (from HasCallStack):
        callStackDoc, called at compiler/utils/Outputable.hs:1162:37 in ghc:Outputable
        pprPanic, called at compiler/nativeGen/X86/CodeGen.hs:1070:9 in ghc:X86.CodeGen

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

I can continue the investigation to find out where the call to cmmMachOpFold comes from, but if any of this rings a bell and the behaviour can be explained, please let me know.

compiler/cmm/CmmNode.hs
459

The 3 infix in CmmOpt2:3:cmmExprNative comes from here.

compiler/cmm/CmmOpt.hs
56

And the CmmOpt2 in CmmOpt2:3:cmmExprNative prefix comes from here.

This is where I lose the trail. But fold here presumably means constant folding.

compiler/codeGen/StgCmmPrim.hs
1686

The "AAA" value comes from here.

Seeing as this is in the doVecBroadcastOp and I call broadcastInt64X4# in my example program, this *kinda* is consistent.

My first thought is, I'm wondering if this should be a MO_V_Broadcast instead of a MO_V_Insert to match the similar code three lines up.

Secondly, whatever constructor I use here (MO_V_Broadcast or MO_V_Broadcast), somehow, this value is packed in a CmmMachOp and then passed to my vector_unpack and I don't know why.

compiler/nativeGen/AsmCodeGen.hs
1195

The CmmOpt2:3:cmmExprNative string in my debug string derives its suffix :cmmExprNative from here.

compiler/nativeGen/X86/CodeGen.hs
869

The t and x comes from the CmmMachOp passed into the third argument of getRegister'.

At this point the third argument probably looks something like this

CmmMachOp "CmmOpt2:3:cmmExprNative"
  (MO_V_Extract _ _ _)
  [CmmMachOp u (MO_V_Insert ...), y]

So I am being given information about the Insert (actually broadcast) despite the fact that I'm meant to emit something for Extract. Maybe this is intentional and information about the prior instruction is somehow useful for optimisations?

1070

This is site of the panic that was triggered.

The issue here is that I am getting a CmmMachOp as my CmmExpr value. This differs from the behaviour seen in vector_float_unpack which gets a CmmReg passed into the third argument.

This is why when I used the vector_float_unpack function as a basis for my vector_unpack implementation, it doesn't work.

The pattern that actually machines is (CmmMachOp u (MO_V_Insert reason len wid) args) rather than (CmmReg reg) as expected.

Curiously, I'm also getting a MO_V_Insert embedded in this value. I don't understand why it is insert when I'm calling vector_unpack in response to MO_V_Extract.

I reproduce the panic message here because I've added some information to the panic message to help me find where these values are coming from:

	Unpack not supported for XXX 2:
  [A, A, A]   [D, D, D]   [C, m, m, O, p, t, 2, :, 3, :, c, m, m, E,
                           x, p, r, N, a, t, i, v, e]   [C, m, m, O, p, t, 2, :, 3, :, c, m,
                                                         m, E, x, p, r, N, a, t, i, v, e]

From the message, I know the variables I'm printing have the following values:

reason = "AAA"
s = "DDD"
t = "CmmOpt2:3:cmmExprNative"
u = "CmmOpt2:3:cmmExprNative"

The values come from the extra fields I added to the MO_V_Insert and CmmMachOp constructors so that I can figure out which call to these constructors was responsible for these values.

1070

The "CmmOpt2:3:cmmExprNative" in the vector_unpack helps me find out where the CmmMachOp value containing the MO_V_Insert comes from.

"CmmOpt2:3:cmmExprNative" is bound to t, which is an additional argument I added to vector_unpack for debugging purposes. I did this because CmmMachOp was given by the caller to vector_unpack, so I need to pass some information from the caller to find the origin of my value.

newhoggy updated this revision to Diff 17109.Jun 27 2018, 6:51 AM
  • Add more tracing
carter requested changes to this revision.Jun 27 2018, 6:36 PM

, theres some confusing design choices in heres

  1. whats this shuffle op? doing the naive llvm one isn't easy to support on NCG or other code gens...

1a) whats the semantics for it?

  1. the string field in machops seems WRONG, whats that for

2b) whats the use of the strings

  1. shuffle needs static data, ghc doesn't have support for that yet, at least not in any way that wont end in ghc panics if you are able to optimize them. Shuffle args need to already be "primop" fields by the time they reach core, or ghc will necessarily panic if the literals aren't kept with the operation etc
compiler/cmm/CmmExpr.hs
56

this seems DEEPLY wrong. why are you doing this?

compiler/cmm/CmmMachOp.hs
114

again why are you adding strings everywhere?

compiler/codeGen/StgCmmPrim.hs
1137

why are we adding strings everywhere?

compiler/prelude/primops.txt.pp
3273

ummm, which shuffle operation is this?

interleave? or what?

theres a LOT of shuffle operations out there

This revision now requires changes to proceed.Jun 27 2018, 6:36 PM

additionally: you're gonna want to add an IntSIMD# type, because the GPR / R*X / normal int pointer registers and the XMM / YMM / ZMM registers can only communicate via a read/write to memory
(recent ghc has built in a version of https://github.com/cartazio/random/blob/22a2a16bd62edd553b4f7f2e9eedc26cbf8850d8/cmmbits/floatsAndBits.cmm#L10)
as an out of band/explicitly CMM primop.

the register allocator cant/wont do that correctly for you today (and this was a point of GHC panics before runtime reps in types were a thing)