Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backend
ClosedPublic

Authored by rwbarton on Aug 10 2014, 7:07 PM.

Details

Summary

These MachOps are used by addIntC# and subIntC#, which in turn are
used in integer-gmp when adding or subtracting small Integers. The
following benchmark showed a ~6% speedup with an earlier version of
this commit (building with BuildFlavour=perf).

{-# LANGUAGE MagicHash #-}

import GHC.Exts
import Criterion.Main

count :: Int -> Integer
count (I# n#) = go n# 0
  where go :: Int# -> Integer -> Integer
        go 0# acc = acc
        go n# acc = go (n# -# 1#) $! acc + 1

main = defaultMain [bgroup "count"
                      [bench "100" $ whnf count 100]]
Test Plan

I intend to "validate --slow" and re-benchmark this
to double check my earlier findings.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Lint Skipped
Unit
Unit Tests Skipped
rwbarton updated this revision to Diff 316.Aug 10 2014, 7:07 PM
rwbarton retitled this revision from to Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backend.
rwbarton updated this object.
rwbarton edited the test plan for this revision. (Show Details)
rwbarton added a reviewer: simonmar.
rwbarton added a subscriber: hvr.
rwbarton updated this object.
rwbarton edited edge metadata.

It occurs to me that these MachOps names are rather inconsistent. How about MO_Add2 -> MO_U_AddC, MO_AddIntC -> MO_S_AddO, MO_SubIntC -> MO_S_SubO?

The semantics are not quite consistent either: MO_AddIntC and MO_SubIntC produce results in the opposite order from MO_Add2, and are permitted to produce any nonzero overflow flag value on overflow, while MO_Add2 can only produce a carry of 0 or 1. These are the existing semantics of the addIntC#, subIntC#, plusWord2# primops, but the MachOp semantics don't necessarily have to be identical. Not sure what is the least confusing thing to do.

Oh: when generating 32-bit code will the register allocator automatically give me a register from eax through edx so that I can refer to its low byte with SETCC/MOVZxL? Or do I need to be careful about that?

austin accepted this revision.Aug 12 2014, 8:46 AM
austin edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Aug 12 2014, 8:46 AM

To sort of answer my own question, condIntReg and condFltReg_x87 do request a temporary 8-bit register for SETCCing, while condFltReg_sse2 requests two whole-word registers. I'm still not sure whether it's necessary but I suppose it would not hurt to use an 8-bit temporary register for the result of SETCC.

compiler/nativeGen/X86/CodeGen.hs
2016–2017

Note to self: move this to the end of the where block

I determined that I do need to allocate a temporary register, because the destination register might well be R1, which is not byte-addressable on x86. But all allocatable registers are byte-addressable, so I do not have to communicate to the register allocator that I need a byte-addressable temporary register.

rwbarton updated this revision to Diff 428.Aug 22 2014, 4:28 PM
rwbarton updated this object.
rwbarton edited edge metadata.

Use a temporary register for the result of SETCC

This doesn't affect the generated code for a simple call to addIntC#
but it might matter if we could somehow use R1 as the result directly.

Also move addSubIntC to where it should go.

rwbarton closed this revision.Aug 23 2014, 1:56 PM
rwbarton updated this revision to Diff 433.

Closed by commit rGHCcfd08a992c91 (authored by @rwbarton).