Implement new CLZ and CTZ primops (re #9340)
ClosedPublic

Authored by hvr on Aug 11 2014, 1:49 PM.

Details

Summary

This implements the new primops

clz#, clz32#, clz64#,
ctz#, ctz32#, ctz64#

which provide efficient implementations of the popular
count-leading-zero and count-trailing-zero respectively
(see testcase for a pure Haskell reference implementation).

On x86, NCG as well as LLVM generates code based on the BSF/BSR
instructions (which need extra logic to make the 0-case well-defined).

Test Plan

validate and succesful tests on i686 and amd64

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Lint Skipped
Unit
Unit Tests Skipped
hvr updated this revision to Diff 327.Aug 11 2014, 1:49 PM
hvr retitled this revision from to Implement new CLZ and CTZ primops (re #9340).
hvr edited the test plan for this revision. (Show Details)
hvr added reviewers: rwbarton, simonmar, ezyang.
hvr updated the Trac tickets for this revision.
hvr updated this object.

Yay! Build B395: Diff 327 (D144) has succeeded! Full logs available at F12443.

tibbe accepted this revision.Aug 12 2014, 4:03 AM
tibbe added a reviewer: tibbe.
tibbe added a subscriber: tibbe.
tibbe added inline comments.
compiler/prelude/primops.txt.pp
390

Does it also make sense to offer a 16-bit version or is there no corresponding instruction?

hvr added inline comments.Aug 12 2014, 4:36 AM
compiler/prelude/primops.txt.pp
390

If there's use, 16bit variants can be offered easily as well. See http://en.wikipedia.org/wiki/Find_first_set#Hardware_support to get an idea what's supported at the HW level.

For the __builtin_c{t,l}z() fallbacks I'll have to workaround a bit, as those aren't provided for smaller types than int, bit the LLVM codegen happily supports generating code for 16bit and 8bit llvm.ct{l,t}z.*

tibbe added a comment.Aug 12 2014, 4:39 AM

Will you also add the relevant things to Data.Bits?

compiler/prelude/primops.txt.pp
390

I think it makes sense to support all hardware supported sizes. For example, unordered-containers happens to use 16-bit wide bitmasks. I doesn't need this particular instruction but you could imagine a case where it was needed and then it would be a shame to not have it.

hvr updated this revision to Diff 342.Aug 12 2014, 2:52 PM
hvr edited edge metadata.

Implement also 8-bit and 16-bit variants; make unit-test smaller/faster

Yay! Build B424: Diff 342 (D144) has succeeded! Full logs available at F12546.

I just tested this on Solaris 11.1 and bootstrapping went fine and gmake TEST=T9340 also passes. Good work!

ezyang accepted this revision.Aug 13 2014, 5:19 PM
ezyang edited edge metadata.

LGTM.

tibbe accepted this revision.Aug 14 2014, 2:09 AM
tibbe edited edge metadata.
simonmar accepted this revision.Aug 14 2014, 3:29 AM
simonmar edited edge metadata.
hvr closed this revision.Aug 14 2014, 5:33 AM
hvr updated this revision to Diff 357.

Closed by commit rGHCe0c1767d0ea8 (authored by @hvr).