Implement new CLZ and CTZ primops (re #9340)

Authored by hvr on Aug 11 2014, 1:49 PM.



This implements the new primops

clz#, clz32#, clz64#,
ctz#, ctz32#, ctz64#

which provide efficient implementations of the popular
count-leading-zero and count-trailing-zero respectively
(see testcase for a pure Haskell reference implementation).

On x86, NCG as well as LLVM generates code based on the BSF/BSR
instructions (which need extra logic to make the 0-case well-defined).

Test Plan

validate and succesful tests on i686 and amd64

Diff Detail

rGHC Glasgow Haskell Compiler
Lint Skipped
Unit Tests Skipped
hvr updated this revision to Diff 327.Aug 11 2014, 1:49 PM
hvr retitled this revision from to Implement new CLZ and CTZ primops (re #9340).
hvr edited the test plan for this revision. (Show Details)
hvr added reviewers: rwbarton, simonmar, ezyang.
hvr updated the Trac tickets for this revision.
hvr updated this object.

Yay! Build B395: Diff 327 (D144) has succeeded! Full logs available at F12443.

tibbe accepted this revision.Aug 12 2014, 4:03 AM
tibbe added a reviewer: tibbe.
tibbe added a subscriber: tibbe.
tibbe added inline comments.

Does it also make sense to offer a 16-bit version or is there no corresponding instruction?

hvr added inline comments.Aug 12 2014, 4:36 AM

If there's use, 16bit variants can be offered easily as well. See to get an idea what's supported at the HW level.

For the __builtin_c{t,l}z() fallbacks I'll have to workaround a bit, as those aren't provided for smaller types than int, bit the LLVM codegen happily supports generating code for 16bit and 8bit llvm.ct{l,t}z.*

tibbe added a comment.Aug 12 2014, 4:39 AM

Will you also add the relevant things to Data.Bits?


I think it makes sense to support all hardware supported sizes. For example, unordered-containers happens to use 16-bit wide bitmasks. I doesn't need this particular instruction but you could imagine a case where it was needed and then it would be a shame to not have it.

hvr updated this revision to Diff 342.Aug 12 2014, 2:52 PM
hvr edited edge metadata.

Implement also 8-bit and 16-bit variants; make unit-test smaller/faster

Yay! Build B424: Diff 342 (D144) has succeeded! Full logs available at F12546.

I just tested this on Solaris 11.1 and bootstrapping went fine and gmake TEST=T9340 also passes. Good work!

ezyang accepted this revision.Aug 13 2014, 5:19 PM
ezyang edited edge metadata.


tibbe accepted this revision.Aug 14 2014, 2:09 AM
tibbe edited edge metadata.
simonmar accepted this revision.Aug 14 2014, 3:29 AM
simonmar edited edge metadata.
hvr closed this revision.Aug 14 2014, 5:33 AM
hvr updated this revision to Diff 357.

Closed by commit rGHCe0c1767d0ea8 (authored by @hvr).