Some Intel processors appear to have a false dependency on the destination of the popcnt instruction. This could lead to poor performance. A simple way to prevent this is to clear the destination register immediately before the popcnt instruction. Currently I can't produce code from GHC where the source and destination registers are not the same (perhaps someone is interested in producing a test case that does). I'm putting this here in case anyone is interested in investigating further.