Generate better fp abs for X86 and llvm with default cmm otherwise

Authored by idontgetoutmuch on Mar 2 2017, 11:36 PM.



Currently we have this in libraries/base/GHC/Float.hs:

abs x | x == 0    = 0 -- handles (-0.0)
      | x >  0    = x
      | otherwise = negateFloat x

But 3-4 years ago it was noted that this was inefficient:

We can generate better code for X86 and llvm and for others generate some custom cmm code which is similar to what the compiler generates now.

Diff Detail

rGHC Glasgow Haskell Compiler
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Add correct cmm.

Generate better fp abs for X86 and llvm with default cmm otherwise.

idontgetoutmuch retitled this revision from Generate the default cmm. to Generate better fp abs for X86 and llvm with default cmm otherwise.Mar 3 2017, 12:09 AM
idontgetoutmuch edited the summary of this revision. (Show Details)Mar 3 2017, 4:29 AM
idontgetoutmuch edited the summary of this revision. (Show Details)


	xorps %xmm1,%xmm1
	ucomiss %xmm1,%xmm0
	jp _c41v
	je _c41w
	xorps %xmm1,%xmm1
	ucomiss %xmm1,%xmm0
	ja _c41t
	leaq GHC.Types.F#_con_info(%rip),%rax
	movq %rax,-8(%r12)
	movss _n41H(%rip),%xmm1
	xorps %xmm1,%xmm0
	movss %xmm0,(%r12)
	leaq -7(%r12),%rbx
	addq $-16,%rbp
	jmp *(%rbp)
	addq $-16,%r12
	leaq GHC.Float.rationalToFloat4_closure(%rip),%rbx
	addq $-16,%rbp
	jmp *(%rbx)

is replaced with

	movss _n40P(%rip),%xmm1
	andps %xmm1,%xmm0

and llvm actually has llvm.fabs.f32 and llvm.fabs.f64.

Address comment.

Address comment.

dfeuer added a subscriber: dfeuer.Mar 3 2017, 12:51 PM

I canceled the CI builds for several versions of this differential because our build bots are a bit swamped right now. If there were specific revisions you wanted built, feel free to restart those builds.

dfeuer added a comment.Mar 3 2017, 1:05 PM

Have you verified that the test suite includes tests for abs behavior, including for NaN and infinities? If not, we need to add them before merging this.


I doubt it matters, but you should be able to use <$> rather than liftM.


Is there no way to optimize this in 32-bit mode?


Why not add a function for this panic?

x@(II8) -> wrongFmt x
  wrongFmt x = panic $ "sse2NegCode: " ++ show x
bgamari edited edge metadata.Mar 3 2017, 4:34 PM

I couldn't find a test exercising floating-point abs so I added one D3273.


I suspect we should use isSse2Enabled here instead of is32Bit; this would at least give us better behavior on some 32-bit platforms.

With respect to x87, it does provide a fabs instructor but I'm fine with punting on this. Pre-SSE chips are pretty rare nowadays and if someone really cares they can always submit a patch.

bgamari requested changes to this revision.Mar 3 2017, 8:08 PM

This looks great save the one inline comment.

This revision now requires changes to proceed.Mar 3 2017, 8:08 PM
idontgetoutmuch marked 4 inline comments as done.Mar 5 2017, 3:45 AM
idontgetoutmuch added inline comments.

I think just replacing is32Bit with isSse2Enabled addresses both comments. Let me know if this is incorrect.

idontgetoutmuch edited edge metadata.

Comments addressed.

bgamari accepted this revision.Mar 6 2017, 12:35 PM

Yes, I believe so (modulo the x87-only case).

This revision is now accepted and ready to land.Mar 6 2017, 12:35 PM
Closed by commit rGHC12ccf767af33: Generate better fp abs for X86 and llvm with default cmm otherwise (authored by Dominic Steinitz <>, committed by bgamari). · Explain WhyMar 7 2017, 12:32 PM
This revision was automatically updated to reflect the committed changes.
carter added a subscriber: carter.Mar 11 2017, 7:33 AM

Is it correct to say unsupported ? Seems more like it should be an impossible error for those cases. Cause you're just generating it early.

I do think for the c backend it probably makes more sense to do the abs function directly, but it probably doesn't matter too much


Why aren't we making sure the c backend has the correct code ?

bgamari added inline comments.Mar 13 2017, 1:13 PM

Yes, this was indeed a bug and it has been fixed in master.