Generate better fp abs for X86 and llvm with default cmm otherwise
ClosedPublic

Authored by idontgetoutmuch on Mar 2 2017, 11:36 PM.

Details

Summary

Currently we have this in libraries/base/GHC/Float.hs:

abs x | x == 0    = 0 -- handles (-0.0)
      | x >  0    = x
      | otherwise = negateFloat x

But 3-4 years ago it was noted that this was inefficient:
https://mail.haskell.org/pipermail/libraries/2013-April/019690.html

We can generate better code for X86 and llvm and for others generate some custom cmm code which is similar to what the compiler generates now.

Add correct cmm.

Generate better fp abs for X86 and llvm with default cmm otherwise.

idontgetoutmuch retitled this revision from Generate the default cmm. to Generate better fp abs for X86 and llvm with default cmm otherwise.Mar 3 2017, 12:09 AM
idontgetoutmuch edited the summary of this revision. (Show Details)Mar 3 2017, 4:29 AM
idontgetoutmuch edited the summary of this revision. (Show Details)

So

	xorps %xmm1,%xmm1
	ucomiss %xmm1,%xmm0
	jp _c41v
	je _c41w
_c41v:
	xorps %xmm1,%xmm1
	ucomiss %xmm1,%xmm0
	ja _c41t
_c41q:
	leaq GHC.Types.F#_con_info(%rip),%rax
	movq %rax,-8(%r12)
	movss _n41H(%rip),%xmm1
	xorps %xmm1,%xmm0
	movss %xmm0,(%r12)
	leaq -7(%r12),%rbx
	addq $-16,%rbp
	jmp *(%rbp)
_c41w:
	addq $-16,%r12
	leaq GHC.Float.rationalToFloat4_closure(%rip),%rbx
	addq $-16,%rbp
	jmp *(%rbx)
_c41t:

is replaced with

	movss _n40P(%rip),%xmm1
	andps %xmm1,%xmm0

and llvm actually has llvm.fabs.f32 and llvm.fabs.f64.

Address comment.

Address comment.

dfeuer added a subscriber: dfeuer.Mar 3 2017, 12:51 PM

I canceled the CI builds for several versions of this differential because our build bots are a bit swamped right now. If there were specific revisions you wanted built, feel free to restart those builds.

dfeuer added a comment.Mar 3 2017, 1:05 PM

Have you verified that the test suite includes tests for abs behavior, including for NaN and infinities? If not, we need to add them before merging this.

compiler/codeGen/StgCmmPrim.hs
1091

I doubt it matters, but you should be able to use <$> rather than liftM.

compiler/nativeGen/X86/CodeGen.hs
2059

Is there no way to optimize this in 32-bit mode?

3089

Why not add a function for this panic?

x@(II8) -> wrongFmt x
...
where
  wrongFmt x = panic $ "sse2NegCode: " ++ show x
bgamari edited edge metadata.Mar 3 2017, 4:34 PM

I couldn't find a test exercising floating-point abs so I added one D3273.

compiler/nativeGen/X86/CodeGen.hs
2059

I suspect we should use isSse2Enabled here instead of is32Bit; this would at least give us better behavior on some 32-bit platforms.

With respect to x87, it does provide a fabs instructor but I'm fine with punting on this. Pre-SSE chips are pretty rare nowadays and if someone really cares they can always submit a patch.

bgamari requested changes to this revision.Mar 3 2017, 8:08 PM

This looks great save the one inline comment.

This revision now requires changes to proceed.Mar 3 2017, 8:08 PM
idontgetoutmuch marked 4 inline comments as done.Mar 5 2017, 3:45 AM
idontgetoutmuch added inline comments.
compiler/nativeGen/X86/CodeGen.hs
2059

I think just replacing is32Bit with isSse2Enabled addresses both comments. Let me know if this is incorrect.

idontgetoutmuch edited edge metadata.

Comments addressed.

bgamari accepted this revision.Mar 6 2017, 12:35 PM
compiler/nativeGen/X86/CodeGen.hs
2059

Yes, I believe so (modulo the x87-only case).

This revision is now accepted and ready to land.Mar 6 2017, 12:35 PM
Closed by commit rGHC12ccf767af33: Generate better fp abs for X86 and llvm with default cmm otherwise (authored by Dominic Steinitz <dominic@steinitz.org>, committed by bgamari). · Explain WhyMar 7 2017, 12:32 PM
This revision was automatically updated to reflect the committed changes.
carter added a subscriber: carter.Mar 11 2017, 7:33 AM

Is it correct to say unsupported ? Seems more like it should be an impossible error for those cases. Cause you're just generating it early.

I do think for the c backend it probably makes more sense to do the abs function directly, but it probably doesn't matter too much

compiler/cmm/PprC.hs
757

Why aren't we making sure the c backend has the correct code ?

bgamari added inline comments.Mar 13 2017, 1:13 PM
compiler/cmm/PprC.hs
757

Yes, this was indeed a bug and it has been fixed in master.