base: Implement bit casts between word and float types
ClosedPublic

Authored by erikd on Mar 17 2017, 4:53 AM.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
There are a very large number of changes, so older changes are hidden. Show Older Changes
This revision is now accepted and ready to land.Mar 17 2017, 3:32 PM
erikd updated this revision to Diff 11792.Mar 17 2017, 5:35 PM

Improve comments as per @duncan's suggestion.

erikd updated this revision to Diff 11793.Mar 17 2017, 5:53 PM

Still more tweaking of comments.

It almost seems like it would be easier to implement these in cmm; and more efficient too as you could transfer via the stack rather than allocating a ByteArray.

erikd added a comment.Mar 17 2017, 6:02 PM

It almost seems like it would be easier to implement these in cmm; and more efficient too as you could transfer via the stack rather than allocating a ByteArray.

@simonmar tried 6 years ago https://ghc.haskell.org/trac/ghc/ticket/4092#comment:9 but couldn't make it work. Has Cmm changed sufficiently since then to make this possible?

I was imagining doing something like

#include "Cmm.h"

stg_wordToDoublezh(W_ w)
{
    F_ a;
    STK_CHK_P_LL(WDS(1), stg_wordToDoublezh, w);
    Sp(-1) = w;
    a = Sp(-1);
    return (a);
}

which appears to generate the correct assembly, though I'm not 100% sure about the stack check (or even whether it is necessary; do we have a red zone for the Haskell stack?)

erikd added a comment.EditedMar 17 2017, 9:11 PM

@rwbarton yes, the F_ needs to be D_ for Double but otherwise it seems to work. Worth investigating more.

Also need to make this work for 32 bit values and 32 bit systems.

erikd added a comment.Mar 18 2017, 1:14 AM

I had the CMM code working external to GHC and when I tried to move it to rts (where all the other CMM files seem to be) I get the following CMM lint error:

Cmm lint error:
 in basic block c5
   in assignment: 
     _c2::F64 = I64[Sp + -1 * 8];
     Reg ty: F64
     Rhs ty: I64

The code in question is:

c5: // global
    I64[Sp + -1 * 8] = _c1::I64;
    _c2::F64 = I64[Sp + -1 * 8];

I've not dealt with CMM much before. The linter seems to be complaining about an I64 stack location being assigned to an F64 register? Is there a way to override this?

erikd planned changes to this revision.Mar 18 2017, 1:14 AM
erikd added a comment.Mar 18 2017, 2:05 AM

Raised https://ghc.haskell.org/trac/ghc/ticket/13442 . Going back to the newByteArray solution for now to prevent this turning into a massive yak shave.

erikd edited the summary of this revision. (Show Details)Mar 18 2017, 2:16 AM
erikd updated this revision to Diff 11797.Mar 18 2017, 2:51 AM
  • Add comments about CMM solution to this problem and why it doesn't currently work.
  • Rebase against master.
This revision is now accepted and ready to land.Mar 18 2017, 2:51 AM

Yeah, in practice it seems to not be as easy as I hoped :(

bgamari added inline comments.Mar 18 2017, 7:37 PM
libraries/base/GHC/Float.hs
1290

I'm confused; isn't there still the possibility that the newByteArray# 4# realWorld# will be floated out? Afterall, it has no free variables. It doesn't seem to me like this is necessarily sufficient (even if it does work in practice). Ideally it seems like newByteArray# would take a state token to ensure that it can't be shared.

erikd added a comment.EditedMar 19 2017, 4:18 AM

The CMM version does actually work for everything except Word64 <--> Double on 32 bit systems (hardly surprising).

The problem is that I can't figure out the CMM syntax without adding a cast operator as mentioned in Trac Trac #13442 .

If Cmm is not a convenient option, it seems to me like even an FFI call to C would be more efficient (no allocation or branches, just a call/ret).

libraries/base/GHC/Float.hs
1290

Yeah, the right way to write this nowadays is using runRW# where the argument w# is used inside the function passed to runRW#.

erikd updated this revision to Diff 11815.Mar 19 2017, 3:06 PM
  • Actually get CMM working at least on x86_64.
  • Still looking at fixing x86.
erikd planned changes to this revision.Mar 19 2017, 3:07 PM

The 32 bit code path needs work.

rwbarton requested changes to this revision.Mar 19 2017, 5:21 PM

This seems to have turned into a mess...

libraries/base/GHC/Float.hs
1311

I'm so lost now, how can stg_wordToFloatzh implement both Word# -> Float# and Word# -> Double#? I can't imagine that it works on both big-endian and little-endian systems.

1357

I don't understand how it could not work. There is a 64-bit word type in Cmm too (L_).

erikd added inline comments.Mar 19 2017, 5:50 PM
libraries/base/GHC/Float.hs
1357

Thanks, that's useful. Will try it.

Also, the stack check code I wrote is definitely wrong, and needs to be fixed in a different way for each argument type (Word# or Float# or Double#). I'm not sure exactly what the right form is, and it would be rather difficult to test; indeed I'm not completely sure whether it is even necessary.

This seems horribly inefficient. It seems that the best approach would be something integrated into the compiler backend. Especially on x64 where the SSE registers are used.

erikd added a comment.Mar 20 2017, 3:05 AM

This seems horribly inefficient.

The CMM solution or the newByteArray solution? I agree on the second one being in-efficient, but that is the current state-of-the-art. The CMM version is a huge improvement.

You should also be aware that trac ticket requesting this feature has been open for 7 years. @simonmar tried and failed to come up with a solution See https://ghc.haskell.org/trac/ghc/ticket/4092#comment:9 .

erikd added inline comments.Mar 20 2017, 3:07 AM
libraries/base/GHC/Float.hs
1311

@rwbarton You are responding to a WIP version of the code. I marked the review as "changes planned".

simonmar requested changes to this revision.Mar 20 2017, 4:02 AM

The right way to do this is to add new MachOps and implement them for all the native backends and LLVM. However, that's a lot of work and the Cmm versions are a reasonable second-best. (let's avoid the ByteArray# versions if we can)

rts/CastFloatWord.cmm
18 ↗(On Diff #11815)

See decodeFloat_Int# for the right way to implement these, with a stack check and without using explicit Sp references: https://phabricator.haskell.org/diffusion/GHC/browse/master/rts/PrimOps.cmm;bf3952ed0f3d4c3ed2ddeb4a0de1b4c16df4a250$778-799

29 ↗(On Diff #11815)

Here too.

erikd added a comment.EditedMar 26 2017, 1:31 AM

@simonmar Does this seem right to you?

stg_word64ToDoublezh(I64 w)
{
    D_ d;
    P_ ptr;

    STK_CHK_GEN_N (1);

    reserve 1 = ptr {
        I64[ptr] = w;
        d = D_[ptr];
    }

    return (d);
}

stg_doubleToWord64zh(D_ d)
{
    I64 w;
    P_ ptr;

    STK_CHK_GEN_N (1);

    reserve 1 = ptr {
        I64[ptr] = d;
        w = I64[ptr];
    }

    return (w);
}

It seems to work.

erikd updated this revision to Diff 11990.Apr 4 2017, 3:16 AM
erikd edited edge metadata.

Make it actually work

erikd edited reviewers, added: trofi; removed: austin, hvr.Apr 4 2017, 3:19 AM
erikd added a subscriber: Phyx.Apr 4 2017, 4:35 AM

The Linux build failure seems totally unrelated to this patch, but the Windows one very definitely is this patch. @Phyx Any clue what's going on here?

erikd updated this revision to Diff 11991.EditedApr 4 2017, 6:37 AM

Fix Windows by adding needed symbols to RTS_SYMBOLS. Thanks @Phyx !

bgamari edited edge metadata.Apr 4 2017, 9:23 AM

Thanks for pushing this one through!

libraries/base/GHC/Float.hs
1289

Can we have a comment mentioning that these would ideally be primops, but this is currently the compromise we have?

trofi added inline comments.Apr 4 2017, 4:08 PM
libraries/base/GHC/Float.hs
1338

Should be s/"stg_word32ToFloatzh"/"stg_word64ToDoublezh"/

1351

Perhaps should be s/"stg_floatToWord32zh"/"stg_doubleToWord64zh"

rts/CastFloatWord.cmm
2 ↗(On Diff #11991)

"MachDeps.h" include looks unused.

27 ↗(On Diff #11991)

Type-wise I guess this should be D_[ptr] = d; to match types in lhs/rhs.

erikd updated this revision to Diff 12002.Apr 4 2017, 4:14 PM
  • Remove last vestigaes of Haskell land allocation method.
  • Comments.
erikd planned changes to this revision.Apr 4 2017, 4:16 PM
erikd added inline comments.
libraries/base/GHC/Float.hs
1351

Yeah, fixed in the latest version. I hate how our nice strongly typed Haskell language ends up being converted to very loosely typed CMM.

erikd requested review of this revision.Apr 4 2017, 4:25 PM
erikd planned changes to this revision.
trofi added inline comments.Apr 5 2017, 2:36 AM
rts/CastFloatWord.cmm
9 ↗(On Diff #12002)

For 32-bit arch Word64 will require storage of 2 words.

As we do raw pointers here alignment might be tricky. I have suspiction
arm needs 8-byte address alignment for loads/stores for doubles on pre armv7.

For armv7+ it's 1 cycle to fetch from aligned address and 2 cycles from unaligned:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360f/CHDDEEDH.html

Might be negligible.

24 ↗(On Diff #12002)

Same here: For 32-bit arch Word64 will require storage of 2 words.

27 ↗(On Diff #12002)

[copying from previous review] Type-wise I guess this should be D_[ptr] = d; to match types in lhs/rhs.

rwbarton added inline comments.Apr 8 2017, 11:10 AM
libraries/base/GHC/Float.hs
1286

typo: integral

1300

typo: integral

1313

typo: integral

1331

typo: integral

rts/CastFloatWord.cmm
9 ↗(On Diff #12002)

I'm pretty sure we neglect the issue of alignment when dealing with Double# generally, for example when we have a boxed Double on the heap (how would we keep the Double# inside 8-byte aligned after a GC?) So I think it's safe to ignore the alignment issue here as well.

42 ↗(On Diff #12002)

W_TO_INT isn't necessarily right; it is for converting to the C type int which might not be 32 bits. Better use %lobits32(w) explicitly.

erikd updated this revision to Diff 12039.Apr 9 2017, 7:15 AM

Fix typos in comments.
Use %lobits32.

This revision is now accepted and ready to land.Apr 9 2017, 7:15 AM
erikd marked 23 inline comments as done.Apr 9 2017, 3:20 PM

I'd be happier if we had some more decent tests, that would be less likely to pass under a completely broken implementation.

erikd added a comment.Apr 10 2017, 3:57 AM

I'd be happier if we had some more decent tests, that would be less likely to pass under a completely broken implementation.

What would you suggest?

In an external project I did QuickCheck round tripping of word -> float -> word and float -> word -> float for both Double/Word64 and Float/Word32. This works fine on x86_64, but on x86 the word -> float -> word round trip fails if the value of the f is NaN (for both Float and Double). I'm pretty sure this is a quirk of the x86 floating point unit.

erikd updated this revision to Diff 12046.Apr 10 2017, 7:16 AM

Improve the test.

erikd added a comment.Apr 10 2017, 7:17 AM

Tested on x86/linux and x86_64/linux. Maybe @trofi or someone could test on other architectures.

By the way, you could also just put the cmm file in the base package, and not have to touch the rts at all.

Indeed, I like the idea of including this in base. Only truly foundational operations should be in rts/.

erikd updated this revision to Diff 12082.EditedApr 11 2017, 3:25 AM

Move CMM code to base library as requested.

This revision was automatically updated to reflect the committed changes.