Add unaligned bytearray access primops. Fixes #4442.
ClosedPublic

Authored by reinerp on Mar 9 2018, 12:22 AM.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
reinerp created this revision.Mar 9 2018, 12:22 AM
reinerp updated this revision to Diff 15715.Mar 9 2018, 12:32 AM

Remove addressed TODOs.

reinerp updated this revision to Diff 15723.Mar 10 2018, 11:16 AM

Increase -fmax-pmcheck-iterations.

bgamari requested changes to this revision.Mar 10 2018, 12:15 PM

I suspect arc is confused about your base commit. Try explicitly specifying it (e.g. arc diff --update D4488 origin/master).

This revision now requires changes to proceed.Mar 10 2018, 12:15 PM

I'm new to Phabricator. What signs of confusion are you seeing in the diff? As far as I can tell from the Phabricator UI, the diff looks correct to what intended to upload.

Hmm, that is a good question. This actually now looks fine to me. Disregard my noise. Sorry for the confusion!

reinerp requested review of this revision.Mar 13 2018, 1:06 PM

Ok thanks. Marking as ready for review.

Gentle ping -- could you take a look when you get a chance?

On the whole this looks reasonable. However, I do wonder whether it will work correctly on architectures which disallow non-naturally-aligned accesses (e.g. some ARMs). Have you tried this?

Yes, I wondered about that too. My development machine is an Intel machine. I haven't tested on other architectures, and I don't know of a convenient way to do so. Does GHC's continuous integration infrastructure provide some way for me to do so?

The main reason I expected this patch to be safe is that GHC already supports primops for unaligned memory accesses: plusAddr# followed by indexWord64OffAddr# and friends. Most serialization packages seem to rely on this functionality, in that if you serialize a Word8 followed by a Word64, the second write will be an unaligned write. My intention for this CL was to be able to generate the same code, but for non-pinned MutableByteArrays.

dfeuer added a subscriber: dfeuer.Mar 20 2018, 12:43 AM

Overall looks fine to me. But I'm wondering if some of these are redundant.

compiler/prelude/primops.txt.pp
1265

Why do we need these to be primops? Can't people read a Word8#, Int8#, Word32#, or Int32# and convert it to a Char# in library code?

1270

Why do we need this when we have indexWord8ArrayAsInt# and int2Addr#?

1285

Unlike Addr#, we don't currently have primops to convert between StablePtr# and Int# or Word#. Should we add those instead? Is there some reason we don't have them?

1325

We have int2Word# and word2Int#. Do we need to have both Int* and Word*-based operations here?

Overall looks fine to me. But I'm wondering if some of these are redundant.

Thanks David. Indeed, we could probably make a more minimal set of these. I went with this set for consistency with what the rest of the array ops in GHC.Prim provide, namely the index*Array#/read*Array#/write*Array# family and the index*OffAddr#/read*OffAddr#/write*OffAddr# family. But of course, we could set a new precedent now.

Added a couple of more specific replies below.

compiler/prelude/primops.txt.pp
1265

I think that should work. The equivalences that seems to be correct are:

indexWord8ArrayAsChar# arr i = chr# (word2Int# (indexWord8Array# arr i))
indexWord8ArrayAsWideChar# arr i = chr# (indexWord8ArrayAsInt32# arr i)

Note that in the first form, we need to read as a Word8 (not an Int8) because we don't want sign extension during the read. In the second form we don't care, because the sign bit is assumed not to be set (we're reading 31-bit characters).

1270

You're right, we probably don't.

On the other hand, int2Addr# and addr2Int# are marked "strongly deprecated". If this is an intended pattern, we should probably remove that comment. Similarly, we might want to do something about this comment at the top of GHC.Prim:

"Finally, there are strongly deprecated primops for coercing between Addr#, the primitive type of machine addresses, and Int#. These are pretty bogus anyway, but will work on existing 32-bit and 64-bit GHC targets; they are completely bogus when tag bits are used in Int#, so are not available in this case."

1285

GHC.Stable (and transitively Foreign.StablePtr) provides the following functions:

castStablePtrToPtr :: StablePtr a -> Ptr ()
castStablePtrToPtr (StablePtr s) = Ptr (unsafeCoerce# s)

castPtrToStablePtr :: Ptr () -> StablePtr a
castPtrToStablePtr (Ptr a) = StablePtr (unsafeCoerce# a)

Perhaps we could require users to use unsafeCoerce# here instead. Alternatively, we could add the conversion functions directly.

1325

It turns out that the Word* operations followed by word2Int# behave differently than the corresponding Int* operations. For example, consider:

let a = indexWord8ArrayAsWord16# arr i
    b = word2Int# a
    c = indexWord8ArrayAsInt16# arr i

If a == 0xFFFF then b == 0xFFFF, but c == 0xFFFFFFFF. That's because indexWord8ArrayAsInt16 sign-extends the 16-bit loaded value to a 64-bit value, whereas indexWord8ArrayAsWord16 zero-extends it.

If we wanted to remove the multiple read/write/index operations in favor of conversions, we'd want to have some primitives for sign-extension, perhaps these:

signExtendWord8ToWord# :: Word# -> Word#
signExtendWord16ToWord# :: Word# -> Word#
signExtendWord32ToWord# :: Word# -> Word#

Furthermore, we'd want to make sure that instruction selection correctly identifies that the sign extension and the memory-to-register load can be fused into a single load instruction.

That's more complicated than I had hoped for in this patch.

Ah. It sounds like you have very good cause for some of these, and tradition for the rest.

bgamari accepted this revision.Mar 25 2018, 10:59 AM

Yes, I wondered about that too. My development machine is an Intel machine. I haven't tested on other architectures, and I don't know of a convenient way to do so. Does GHC's continuous integration infrastructure provide some way for me to do so?

I wish. Sadly we are lacking in CI for non-amd64 platforms at the moment.

The main reason I expected this patch to be safe is that GHC already supports primops for unaligned memory accesses: plusAddr# followed by indexWord64OffAddr# and friends. Most serialization packages seem to rely on this functionality, in that if you serialize a Word8 followed by a Word64, the second write will be an unaligned write. My intention for this CL was to be able to generate the same code, but for non-pinned MutableByteArrays.

I suppose that is true; I suppose if this is broken currently then this is a pre-existing bug.

This looks reasonable to me.

This revision is now accepted and ready to land.Mar 25 2018, 10:59 AM

Thanks Ben and David for the review.

As I understand it, since Ben has marked this code Accepted, it's now ready for submission. Is there some command I should run in order to make that happen? Or do I wait for someone else to merge code into ghc's repo?

I'll push it.

This revision was automatically updated to reflect the committed changes.

@reinerp We should mention these changes in changelog.md of the ghc-prim package.