Make GHC generics capable of handling unboxed types (#10868)
ClosedPublic

Authored by RyanGlScott on Sep 11 2015, 8:38 PM.

Details

Summary

This adds a data family (URec) and six data family instances (UAddr, UChar, UDouble, UFloat, UInt, and UWord)
which a deriving Generic(1) clause will generate if it sees Addr#, Char#,
Double#, Float#, Int#, or Word#, respectively. The programmer can then
provide instances for these data family instances to provide custom implementations
for unboxed types, similar to how derived Eq, Ord, and Show instances
currently special-case unboxed types.

Test Plan

./validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
RyanGlScott updated this revision to Diff 4144.Sep 11 2015, 8:38 PM
RyanGlScott retitled this revision from to Make GHC generics capable of handling unboxed types (#10868).
RyanGlScott updated this object.
RyanGlScott edited the test plan for this revision. (Show Details)
RyanGlScott updated the Trac tickets for this revision.
RyanGlScott updated this object.Sep 11 2015, 8:41 PM
RyanGlScott edited edge metadata.
dreixel accepted this revision.Sep 12 2015, 11:32 AM
dreixel edited edge metadata.

Looks good to me, thanks. Perhaps just worth adding somewhere that the reason why the new datatypes are introduced is that the kinds of the unlifted types prevent Generic instances being given to them.

This revision is now accepted and ready to land.Sep 12 2015, 11:32 AM
RyanGlScott updated this revision to Diff 4145.Sep 12 2015, 1:35 PM
RyanGlScott edited edge metadata.
  • Expand documentation

This look generally plausible to me, and I like the general idea, but I'd want sign off from Andres Loh (kosmikus).

Simon

RyanGlScott planned changes to this revision.EditedSep 21 2015, 9:03 PM

It occurred to me that we probably shouldn't derive Show instances for UChar et al. The reason is that if a programmer wanted to make a generic Show implementation, she'd need an instance like this:

class GShow f where
    gshow :: f a -> String

...

instance Show c => GShow (K1 c) where
    gshow (K1 x) = gshow x


...

data CharHash = Char C# deriving (Generic)
instance Show CharHash where show = gshow

But with a derived Show instance, showing CharHash 'a'# would display as CharHash (UChar {uChar# = 'a'})! This is clearly not what she would want, since a derived Show instance for CharHash would render it as CharHash 'a'#.

Fixing this should be a simple matter of defining Show instances by hand. The derived Eq and Ord instances should be fine, since they're defined structurally (as opposed to Show, which is name-aware).

RyanGlScott updated this revision to Diff 4255.Sep 21 2015, 9:32 PM
RyanGlScott edited edge metadata.
  • Hand-implement Show instances for unboxed representation types
This revision is now accepted and ready to land.Sep 21 2015, 9:32 PM
RyanGlScott planned changes to this revision.EditedSep 26 2015, 2:11 PM

I've been thinking about this some more, and I've come to conclusion that I don't really like the solution I have at the moment. What makes me uncomfortable about this is that UAddr, UChar et. al aren't really generic representation types in the same vein as the other datatypes in GHC.Generics—rather, they're proxy types that are used to signify the presence of different datatypes.

The distinction is subtle but vitally important when it comes to defining instances for them. Normally, implementing Generic behavior is done via some separate typeclass, e.g.:

class GShow f where
    gshow :: f a -> String

...

instance Show c => GShow (K1 c) where
    gshow (K1 x) = gshow x

...
data Example = Example deriving Generic
instance Show Example where show = gshow . from

But when GHC creates the Generic Example instance, UAddr appears as a type parameter of K1 constructor, which means that one can't use GShow for define how UInts should be shown. Rather, one must use Show directly! This is a bit awkward, since GHC.Generics provides Show instances for UAddr and friends already, so the programmer has no control over how unboxed types should be shown generically.

There's also a matter of semantics to deal with. According to GHC.Generics's documentation, K1/Rec0 denote occurrences of kind *. But UAddr and crew all encapsulate data types of kind #!

I think the neatest solution would be to change the way deriving Generic works a bit further. The following code

data IntHash = IntHash Int# deriving Generic

shouldn't generate this representation:

type Rep IntHash = D1 D1IntHash (C1 C1_0IntHash (S1 NoSelector (Rec0 UInt)))

for the above reasons, but instead, should generate something like this:

type Rep IntHash = D1 D1IntHash (C1 C1_0IntHash (S1 NoSelector UInt))

That is, we should redefine UInt to be data UInt (p :: *) = UInt Int#, and generate that to mark any occurrence of Int#. This way, the user would be able to define their own GShow UInt instance instead of relying on a pre-cooked Show instance, and it makes a distinction between occurrences of kinds * and #. And importantly, this wouldn't break any existing Generic code.

@kosmikus and @dreixel, does this redesign sound reasonable?

Have you considered adding one single datatype to the universe, like Rec0 but of a different kind (perhaps URec0 :: # -> *)? That way you can still address all unboxed cases with a single URec0 a instance. Alternatively, make Rec0 poly-kinded.

I don't think you can use unlifted types (things of kind #) in a polymorphic fashion like that, unfortunately. The cleanest workaround I can think of is to simply make a new data type for each unlifted type of interest, and GHC itself only special-cases the six aforementioned unlifted types in its deriving machinery.

RyanGlScott updated this revision to Diff 4372.Oct 1 2015, 12:58 PM
RyanGlScott edited edge metadata.
  • Make UAddr and friends actual representation types
This revision is now accepted and ready to land.Oct 1 2015, 12:58 PM
kosmikus edited edge metadata.Oct 2 2015, 1:28 AM

Sorry for joining the discussion this late.

Thanks for working on this. I agree with the original sentiment that adding UInt and others as kind * datatypes is somewhat unfortunate, and that your proposed redesign in making them proper representation types of kind * -> * is better.

If you wanted to group them all so that one could give a catch-all instance for unboxed types, then one option would be to use a data family, along the lines of

data family URec (a :: *) (p :: *) :: *
data instance URec Int p = UInt { uInt# :: Int# }
...

(Of course, it would be nicer if we could index over the unboxed types rather than their boxed counterparts, but GHC does not allow this.)

Then one could decide whether to do

instance GShow (URec a) where ...

or

instance GShow (URec Int) where ...

I don't have a concrete use case for this though. What do you think?

RyanGlScott added a comment.EditedOct 2 2015, 8:33 AM

I agree that this approach is not the most, ahem, generic, so let me see if I understand what you are proposing.

Your suggestion is that the user could dispatch which representation type to use via a data family like this?

{-# LANGUAGE MagicHash, TypeFamilies #-}

import GHC.Exts

data family URec (a :: *) (p :: *)
data instance URec Char p = UChar { uChar# :: Char# }
data instance URec Int  p = UInt  { uInt#  :: Int#  }

class GShow (f :: * -> *) where
  gshow :: f a -> String

instance GShow (URec a) where
    gshow (UChar c) = show (C# c)
    gshow (UInt  i) = show (I# i)

If so, that won't compile, since you can't use separate data family instances together in a single typeclass instance like that.

We could, however, use a GADT to accomplish that:

{-# LANGUAGE GADTs, KindSignatures, MagicHash #-}

import GHC.Exts

data URec (a :: *) (p :: *) where
    UChar :: { uChar# :: Char# } -> URec Char p
    UInt  :: { uInt#  :: Int#  } -> URec Int  p

class GShow (f :: * -> *) where
  gshow :: f a -> String

instance GShow (URec a) where
    gshow (UChar c) = show (C# c)
    gshow (UInt  i) = show (I# i)

Somewhat ironically, this would prevent URec itself from having a derived Generic instance due to its use of existential constraints, but I could live with that.

RyanGlScott added a comment.EditedOct 2 2015, 9:18 AM

Also, it's not clear to me what type we'd use to index UAddr. We could do

UAddr :: { uAddr# :: Addr# } -> URec String p

since primitive string literals (e.g., "abc"#) are of type Addr#. On the other hand, we could just as well do

UAddr :: { uAddr# :: Addr# } -> URec (Ptr a) p

or

UAddr :: { uAddr# :: Addr# } -> URec (FunPtr a) p

since Addr# is the underlying representation of a Ptr and FunPtr.

Some small comments.

libraries/base/GHC/Generics.hs
637

This change doesn't look to be necessary.

639

This change doesn't look to be necessary.

testsuite/tests/generics/GEq/GEq1A.hs
8

Are GHC.Exts and GHC.Prim both needed here?

kosmikus added a comment.EditedOct 2 2015, 9:38 AM
instance GShow (URec a) where
    gshow (UChar c) = show (C# c)
    gshow (UInt  i) = show (I# i)

No, I never meant you'd do this. URec would only be useful if you'd want to give a "generic" instance for all unboxed things, e.g.,

instance GShow (URec a) where
  gshow _ = "<unboxed>"

If you'd still want to do separate instance, that's possible with the data family, but has to be done like this:

instance GShow (URec Char) where
    gshow (UChar c) = show (C# c)

instance GShow (URec Int) where
    gshow (UInt  i) = show (I# i)

We could, however, use a GADT to accomplish that:
[...]

I don't much like the idea of using a GADT for this, because conceptually, this should be an "open" thing. It's true that the number of unlifted things we handle is fixed per GHC version, but it could very much change between versions. Using a GADT suggests that you can write functions using exhaustive pattern matching, but the number of cases may increase.

Your point about Addr not having a direct boxed counterpart is a good one, but I agree that Ptr might work well.

Can I urge, strongly, that whatever conclusions you come to (and you seem to be making progress) you document the design on a wiki page.

I'm always anxious that in 5 yrs time someone fresh will come to this; it's SO helpful to have a clearly described design. Of course the paper is a good start, but there are many differences of detail, this among them.

Thanks!

Simon

No, I never meant you'd do this. URec would only be useful if you'd want to give a "generic" instance for all unboxed things, e.g.,

instance GShow (URec a) where
  gshow _ = "<unboxed>"

OK, that does sound like a reasonable thing to want. In that case, I'll march forth with a data-family–oriented design.

Can I urge, strongly, that whatever conclusions you come to (and you seem to be making progress) you document the design on a wiki page.

I'm always anxious that in 5 yrs time someone fresh will come to this; it's SO helpful to have a clearly described design. Of course the paper is a good start, but there are many differences of detail, this among them.

A good point. Luckily, there's already a wiki page dedicated to DeriveGeneric, so that looks like a good place to put the design rationale for these additions. I'll incorporate the lessons learned into a section on that page.

RyanGlScott updated this revision to Diff 4396.Oct 2 2015, 12:36 PM
RyanGlScott edited edge metadata.
  • Data family redesign
RyanGlScott updated this revision to Diff 4402.Oct 2 2015, 6:41 PM
RyanGlScott edited edge metadata.
  • Add note, refer to design on wiki page
RyanGlScott updated this object.Oct 2 2015, 6:42 PM
RyanGlScott edited edge metadata.
kosmikus accepted this revision.Oct 3 2015, 4:53 AM
kosmikus edited edge metadata.
bgamari accepted this revision.Oct 3 2015, 12:29 PM
bgamari edited edge metadata.

Looks good to me. I can handle rebasing the user manual.

This revision was automatically updated to reflect the committed changes.