Update hsSyn AST to use Trees that Grow
ClosedPublic

Authored by alanz on May 24 2017, 4:43 PM.

Details

Summary

See https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow

This commit prepares the ground for a full extensible AST, by replacing the type
parameter for the hsSyn data types with a set of indices into type families,

data GhcPs -- ^ Index for GHC parser output
data GhcRn -- ^ Index for GHC renamer output
data GhcTc -- ^ Index for GHC typechecker output

These are now used instead of RdrName, Name and Id/TcId/Var

Where the original name type is required in a polymorphic context, this is
accessible via the IdP type family, defined as

type family IdP p
type instance IdP GhcPs = RdrName
type instance IdP GhcRn = Name
type instance IdP GhcTc = Id

These types are declared in the new 'hsSyn/HsExtension.hs' module.

To gain a better understanding of the extension mechanism, it has been applied
to HsLit only, also replacing the SourceText fields in them with extension
types.

To preserve extension generality, a type class is introduced to capture the
SourceText interface, which must be honoured by all of the extension points
which originally had a SourceText. The class is defined as

class HasSourceText a where
  -- Provide setters to mimic existing constructors
  noSourceText  :: a
  sourceText    :: String -> a

  setSourceText :: SourceText -> a
  getSourceText :: a -> SourceText

And the constraint is captured in SourceTextX, which is a constraint type
listing all the extension points that make use of the class.

Updating Haddock submodule to match.

Test Plan

./validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
alanz created this revision.May 24 2017, 4:43 PM
alanz updated this revision to Diff 12704.May 25 2017, 4:15 PM

Rename ClassX to ForallX, clean up, rebase

goldfire edited edge metadata.May 30 2017, 9:45 AM

I just took a look here to see what I thought... and it's hard to know where to begin. I took a look at HsExpr.hs and TcExpr.hs and didn't see much interesting. Also saw HsExtension.hs, which was more interesting. But I still don't know where to focus.

So: is there anything in here you think is controversial? Design decisions you weren't sure of? Do we have any performance measurements?

Thanks for doing this. I do like the direction of travel here, though I shudder to think about the amount of boilerplate this will grow to...

alanz added a comment.May 30 2017, 1:26 PM

So: is there anything in here you think is controversial? Design decisions you weren't sure of? Do we have any performance measurements?

This particular patch is basically clearing the way for the Trees that Grow implementation, in that it changes the type parameter to the
hsSyn AST, and not much else. And hsSyn/Extension.hs is where it happens, the rest is pretty much mechanically replacing the old type
parameter with the new one.

In terms of performance, it should only affect compilation time at this stage. The single experiment I did, on my machine, compiling via

$ make maintainer-clean
$ CPUS=5 time ./validate

gives the following

new branch: 12428.44 user 833.69 system 52:48.24 elapsed
master    : 12605.34 user 866.27 system 53:33.50 elapsed

It's not very scientific, but the numbers show the new branch being built and tested quicker than master.
To me the main point is that they are of a similar magnitude.

I have not checked memory usage during compilation, but can do that if someone offers some pointers
as to the best way to do it.

Thanks for doing this. I do like the direction of travel here, though I shudder to think about the amount of boilerplate this will grow to...

I think the amount of boilerplate will depend on the exact approach taken after this patch.

Personally I believe that the extension mechanism should be built in as described in the paper so that it is generally available,
but then used sparingly in the main GHC compilation process. And the candidates here would be the phase specific constructors,
such as ValBindsOut and similar.

But having it fully implemented will open many options for tool writers, which is where my interest lies.

goldfire accepted this revision.May 30 2017, 1:39 PM

I'm satisfied with this, then.

This revision is now accepted and ready to land.May 30 2017, 1:39 PM
bgamari retitled this revision from Udate hsSyn AST to use Trees that Grow to Update hsSyn AST to use Trees that Grow.May 30 2017, 2:05 PM
RyanGlScott added inline comments.
compiler/hsSyn/HsExtension.hs
53–57

It would be cleaner not to derive instances for each individual index, and instead do:

deriving instance Eq (GHC c)
deriving instance Typeable c => Data (GHC c)

etc.

(For the record, the bug that you ran into is being tracked separately as Trac #13759.)

Also, a minor point, but we already have another prominent type in the GHC API named Ghc. Since GHC is being exported from HsExtension, perhaps we ought to give it a more distinctive name to differentiate it from Ghc?

alanz updated this revision to Diff 12729.May 31 2017, 3:25 PM

Rename GHC to GhcPass, clean lint warnings, rebase

shayan-najd accepted this revision.Jun 1 2017, 5:54 PM

I am not sure if I really qualify as a reviewer for this.
This is a minor, yet enabling step in practice.
All is as planned and described in the wiki, and if other reviewers are fine with it, there should be no problem in landing it as soon as possible.

alanz updated this revision to Diff 12756.Jun 3 2017, 2:15 PM

rebase (again)

bgamari requested changes to this revision.Jun 3 2017, 10:51 PM

On the whole this seems fine but I would really like to see the comments finished up before we move ahead with merging. In fact, I think it would be reasonable to write the beginning of a release notes entry (under the "GHC API" section) describing the change, it's motivations, and generally how this will affect API users. In general I fear that this change may make it quite tricky for API users (think ghc-mod) to support GHC releases pre- and post-growable trees. Have you tried porting such a package to this new scheme?

compiler/hsSyn/HsExtension.hs
36

What is going to happen with this? We should likely write this note before merging as this business is far from trivial.

207

s/accross/across/

216

I'm not sure I follow here. What is Convertible intended to be used for?

This revision now requires changes to proceed.Jun 3 2017, 10:51 PM
alanz marked an inline comment as done.Jun 4 2017, 3:15 AM

On the whole this seems fine but I would really like to see the comments finished up before we move ahead with merging. In fact, I think it would be reasonable to write the beginning of a release notes entry (under the "GHC API" section) describing the change, it's motivations, and generally how this will affect API users. In general I fear that this change may make it quite tricky for API users (think ghc-mod) to support GHC releases pre- and post-growable trees. Have you tried porting such a package to this new scheme?

The conversion of check-api-annotations and HsDumpAst for this patch was basically a GHC API conversion. It is a simple but pervasive replacement of the original type parameter with another. The problems will come in trying to support multiple versions of GHC simultaneously using CPP.

I do know @shayan-najd has some ideas around making this transition easier using patterns, but I have not investigated these in detail.

compiler/hsSyn/HsExtension.hs
36

I was thinking that as this is the enabling commit for the work that @shayan-najd will be doing, it should be filled in when the actual Trees that Grow work goes in.

I suppose it should be updated to reflect that.

216

I will extend the comment. It is for when you are converting an AST from one pass to another, e.g. in the renamer. By design you do not know what is in the extension points, but they need to be brought along into the updated AST.

So for the HsLit SourceText annotations this function ends up being id, but needs a type signature to make GHC happy.

alanz marked 5 inline comments as done.Jun 4 2017, 1:59 PM

I have updated ghc-exactprint to use the GHC from this patch.

The commit is https://github.com/alanz/ghc-exactprint/commit/6a6e3cdeb40f9623e9a7fc6f1be90da981cb921b,
and the clean build is https://travis-ci.org/alanz/ghc-exactprint/builds/239364977

The main thing is to define

#if MIN_VERSION_ghc(8,3,0)
type ParseI     = GhcPs
type RenameI    = GhcRn
type TypecheckI = GhcTc
#else
type ParseI     = RdrName
type RenameI    = Name
type TypecheckI = Var
#endif

and then replace all usages of of the RdrName, Name and Var type parameters accordingly.

For polymorphic functions, a helper constraint can be defined along the lines of

#if MIN_VERSION_ghc(8,3,0)
-- |bundle up the constraints required for a trees that grow pass
type IsPass pass = (DataId pass, OutputableBndrId pass, SourceTextX pass)
#else
type IsPass pass = (DataId pass, OutputableBndrId pass)
#endif

Both of these require a change to the code, but it then cleanly compiles with both older and newer GHC versions.

alanz updated this revision to Diff 12757.Jun 4 2017, 2:56 PM
alanz edited edge metadata.

Update per @bgamari comments

This revision is now accepted and ready to land.Jun 4 2017, 2:56 PM
In D3609#103097, @alanz wrote:

...

Both of these require a change to the code, but it then cleanly compiles with both older and newer GHC versions.

Lovely. That doesn't seem too bad at all. However, it's certainly something that should be documented in the release notes (or perhaps the migration guide).

bgamari accepted this revision.Jun 5 2017, 12:08 PM

Fire when ready.

alanz updated this revision to Diff 12764.Jun 5 2017, 5:10 PM

Rebase

This revision was automatically updated to reflect the committed changes.