Initial POC for a parallel annotation structure, retired in favour of D428,D438 and D412
AbandonedPublic

Authored by alanz on Oct 1 2014, 10:00 AM.

Details

Reviewers
simonpj
austin
Trac Issues
#9628
Summary

The final implementation is in D428,D438 and D412

This is a proof of concept implementation of a completely
separate annotation structure, populated in the parser,and tied to the
AST by means of a virtual "node-key" comprising the surrounding
SrcSpan and a value derived from the specific constructor used for the
node.

The key parts of the design are the following.

The Annotations

In hsSyn/ApiAnnotation.hs

type ApiAnns = Map.Map ApiAnnKey SrcSpan

data ApiAnnKey = AK SrcSpan Ann
                  deriving (Eq,Ord,Show)

-- ---------------------------------------------------------------------

-- | Retrieve an annotation based on the SrcSpan of the annotated AST
-- element, and the known type of the annotation.
getAnnotation :: ApiAnns -> SrcSpan -> Ann -> Maybe SrcSpan
getAnnotation anns span ann = Map.lookup (AK span ann) anns


-- | Note: in general the names of these are taken from the
-- corresponding token, unless otherwise noted
data Ann = AnnAs
         | AnnBang
         | AnnClass
         | AnnClose -- ^ } or ] or ) or #) etc
         | AnnComma
         | AnnDarrow
         | AnnData
         | AnnDcolon
         ....

Capturing in the lexer/parser

The annotations are captured in the lexer / parser by extending PState to include a field

In parser/Lexer.x

data PState = PState {
        ....
        annotations :: [(ApiAnnKey,SrcSpan)]
        -- Annotations giving the locations of 'noise' tokens in the
        -- source, so that users of the GHC API can do source to
        -- source conversions.
     }

The lexer exposes a helper function to add an annotation

addAnnotation :: SrcSpan -> Ann -> SrcSpan -> P ()
addAnnotation l a v = P $ \s -> POk s {
  annotations = ((AK l a), v) : annotations s
  } ()

The parser also has some helper functions of the form

type MaybeAnn = Maybe (SrcSpan -> P ())

gl = getLoc
gj x = Just (gl x)

ams :: Located a -> [MaybeAnn] -> P (Located a)
ams a@(L l _) bs = (mapM_ (\a -> a l) $ catMaybes bs) >> return a

This allows annotations to be captured in the parser by means of

ctypedoc :: { LHsType RdrName }
        : 'forall' tv_bndrs '.' ctypedoc {% hintExplicitForall (getLoc $1) >>
                                            ams (LL $ mkExplicitHsForAllTy $2 (noLoc []) $4)
                                                [mj AnnForall $1,mj AnnDot $3] }
        | context '=>' ctypedoc         {% ams (LL $ mkQualifiedHsForAllTy   $1 $3)
                                               [mj AnnDarrow $2] }
        | ipvar '::' type               {% ams (LL (HsIParamTy (unLoc $1) $3))
                                               [mj AnnDcolon $2] }
        | typedoc                       { $1 }

Parse result

lang-haskell
data HsParsedModule = HsParsedModule {
    hpm_module    :: Located (HsModule RdrName),
    hpm_src_files :: [FilePath],
       -- ^ extra source files (e.g. from #includes).  The lexer collects
       -- these from '# <file> <line>' pragmas, which the C preprocessor
       -- leaves behind.  These files and their timestamps are stored in
       -- the .hi file, so that we can force recompilation if any of
       -- them change (#3589)
    hpm_annotations :: ApiAnns
  }

-- | The result of successful parsing.
data ParsedModule =
  ParsedModule { pm_mod_summary   :: ModSummary
               , pm_parsed_source :: ParsedSource
               , pm_extra_src_files :: [FilePath]
               , pm_annotations :: ApiAnns }

hsSyn changes

To make sure each annotatoin has a SrcSpan to look up in the index, some parts of the AST have had locations added.

in particular


-- in BasicTypes
data WarningTxt = WarningTxt [Located FastString]  -- now Located
                | DeprecatedTxt [Located FastString]       -- now Located
    deriving (Eq, Data, Typeable)


LRuleBndr
...
    HsDataDefn { dd_ND     :: NewOrData,
                 ...
                 dd_cType  :: Maybe (Located CType), -- now located
                 dd_cons   :: [Located [LConDecl name]], -- now list of located list
                 ...
...
type HsConDeclDetails name
        = HsConDetails (LBangType name) [Located [ConDeclField name]] -- located to match dd_cons
...
data RuleDecl name
  = HsRule                      -- Source rule
        (Located RuleName)      -- now located
        Activation
        [LRuleBndr name]        -- now located
...
data HsExpr id
  ..
  | ExplicitTuple
        [LHsTupArg id] -- now located
        Boxity
...
data IE name
  = ...
  | IEThingWith         name [Located name] -- now located
...
data HsRecFields id arg         -- A bunch of record fields
                                --      { x = 3, y = True }
        -- Used for both expressions and patterns
  = HsRecFields { rec_flds   :: [LHsRecField id arg],   -- now located
...
data HsModule name
  = HsModule {
      ....
      hsmodExports :: Maybe (Located [LIE name]), -- now located
      hsmodImports :: Located [LImportDecl name], -- now located
...
data HsType name
   ...
  | HsRecTy [Located [ConDeclField name]] -- now located

Other

See https://ghc.haskell.org/trac/ghc/wiki/GhcAstAnnotations

The current design as implemented here is detailed at https://ghc.haskell.org/trac/ghc/wiki/GhcAstAnnotations#design

Test Plan

sh ./validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
wip/ast-annotations
Lint
Lint WarningsExcuse: todo item
SeverityLocationCodeMessage
Warningcompiler/parser/Parser.y.pp:1754TXT3Line Too Long
Unit
No Unit Test Coverage
Build Status
Buildable 1592
Build 1598: GHC Patch Validation (amd64/Linux)
alanz updated this revision to Diff 743.Oct 1 2014, 10:00 AM
alanz retitled this revision from to Initial POC for a parallel annotation structure.
alanz updated this object.
alanz edited the test plan for this revision. (Show Details)
alanz added a reviewer: simonpj.
alanz updated the Trac tickets for this revision.
alanz updated this revision to Diff 760.Oct 2 2014, 10:35 AM
alanz edited edge metadata.
  • Work in feedback from Neil Mitchell

ApiAnn has been split into separate stuctures, one per annotation.

This obviates the need to pattern match on the returned annotation
when retrieving it.

The storage/lookup is changed to match this.

rodlogic added inline comments.
compiler/ghc.mk
523

Excuse my bike shedding, but wouldn't it be more consistent with the other modules if it were called HsAnn or HsAnnotation instead? I can't say I agree with the existing naming conventions in the code base, but consistency trumps that imo. (feel free to ignore me :-) )

alanz updated this revision to Diff 762.Oct 2 2014, 1:14 PM
  • Fix lint and harbourmaster, merge master
alanz updated this revision to Diff 765.Oct 2 2014, 4:47 PM
  • Cleaning up and more annotation types.

Introduced additional SrcSpan to exports in HsModule to be able to
track parens location in an annotation.

alanz updated this revision to Diff 771.Oct 3 2014, 9:12 AM
  • Explicitly tracking commas in exports

Introduce a type HsCommaList to provide a point to attach an annotation.

This is modelled on the SPJ suggestion for tracking extra commas, and
currently includes an ExtraComma constructor, which will be needed for
annotations on tuple sections.

alanz updated this revision to Diff 832.Oct 8 2014, 4:34 PM
  • Fixing lint and reviewed issues
  • Merge remote-tracking branch 'origin/master' into wip/ast-annotations-separate
  • Progress on annotations for the header area
  • Using Value as per Shake instead of Dynamic, tks @ndmitchell
  • Remove unused Shake Key type
  • Fix test to show type-specific annotation lookup
  • Replacing more instances of OrdList with HsCommaList in parser
  • Worked in all exports and top level decls
  • Working in more instances of HsCommaList
  • Merge remote-tracking branch 'origin/master' into wip/ast-annotations-separate
  • Working in more API Annotations
alanz updated this object.Oct 9 2014, 2:15 PM
alanz updated this revision to Diff 861.Oct 12 2014, 4:19 PM
alanz updated this object.
  • validates again with new design. Incomplete though.
  • Merge remote-tracking branch 'origin/master' into wip/ast-annotations-separate
alanz updated this object.Oct 19 2014, 3:05 PM
alanz updated this revision to Diff 916.Oct 19 2014, 3:09 PM
  • Warnings and deprecations are now located
  • Make data constructors in HsDataDefn [Located [LConDecl name]]
  • Replace [ConDeclField RdrName] with [Located [ConDeclField RdrName]]
  • Make HsTupArg Located
  • Generate annotations for all commas in ExplicitTuple
  • Completed introduction of LHsRecField
  • Annotations added
alanz updated this object.Oct 19 2014, 3:32 PM
alanz updated this revision to Diff 917.Oct 19 2014, 4:38 PM
  • Fix lint warnings
alanz added a comment.Oct 19 2014, 4:46 PM

I think this is ready for review now.

There are just some lint cleanups to come on the grammar file, but the annotations are all in.

To complete the round-tripping of source there still needs to be a modification to the AST to record the original source text for literals so they can be reproduced. I am not sure if this should be a separate diff, or included in this one.

And once this is accepted I will begin on a exact-print tool to do the round tripping, to confirm that all the detail is correct.

simonpj edited edge metadata.Oct 20 2014, 2:41 AM

See my notes added to Trac #9628

alanz added a comment.Oct 20 2014, 2:47 AM

I agree with your comments on motivating each individual literal in terms
of whether it needs a "raw" field or not.

I am not sure whether it should be included as part of this change, or put
through separately.

alanz updated this revision to Diff 922.Oct 20 2014, 1:19 PM
alanz edited edge metadata.
  • More lint cleanups
  1. Updating D297: Initial POC for a parallel annotation structure #
  2. Enter a brief description of the changes included in this update.

More lint fixups

alanz updated this revision to Diff 923.Oct 20 2014, 1:27 PM
  • More lint fixups
alanz updated this revision to Diff 934.Oct 21 2014, 10:02 AM
  • Merge branch 'wip/ast-annotations-comments' into wip/ast-annotations-separate
ezyang removed a subscriber: ezyang.Oct 27 2014, 2:05 PM
alanz updated this revision to Diff 1051.Oct 27 2014, 4:22 PM
  • Make name in IE Located for pattern export annotations
alanz updated this revision to Diff 1054.Oct 27 2014, 4:57 PM
  • Fix lint warnings
alanz updated this revision to Diff 1066.Oct 28 2014, 11:01 AM

Apply existing D297 to current HEAD

alanz updated this revision to Diff 1076.Oct 28 2014, 4:13 PM
  • Make sure export_subspec annotations are captured.
  • Make IEModuleContents have Located ModuleName
alanz updated this revision to Diff 1098.Oct 29 2014, 3:22 PM
  • Tweaks for returning comments

In general I'm ok with this.

There are really two separate patches here.

(A) One adds extra locations to the source tree, causing simple but pervasive changes.
(B) The other does the ApiAnns thing, with much more localised changes.

Would it be possible to separate them? They really do different things.

For (A) as you'll see from my comments I'm quite confused about the intended semantics of the locations. Some look redundant to me, but doubtless has subtly different semantics for you. Some comments would help; perhaps the general principles in one place, and some clarifying specifics elsewhere.

The only change I really dislike is the splitting up of dd_cons, and HsConDeclDetails.

For (B) the following question arises: for each of the myriad SrcSpans in the tree, which ones have associated Ann keys? How is a user to know?

The only thing I can think of is to add a comment, in the source code, thus
{{{
type LConDecl id = Located (ConDecl id)

  • AnnKeywordIds = AnnForAll, AnnOpen, ...

}}}

Simon

compiler/hsSyn/HsDecls.lhs
793

I don't get this. Why the extra locations? What are the locations on the sub-lists saying?

Is it really necessary -- it really clutters up the code elsewhere.

Please add a comment to explain.

compiler/hsSyn/HsExpr.lhs
346

Since the HsTupArg is always located, you can have an HsExpr inside here, rather than a located one, can't you? Just one less layer.

compiler/hsSyn/HsImpExp.lhs
107–112

Don't we have a location for the whole LIE thing? Or does that include the comma but this doesn't? Please say

compiler/hsSyn/HsSyn.lhs
75

Here and for hsmodExports, why do you want a location for the entire list?

compiler/parser/Lexer.x
2610

"Ann" is a terribly short name for this. Perhaps "AnnKey" or something? Or "AnnKeywordId" Something a bit longer would help.

alanz updated this revision to Diff 1197.Nov 2 2014, 2:56 PM

Final cleanup before splitting into AST structure change diff and
annotation diff.

  • Remove leading '{-' from ITblockComment string
  • Include missing annotation
  • Add a comment motivating dd_cons as list of lists
alanz abandoned this revision.Nov 4 2014, 3:21 PM
alanz retitled this revision from Initial POC for a parallel annotation structure to Initial POC for a parallel annotation structure, retired in favour of D428,D438 and D412.
alanz updated this object.

The proof of concept has served its purpose the final version is at D428,D438 and D412

compiler/hsSyn/HsDecls.lhs
793

This exists because a gadt constructor can be of the form

A,B :: TWhatever

so the ConDecls for A and B are clustered together for the annotation

compiler/hsSyn/HsExpr.lhs
346

In a tuple section we have commas indicating missing arguments. These need to carry annotations for the comma, and so need a location.

compiler/hsSyn/HsImpExp.lhs
107–112

A pattern Foo export uses an IEVar, in which case the name does not start with the surrounding location.

compiler/hsSyn/HsSyn.lhs
75

We have AnnOpen and AnnClose.

At the HsModule level we need to potentially track the ( and ) for the exports, as well as the { and } around the imports/declarations.

There are two options

  1. Introduce AnnOpen1 / AnnOpen2 or similar
  2. Keep the Located on hsmodExports