[WIP] Remove Haddock tokens from the grammar
Changes PlannedPublic

Authored by harpocrates on Aug 10 2018, 10:55 AM.

Details

Reviewers
bgamari
simonmar
Trac Issues
#8944
Summary

This removes all Haddock-related tokens from the grammar. When the lexer encounters
a Haddock token, it plucks it out and puts it into some store in the 'PState'. Then,
the parser can query this store directly to get Haddocks. The actual docstrings are
keyed by their left or right 'RealSrcLoc' position.

The main benefit of this is that misplaced Haddock tokens no longer cause parse
errors - only warnings.

Not everything is rosy though. Here are the main issues:

  • 'parseDeclaration' doesn't return 'DocD' variants
  • we have to lookahead on every token (when in Haddock mode), so that means stashing and restoring the PState. Crucially, it means relexing twice the same thing. This could probably be fixed (by storing more stuff in PState)
Test Plan

./validate

Unit TestsFailed

TimeTest
0 msTest: Test
#!/bin/bash -eo pipefail mkdir -p test-results make test THREADS=`mk/detect-cpu-count.sh` SKIP_PERF_TESTS=YES JUNIT_FILE=../../test-results/junit.xml
0 msBoot: Boot
#!/bin/bash -eo pipefail ./boot
0 msBuild: Build
#!/bin/bash -eo pipefail make -j`mk/detect-cpu-count.sh` V=0
0 msCheckout code: Checkout code
#!/bin/sh set -e
0 msConfigure: Configure
#!/bin/bash -eo pipefail ./configure
View Full Test Results (1 Failed · 10 Passed)
harpocrates created this revision.Aug 10 2018, 10:55 AM

There are still some pretty big TODOs.

  • clean up the grammar around semicolon tokens which can be interleaved with DocDecls (this should also get rid of 3 unnecessary shift reduce conflicts)
  • make sure behaviour around annotations hasn't changed
  • clarify exactly how the haddock, raw token stream, and queued comments features are supposed to interact
  • clarify that lexer performance isn't substantially affected by this. If it is, cache the looked-ahead tokens in PState

However, I would appreciate some feedback on the general approach.

  • Add some tests

I rebased, did some work, then did arc diff. Unfortunately, it looks like the base commit of this diff is the still the pre-rebase one. How would I go about fixing this?

thomie added a subscriber: thomie.Aug 16 2018, 10:14 PM

I rebased, did some work, then did arc diff. Unfortunately, it looks like the base commit of this diff is the still the pre-rebase one. How would I go about fixing this?

Try arc diff <basecommit>. For example, if you want this Differential to only include a single commit, you would do: arc diff HEAD^.

See also the Note in https://ghc.haskell.org/trac/ghc/wiki/Phabricator#Startingoff:Fixingabugsubmittingareview.

Rebased and fixed base commit.

Really fix base commit

I suspect this might fix https://ghc.haskell.org/trac/ghc/ticket/15029 too?!

Yeah. There's a handful of other parser issues this covers. These are from the Haddock bug tracker really despite being about the GHC parser:

This is a really cool patch—thanks for working on this.

One particularly intriguing avenue that this might open up is the possibility of removing the type production and moving -> to atype—in other words, making -> be parsed exactly the same as any other type constructor. Not only would this be a nice simplification, but it would also eliminate some annoying bugs like Trac #15235 (where -> is effectively infixr -∞ due to the special treatment it receives in the parser).

This patch doesn't get us all the way there, but it does remove a good number of special cases for ->, so it's certainly a step in the right direction, I think.

harpocrates planned changes to this revision.Sep 22 2018, 10:52 PM

Apologize for letting this languish for so long - I expect to pick this work back up in the coming weeks.

One particularly intriguing avenue that this might open up is the possibility of removing the type production and moving -> to atype—in other words, making -> be parsed exactly the same as any other type constructor.

That is a very interesting idea! I'm dying to try it out now...

x - Add some tests

  • Rebase sludge
  • Promising WIP optimization
  • Fix an off-by one reverse
  • Fix issue around 'pushCurrentContext'
harpocrates planned changes to this revision.Nov 1 2018, 11:26 AM

This is almost done. The perf/haddock tests all pass again. All that remains is:

  • figuring out the interaction between Opt_KeepRawTokenStream and Opt_Haddock (this is what is causing the remaining two test failures)
  • check that -haddock has not been broken in the presence of ALR
  • write a note explaining the approach taken and the lexer-level gotchas
  • address misc. TODOs I've left lying around