[WIP] Lex and rename docstrings

Authored by sjakobi on May 29 2018, 7:23 AM.




The lexer parses a HsDoc RdrName from each docstring. This is a raw
docstring combined with a list of identifiers found in the docstring
and the RdrNames created from each identifier.

The renamer then turns each HsDoc RdrName into a HsDoc Name,
replacing each RdrName with a list of Names.

When including docstrings in ModGuts or ModIface, we split a
HsDocNamesMap off the collected docs. This map collects all lexed
identifiers and the Names they may correspond to.

https://github.com/haskell/haddock/pull/828 demonstrates how haddock
may use the new .hi-file output.

Measurements (outdated)

We compare this patch with a version of GHC before the changes but
which includes the ByteString docstring patch from D4743.

The modules we compile are
a medium-sized and well documented module, and Options.GenericNoDocs,
a version of Options.Generic with all docstrings removed.

Before (inludes ByteString docstring patch)             Now

                                        Module:  Options.Generic

                                        -haddock: off

Compile time:                   810.0 ms                816.3 ms
max_bytes_used:                 52,170,808              52,805,264
peak_megabytes_allocated:       172                     174
Size of hi-file in bytes:       33,864                  33,792

                                        -haddock: on

Compile time:                   807.1 ms                820.2 ms
max_bytes_used:                 51,838,320              52,957,272
peak_megabytes_allocated:       170                     175
Size of hi-file in bytes:       33,864                  50,081

                                        Module:  Options.GenericNoDocs

                                        -haddock: off

Compile time:                   813.9 ms                818.5 ms
max_bytes_used:                 51,845,232              52,789,944
peak_megabytes_allocated:       171                     174
Size of hi-file in bytes        33,864                  33,792

                                        -haddock: on

Compile time:                   809.7 ms                817.3 ms
max_bytes_used:                 51,844,840              52,801,504
peak_megabytes_allocated:       171                     174
Size of hi-file in bytes:       33,864                  33,792

Planned next

Add a :doc command to GHCi that shows the documentation for an identifier.

Ideas for further work

  • Remove the dependency on parsec.
  • Add an option to configure the identifier delimiters the lexer looks for.
  • Add an option to make the lexer exclude birdtracks.
Test Plan

make test TEST="DocsInHiFile0 DocsInHiFile1"

Diff Detail

rGHC Glasgow Haskell Compiler
Lint WarningsExcuse: The surrounding code also violates the line length convention.
Warningcompiler/deSugar/Desugar.hs:187TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:41TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:43TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:66TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:97TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:101TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:108TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:110TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:149TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:158TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:159TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:161TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:162TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:190TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:234TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:235TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:236TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:237TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:258TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:283TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:302TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:329TXT3Line Too Long
Warningcompiler/hsSyn/HsDecls.hs:147TXT3Line Too Long
Warningcompiler/hsSyn/HsDecls.hs:1253TXT3Line Too Long
Warningcompiler/hsSyn/HsDoc.hs:2TXT3Line Too Long
No Unit Test Coverage
Build Status
Buildable 20890
Build 46463: [GHC] Linux/amd64: Continuous Integration
Build 46462: [GHC] OSX/amd64: Continuous Integration
Build 46461: [GHC] Windows/amd64: Continuous Integration
Build 46460: arc lint + arc unit
sjakobi created this revision.May 29 2018, 7:23 AM
sjakobi retitled this revision from [DON' MERGE] Lex docstrings and include them in .hi-files to [DON'T MERGE] Lex docstrings and include them in .hi-files.May 29 2018, 7:23 AM
sjakobi updated this revision to Diff 16571.May 29 2018, 9:52 AM
  • Fix build with ghc-8.2
  • Fix line length issues
alexbiehl added a comment.EditedMay 29 2018, 1:00 PM

Thank you Simon for your non-stop, all-night effort! Impressive how fast we got to some POC!


This whole file is basically copied from Haddock... and is super awful. Maybe we can find some other way to support the process of getting documentation from a HsGroup somewhat to finally get rid of this.

sjakobi added inline comments.May 29 2018, 1:16 PM

Yes, apart from extractDocs, combineDocs and splitMbHsDoc, everything in this file was copied from haddock and partially modified.

Can you be more specific about what you don't like about this code and what I should try to do differently, though?

BTW the build fails only because haddock hasn't been adapted yet.

sjakobi updated this revision to Diff 16724.Jun 5 2018, 10:21 AM

Rebase on master

sjakobi retitled this revision from [DON'T MERGE] Lex docstrings and include them in .hi-files to [WIP] Lex and rename docstrings.Jun 5 2018, 10:28 AM
sjakobi edited the summary of this revision. (Show Details)
bgamari requested changes to this revision.Jun 15 2018, 1:13 PM

Bumping out of review queue since we already have a limited version of this in.

This revision now requires changes to proceed.Jun 15 2018, 1:13 PM
sjakobi abandoned this revision.Nov 4 2018, 7:28 AM

Superseded by D5067.