[WIP] Hi Haddock: Enable haddock to generate docs from .hi-files
Needs RevisionPublic

Authored by sjakobi on Aug 14 2018, 8:14 AM.

Details

Summary

This extends my work from D4758 and enables haddock to generate docs
from information enclosed in the .hi interface files.

Main changes

  • HaddockLex contains an Alex-based lexer for docstrings that detects identifiers delimited with ' or backticks.
  • HsDoc.Docs comprises the DeclDocMap and ArcDocMap structures added in D4758 and other data needed by haddock.
  • haddock now uses GhcMake.load' to load .hi-files, hooking into GHC's recompilation avoidance. In order to avoid recompilation when a .hi-file already contains documentation we use a new option Opt_SkipIfaceVersionCheck.

TODOs and requests for comment

  • Work out a specification for a deterministic module structure (docs_structure) and update the existing code to implement it. The HaddockIssue849 test case is currently flaky because the ordering of constructors in the Avail for Maybe fluctuates.
  • Possibly fuse mi_exports and docs_structure. Currently both serialize the same Avails.
  • Get the recompilation avoidance right: While some of the checks performed in checkVersions are irrelevant for haddock, we may want to keep some of them. We probably also want -haddock to cause recompilation if an existing .hi-file doesn't contain the Docs.
  • Fix the tests. ./validate currently reports failures for TEST="HaddockIssue849 ImpSafeOnly01 ImpSafeOnly02 ImpSafeOnly03 ImpSafeOnly04 ImpSafeOnly05 ImpSafeOnly06 ImpSafeOnly07 ImpSafeOnly08 ImpSafeOnly09 ImpSafeOnly10 Splices T13168 T14304 T4029 T9881 break006 break011 break013 break024 cabal01 dynamicToo002 dynamicToo005 haddock.base haddock.compiler hist001 hist002 plugins09 plugins11 print026 safePkg01"

Several more (minor) issues are marked TODO in the source code.


The corresponding changes in haddock are here.

Possible further work

  • Add an option to configure the identifier delimiters the lexer looks for.
  • Add an option to make the lexer exclude birdtracks.
  • Make the :doc command also pretty-print arg docs.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
wip/hi-haddock
Lint
Lint WarningsExcuse: Sorry for the long lines!
SeverityLocationCodeMessage
Warningcompiler/deSugar/ExtractDocs.hs:68TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:84TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:139TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:149TXT3Line Too Long
Warningcompiler/deSugar/ExtractDocs.hs:167TXT3Line Too Long
Warningcompiler/hsSyn/HsDoc.hs:249TXT3Line Too Long
Warningcompiler/iface/MkIface.hs:1200TXT3Line Too Long
Warningcompiler/iface/MkIface.hs:1203TXT3Line Too Long
Warningcompiler/main/HscTypes.hs:2804TXT3Line Too Long
Warningcompiler/rename/RnEnv.hs:68TXT3Line Too Long
Warningcompiler/rename/RnNames.hs:55TXT3Line Too Long
Warningtestsuite/tests/ghc-api/annotations/stringSource.hs:59TXT3Line Too Long
Warningtestsuite/tests/ghc-api/annotations/stringSource.hs:67TXT3Line Too Long
Unit
No Unit Test Coverage
Build Status
Buildable 22268
Build 51620: [GHC] Linux/amd64: Continuous Integration
Build 51619: [GHC] OSX/amd64: Continuous Integration
Build 51618: [GHC] Windows/amd64: Continuous Integration
Build 51617: arc lint + arc unit
sjakobi created this revision.Aug 14 2018, 8:14 AM
sjakobi edited the summary of this revision. (Show Details)Aug 14 2018, 8:39 AM
sjakobi edited the summary of this revision. (Show Details)
thomie added a subscriber: thomie.Aug 16 2018, 10:23 PM

Please mention the Trac ticket number in the summary (click "Edit revision"). Create a ticket first if it doesn't exist yet.

Is it possible to have a flag to have old behaviour too (and possibly enabled by -Wall)? We currently use -haddock at work and rely on the behaviour to make sure none of our haddock comments are malformed even though we don't actually generate the docs themselves. Perhaps the code already does that: the diff is quite large so I haven't looked through it.

sjakobi added a comment.EditedAug 20 2018, 5:07 PM

Please mention the Trac ticket number in the summary (click "Edit revision"). Create a ticket first if it doesn't exist yet.

There is no ticket yet. What's the intention though? Do we need a separate place for discussing the design or do we simply need a stub ticket that we can reference?

Is it possible to have a flag to have old behaviour too (and possibly enabled by -Wall)? We currently use -haddock at work and rely on the behaviour to make sure none of our haddock comments are malformed even though we don't actually generate the docs themselves. Perhaps the code already does that: the diff is quite large so I haven't looked through it.

Do I understand correctly that you're worried about the overhead (but not the recompilation behaviour)? Which old behaviour do you want then? The behaviour as of 8.6 or the one we had before? If you're unhappy with the behaviour in 8.6, can you please make a ticket?

(Also note that D5057 already changes GHC's behaviour with malformed or misplaced docstrings.)

harpocrates requested changes to this revision.Aug 20 2018, 5:39 PM
harpocrates added inline comments.
compiler/parser/HaddockLex.x
79

This isn't right - some of the flags stored in ParserFlags impact how things get lexed. For instance, this is never going to properly lex Addr# as an identifier.

I think you probably want to thread the ParserFlags through from the PState of the main lexer. If that is too onerous, we should at least figure out what set of extensions to enable (maybe MagicHash really is the only one that impacts lexing of identifiers...).

This revision now requires changes to proceed.Aug 20 2018, 5:39 PM
sjakobi added inline comments.Aug 20 2018, 6:41 PM
compiler/parser/HaddockLex.x
79

Good catch! :)

I'm wondering though if we really want the current module's language extensions to impact identifier lexing in the docstrings. I think I'd want the identifier lexing always to be as liberal as possible. So my preference would be to simply enable MagicHash by default as I, too, am not aware of any other extensions that impact identifier lexing.

Do we need a separate place for discussing the design or do we simply need a stub ticket that we can reference?

Mostly the latter.

In general, every non-trivial diff/commit should have a ticket number. The idea is that Phabricator might disappear one day, but trac ticket numbers are forever. You can refer to them in Notes, refer to them from other tickets etc. If a patch causes a regression, or needs a few more tweaks later, or needs to be merged to stable branch, the ticket is where this discussion takes place. Tickets have milestones attached, so you can later find out in which release the feature/bugfix was first included. Some people get email notifications on tickets, but not on phabricator diffs.

alexbiehl requested changes to this revision.Aug 21 2018, 4:05 AM

I think this is a good start, thanks @sjakobi! However, there are a few serious problems to be solved.

  • DocStructure <-> Avails. We are serializing Avails twice in case of -haddock. We need a way to unify these.
  • mkMaps and other Haddock survivors. Some of the nasty bits from haddock moved into GHC with this patch (and the prior one). We should look for an alternative to store and extract documentation. Simon suggest putting documentation into IfaceDecl directly. Rendering would then have to traverse the IfaceDecls only.
compiler/deSugar/ExtractDocs.hs
44

I think this is fishy: Ideally we would want to avoid "extracting docs" iff -haddock is omitted. But this code seems to do the work nevertheless.

70

+1

89

I wonder why we need this function? It seems to strip off the Names of the document structures to avoid duplication of serialized Names. But Names are deduplicated anyways in the hi symbol table.

125

This is the explicit export list, right? Please say so.

130

I find these combinators far too generic. Please use explicit pattern matching ala

mkDocStructure .... (Just rn_exports) ... = 
  ... 
mkDocStructure ... (Just rn_decls) ... = 
  ...

This implies the need for handling the Nothing cases.

141

Explicit export list, ditto.

184

case for no explicit export list

221

Modules without explicit export lists can't have named chunks.

compiler/hsSyn/HsDoc.hs
136

magic? I assume the 2 here has something todo with delimiters?

262

Ok, this seems to include the Avails in several constructors. Means that we might serialize them twice (once with mi_exports and once with this type. Instead we should have a unified structure which can give all exports efficiently and resembles the doc structure more closely. (I think this is in TODOs already)

compiler/parser/HaddockLex.x
55

We should prepare to switch ByteStrings here. Strings really are expensive.

79

+1

compiler/utils/Binary.hs
1207

For safety reasons it might make sense to use fromList. We had serious fallouts with (accidental) non deterministic keys.

harpocrates added inline comments.Tue, Aug 21, 10:07 PM
compiler/deSugar/ExtractDocs.hs
277

Could you add in these changes: https://github.com/haskell/haddock/commit/133e9c2c168db19c1135479f7ab144c4e33af2a4#diff-f79609cfae78e8f536ce2b8f3c706f0f? This will will make docs on type families re-appear (so the TypeFamilies2 test case from the Haddock suite will pass).

382

Could you add back in the change from https://github.com/haskell/haddock/commit/5cd53e09b9bd268edbbdd72af9676edddf82f886? This will make the ConstructorArgs test case in the Haddock test suite pass again.

compiler/utils/Binary.hs
1207

Might it be possible to just add a (cheap) check that the list really is ascending? I recently opened a PR for this: https://github.com/kolmodin/binary/pull/153/files. Ditto for Set.

sjakobi added inline comments.Wed, Aug 22, 4:21 AM
compiler/utils/Binary.hs
1207

I think that is what fromList already does. From http://hackage.haskell.org/package/containers-0.6.0.1/docs/Data-Map-Lazy.html#v:fromList:

If the keys of the list are ordered, linear-time implementation is used, with the performance equal to fromDistinctAscList.

harpocrates added inline comments.Wed, Aug 22, 10:11 AM
compiler/utils/Binary.hs
1207

I was suggesting we fail early when the list is not ascending (or at least emit a warning). I'm not sure which is worse: silently taking longer to deserialize or panicking.

  • mkMaps and other Haddock survivors. Some of the nasty bits from haddock moved into GHC with this patch (and the prior one). We should look for an alternative to store and extract documentation. Simon suggest putting documentation into IfaceDecl directly. Rendering would then have to traverse the IfaceDecls only.

Extending IfaceDecl with docstrings has been considered before but was ruled out at the time as it wouldn't have been compatible with haddock's ignore-exports option. Now that we're removing that option anyway, we could try this avenue again. I'm not too keen on trying it in this patch though, as I feel it is sufficiently large and complex already.

It also isn't clear whether extending IfaceDecl would really get rid the mkMaps code or simply move it somewhere else where we want to access docstrings by Name.