[WIP] Hi Haddock: Enable haddock to generate docs from .hi-files
Needs ReviewPublic

Authored by sjakobi on Aug 14 2018, 8:14 AM.

Details

Summary

This extends my work from D4758 and enables haddock to generate docs
from information enclosed in the .hi interface files.

Main changes

  • HaddockLex contains an Alex-based lexer for docstrings that detects identifiers delimited with ' or backticks.
  • HsDoc.Docs comprises the DeclDocMap and ArcDocMap structures added in D4758 and other data needed by haddock.
  • haddock now uses GhcMake.load' to load .hi-files, hooking into GHC's recompilation avoidance. In order to avoid recompilation when a .hi-file already contains documentation we use a new option Opt_SkipIfaceVersionCheck.

TODOs and requests for comment

  • Work out a specification for a deterministic module structure (docs_structure) and update the existing code to implement it. The HaddockIssue849 test case is currently flaky because the ordering of constructors in the Avail for Maybe fluctuates.
  • Possibly fuse mi_exports and docs_structure. Currently both serialize the same Avails.
  • Get the recompilation avoidance right: While some of the checks performed in checkVersions are irrelevant for haddock, we may want to keep some of them. We probably also want -haddock to cause recompilation if an existing .hi-file doesn't contain the Docs.
  • Fix the tests. ./validate currently reports failures for TEST="HaddockIssue849 ImpSafeOnly01 ImpSafeOnly02 ImpSafeOnly03 ImpSafeOnly04 ImpSafeOnly05 ImpSafeOnly06 ImpSafeOnly07 ImpSafeOnly08 ImpSafeOnly09 ImpSafeOnly10 Splices T13168 T14304 T4029 T9881 break006 break011 break013 break024 cabal01 dynamicToo002 dynamicToo005 haddock.base haddock.compiler hist001 hist002 plugins09 plugins11 print026 safePkg01"

Several more (minor) issues are marked TODO in the source code.


The corresponding changes in haddock are here.

Possible further work

  • Add an option to configure the identifier delimiters the lexer looks for.
  • Add an option to make the lexer exclude birdtracks.
  • Make the :doc command also pretty-print arg docs.
There are a very large number of changes, so older changes are hidden. Show Older Changes
sjakobi edited the summary of this revision. (Show Details)Aug 14 2018, 8:40 AM
thomie added a subscriber: thomie.Aug 16 2018, 10:23 PM

Please mention the Trac ticket number in the summary (click "Edit revision"). Create a ticket first if it doesn't exist yet.

Is it possible to have a flag to have old behaviour too (and possibly enabled by -Wall)? We currently use -haddock at work and rely on the behaviour to make sure none of our haddock comments are malformed even though we don't actually generate the docs themselves. Perhaps the code already does that: the diff is quite large so I haven't looked through it.

sjakobi added a comment.EditedAug 20 2018, 5:07 PM

Please mention the Trac ticket number in the summary (click "Edit revision"). Create a ticket first if it doesn't exist yet.

There is no ticket yet. What's the intention though? Do we need a separate place for discussing the design or do we simply need a stub ticket that we can reference?

Is it possible to have a flag to have old behaviour too (and possibly enabled by -Wall)? We currently use -haddock at work and rely on the behaviour to make sure none of our haddock comments are malformed even though we don't actually generate the docs themselves. Perhaps the code already does that: the diff is quite large so I haven't looked through it.

Do I understand correctly that you're worried about the overhead (but not the recompilation behaviour)? Which old behaviour do you want then? The behaviour as of 8.6 or the one we had before? If you're unhappy with the behaviour in 8.6, can you please make a ticket?

(Also note that D5057 already changes GHC's behaviour with malformed or misplaced docstrings.)

harpocrates requested changes to this revision.Aug 20 2018, 5:39 PM
harpocrates added inline comments.
compiler/parser/HaddockLex.x
80

This isn't right - some of the flags stored in ParserFlags impact how things get lexed. For instance, this is never going to properly lex Addr# as an identifier.

I think you probably want to thread the ParserFlags through from the PState of the main lexer. If that is too onerous, we should at least figure out what set of extensions to enable (maybe MagicHash really is the only one that impacts lexing of identifiers...).

This revision now requires changes to proceed.Aug 20 2018, 5:39 PM
sjakobi added inline comments.Aug 20 2018, 6:41 PM
compiler/parser/HaddockLex.x
80

Good catch! :)

I'm wondering though if we really want the current module's language extensions to impact identifier lexing in the docstrings. I think I'd want the identifier lexing always to be as liberal as possible. So my preference would be to simply enable MagicHash by default as I, too, am not aware of any other extensions that impact identifier lexing.

Do we need a separate place for discussing the design or do we simply need a stub ticket that we can reference?

Mostly the latter.

In general, every non-trivial diff/commit should have a ticket number. The idea is that Phabricator might disappear one day, but trac ticket numbers are forever. You can refer to them in Notes, refer to them from other tickets etc. If a patch causes a regression, or needs a few more tweaks later, or needs to be merged to stable branch, the ticket is where this discussion takes place. Tickets have milestones attached, so you can later find out in which release the feature/bugfix was first included. Some people get email notifications on tickets, but not on phabricator diffs.

alexbiehl requested changes to this revision.Aug 21 2018, 4:05 AM

I think this is a good start, thanks @sjakobi! However, there are a few serious problems to be solved.

  • DocStructure <-> Avails. We are serializing Avails twice in case of -haddock. We need a way to unify these.
  • mkMaps and other Haddock survivors. Some of the nasty bits from haddock moved into GHC with this patch (and the prior one). We should look for an alternative to store and extract documentation. Simon suggest putting documentation into IfaceDecl directly. Rendering would then have to traverse the IfaceDecls only.
compiler/deSugar/ExtractDocs.hs
42

I think this is fishy: Ideally we would want to avoid "extracting docs" iff -haddock is omitted. But this code seems to do the work nevertheless.

70

+1

80

I wonder why we need this function? It seems to strip off the Names of the document structures to avoid duplication of serialized Names. But Names are deduplicated anyways in the hi symbol table.

116

This is the explicit export list, right? Please say so.

121

I find these combinators far too generic. Please use explicit pattern matching ala

mkDocStructure .... (Just rn_exports) ... = 
  ... 
mkDocStructure ... (Just rn_decls) ... = 
  ...

This implies the need for handling the Nothing cases.

132

Explicit export list, ditto.

175

case for no explicit export list

212

Modules without explicit export lists can't have named chunks.

compiler/hsSyn/HsDoc.hs
130

magic? I assume the 2 here has something todo with delimiters?

283

Ok, this seems to include the Avails in several constructors. Means that we might serialize them twice (once with mi_exports and once with this type. Instead we should have a unified structure which can give all exports efficiently and resembles the doc structure more closely. (I think this is in TODOs already)

compiler/parser/HaddockLex.x
56

We should prepare to switch ByteStrings here. Strings really are expensive.

80

+1

compiler/utils/Binary.hs
1226

For safety reasons it might make sense to use fromList. We had serious fallouts with (accidental) non deterministic keys.

harpocrates added inline comments.Aug 21 2018, 10:07 PM
compiler/deSugar/ExtractDocs.hs
237–238

Could you add in these changes: https://github.com/haskell/haddock/commit/133e9c2c168db19c1135479f7ab144c4e33af2a4#diff-f79609cfae78e8f536ce2b8f3c706f0f? This will will make docs on type families re-appear (so the TypeFamilies2 test case from the Haddock suite will pass).

347–349

Could you add back in the change from https://github.com/haskell/haddock/commit/5cd53e09b9bd268edbbdd72af9676edddf82f886? This will make the ConstructorArgs test case in the Haddock test suite pass again.

compiler/utils/Binary.hs
1226

Might it be possible to just add a (cheap) check that the list really is ascending? I recently opened a PR for this: https://github.com/kolmodin/binary/pull/153/files. Ditto for Set.

sjakobi added inline comments.Aug 22 2018, 4:21 AM
compiler/utils/Binary.hs
1226

I think that is what fromList already does. From http://hackage.haskell.org/package/containers-0.6.0.1/docs/Data-Map-Lazy.html#v:fromList:

If the keys of the list are ordered, linear-time implementation is used, with the performance equal to fromDistinctAscList.

harpocrates added inline comments.Aug 22 2018, 10:11 AM
compiler/utils/Binary.hs
1226

I was suggesting we fail early when the list is not ascending (or at least emit a warning). I'm not sure which is worse: silently taking longer to deserialize or panicking.

  • mkMaps and other Haddock survivors. Some of the nasty bits from haddock moved into GHC with this patch (and the prior one). We should look for an alternative to store and extract documentation. Simon suggest putting documentation into IfaceDecl directly. Rendering would then have to traverse the IfaceDecls only.

Extending IfaceDecl with docstrings has been considered before but was ruled out at the time as it wouldn't have been compatible with haddock's ignore-exports option. Now that we're removing that option anyway, we could try this avenue again. I'm not too keen on trying it in this patch though, as I feel it is sufficiently large and complex already.

It also isn't clear whether extending IfaceDecl would really get rid the mkMaps code or simply move it somewhere else where we want to access docstrings by Name.

Something to not forget: once we bump the Haddock submodule again, we'll need to add the {-# OPTIONS_HADDOCK print-explicit-runtime-reps #-} option (introduced here) to all the modules where we really want RuntimeRep to show up in the generated docs. Off the top of my head, I'd want that to include:

  • all of the ghc-prim modules
  • GHC.Exts and maybe even Data.Function (so we'll have a levity polymorphic signature of ($) somewhere)

It is somewhat unfortunate that there is no place to document the levity polymorphic signatures of error and undefined (since those only show up in the Prelude docs, and we surely do not want mentions of RuntimeRep there).

sjakobi updated the Trac tickets for this revision.Sat, Oct 20, 6:29 AM
sjakobi updated this revision to Diff 18406.Mon, Oct 22, 12:08 PM
  • Port over some fixes to 'ExtractDocs'
  • Add Haddock 'print-explicit-runtime-reps' in libs
sjakobi updated this revision to Diff 18409.Mon, Oct 22, 5:23 PM
  • Use extractDocs' iff we have -haddock
  • mkDocStructure: Clarify handling of explicit export list
  • mkDocStructureFromExportList: Misc cleanup
sjakobi marked 5 inline comments as done.Mon, Oct 22, 5:28 PM
sjakobi updated this revision to Diff 18411.Mon, Oct 22, 7:24 PM
  • Remove redundant import
sjakobi updated this revision to Diff 18448.Thu, Oct 25, 2:22 PM
  • Don't split identifiers off docstrings for serialization
sjakobi marked an inline comment as done.Thu, Oct 25, 2:23 PM
sjakobi added inline comments.Fri, Oct 26, 12:52 AM
compiler/deSugar/ExtractDocs.hs
237–238

I think this was fixed during a rebase or so?!

347–349

Dito.

harpocrates added inline comments.Fri, Oct 26, 12:54 AM
compiler/deSugar/ExtractDocs.hs
237–238

Yep, right after the rebase (in Port over some fixes to 'ExtractDocs'). :)

347–349

Dito.

sjakobi marked 6 inline comments as done.Fri, Oct 26, 12:57 AM
sjakobi updated this revision to Diff 18464.Fri, Oct 26, 10:27 AM
  • Binary instances for ordered containers: Simply use fromList
sjakobi added inline comments.Fri, Oct 26, 10:33 AM
compiler/utils/Binary.hs
1226

Since these questions are kind of orthogonal to the purpose of this patch, let's just use fromList for now:

According to containers docs this shouldn't be much slower:

> If the keys of the list are ordered, linear-time implementation is
> used, with the performance equal to fromDistinctAscList.

Once https://github.com/haskell/containers/issues/405 is resolved,
there may be more point in trying for faster deserialization of
containers with deterministically ordered keys.

sjakobi marked 5 inline comments as done.Fri, Oct 26, 10:34 AM
sjakobi marked an inline comment as done.Fri, Oct 26, 10:43 AM
sjakobi updated this revision to Diff 18466.Fri, Oct 26, 10:51 AM
  • getNamedChunks: Add docs
sjakobi marked an inline comment as done.Fri, Oct 26, 10:52 AM
sjakobi updated this revision to Diff 18467.Fri, Oct 26, 11:16 AM
  • extractDocs': Rely on tcg_rn_decls to be Just
  • A small haddock improvement
sjakobi updated this revision to Diff 18468.Fri, Oct 26, 11:38 AM
  • appendHsDoc: Clarify
sjakobi marked an inline comment as done.Fri, Oct 26, 11:39 AM
sjakobi added inline comments.Fri, Oct 26, 11:48 AM
compiler/parser/HaddockLex.x
56

Are you asking for a change in this patch here, Alex?

sjakobi updated this revision to Diff 18469.Fri, Oct 26, 12:02 PM
  • HaddockLex: Simplify a type
harpocrates added inline comments.Fri, Oct 26, 11:54 PM
compiler/parser/HaddockLex.x
80

This is more involved that just

-    let pflags = ParserFlags EnumSet.empty EnumSet.empty (stringToUnitId "") 0
+    let pflags = ParserFlags EnumSet.empty (EnumSet.fromList [MagicHash]) (stringToUnitId "") 0

because the 0 is a bitset that also takes into account language extensions. In order to work around this is a sane fashion, I've opened D5269. With that, we'll want:

-    let pflags = ParserFlags EnumSet.empty EnumSet.empty (stringToUnitId "") 0
+    let pflags = mkParserFlags'
+                   EnumSet.empty (EnumSet.fromList [MagicHash])
+                   (stringToUnitId "")
+                   False False False False False
sjakobi updated this revision to Diff 18642.Fri, Nov 9, 7:03 PM
  • Bump haddock
  • Speed up utf8SplitAtByteString
sjakobi updated this revision to Diff 18643.Fri, Nov 9, 9:35 PM
  • Fix serialization of language extensions
sjakobi updated this revision to Diff 18644.Fri, Nov 9, 10:52 PM
  • Break some overlong lines
  • Strictly serialize the mi_docs constructor, lazily serialize the Docs inside
sjakobi added inline comments.Sat, Nov 10, 2:54 AM
compiler/parser/HaddockLex.x
6
sjakobi updated this revision to Diff 18646.Sat, Nov 10, 3:15 AM
  • HaddockLex: Add TODO
  • rnHsDocIdentifier: Check an invariant
sjakobi added a subscriber: Tritlo.Sat, Nov 10, 4:00 AM
sjakobi added inline comments.
compiler/typecheck/TcHoleErrors.hs
478

@Tritlo What do you think about this? mb_docs == Nothing implies that -haddock is turned off.

If we want a warning for this case, do we also want them for the modules addressed by lookupInIface below? Potentially this could get very verbose…

Tritlo added inline comments.Sat, Nov 10, 6:11 AM
compiler/typecheck/TcHoleErrors.hs
478

I think we should avoid being too verbose if we can, but I think a warning should be in order. Maybe the following procedure: if the user asked for documentation, and one or more of the displayed suggestions does not have documentation (due to haddock being off), then add a message to the end to the effect of "Some documentation could not be found. Make sure the modules were compiled with '-haddock'". Would that be preferable?