Introduce module hierarchy
Needs RevisionPublic

Authored by hsyl20 on Wed, Jun 14, 8:20 PM.

Details

Summary

Before this patch GHC modules were (almost) all directly in the root
namespace and had short cryptic names (e.g., UniqDFM, RdrHsSyn).

  1. the structure of the codebase was hard to understand and could be

confusing/intimidating for newcomers

  1. using GHC API could easily lead to module name clashes (e.g., Config,

Parser and Name module names were taken by GHC)

  1. GHC API documentation was hard to navigate

This patch renames modules only! Some of them should be split and edited
to better reflect the compiler structure but this is left for future
work.

Note, however, that it's not a quick-and-dirty query-replace. I have
tried to fix all references to GHC modules (in comments, notes, etc.).
Please use module names (e.g., GHC.IR.Haskell.Parser.Syntax) instead of
file names (e.g., parser/RdrHsSyn.hs) from now on to make future renaming
easier. As new module names are longer, I have had to fix some style
issues too (import list alignment, 80-char limit, etc.).

Here is the outline of the new hierarchy (in GHC. namespace):

  • IR: intermediate representations. Each one contains its syntax and stuff manipulating it
    • Haskell
      • Syntax, Parser, Lexer, Printer, Analyser, TypeChecker, Renamer, Deriver
    • Core
      • Syntax, Analyser
      • Transformer.{Simplifier,Specialiser,Vectoriser,WorkerWrapper,FloatIn,FloatOut...}
    • Stg
      • Syntax, Analyser
      • Transformer.{CommonSubExpr,CostCentreCollector,Unariser}
    • Cmm
      • Syntax, Parser, Lexer, Printer, Analyser
      • Transformer.{CommonBlockElim,ConstantFolder,Dataflow,ShortCutter,Sinker}
    • ByteCode.{Assembler,Linker...}
    • Interface.{Loader,Renamer,TypeChecker, Transformer.Tidier}
    • Llvm.{Syntax, Printer}
  • Compiler: converters between representations
    • HaskellToCore, CoreToStg, StgToCmm, CmmToAsm
    • CmmToLlvm, CmmToC, CoreToByteCode, CoreToInterface
    • TemplateToHaskell
  • Entity: entities shared by different phases of the compiler (Class, Id, Name, Unique, etc.)
  • Builtin: builtin stuff
    • Primitive.{Types,Operations}: primitives
    • Names, Types, Uniques: other wired-in stuff
  • Program: GHC-the-program (command-line parser, etc.) and its modes
    • Driver.{Phases,Pipeline}
    • Backpack
    • Make, MakeDepend
  • Interactive: interactive stuff (debugger, closure inspector, interpreter, etc.)
  • Data: data structures (Bag, Tree, etc.)
  • Config: GHC configuration
    • HostPlatform: host platform constants
    • Build: constants generated at build time
    • Flags: dynamic configuration (DynFlags)
  • Packages: package management stuff
  • RTS: interaction with the runtime system (closure and table representation)
  • Utils: utility code (e.g., Outputable, SysTools, Elf, Finder, etc.)
  • Plugin: modules to import to write compiler plugins

The actual renaming is stored on the wiki to keep a cross-reference with the old
module names: https://ghc.haskell.org/trac/ghc/wiki/CodeBaseCleanup/ModuleRenaming

Important: to make GIT merge/diff detect all renamed files, you may need
to increase the renameLimit:

git config --global diff.renameLimit 600

and to decrease the "find-renames" threshold. E.g.,

git merge this_patch --find-renames=10%

git diff origin/master --summary --find-renames=10%

The latter command should only report a single deleted file. All the others
are considered to be renamed. This is important to track file history!

Test Plan

validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
T13009
Lint
Lint WarningsExcuse: core-spec.pdf is in the repo for convenience
SeverityLocationCodeMessage
Warningdocs/core-spec/core-spec.pdf:ExtJsonLint
Unit
No Unit Test Coverage
Build Status
Buildable 16249
Build 28983: [GHC] Linux/amd64: Patch building
Build 28982: [GHC] OSX/amd64: Continuous Integration
Build 28981: [GHC] Windows/amd64: Continuous Integration
Build 28980: arc lint + arc unit
hsyl20 created this revision.Wed, Jun 14, 8:20 PM
ezyang added a subscriber: ezyang.Wed, Jun 14, 9:51 PM

I'm a grumpy curmudgeon and don't want to have to learn new locations for modules, but don't let me stop you :) One thing that would be kinder to all GHC API users (who are now going to have to rewrite all of their imports) is to create an old-ghc package which reexports all of the modules under their old names. Might be helpful.

hsyl20 updated this revision to Diff 12838.Wed, Jun 14, 10:49 PM
  • Minor fixes

@ezyang Good idea!

Gosh, tha's a BIG patch. Thank you for investing so much effort.

However I wish we'd discussed the specific proposed new hierarchy before you invested all that time. For example, I'd like to remove the IR/ level, and just have Cmm, Core etc. It doesn't pay its way in my view.

More generally, I think we should check with ghc-devs that we really want to do this. (Only I responded on the ticket at the time, and I was neutral.) There's a significant cost to re-learning where everything is.

  1. Do we want to do this? I think we should, being careful to provide a smooth migration path for existing users. Let's discuss how exactly to do this before any changes take place.
  1. If What specifically should the new module hierarchy look like? The structure here is not the one I would choose (I agree with @simonpj that IR should go away for example). I imagine we'll need to go a few rounds on this to converge on something that everyone is happy with.

@hsyl20, this is quite an impressive effort. I agree with @simonpj that it probably would have been wise to discuss the concrete renaming a bit before committing all of this work to it, but oh well.

I'm not opposed to this, assuming that it doesn't make rebasing significantly harder. I think @ezyang's concern about the cost to developers isn't too serious; tools like hasktags are capable of answering these sorts of queries in a fraction of a second. I am, however, a bit concerned about the cost to plugin maintainers. While we technically only guarantee the stability of the GHC module, reshuffling the entire module hierarchy is another thing entirely. How much work is this going to create for the maintainers of such projects (especially since they likely need to retain compatibility with older compilers as well)? Can we at least provide some sed or ghc-exactprint scripts to help here?

Just so we understand what the costs to you are to change this patch, have you automated this refactoring at all or was this done entirely by hand? How much does it take to rebase it? On that note, will git be able to robustly cherry pick across such a massive renaming? Have you tried, for instance, taking an open Differential from Phabricator, applying it to your un-renamed tree, then rebasing it on to your renamed branch?

Also, I should mention that I'll be off hiking the White Mountains with @philderbeast today so I'm afraid I'll not be able to respond to discussion until tomorrow morning.

And finally, thanks for your effort, @hsyl20!

hsyl20 added a comment.EditedThu, Jun 15, 8:47 AM

I didn't want to discuss too much the details of the new hierarchy beforehand because I wasn't sure I would have the patience to finish this patch anyway ;-).

Considering the proposed hierarchy, I have been loosely following the old proposal on the wiki (https://ghc.haskell.org/trac/ghc/wiki/ModuleDependencies/Hierarchical) but adapted it.
I have iterated quite a few times and indeed I started without the IR level but it felt wrong for GHC.Haskell (isn't GHC all about Haskell?) and GHC.Core ("core" is misleading and I wouldn't think it is an IR if I wouldn't already know it is). Moreover I didn't want to pollute too much the top-level.
In general I have tried to make module names meaningful and unambiguous, even to someone that is first exposed to them. For instance, just by looking at the outline (e.g., http://hsyl20.fr/ghc_doc/), even if we don't know what FloatOut is, we know it's a transformation on Core IR.

My main concern was that main (and former) GHC devs may consider GHC module names to be immutable: so I would like to thank you (@simonpj, @simonmar, @ezyang, @bgamari) already for not being opposed to this kind of invasive change. I totally understand how replacing Desugarer with HaskellToCore or NCG (that isn't new anymore) with CmmToAsm could feel, but in my opinion it reduces a lot of the cognitive overhead for newcomers and future maintainers/developers that don't have to learn about GHC's history to understand the names in the code. The trade-off is that it puts cognitive overhead on you for some time...

Considering the transition period and how to make it smoother:

  • I have tried to apply a patch on my branch (arc patch --nobranch D3646) and it works as expected: git correctly finds renamed files.
  • I think we should provide an old-ghc package as proposed by @ezyang for ghc-api users that don't want to switch to the new API right now.
  • Plugin writers shouldn't be affected much if they included GhcPlugins (which reexports most of the other stuff): they just have to switch to GHC.Plugin instead (or add '-package "ghc (GHC.Plugin as GhcPlugins)"' command-line parameter).

Considering changing the renaming:

  • of course, the current renaming is only a proposal so we can debate it (in the Trac #13009 ticket).
  • replacing "import Whatever" with "import GHC.XXX.Whatever" wasn't the most difficult thing:
    • I have used sed for that. Except for name clashes (e.g. "import Parser" in the testsuite), mistakes (GHC.Types already exists in ghc-prim) and qualified imports (replacing "import X" with "import GHC.XXX as X"), it worked well.
    • we can provide such sed scripts to users. I have never used ghc-exactprint but it seems to be overkill for this.
    • the boring part was to make modified imports look nice manually (aligning "(", 80-char limit, etc.), but this is not mandatory and another independent tool could do it (does it exist?)
  • the worst part was to fix references to modules in comments. But now that I have replaced those that I have found (most file paths, etc.) with qualified module names we can relatively easily sed them too!
    • we still have to fix style manually (80-char limit)
    • this shouldn't be a problem for codes outside of GHC
  • Summary: yes we can relatively easily change the proposed renaming (using sed), even more easily if we don't care about style (80-char limit, alignment). We can provide some sed scripts to help users convert their imports.

As we use the textual IR for LLVM that should be explicit in the. Naming as well. IR.LLVM.Textual. I have the Data.Bitcode packages which I hope to eventually bring into GHC (it's just not high priority right now) and they would live in IR.LLVM.Bitcode with the proposed outline.

bgamari requested changes to this revision.Wed, Jun 21, 2:56 PM

As noted on the ticket, we are going to run the propose module structure through the proposal process.

This revision now requires changes to proceed.Wed, Jun 21, 2:56 PM