Remove Cabal dependency from GHC
AbandonedPublic

Authored by hvr on Aug 22 2014, 10:44 AM.

Details

Reviewers
ezyang
austin
duncan
Apply Patch
arc patch D172
Trac Issues
#8244
Summary

(Uploaded on behalf of Duncan)

We can serialise directly, without having to convert some fields to
string first.

(Part of preparitory work for removing the compiler's dep on Cabal)

Drop support for single-file style package databases

Historically the package db format was a single text file in Read/Show
format containing [InstalledPackageInfo]. For several years now the
default format has been a directory with one file per package, plus a
binary cache.

The old format cannot be supported under the new scheme where the
compiler will not depend on the Cabal library (because it will not
have access to the InstalledPackageInfo type) so we must drop support.
It would still technically be possible to support a single text file
style db (but containing a different type), but there does not seem to
be any compelling reason to do so.

(Part of preparitory work for removing the compiler's dep on Cabal)

Improve the ghc-pkg warnings for missing and out of date package cache files

In particular, report when it's missing, and also report it for ghc-pkg check.
Also make the warning message more explicit, that ghc will not be able to
read these dbs, even though ghc-pkg may be able to.

Introduce new file format for the package database binary cache

The purpose of the new format is to make it possible for the compiler
to not depend on the Cabal library. The new cache file format contains
more or less the same information duplicated in two different sections
using different representations.

One section is basically the same as what the package db contains now,
a list of packages using the types defined in the Cabal library. This
section is read back by ghc-pkg, and used for things like ghc-pkg dump
which have to produce output using the Cabal InstalledPackageInfo text
representation.

The other section is a ghc-local type which contains a subset of the
information from the Cabal InstalledPackageInfo -- just the bits that
the compiler cares about.

The trick is that the compiler can read this second section without
needing to know the representation (or types) of the first part. The
ghc-pkg tool knows about both representations and writes both.

This patch introduces the new cache file format but does not yet use it
properly. More patches to follow. (As of this patch, the compiler reads
the part intended for ghc-pkg so it still depends on Cabal and the
ghc-local package type is not yet fully defined.)

Use ghc-local types for packages, rather than Cabal types

Also start using the new package db file format properly, by using the
ghc-specific section.

This is the main patch in the series for removing the compiler's dep
on the Cabal lib.

Move Cabal Binary instances from bin-package-db to ghc-pkg itself

The ghc-pkg program of course still depends on Cabal, it's just the
bin-package-db library (shared between ghc and ghc-pkg) that does not.

Drop ghc library dep on Cabal

Make binary a boot package

Since ghc-pkg needs a relatively recent version.

Fix warnings arising from the package db refactoring

Fix more warnings, remove now-dead code

Also remove more of the old file style ghc-pkg dbs that I missed previously.

Test Plan

validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
wip/remove-cabal-dep
Lint
Lint WarningsExcuse: don't trust auto-fixes
SeverityLocationCodeMessage
Warningcompiler/main/Packages.lhs:1366TXT3Line Too Long
Auto-Fixcompiler/main/PackageConfig.hs:110TXT6Trailing Whitespace
Unit
No Unit Test Coverage
Build Status
Buildable 626
Build 960: GHC Patch Validation (amd64/Linux)
Build 628: GHC Patch Validation (amd64/Linux)
ezyang created this revision.Aug 22 2014, 10:44 AM
ezyang added reviewers: austin, hvr.
ezyang updated the Trac tickets for this revision.
ezyang updated the summary for this revision. (Show Details)Aug 22 2014, 10:46 AM

Some minor comments, but looks good! Ship it!

compiler/main/Finder.lhs
618–625

Hm, a type signature on this function would be handy.

compiler/main/PackageConfig.hs
49

Maybe a remark about why it's OK to represent these data types as strings, as opposed to ModuleName and PackageKey (namely, because the equality comparisons we do for them are not on the hotpath)

66

Do we seriously not have a bytestring to faststring conversion function?

compiler/main/Packages.lhs
385

Isn't this TODO done?

562

I think it might be better to convert the string to a FastString than vice versa.

ghc.mk
386

Oh, do we really assume binary isn't recent enough?

libraries/bin-package-db/GHC/PackageDb.hs
155

It's worth remarking that the instance here is not defined in this package!

229

I added a PR to put this in binary proper, maybe mention that.

utils/ghc-pkg/Main.hs
598–599

Wait, but the conditional doesn't look and see if the command is check/modify?

697–698

Can we refactor out this condition?

ezyang retitled this revision from "Simplify conversion in binary serialisation of ghc-pkg db" to "Remove Cabal dependency from GHC".Aug 24 2014, 5:51 PM
ezyang updated this revision to Diff 438.Aug 24 2014, 5:52 PM
  • Drop support for single-file style package databases
  • Improve the ghc-pkg warnings for missing and out of date package cache files
  • Introduce new file format for the package database binary cache
  • Use ghc-local types for packages, rather than Cabal types
  • Move Cabal Binary instances from bin-package-db to ghc-pkg itself
  • Drop ghc library dep on Cabal
  • Make binary a boot package
  • Fix warnings arising from the package db refactoring
  • Fix long lines and trailing whitespace
  • Remove a TODO that is now done
  • Add a ghc -show-packages mode to display ghc's view of the package env
  • Make mkFastStringByteString pure and fix up uses
  • Switch the package id types to use FastString (rather than String)
  • Fix string conversions in ghc-pkg to be correct w.r.t. Unicode
  • Address a number of Edward's code review comments

LGTM.

compiler/utils/FastString.lhs
383–388

Why did this get inlinePerformIO'ified?

Question thats related: is Duncan's WIP new binary major version out yet? (or is that unrelated to the use of Binary in ghc?)

duncan added a subscriber: duncan.
duncan removed a subscriber: duncan.
In D172#12, @carter wrote:

Question thats related: is Duncan's WIP new binary major version out yet? (or is that unrelated to the use of Binary in ghc?)

Unrelated.

compiler/utils/FastString.lhs
383–388

as opposed to unsafePerformIO? Because that's what FastString uses for all of these (though I suspect it's unwise). For example see the appendFS which is using inlinePerformIO.

If you mean, why do we need it to be pure? Because we need to use it in a pure context elsewhere. It's being used in a pure context already (appendFS), and it is morally pure so it's the right way round.

As an independent thing, I think all the uses of inlinePerformIO throughout the FastString module should be audited. I would avoid allocating inside an inlinePerformIO.

ezyang added inline comments.Aug 27 2014, 8:38 AM
compiler/utils/FastString.lhs
383–388

OK.

duncan commandeered this revision.Aug 27 2014, 10:44 AM
duncan edited reviewers, added: ezyang; removed: duncan.

Now regained my logon credentials :-)

duncan updated this revision to Diff 448.Aug 27 2014, 10:51 AM

Hopefully add a few extra patches...

No idea if I'm using arc correctly. We'll see...
Add patches for the testsuite, haddock and Cabal updates
The ./validate now passes, except for a couple unrelated test failures.

duncan updated this revision to Diff 449.Aug 27 2014, 10:56 AM

Ok, that didn't work, lets try again...

ezyang accepted this revision.Aug 27 2014, 11:07 AM

LGTM.

austin requested changes to this revision.Aug 27 2014, 12:31 PM

Generally LGTM, although please kill the dead code/trailing comments.

Also, should the behavioral change of ghc-pkg now requiring ghc-pkg init (e.g. in the testsuite) be documented? I am not sure if anyone other than us relied on it, but perhaps noting it if worthy (in the release notes).

I think the change should at least also include a release note modification saying the compiler no longer depends on the Cabal library.

compiler/main/PackageConfig.hs
110

Kill this trailing whitespace.

(NB, regarding the linting excuse: 'auto fixes' are only suggested in the case the lint rule actively suggests a replacement for the given line. Not every linting rule does this: for example, the 'trailing whitespace' suggestion can autofix - the suggested line is trivial - but 80-column violations can not be reliably autofixed by the generic linter. In general it should be quite safe to let Arcanist fix things like this if you miss them.)

compiler/main/Packages.lhs
1366

Fixing this would be nice, JFYI. :)

compiler/utils/Binary.hs
684

Is there any reason for this change? If it improves things or fixes a leak (somehow) it should be noted, at least (aside: unfortunately we can't use <$!> or anything here yet as an alternative.)

utils/ghc-pkg/Main.hs
60

Kill this line.

This revision now requires changes to proceed.Aug 27 2014, 12:31 PM

I can't comment on the details, but I'm very supportive of the direction of travel.

Would it be possible to describe the new architecture -- the pieces and how they fit together -- on a GHC Wiki page? That would be super-helpful.

Simon

In D172#33, @simonpj wrote:

I can't comment on the details, but I'm very supportive of the direction of travel.

Cheers.

Would it be possible to describe the new architecture -- the pieces and how they fit together -- on a GHC Wiki page? That would be super-helpful.

https://ghc.haskell.org/trac/ghc/wiki/CabalDependency

compiler/utils/Binary.hs
684

It's really not a change. Previously it was in IO and so ran strictly. Now that it's pure, the equivalent is to evaluate it strictly here.

I didn't know if it was important or not that it be evaluated strictly here, so I kept it strict rather than change it to be lazy.

duncan added a comment.EditedAug 27 2014, 11:58 PM
In D172#30, @austin wrote:

Generally LGTM,

Thanks for the review.

although please kill the dead code/trailing comments.

Done.

Also, should the behavioral change of ghc-pkg now requiring ghc-pkg init (e.g. in the testsuite) be documented? I am not sure if anyone other than us relied on it, but perhaps noting it if worthy (in the release notes).

Documented in the release notes.

I think the change should at least also include a release note modification saying the compiler no longer depends on the Cabal library.

Done.

All the changes are in the wip/remove-cabal-dep branch.

So as far as I know, we're good to go. I did have a couple test failures, but I'm fairly sure they're nothing to do with these changes and were pre-existing.

hvr commandeered this revision.Sep 23 2014, 12:02 PM
hvr edited reviewers, added: duncan; removed: hvr.

This has been committed. We used the wip/ branch rather than this diff since it was a whole series of patches that we didn't want to squash into one.

I can't abandon the revision however apparently as I'm not the owner (though I am the author -- clearly something I don't understand)

hvr abandoned this revision.Sep 23 2014, 12:03 PM

This was landed manually, see commits 6d8c70c1262a0f8b02ee685905f469f94d742af2 and earlier