Compile modules that are needed by template haskell, even with -fno-code.
ClosedPublic

Authored by duog on Apr 10 2017, 5:49 PM.

Details

Summary

This patch relates to Trac Trac #8025

The goal here is to enable typechecking of packages that contain some
template haskell. Prior to this patch, compilation of a package with -fno-code
would fail if any functions in the package were called from within a splice.

downsweep is changed to do an additional pass over the modules, targetting
any ModSummaries transitively depended on by a module that has
LangExt.TemplateHaskell enabled. Those targeted modules have hscTarget changed
from HscNothing to the default target of the platform.

There is a small change to the prevailing_target logic to enable this.

A simple test is added.

I have benchmarked with and without a patched haddock (available:https://github.com/duog/haddock/tree/wip-no-explicit-th-compilation).
Running cabal haddock on the wreq package results in a 25% speedup on my machine:

time output from patched cabal haddock:

real 0m5.780s
user 0m5.304s
sys 0m0.496s
time output from unpatched cabal haddock:

real 0m7.712s
user 0m6.888s
sys 0m0.736s

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
duog created this revision.Apr 10 2017, 5:49 PM
duog updated this revision to Diff 12057.Apr 10 2017, 5:54 PM
  • submodule fix
duog updated this revision to Diff 12058.Apr 10 2017, 6:16 PM
  • Removing wreq benchmark
duog edited the summary of this revision. (Show Details)Apr 10 2017, 6:19 PM
bgamari edited edge metadata.
bgamari added a subscriber: DanielG.

This looks pretty reasonable to me.

The question about what HscTarget we want to use for the Template Haskell modules is still outstanding. I suppose given that Template Haskell modules are probably relatively rare, we should probably just use the conservative option, compiling to object code as you have done here. That being said, as I've expressed elsewhere I think HscInterpreted really does feel more appropriate here. You might think that you could just make the target conditional on whether or not the module uses UnboxedTuples, but 1) this seems a tad complicated, and 2) it won't handle the case where an unboxed tuple is introduced via inlining (which is likely rare, but not impossible).

duog added a comment.Apr 10 2017, 7:45 PM

Are there cases where one would want to use asm or llvm instead of the platform default? I think a flag specifying the template-haskell-in-nocode-mode-target, defaulting to interpreted, would work, but is it worth adding?

Also, after this change -fno-code is not quite true anymore, more like -falmost-no-code. Is that a concern?

In D3441#98338, @duog wrote:

Are there cases where one would want to use asm or llvm instead of the platform default? I think a flag specifying the template-haskell-in-nocode-mode-target, defaulting to interpreted, would work, but is it worth adding?

I don't know; I really would like this to "just work", even if this does mean that things are a tad slower. Making the user work around (relatively arbitrary, from the user's perspective) limitations of GHC's interpreter makes for a rather poor user experience. If nothing else we should at very least compile the TH modules with -O0. I wonder how much interpreted and the NCG would differ in that case.

Also, after this change -fno-code is not quite true anymore, more like -falmost-no-code. Is that a concern?

I agree that no-code is a misnomer which is a bit unfortunate. If it weren't for the fact that we will now unexpectedly produce object files I would say this is fine; however, the fact that we are modifying the user's filesystem means that this is not necessarily an innocent lie. I can think of a three options,

  1. Figure out how to write object files to a temporary location so the user's tree isn't touched
  2. Call this, e.g., -fth-only-code instead of -fno-code.
  3. Document our way out of the situation

None of these are terribly enticing, however.

@ezyang, do you have an opinion on the above?

I would be inclined to go with HscAsm. Is bytecode that much faster?

ezyang edited edge metadata.Apr 11 2017, 2:47 PM

I lean towards interpreter (because I want -fno-code to run as quickly as possible), but what seems important to me is the ability to toggle between the two, similarly to how -fobject-code can be used to force GHCi to asm rather than interpret.

I don't know, hopefully asm isn't too bad with optimization disabled. However, I still am a bit concerned about the compiler touching object files despite the user's rather explicit instructions not to. Maybe I am making too much of this issue though.

I don't know, hopefully asm isn't too bad with optimization disabled. However, I still am a bit concerned about the compiler touching object files despite the user's rather explicit instructions not to. Maybe I am making too much of this issue though.

Oh yes, that certainly would not be very nice. Can we use the -odir and -hidir flag internals to this end?

Oh yes, that certainly would not be very nice. Can we use the -odir and -hidir flag internals to this end?

Yes, I think that would be the right solution.

duog added a comment.Apr 11 2017, 5:16 PM

I would be inclined to go with HscAsm. Is bytecode that much faster?

I will do some tests to determine this.

Oh yes, that certainly would not be very nice. Can we use the -odir and -hidir flag internals to this end?

Yes, I think that would be the right solution.

What about the case where the .hi or .o files already exist? If are not out of date, we should surely use them where they are. If they are out of date then I guess it is best to not touch them.

Writing them to temporary files would mean that the modules will be codegened on every ghc -fno-code invocation.

-fwrite-interface seems to be exclusively for writing interface files in -fno-code. It makes some sense to also write the object files if that flag is given, though again it's name becomes worse. Optimisation should not be disabled if we did that.

So, to explain the history behind -fwrite-interface, it was added (ab105f83dcd5f9094a9edb0f0c8266fba6f3c808) because I noticed that -fno-code wasn't writing out interface files, and this meant that recompilation didn't work in "flycheck" mode because you couldn't avoid retypechecking a file that hadn't changed. I added a new flag rather than change the behavior of -fno-code because it was (a) more convenient from an implementation perspective, and (b) preserved BC (for arguably bad behavior.)

The contract that -fno-code -fwrite-interface currently establishes is that we will write hi files to the appropriate places, but we will NOT write o files. This is reflected in how we do recompilation avoidance; in normal compilation the object file is the final source of truth w.r.t. whether or not we need to recompile, whereas with -fno-code -fwrite-interface the interface file is the source of truth. (2223e196b2dc5340d70e58be011c279d381b4319). If you change these semantics, you will need to make sure that recomp avoidance correctly switches to checking object files in this case. It seems to me that it would be easier if we continued not to write object files.

duog added a comment.Apr 11 2017, 9:35 PM

I've done some benchmarks of using HscInterpreted vs HscAsm(no optimisation), and I'm seeing HscAsm being slightly faster. Of course this is on a DEVEL flavour build, so it's probably meaningless. I don't have the horsepower to build an optimized stage 2.

Here is a gist of all my benchmark code
https://gist.github.com/duog/d1e03c73183a8ce23bbe1ab0cf549f6e

duog updated this revision to Diff 12088.EditedApr 11 2017, 9:43 PM

I've pushed some improvements. They are:

  • Disable optimisations
  • Use temporary files for .o and maybe .hi
  • rebase

I'm still using defaultObjectTarget (targetPlatform dflags) to determine the HscTarget.

.o files are now unconditionally temporary, .hi files are temporary or not depending on -fwrite-interface.

I've done some benchmarks of using HscInterpreted vs HscAsm(no optimisation),

Interesting.

and I'm seeing HscAsm being slightly faster. Of course this is on a DEVEL flavour build, so it's probably meaningless. I don't have the horsepower to build an optimized stage 2.

Indeed this does call the result into question. However, I suppose for the time being using the default object target is fine. We can always fine-tune things if we do find there is a significant bit of performance left on the table.

.o files are now unconditionally temporary, .hi files are temporary or not depending on -fwrite-interface.

This sounds like the right solution. Thanks!

However, can we document this? In general it seems that we are missing a Note documenting -fno-code and -fwrite-interface, their purposes, interaction, and implementation. Do you suppose you could provide one, @duog? See Note [Recompilation checking when typechecking only] in GhcMake.hs for an example (and this likely would be a good place to put the new Note as well).

duog added a comment.Apr 26 2017, 4:00 PM

I've done some benchmarks of using HscInterpreted vs HscAsm(no optimisation),

Interesting.

and I'm seeing HscAsm being slightly faster. Of course this is on a DEVEL flavour build, so it's probably meaningless. I don't have the horsepower to build an optimized stage 2.

Indeed this does call the result into question. However, I suppose for the time being using the default object target is fine. We can always fine-tune things if we do find there is a significant bit of performance left on the table.

.o files are now unconditionally temporary, .hi files are temporary or not depending on -fwrite-interface.

This sounds like the right solution. Thanks!

However, can we document this? In general it seems that we are missing a Note documenting -fno-code and -fwrite-interface, their purposes, interaction, and implementation. Do you suppose you could provide one, @duog? See Note [Recompilation checking when typechecking only] in GhcMake.hs for an example (and this likely would be a good place to put the new Note as well).

The current version doesn't actually work, and the added test T8025 fails. There is an error trying to load the .dyn_o file. I tried quite a few things without success, then got busy elsewhere. Apologies for having left this unfinished. I'll have another go, but will likely need some help working out what's going on.

Once it is working I would be happy to add the note.

In D3441#99854, @duog wrote:

I've done some benchmarks of using HscInterpreted vs HscAsm(no optimisation),

Interesting.

and I'm seeing HscAsm being slightly faster. Of course this is on a DEVEL flavour build, so it's probably meaningless. I don't have the horsepower to build an optimized stage 2.

Indeed this does call the result into question. However, I suppose for the time being using the default object target is fine. We can always fine-tune things if we do find there is a significant bit of performance left on the table.

.o files are now unconditionally temporary, .hi files are temporary or not depending on -fwrite-interface.

This sounds like the right solution. Thanks!

However, can we document this? In general it seems that we are missing a Note documenting -fno-code and -fwrite-interface, their purposes, interaction, and implementation. Do you suppose you could provide one, @duog? See Note [Recompilation checking when typechecking only] in GhcMake.hs for an example (and this likely would be a good place to put the new Note as well).

The current version doesn't actually work, and the added test T8025 fails. There is an error trying to load the .dyn_o file. I tried quite a few things without success, then got busy elsewhere. Apologies for having left this unfinished. I'll have another go, but will likely need some help working out what's going on.

Once it is working I would be happy to add the note.

No worries! Let me know if you'd like to chat. IRC and email are both fine ways to reach me.

duog updated this revision to Diff 12340.May 1 2017, 8:17 PM

Rebased
Now correctly using temporary .hi and .o files
The test T8025 is failing only due to the "Compiling" message including the temporary file name.

duog added a comment.May 1 2017, 11:31 PM

Remaining issues are:

  • When compiling a module that has codegen enabled for template haskell ghc says:
[1 of 2] Compiling A                ( A.hs, /tmp/ghcblah/blah.o )

This should say

[1 of 2] Compiling A                ( A.hs, nothing )

I can't see a good way of achieving this, short of putting something in DynFlags or ModLocation which seems way overkill.
If the current behavior is acceptable, then the test needs to be changed in style, because the temporary file name ends up in the output.
Please advise.

  • some documentation:

However, can we document this? In general it seems that we are missing a Note documenting -fno-code and -fwrite-interface, their purposes, interaction, and implementation. Do you suppose you could provide one, @duog? See Note [Recompilation checking when typechecking only] in GhcMake.hs ? > for an example (and this likely would be a good place to put the new Note as well).

I can't see a good way of achieving this, short of putting something in DynFlags or ModLocation which seems way overkill.
If the current behavior is acceptable, then the test needs to be changed in style, because the temporary file name ends up in the output.
Please advise.

Sorry for the delay; I missed the notification of your message.

I agree that there are few good options here. On the whole the complexity associated with modifying ModLocation hardly seems worth it. Let's just keep it as is.

As far as the test is concerned, we should be able to just ignore stderr and determine success via the exit code, no?

duog updated this revision to Diff 12478.May 9 2017, 11:52 PM
  • Rebase
  • Add a note
  • Update some comments
  • Remove T8025.stderr, test should pass noW
duog updated this revision to Diff 12480.May 10 2017, 2:32 AM

Fix a typo

duog updated this revision to Diff 12481.May 10 2017, 4:17 AM

fix test

duog added a comment.May 10 2017, 7:22 AM

Hmm, a couple of problems.

This is failing because multimod_compile does not respect opts.ignore_stderr. Is there a good reason for this?

Also, this test fails on my profiled build, complaining that it can't find <temp_file>.p_o
see output of
make slowtest TEST=T8025
here:P149

I don't understand how the fact that my ghc is profiled is leaking into the test. Is it because my base is profiled?

In D3441#101251, @duog wrote:

Hmm, a couple of problems.

This is failing because multimod_compile does not respect opts.ignore_stderr. Is there a good reason for this?

I doubt it. However, I would probably just add -v0 to the command line and call it a day.

Also, this test fails on my profiled build, complaining that it can't find <temp_file>.p_o
see output of
make slowtest TEST=T8025
here: P149

I don't understand how the fact that my ghc is profiled is leaking into the test. Is it because my base is profiled?

Hmm, that is indeed curious. I'll try to reproduce.

testsuite/tests/th/should_compile/T8025/all.T
3

Adding a -v0 here should fix your test woes.

duog updated this revision to Diff 12489.May 10 2017, 2:18 PM

Fix test

duog marked an inline comment as done.May 10 2017, 4:23 PM
In D3441#101251, @duog wrote:

Hmm, a couple of problems.

This is failing because multimod_compile does not respect opts.ignore_stderr. Is there a good reason for this?

I doubt it. However, I would probably just add -v0 to the command line and call it a day.

Wow, that's probably why no-one has needed to ignore_stderr on multimod_compile before.

Also, this test fails on my profiled build, complaining that it can't find <temp_file>.p_o
see output of
make slowtest TEST=T8025
here: P149

I don't understand how the fact that my ghc is profiled is leaking into the test. Is it because my base is profiled?

Hmm, that is indeed curious. I'll try to reproduce.

Is make slowtest being run regularly?
I'm a bit concerned that this may be broken in non-normal ways. I have a hard-won profiling build in my worktree now that I don't want to lose, but I'll investigate at some point.

In D3441#101344, @duog wrote:

I doubt it. However, I would probably just add -v0 to the command line and call it a day.

Wow, that's probably why no-one has needed to ignore_stderr on multimod_compile before.

😄

Hmm, that is indeed curious. I'll try to reproduce.

Is make slowtest being run regularly?
I'm a bit concerned that this may be broken in non-normal ways. I have a hard-won profiling build in my worktree now that I don't want to lose, but I'll investigate at some point.

Sadly it's not run terribly often and, sadly, it is generally expected that results will be... less than perfect.

duog updated this revision to Diff 12614.May 16 2017, 8:57 PM

Rebase. I think this is good to go on my end once it validates.

bgamari accepted this revision.May 17 2017, 10:21 AM

Thanks @duog, great work!

This revision is now accepted and ready to land.May 17 2017, 10:21 AM
Closed by commit rGHC53c78be0aab7: Compile modules that are needed by template haskell, even with -fno-code. (authored by Douglas Wilson <douglas.wilson@gmail.com>, committed by bgamari). · Explain WhyMay 20 2017, 3:29 PM
This revision was automatically updated to reflect the committed changes.