Extend the Quasi Monad
Needs ReviewPublic

Authored by angerman on May 23 2017, 9:45 PM.

Details

Summary

This adds File IO and Process IO commands to the Quasi monad. This
makes Template Haskell code more declarative, allows for reading and
writing of files / processes in TH that are on the build system, when the
interpreter runs on a different host (e.g. cross compiling).

There are a very large number of changes, so older changes are hidden. Show Older Changes

My apologies for not being familiar with the previous work on this front, but I don't see why this is necessary. Why does adding a bunch of hardcoded IO operations to Quasi resolve this issue, but qRunIO alone does not? Where in the code are cross compilers making the distinction between files on the build machine and files on the host machine when running, e.g., qFindExecutables?

My apologies for not being familiar with the previous work on this front, but I don't see why this is necessary. Why does adding a bunch of hardcoded IO operations to Quasi resolve this issue, but qRunIO alone does not? Where in the code are cross compilers making the distinction between files on the build machine and files on the host machine when running, e.g., qFindExecutables?

You might have seen:

The short version is: in the cross compilation setting, we use the External Interpreter running on a different machine, than the one ghc is running on:

    build machine            host machine
.-----------------------.   .------------.
| GHC <-> iserv-proxy <-+---+-> GHCSlave |
'-----------------------' ^ '------------'
       ^                  |
       |                  '-- communication via sockets
       '--------------------- communication via pipes

What we do with TH in this case is to send the libraries over to the host machine, wrap the TH call into a ResolvedBCO, and
evaluate it on the host.

This poses issues when using File or Process IO, as we end up in qRunIO on the host. Packages which demonstrate this rather
clearly are file-embed and gitrev. file-embed intends to read a file, and allow to embed it. However the file is on the file system
of the build machine, not the host. Similarly gitrev provides the utilities to embed the git hash at compile time. This looks up the
git command, and runs it to obtain the git hash.

The following is some rather contrived example, that embeds parts of file-embed and gitrev, with a slightly altered API, to use
the provided new qXXX functions and no qRunIO. It also happens to use the TH.findExecutables.

# Pastebin iuaW4a2s
pi@raspberrypi:~ $ LD_LIBRARY_PATH=$HOME ./THMain
Hello World
Build directory contains:
> .
> ..
> .git
> Lib.hi
> Lib.hs
> Lib.o
> Main.hs

This File
> {-# LANGUAGE TemplateHaskell #-}
> module Main where
>
> import Lib
>
> main :: IO ()
> main = do
>   putStrLn $ "Hello World"
>   putStrLn "Build directory contains: "
>   putStrLn $ quoteLines $(currentDirectoryContents)
>   putStrLn "This File"
>   putStrLn $ quoteLines $(embedFile "Main.hs")
>   putStrLn "The Lib"
>   putStrLn $ quoteLines $(embedFile "Lib.hs")
>   putStrLn $ "fibQ 2 = "++ show $(fibQ 2)
>   putStrLn $ "GitRoot is " ++ show $(gitRoot)
>   putStrLn $ "Git rev is " ++ show $(gitHash)
>
>   where
>     quoteLines = unlines . map ("> "++) . lines

The Lib
> {-# LANGUAGE TemplateHaskell #-}
> {-# LANGUAGE MultiWayIf      #-}
> {-# LANGUAGE LambdaCase      #-}
>
> module Lib where
>
> import System.Exit
> import Control.Exception
> import Control.Monad
>
> import System.FilePath
>
> import Language.Haskell.TH
> import Language.Haskell.TH.Syntax as TH
>
> fibs :: [Int]
> fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
>
> fibQ :: Int -> Q Exp
> fibQ n = [| fibs !! n |]
>
> embedFile :: FilePath -> Q Exp
> embedFile f = LitE . StringL <$> TH.readFile f
>
> -- | from the gitrev package
> getGitRoot :: Q FilePath
> getGitRoot = do
>   pwd <- TH.getCurrentDirectory
>   (code, out, _) <-
>     TH.readProcessWithExitCode "git" ["rev-parse", "--show-toplevel"] ""
>   case code of
>     ExitSuccess   -> return $ takeWhile (/= '\n') out
>     ExitFailure _ -> return pwd -- later steps will fail, that's fine
>
> gitRoot :: Q Exp
> gitRoot = LitE . StringL <$> getGitRoot
>
> -- | Run git with the given arguments and no stdin, returning the
> -- stdout output. If git isn't available or something goes wrong,
> -- return the second argument.
> runGit :: [String] -> String -> IndexUsed -> Q String
> runGit args def useIdx = do
>   let oops :: SomeException -> Q (ExitCode, String, String)
>       oops _e = return (ExitFailure 1, def, "")
>   gits <- TH.findExecutables "git"
>   case gits of
>     gitFound:_ -> do
>       -- a lot of bookkeeping to record the right dependencies
>       pwd <- getDotGit
>       let hd         = pwd </> ".git" </> "HEAD"
>           index      = pwd </> ".git" </> "index"
>           packedRefs = pwd </> ".git" </> "packed-refs"
>       hdExists  <- TH.doesFileExist hd
>       when hdExists $ do
>         -- the HEAD file either contains the hash of a detached head
>         -- or a pointer to the file that contains the hash of the head
>         splitAt 5 `fmap` (TH.readFile hd) >>= \case
>           -- pointer to ref
>           ("ref: ", relRef) -> do
>             let ref = pwd </> ".git" </> relRef
>             refExists <- TH.doesFileExist ref
>             when refExists $ addDependentFile ref
>           -- detached head
>           _hash -> addDependentFile hd
>       -- add the index if it exists to set the dirty flag
>       indexExists <- TH.doesFileExist index
>       when (indexExists && useIdx == IdxUsed) $ addDependentFile index
>       -- if the refs have been packed, the info we're looking for
>       -- might be in that file rather than the one-file-per-ref case
>       -- handled above
>       packedExists <- TH.doesFileExist packedRefs
>       when packedExists $ addDependentFile packedRefs
>       do
>         (code, out, _err) <- TH.readProcessWithExitCode "git" args "" -- `catch` oops
>         case code of
>           ExitSuccess   -> return (takeWhile (/= '\n') out)
>           ExitFailure _ -> return def
>     _ -> return def
>
> -- | Determine where our @.git@ directory is, in case we're in a
> -- submodule.
> getDotGit :: Q FilePath
> getDotGit = do
>   pwd <- getGitRoot
>   let dotGit = pwd </> ".git"
>       oops = return dotGit -- it's gonna fail, that's fine
>   isDir <- TH.doesDirectoryExist dotGit
>   isFile <- TH.doesFileExist dotGit
>   if | isDir -> return dotGit
>      | not isFile -> oops
>      | isFile ->
>          splitAt 8 `fmap` TH.readFile dotGit >>= \case
>            ("gitdir: ", relDir) -> do
>              isRelDir <- TH.doesDirectoryExist relDir
>              if isRelDir
>                then return relDir
>                else oops
>            _ -> oops
>
> -- | Type to flag if the git index is used or not in a call to runGit
> data IndexUsed = IdxUsed -- ^ The git index is used
>                | IdxNotUsed -- ^ The git index is /not/ used
>     deriving (Eq)
>
> -- | Return the hash of the current git commit, or @UNKNOWN@ if not in
> -- a git repository
> gitHash :: ExpQ
> gitHash =
>   stringE =<< runGit ["rev-parse", "HEAD"] "UNKNOWN" IdxNotUsed
>
> currentDirectoryContents :: Q Exp
> currentDirectoryContents = LitE . StringL . unlines <$> (TH.getDirectoryContents =<< TH.getCurrentDirectory)

fibQ 2 = 1
GitRoot is "/Users/angerman/Projects/zw3rk/Sample"
Git rev is "00fe8976e1dcb7886131ab1e4f7d4747889c9be9"

Thank you for the detailed explanation. But I'm not clearer than before on my second question: why are the file/process opesrtions you just added not subject to the same issues that qRunIO has? After all, they're also IO operations. Where specifically in the GHC codebase does a cross compiler make the decision to, e.g., look up executables on the build machine instead of the host machine when running qFindExecutables?

Thank you for the detailed explanation. But I'm not clearer than before on my second question: why are the file/process opesrtions you just added not subject to the same issues that qRunIO has? After all, they're also IO operations. Where specifically in the GHC codebase does a cross compiler make the decision to, e.g., look up executables on the build machine instead of the host machine when running qFindExecutables?

Depending on what kind of IO we do, we might be fine. (if the IO doesn't touch processes or files, I do not (yet) see any issue with that kind of IO). When running ghc with -fexternal-interpreter the qXXX are evaluated in GHCiQ (see libraries/ghci/GHCi/TH.hs). Which running on the host has the capability to query the ghc instance on the build machine.

Thus we have a two way communication, where the discrimination of qRunIO into separate file and process calls, allows us to ask the ghc process on the build machine to provide us with the value for them.

Say ghc wants to compile $(qFindExecutables "git"). This is transmitted as a ResolvedBCO to the GHCSlave running on the host, which evaluates the splice in the GHCiQ,
this in turn evaluates qFindExecutables by querying the ghc process on the build machine by sending a FindExecutables message back. GHC then responds with the result
of evaluating Dir.findExecutables and returning the result back to the GHCSlave on the host.

A somewhat high level description of the communication is a follows:

ghc -> slave: send library X
ghc -> slave: link + load library X
ghc -> slave: send ResolvedBCO for splice
ghc -> slave: run BCO
ghc <- slave: ask for findExecutables
ghc -> slave: findExeutables result.
ghc <- slave: send back splice result

Thus the additional qXXX function allow the slave to decide how to handle the function. qRunIO is true IO on the host, while the others
ask the ghc process to invoke the IO the build machine on the on behalf of the slave.

Depending on what kind of IO we do, we might be fine. (if the IO doesn't touch processes or files, I do not (yet) see any issue with that kind of IO). When running ghc with -fexternal-interpreter the qXXX are evaluated in GHCiQ (see libraries/ghci/GHCi/TH.hs). Which running on the host has the capability to query the ghc instance on the build machine.

Hm, OK. I'm still a bit unclear where in libraries/ghci/GHCi/TH.hs this decision to query the build machine instead of the host one takes place, but I'll trust your word on this matter.

More importantly, I find the prospect of cramming a bunch of (fairly ad hoc) file/process IO operations into Quasi to be very unsettling. I know that Quasi is already a grab bag of assorted things, but this increases the API surface area by an extraordinary amount. Moreover, there doesn't appear to be any end in sight: what happens when a user needs even more operations from the directory/process library in Template Haskell? They'd either need to add even more Quasi class methods, or they'd need to completely reimplement their desired functions from scratch, but using Q operations instead of IO ones. Neither approach is very satisfying.

Instead, why not have functionality to toggle which machine to search for files on?

qWithMachine :: BuildOrHost -> Q a -> Q a

(API subject to bikeshedding.) That way, you can continue to use qRunIO as before for anything that queries files or processes, and you won't need to have a Quasi counterpart to every basic file/process op under the sun.

Depending on what kind of IO we do, we might be fine. (if the IO doesn't touch processes or files, I do not (yet) see any issue with that kind of IO). When running ghc with -fexternal-interpreter the qXXX are evaluated in GHCiQ (see libraries/ghci/GHCi/TH.hs). Which running on the host has the capability to query the ghc instance on the build machine.

Hm, OK. I'm still a bit unclear where in libraries/ghci/GHCi/TH.hs this decision to query the build machine instead of the host one takes place, but I'll trust your word on this matter.

More importantly, I find the prospect of cramming a bunch of (fairly ad hoc) file/process IO operations into Quasi to be very unsettling. I know that Quasi is already a grab bag of assorted things, but this increases the API surface area by an extraordinary amount. Moreover, there doesn't appear to be any end in sight: what happens when a user needs even more operations from the directory/process library in Template Haskell? They'd either need to add even more Quasi class methods, or they'd need to completely reimplement their desired functions from scratch, but using Q operations instead of IO ones. Neither approach is very satisfying.

Instead, why not have functionality to toggle which machine to search for files on?

qWithMachine :: BuildOrHost -> Q a -> Q a

(API subject to bikeshedding.) That way, you can continue to use qRunIO as before for anything that queries files or processes, and you won't need to have a Quasi counterpart to every basic file/process op under the sun.

This sadly would not work. As we'd still have a way too generic IO action. With a cross compiler we do not have access to the same libraries we have on the host. qRunIO can run an arbitrary IO action and by extension call
any arbitrary function. This in turn requires those functions to be available on the build, which they are not necessarily are, unless we start to build each and every library for the build machine (but the cross compiler can't do this),
and for the host. The ultimate plan is to eventually make ghc multi-target aware, once we have that (though this is *far* out of the scope of this diff), this could become feasible.

The key here is that we provide specific function instead of a generic one. Only by doing this, we can provide the special handling.

If we wanted to support arbitrary file or process IO in through qRunIO, we'd need to hook into the rts, (e.g. the approach taken in D3502).

Regarding the increase in API surface, this is a valid concern, and the goal should be to implement only the minimal necessary set and from which anything else can be combined. This is the one where you'd need to combine everything
from Q operations. And yes, while this is not very satisfying, it is a tradeoff I'm willing to make and advocate for, as it allows for proper cross compilation support. With something like Backpack, this could even become less painful for
downstream consumers.

This sadly would not work. As we'd still have a way too generic IO action. With a cross compiler we do not have access to the same libraries we have on the host. qRunIO can run an arbitrary IO action and by extension call
any arbitrary function. This in turn requires those functions to be available on the build, which they are not necessarily are, unless we start to build each and every library for the build machine (but the cross compiler can't do this),
and for the host. The ultimate plan is to eventually make ghc multi-target aware, once we have that (though this is *far* out of the scope of this diff), this could become feasible.

Alas, I was afraid it wouldn't be that simple.

If we wanted to support arbitrary file or process IO in through qRunIO, we'd need to hook into the rts, (e.g. the approach taken in D3502).

Ah, I hadn't seen D3502. Well, so much for that idea :)

Regarding the increase in API surface, this is a valid concern, and the goal should be to implement only the minimal necessary set and from which anything else can be combined. This is the one where you'd need to combine everything
from Q operations. And yes, while this is not very satisfying, it is a tradeoff I'm willing to make and advocate for, as it allows for proper cross compilation support. With something like Backpack, this could even become less painful for
downstream consumers.

Well, either path towards short-term cross-compiler support for TH is going to involve some unsightly hacks, and I suppose this is at least a manageable hack. All I can do is grumble from the sidelines until we have proper multi-target awareness in GHC ;)

Well, either path towards short-term cross-compiler support for TH is going to involve some unsightly hacks, and I suppose this is at least a manageable hack. All I can do is grumble from the sidelines until we have proper multi-target awareness in GHC ;)

Just to be clear, here, I'm not a big fan of blowing up Quasi either. This however seems to be the least worst option. I've come to find a certain benefit in being explicit in the qAction though, as it provides more information to ghc about the actual intent.

It would also allow a custom Quasi instance to be restricted to read-only file IO. And while we could provide default implementations, doing so at the definition site of the class, would add additional dependencies on the template-haskell package, which I
believe we'd rather not get into.

angerman updated this revision to Diff 12708.May 27 2017, 2:08 AM
  • add time accessors
bgamari requested changes to this revision.May 29 2017, 11:20 PM

I agree that this is much better than the hooked base option we were looking at earlier. However, some documentation is in order.

libraries/template-haskell/Language/Haskell/TH/Syntax.hs
570

I know we haven't done a great job of documenting this module, but can we have a nice section with a Haddock comment explaining what these are and why they exist?

This revision now requires changes to proceed.May 29 2017, 11:20 PM
angerman updated this revision to Diff 12771.Jun 6 2017, 10:18 AM
  • Add AppendFile
  • Adds removeFile
bgamari requested changes to this revision.EditedJun 8 2017, 1:50 PM

There are still things to be done here.

libraries/template-haskell/Language/Haskell/TH/Syntax.hs
570

I stand by this request :)

This revision now requires changes to proceed.Jun 8 2017, 1:50 PM

There are still things to be done here.

I know, I know. I'm just adding stuff as needed...

I'll try to clean this up once I'm back in sG by the end of the month :-/

angerman updated this revision to Diff 13029.Jul 4 2017, 10:48 PM
  • rebase onto master
bgamari requested changes to this revision.Jul 7 2017, 10:02 AM

Bump out of review queue while this is finished up.

This revision now requires changes to proceed.Jul 7 2017, 10:02 AM
angerman updated this revision to Diff 13083.Jul 9 2017, 8:46 PM
  • rebase

What is the status of this, @angerman? It seems to fail validation.

angerman updated this revision to Diff 13124.Jul 11 2017, 8:08 PM
  • rebase & relax time to 1.5
angerman updated this revision to Diff 13133.Jul 12 2017, 2:58 AM
  • rebase onto fixed master
angerman updated this revision to Diff 13146.Jul 12 2017, 9:26 PM
  • proper rebase
bgamari requested changes to this revision.Aug 18 2017, 7:38 AM

It looks like the output of the test needs to be updated.

This revision now requires changes to proceed.Aug 18 2017, 7:38 AM
angerman updated this revision to Diff 13771.Sep 7 2017, 12:55 AM
  • rebase; fix TH_Roles2
angerman updated this revision to Diff 13801.Sep 9 2017, 2:40 AM
  • rebase
angerman added a subscriber: luite.Sep 9 2017, 6:20 AM
simonmar requested changes to this revision.Sep 11 2017, 2:48 AM
simonmar added inline comments.
libraries/template-haskell/Language/Haskell/TH/Syntax.hs
110–137

I'd like to suggest an alternative approach that should be a bit more modular and extensible, and require fewer changes overall.

Let's define a datatype for the IO operations we want to perform on the build machine:

data BuildIO r where
  BuildIOReadFile :: FilePath -> BuildIO String
  BuildIOWriteFile :: FilePath -> String -> BuildIO ()
  ...

Now define a way to execute these in IO:

performBuildIO :: BuildIO r -> IO r
performBuildIO (BuildIOReadFile f) = readFile f
performBuildIO (BuildIOWriteFile f s) = writeFile f s
...

and in the Q monad we only need one new method:

class Quasi m where
  qBuildIO :: BuildIO r -> m r
  ...

and in the remote GHCi code we can serialise/deserialize BuildIO, so we just need one additional message. We can reuse performBuildIO to actually execute the IO on the build machine.

Does that sound reasonable?

This revision now requires changes to proceed.Sep 11 2017, 2:48 AM
angerman added inline comments.Sep 11 2017, 7:21 AM
libraries/template-haskell/Language/Haskell/TH/Syntax.hs
110–137

Thanks for taking the time to review this. That does seem like a sensible idea indeed. I'll have to play with it for a bit. Will hopefully get around to it this week.

bgamari added inline comments.Sep 11 2017, 9:21 AM
libraries/template-haskell/Language/Haskell/TH/Syntax.hs
110–137

In principle we could even drop the ability to use liftIO and require that the user be explicit by introducing two lifting primitives,

haskell
class Quasi m where
  qHostIO :: BuildIO r -> m r
  qTaretIO :: BuildIO r -> m r

I suspect the breakage would be far more severe than we are willing to stomach, but on the bright side it would force TH users to consider how their code should behave in cross-compiled environments.

Regardless of what we do, we should take care to note the build/target distinction in the Haddocks for runIO and the MonadIO Q instance. Perhaps a reference-able Haddock section in Language.Haskell.TH.Syntax is in order.

angerman updated this revision to Diff 14426.Oct 19 2017, 9:48 PM
  • Rebase. Prior to adapting.
angerman updated this revision to Diff 14427.Oct 19 2017, 9:51 PM
  • Rebase. Again.
bgamari requested changes to this revision.Nov 5 2017, 9:33 AM

Requesting changes pending rework.

This revision now requires changes to proceed.Nov 5 2017, 9:33 AM
austin resigned from this revision.Nov 9 2017, 11:38 AM

Alright, let's do this. I'll rework this as suggested.

How is this going, @angerman?

angerman updated this revision to Diff 15465.Thu, Feb 15, 1:58 AM
  • rebase onto master