StaticValues language extension
AbandonedPublic

Authored by facundominguez on Aug 4 2014, 1:02 PM.

Details

Reviewers
simonmar
simonpj
austin
Trac Issues
#7015
Summary

Hello,

We are submitting an implementation of a compiler extension for static values following the outline of [1]. This work was developed with the support of Tweag I/O.

The extension introduces a new syntactic form (static e), where e :: a can be any closed expression. The static form produces a value of type Ref a, which works as a reference that programs can "dereference" to get the value of e back. References are like Ptrs, except that they are stable across invocations of a program.

While the body of the static form may not have a direct serializable representation, references of type Ref uniquely identify it, can be serialized and meaningfully transmitted across a network.

The implementation of a Ref contains information that is useful to locate the referenced value in symbol tables of libraries and object files. While static e is a baked in syntactic form, resolving (dereferencing) references to values is implemented entirely in userland, as a library function (i.e. what would correspond to unstatic as formulated in [1]). We provide a basic implementation of lookups in the module GHC.Ref of the base package.

In essence the extension makes sure that the argument of the static form does appear in linker symbol tables, and it fills in the information carried by references (package name, installation identifier, module name, symbol name). For more details we refer to the users guide section contained in the patch.

The extension is a contribution to the Cloud Haskell ecosystem (distributed-process and related), and thus has the potential to foster Haskell as a programming language for distributed systems.

The immediate improvement brought by the extension is the elimination of remote tables from Cloud Haskell applications. Such applications contain table fragments spread throughout multiple modules and packages. Eliminating these tables saves the programmer the burden required to construct and assemble the global remote table from fragments, a verbose and error-prone process, even with the help of Template Haskell, that moreover pollutes the export lists of all modules.

As a nice convenience, the extension implemented here makes it possible to write shorter code by allowing any closed expression as argument of the static form, not just single identifiers the way Cloud Haskell currently requires [3].

In the future, the extension could open the door for implementing a form of sending code between the members of a distributed application in the form of libraries or object files.

A notable limitation of the extension is that it cannot create references to values with qualified types. Thus, static show is an illegal term because show has a constraint Show a in its type. For the time being, this limitation can be sidestepped as explained in the contributed documentation to the user’s guide.

Comments and suggestions will be much appreciated.

Bikeshedding Addendum:

It has been pointed out that “static values” is probably not the best name for this extension. And we mostly agree. By default, we’re using the name from the original paper and would like to get this proposal to a strong technical basis first. But here are a few alternative names that have shown up:

  • -XStaticPtr and have Ref be named StaticPtr. A StaticPtr is a StablePtr that’s so stable it’s portable across different processes.
  • -XStaticNames (carter)
  • -XPinnedPointers (hvr)

[1] Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones. Towards Haskell in the cloud. SIGPLAN Not., 46(12):118–129, September 2011. ISSN 0362-1340.
[2] https://ghc.haskell.org/trac/ghc/ticket/7015
[3] http://hackage.haskell.org/package/distributed-process-0.4.2/docs/Control-Distributed-Process-Closure.html#v:mkStatic

Test Plan

validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
static-values-phab
Lint
Lint ErrorsExcuse: We have a long error message that we can split if needed. Lint errors are also pointing at pre-existing tabs.
Unit
No Unit Test Coverage
Build Status
Buildable 304
Build 305: GHC Patch Validation (amd64/Linux)
facundominguez retitled this revision from to StaticValues language extension.
facundominguez updated this object.
facundominguez edited the test plan for this revision. (Show Details)
facundominguez added reviewers: austin, simonmar.
facundominguez set the repository for this revision to rGHC Glasgow Haskell Compiler.
facundominguez updated the Trac tickets for this revision.
facundominguez added subscribers: carter, mboes, relrod and 2 others.
austin awarded a token.Aug 4 2014, 1:05 PM
austin added a subscriber: simonpj.

I've only done a cursory glance so far (I'm doing over 9,000 other things at this moment), but excellent job - and the 'meat' of the diff is far smaller than I expected!

@simonpj You should definitely look over this; typechecker modifications are your territory.

hvr added a subscriber: hvr.Aug 4 2014, 1:32 PM
hvr added inline comments.
docs/users_guide/glasgow_exts.xml
9969

This seems to be a different URL than intended

libraries/base/GHC/Ref.hs
83–84

would a safe version of deRef be technically possible? (e.g. by including/encoding the type into GlobalName?)

mboes added inline comments.Aug 4 2014, 1:57 PM
libraries/base/GHC/Ref.hs
83–84

It would, and in fact we have an implementation of this here. Beause in general providing a safe version of deRef depends on a lot of higher level concepts (such as the representation of a type), we believe that a safe version of deRef is best provided at a higher level. This used to be done entirely by distributed-static, but for a weaker notion of GlobalName, so we simply extended that library to handle GHC.Ref as well. That required refactoring distributed-static so that it is completely generic in the label type. In this way, we can reuse the same set of combinators to deal with type safe dereferencing and combination of references, essentially for free.

Side note: we have yet to update the branch mentioned above so that it doesn't duplicate code from the GHC.Ref module.

rwbarton added inline comments.
libraries/base/GHC/Ref.hs
119

The main configure script defines a variable LeadingUnderscore for this purpose. We should use it here; I see this is in base so it might take a bit of plumbing.

testsuite/tests/typecheck/should_fail/all.T
333

You want 7.9, not 7.7.

mietek added a subscriber: mietek.Aug 4 2014, 2:30 PM
mietek added inline comments.Aug 4 2014, 2:54 PM
compiler/rename/RnExpr.lhs
338

Should this say "referenced" instead of "referred"?

facundominguez added inline comments.Aug 4 2014, 3:01 PM
docs/users_guide/glasgow_exts.xml
9969
libraries/base/GHC/Ref.hs
83–84

As Mathieu says, without hard-wiring a representation of types it doesn't look very simple.

The user could achieve some safety by always creating references of type Ref Data.Dynamic.Dynamic and then refusing to deRef globals which are not of his crafting (say, they come from other modules).

facundominguez updated this object.
facundominguez edited edge metadata.

Addressing some of the various comments so far.

In particular, using the LeadingUnderscore variable in GHC.Ref is not done yet.

Whoops, Build B294: Diff 265 (D119) has failed! Full logs available at F11468. The testsuite summary sez:

testsuite_summary.txt
Unexpected results from:
TEST="linker_unload haddock.base"

OVERALL SUMMARY for test run started at Mon Aug  4 21:50:59 2014 UTC
 0:06:37 spent to go through
    4068 total tests, which gave rise to
   13421 test cases, of which
    9694 were skipped

      26 had missing libraries
    3639 expected passes
      60 expected failures

       0 caused framework failures
       0 unexpected passes
       2 unexpected failures

Unexpected failures:
   perf/haddock  haddock.base [stat too good] (normal)
   rts           linker_unload [bad exit code] (normal)

hm, reviewers may stick to Diff 1 263. I'm trying to figure out how to get arc to update the revision without overriding the existing diffs.

ezyang added a comment.Aug 5 2014, 6:34 AM

Basically, you need to say arc diff REV_START, where REV_START is the latest revision which is not part of your patchset. So, for example, if your patchset has two commits, arc diff HEAD~~ would do the trick.

facundominguez edited edge metadata.

Changes:

  • replace "argument of static" with "body of static"
  • update ghc version in test guards
  • remove trailing empty lines
  • Fix link to CH paper in user's guide.

Whoops, Build B302: Diff 271 (D119) has failed! Full logs available at F11525.

The latest diff is now good to review. My apologies for the hiccup.

Fixed some lint warnings about line lengths.

Whoops, Build B305: Diff 273 (D119) has failed! Full logs available at F11533.

on a bike shedding note, before this gets merged in, could we please change all the notes/comments/wiki pages to NOT calling it static values? I'm (slowly) working on something that would be about using compile time fixed (static) values, and as has been noted here, this is more like "multiple compilations stable module name/handles"

tibbe added a subscriber: tibbe.Aug 25 2014, 9:34 AM
tibbe added inline comments.
libraries/base/GHC/Ref.hs
40

I don't think this is necessary. This is a GHC. module after all. Even if it was not, #ifdefs in module export lists are evil, as they make all your caller reproduce your exact #ifdefs. Such #ifdefs were recently removed from the network package (and replaced by runtime errors) for that reason.

luite added a subscriber: luite.Sep 16 2014, 7:00 PM

I just read the patch and I wonder if CAFs for statics are kept alive correctly. Since symbols are loaded directly from the library by name, it looks like they may resurrect objects that have already been finalized. At first I thought mkExport was used for this, but I think it generates an exported binding and not a foreign export.

Also I'd like to be able to use this extension in GHCJS for client<->server RPC, but it looks like in the current form there are three problems with this:

  • GHCJS removes unreachable code from all modules (similar to split-objects linking). One would have to make sure that all required symbols are linked correctly. With conditional compilation (using CPP or the impl(ghcjs) flag) this could be rather tricky: A static being in the library is not enough, it would have to be reachable from main.
  • Currently, GHCJS uses z-encoding to encode the symbol names to JavaScript variable names. Unfortunately the names are pretty long and code size is at a premium for web (in particular for mobile), so I'd rather not store the whole name for every symbol. The new optimizer will likely include a link-time renaming step that shortens symbol names that are not used externally (You can already get a similar result on output from the current version using Google Closure Compiler with ADVANCED_OPTIMIZATIONS, but doing this in GHCJS is less fragile and possibly more effective).
  • For bindings that are not a single variable, an automatically numbered top-level name is generated. Some forgotten static inside an #ifdef ghcjs_HOST_OS can skew all the static names.

It looks like the CAF issue and all these GHCJS issues can be fixed by generating a foreign export for every static binding, which would return a pointer to the static value when called.

To make things a bit more robust, the name of the exported symbol could be based on the source code location instead of a simple counter. In addition, I think it would be good to provide some level of sanity checking in GHC.Ref.deRef, which could also be done in the foreign export wrapper (comparing a type fingerprint for example). The current version blindly coerces a pointer to a heap object, and I don't see a way to write a safer wrapper on top of this. There is just no information about the symbols available whatsoever, other than the info table.

In D119#39, @luite wrote:
  • GHCJS removes unreachable code from all modules (similar to split-objects linking). One would have to make sure that all required symbols are linked correctly. With conditional compilation (using CPP or the impl(ghcjs) flag) this could be rather tricky: A static being in the library is not enough, it would have to be reachable from main.

Do you mean something like

#ifndef ghcjs_HOST_OS
f = ... static g ...   -- the only reference to g
#endif

g = ...

Here GHCJS will not be able to see that g is live code (assuming that f is live code on the server). And this is a plausible scenario: perhaps f can't be built as JS for some reason; or maybe we just don't want to ship f to the client.

In D119#41, @rwbarton wrote:

Do you mean something like

#ifndef ghcjs_HOST_OS
f = ... static g ...   -- the only reference to g
#endif

g = ...

Here GHCJS will not be able to see that g is live code (assuming that f is live code on the server). And this is a plausible scenario: perhaps f can't be built as JS for some reason; or maybe we just don't want to ship f to the client.

Right, that's one of the potential problems, but I think there is a worse problem:

{.haskell}
#ifndef ghcjs_HOST_OS
f1 = ... static (g1 10)
#endif`
f2 = ... static (g2 11) .... static (g3 12)
g1 :: Int -> String -> ...
g1 x = ...
g2 :: Int -> Char -> ...
g2 x = ...
g3 :: Int -> Int -> ...
g3 x = ...

Since none of the static expressions is a simple toplevel name, GHC has to insert the bindings. For GHC:

GlobalName "main" "Main" "static:0" -> (g1 10)
GlobalName "main" "Main" "static:1" -> (g2 11)
GlobalName "main" "Main" "static:2" -> (g3 12)

and GHCJS, which does not have the first static:

GlobalName "main" "Main" "static:0" -> (g2 11)
GlobalName "main" "Main" "static:1" -> (g3 12)

So all remaining statics are mapped to a different GlobalName with a different type. At least completely dead code would give you Nothing on deRef, but this would give you the wrong heap object.

The current implementation does not appear to keep the heap references alive correctly:

{-# LANGUAGE StaticValues #-}
module Main where

import GHC.Ref

import Control.Concurrent
import System.Environment
import System.Mem
import System.Mem.Weak

import Data.Numbers.Primes -- from Numbers

-- Just a Ref to some CAF from another package so that we can deRef it when linked with -dynamic
primes_ref :: Ref [Integer]
primes_ref = static primes

main = do
  as <- getArgs
  let z = primes!!(length as+200)
  print z
  performGC

  -- addFinalizer z (putStrLn "finalizer z")
  print z
  performGC
  threadDelay 1000000
  print . fmap (!!300) =<< deRef primes_ref
  -- uncommenting the next line keeps primes alive and prevents the segfault
  -- print (primes!!300)
$ ghc -dynamic -o primes primes.hs
[1 of 1] Compiling Main             ( primes.hs, primes.o )
Linking primes ...
 
$ ./primes
1229
1229
Segmentation fault (core dumped)
simonpj edited edge metadata.Sep 17 2014, 2:23 AM

There is discussion going on about the design, at https://ghc.haskell.org/trac/ghc/ticket/7015, so don't invest too much in reviewing the details here, yet anyway.

Simon

facundominguez abandoned this revision.EditedSep 17 2014, 6:20 AM

Hello,

We are working on a reimplementation of the extension based on these design notes. I'm abandoning this revision since we don't intend it to be merged as it is.

I want to thank all participants for helping us to identify problems and to improve the submission.