Implement unsafeInterleaveIO using MVars
Changes PlannedPublic

Authored by dfeuer on May 2 2017, 4:10 PM.

Details

Summary

Previously, unsafeInterleaveIO used noDuplicate to prevent the
computation from being run twice. noDuplicate needs to walk the
evaluation stack. This experimental implementation instead uses
MVars to prevent duplication.

dfeuer created this revision.May 2 2017, 4:10 PM
dfeuer added a comment.May 2 2017, 4:23 PM

Using two MVars seems a little weird, but it's simpler than the alternatives I've been able to think of thus far. Another option is to use MVar (Maybe a), which starts out containing Nothing. That seems rather uglier:

unsafeInterleaveIO :: IO a -> IO a
unsafeInterleaveIO m = do
  v <- newMVar Nothing
  unsafeDupableInterleaveIO $ do
    r <- tryTakeMVar v
    case r of
      -- Someone else has taken the MVar and is performing
      -- the action, so we will wait for their result.
      Nothing -> do
                    res <- readMVar v
                    case res of
                      Nothing -> error "Impossible Nothing"
                      Just a -> pure a

      -- Someone else has performed the action, so we use
      -- their result and put it back in the MVar.
      Just j@(Just r) -> putMVar v j >> pure r

      -- We're the first ones to get the MVar, so we actually
      -- do the work.
      Just Nothing -> do
        res <- m
        putMVar v (Just res)
        pure res

I have no particular sense of what's likely to be fastest.

dfeuer added a comment.May 2 2017, 7:10 PM

I imagine the allocation failures come from the MVars this allocates. Using an unsafe Kmett trick to implement a null pointer, I think we can cut this to just one MVar.

bgamari edited edge metadata.May 2 2017, 7:28 PM

If the point is performance I would have thought you would want to avoid multiple MVars if at all possible. MVar operations aren't particularly cheap.

dfeuer added a comment.May 2 2017, 8:26 PM

@bgamari I can definitely reduce the number of MVars to one, but I don't see a way to reduce the number of MVar *operations* too much. That doesn't mean it's impossible; I just don't see it. I'll take the MVar (Maybe a) approach I gave above for a spin and see how that goes, and then maybe try the evil "null pointer" version.

Minor nit: it would be better if the commit title were more goal oriented. Using MVars is a means to an end. It's the end that matters.

dfeuer updated this revision to Diff 12371.May 3 2017, 9:18 AM
  • Switch to single-MVar unsafeInterleaveIO
fryguybob added inline comments.
libraries/base/GHC/IO/Unsafe.hs
117

Why tryTakeMVar and not just takeMVar here? If it is necessary could there be a comment explaining why?

dfeuer updated this revision to Diff 12372.May 3 2017, 9:44 AM
  • Switch to single-MVar unsafeInterleaveIO
simonmar edited edge metadata.May 3 2017, 10:16 AM

@dfeuer what's this aiming at? Is there a problem with the old implementation, or are you trying to optimise it? If so, what's the benchmark?

I'm very wary of modifying this code, the bugs are really hard to find.

@dfeuer what's this aiming at? Is there a problem with the old implementation, or are you trying to optimise it? If so, what's the benchmark?

I'm very wary of modifying this code, the bugs are really hard to find.

I'm trying to optimize it. You chose to use an MVar rather than noDuplicate# for fixIO to try to optimize it. unsafeInterleaveIO
gets a lot more use, so we should worry at least as much about its performance. I can't guarantee that I can improve performance,
but I think it's worth a shot. As for the benchmark, I imagined that we must have some benchmarks that use lazy
IO in multi-threaded code. If that's wrong, I'll have to try to come up with something.

dfeuer added inline comments.May 3 2017, 10:32 AM
libraries/base/GHC/IO/Unsafe.hs
117

Good question. I guess it's not strictly necessary. The potential advantage is that I imagine taking the MVar and putting it back to be more expensive than reading it. Maybe that's wrong.

@dfeuer what's this aiming at? Is there a problem with the old implementation, or are you trying to optimise it? If so, what's the benchmark?

I'm very wary of modifying this code, the bugs are really hard to find.

I'm trying to optimize it. You chose to use an MVar rather than noDuplicate# for fixIO to try to optimize it. unsafeInterleaveIO
gets a lot more use, so we should worry at least as much about its performance. I can't guarantee that I can improve performance,
but I think it's worth a shot. As for the benchmark, I imagined that we must have some benchmarks that use lazy
IO in multi-threaded code. If that's wrong, I'll have to try to come up with something.

Hmm, I'm slightly dubious about all this. It seems like an unforced change in code that's been historically very difficult to get right. We don't have an open performance bug related to this code, and it doesn't affect the performance of GHC itself. Is it really worth the effort? You'd have to find a benchmark or synthesise one, and that's a lot of work, getting confidence in correctness will be hard, and at the end there might not be much to be gained here.

dfeuer planned changes to this revision.May 4 2017, 1:51 PM

It turns out that in simple cases (without deep thunk chains that would make this more likely useful), the simple two-MVar solution is the best of the MVar options, with the crazy null-pointer one a close second. When run with -N1, all are inferior (in this simple case) to the noDuplicate version. However, when run with -N2 (even without any actual parallel or concurrent code), the two-MVar version pulls ahead.

@simonmar, why do you say this does not affect GHC performance? GHC uses unsafeInterleaveIO directly in multiple places, although I certainly couldn't say if the differences are significant enough to be easily measurable in such a large program. You're right that finding/writing good benchmarks could be a bit of a challenge. I wonder if it would be possible to produce a test demonstrating quadratic behavior from repeated stack walking.

I'm not personally too concerned about correctness in the two-MVar implementation. It's quite simple, and the failures of the past have left us lots of good tests to pound on any new implementation.

A final thought: whichever way we go on this, it seems quite peculiar to me to use one approach for fixIO and another for unsafeInterleaveIO.

Well ok, if you can demonstrate a perf improvement then I won't object. GHC does use unsafeInterleaveIO, but as far as I know we've never identified it as a significant factor for performance (that could be because our profiling tools aren't good enough, of course). So my concerns are really around whether this is the most productive area to focus effort, rather than whether it's a good idea in isolation.

It might be better to use an IORef for the exclusion part and an MVar for the blocking part. Something like

unsafeInterleaveIO m = do
  ref <- newIORef False
  m <- newEmptyMVar
  unsafeDupableInterleaveIO $ do
    r <- atomicModifyIORef' ref (\b -> return (True,b))
    if r then takeMVar m else do res <- m; putMVar m res; return res
unsafeInterleaveIO m = do
  ref <- newIORef False
  m <- newEmptyMVar
  unsafeDupableInterleaveIO $ do
    r <- atomicModifyIORef' ref (\b -> return (True,b))
    if r then takeMVar m else do res <- m; putMVar m res; return res

Are we assured that multiple threads evaluating an unsafeInterleaveIO like defined above will see the same ref and m? Doesn't the allocation of these objects need to be protected by noDuplicate? I'm happy to be completely wrong, but I would like to know why.

dfeuer added a comment.May 5 2017, 9:59 AM

Ah, that makes sense (both the text and the code) except that the
takeMVar should be readMVar. I did think about combining an IORef
with an MVar, but my ideas in that direction were all obviously wrong.
I'll give your approach a shot and see how it performs.

simonmar added a comment.EditedMay 11 2017, 3:09 AM
unsafeInterleaveIO m = do
  ref <- newIORef False
  m <- newEmptyMVar
  unsafeDupableInterleaveIO $ do
    r <- atomicModifyIORef' ref (\b -> return (True,b))
    if r then takeMVar m else do res <- m; putMVar m res; return res

Are we assured that multiple threads evaluating an unsafeInterleaveIO like defined above will see the same ref and m? Doesn't the allocation of these objects need to be protected by noDuplicate? I'm happy to be completely wrong, but I would like to know why.

The objects are allocated in the outer IO thread, so we can be guaranteed that they are executed exactly once for this call to unsafeInterleaveIO. Inside the unsafeDupableInterleaveIO we lose that guarantee and we have to implement mutual exclusion.

I was able to improve performance a bit using casByteArray# instead of
atomicModifyIORef. But I still don't have an absolute improvement.
Further, performance in this realm seems fragile in the extreme, in
sometimes surprising ways.

unsafeInterleaveIO m = do
  ref <- newIORef False
  m <- newEmptyMVar
  unsafeDupableInterleaveIO $ do
    r <- atomicModifyIORef' ref (\b -> return (True,b))
    if r then takeMVar m else do res <- m; putMVar m res; return res

Are we assured that multiple threads evaluating an unsafeInterleaveIO like defined above will see the same ref and m? Doesn't the allocation of these objects need to be protected by noDuplicate? I'm happy to be completely wrong, but I would like to know why.

The objects are allocated in the outer IO thread, so we can be guaranteed that they are executed exactly once for this call to unsafeInterleaveIO. Inside the unsafeDupableInterleaveIO we lose that guarantee and we have to implement mutual exclusion.

Ah, right. I had my head in the wrong context. This makes sense.

austin resigned from this revision.Nov 9 2017, 11:37 AM