Previously, unsafeInterleaveIO used noDuplicate to prevent the
computation from being run twice. noDuplicate needs to walk the
evaluation stack. This experimental implementation instead uses
MVars to prevent duplication.
- rGHC Glasgow Haskell Compiler
No Unit Test Coverage
- Build Status
Buildable 15654 Build 26753: [GHC] Linux/amd64: Patch building Build 26752: [GHC] OSX/amd64: Continuous Integration Build 26751: [GHC] Windows/amd64: Continuous Integration Build 26750: arc lint + arc unit
Using two MVars seems a little weird, but it's simpler than the alternatives I've been able to think of thus far. Another option is to use MVar (Maybe a), which starts out containing Nothing. That seems rather uglier:
unsafeInterleaveIO :: IO a -> IO a unsafeInterleaveIO m = do v <- newMVar Nothing unsafeDupableInterleaveIO $ do r <- tryTakeMVar v case r of -- Someone else has taken the MVar and is performing -- the action, so we will wait for their result. Nothing -> do res <- readMVar v case res of Nothing -> error "Impossible Nothing" Just a -> pure a -- Someone else has performed the action, so we use -- their result and put it back in the MVar. Just j@(Just r) -> putMVar v j >> pure r -- We're the first ones to get the MVar, so we actually -- do the work. Just Nothing -> do res <- m putMVar v (Just res) pure res
I have no particular sense of what's likely to be fastest.
@bgamari I can definitely reduce the number of MVars to one, but I don't see a way to reduce the number of MVar *operations* too much. That doesn't mean it's impossible; I just don't see it. I'll take the MVar (Maybe a) approach I gave above for a spin and see how that goes, and then maybe try the evil "null pointer" version.
I'm trying to optimize it. You chose to use an MVar rather than noDuplicate# for fixIO to try to optimize it. unsafeInterleaveIO
gets a lot more use, so we should worry at least as much about its performance. I can't guarantee that I can improve performance,
but I think it's worth a shot. As for the benchmark, I imagined that we must have some benchmarks that use lazy
IO in multi-threaded code. If that's wrong, I'll have to try to come up with something.
Good question. I guess it's not strictly necessary. The potential advantage is that I imagine taking the MVar and putting it back to be more expensive than reading it. Maybe that's wrong.
Hmm, I'm slightly dubious about all this. It seems like an unforced change in code that's been historically very difficult to get right. We don't have an open performance bug related to this code, and it doesn't affect the performance of GHC itself. Is it really worth the effort? You'd have to find a benchmark or synthesise one, and that's a lot of work, getting confidence in correctness will be hard, and at the end there might not be much to be gained here.
It turns out that in simple cases (without deep thunk chains that would make this more likely useful), the simple two-MVar solution is the best of the MVar options, with the crazy null-pointer one a close second. When run with -N1, all are inferior (in this simple case) to the noDuplicate version. However, when run with -N2 (even without any actual parallel or concurrent code), the two-MVar version pulls ahead.
@simonmar, why do you say this does not affect GHC performance? GHC uses unsafeInterleaveIO directly in multiple places, although I certainly couldn't say if the differences are significant enough to be easily measurable in such a large program. You're right that finding/writing good benchmarks could be a bit of a challenge. I wonder if it would be possible to produce a test demonstrating quadratic behavior from repeated stack walking.
I'm not personally too concerned about correctness in the two-MVar implementation. It's quite simple, and the failures of the past have left us lots of good tests to pound on any new implementation.
A final thought: whichever way we go on this, it seems quite peculiar to me to use one approach for fixIO and another for unsafeInterleaveIO.
Well ok, if you can demonstrate a perf improvement then I won't object. GHC does use unsafeInterleaveIO, but as far as I know we've never identified it as a significant factor for performance (that could be because our profiling tools aren't good enough, of course). So my concerns are really around whether this is the most productive area to focus effort, rather than whether it's a good idea in isolation.
It might be better to use an IORef for the exclusion part and an MVar for the blocking part. Something like
unsafeInterleaveIO m = do ref <- newIORef False m <- newEmptyMVar unsafeDupableInterleaveIO $ do r <- atomicModifyIORef' ref (\b -> return (True,b)) if r then takeMVar m else do res <- m; putMVar m res; return res
Are we assured that multiple threads evaluating an unsafeInterleaveIO like defined above will see the same ref and m? Doesn't the allocation of these objects need to be protected by noDuplicate? I'm happy to be completely wrong, but I would like to know why.
Ah, that makes sense (both the text and the code) except that the
takeMVar should be readMVar. I did think about combining an IORef
with an MVar, but my ideas in that direction were all obviously wrong.
I'll give your approach a shot and see how it performs.
The objects are allocated in the outer IO thread, so we can be guaranteed that they are executed exactly once for this call to unsafeInterleaveIO. Inside the unsafeDupableInterleaveIO we lose that guarantee and we have to implement mutual exclusion.
I was able to improve performance a bit using casByteArray# instead of
atomicModifyIORef. But I still don't have an absolute improvement.
Further, performance in this realm seems fragile in the extreme, in
sometimes surprising ways.