Replace atomicModifyMutVar#
ClosedPublic

Authored by dfeuer on Jun 21 2018, 8:37 PM.

Details

Summary

Implement at least part of Proposal 27 to replace atomicModifyMutVar#.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
dfeuer created this revision.Jun 21 2018, 8:37 PM
dfeuer updated this revision to Diff 17044.Jun 21 2018, 8:49 PM

Update docs; clean up

dfeuer updated this revision to Diff 17045.Jun 21 2018, 9:52 PM

Remove unused binding

dfeuer updated this revision to Diff 17046.Jun 21 2018, 11:52 PM
  • Fix up Windows-specific code
dfeuer updated this revision to Diff 17047.Jun 22 2018, 2:53 AM

Update stat (too good)

The commit title and summary don't really tell me what this does - can you say what the goal is here? Is there a ticket?

The commit title and summary don't really tell me what this does - can you say what the goal is here? Is there a ticket?

Ah, that was pretty silly. I opened a GHC proposal for this, but I'm beginning to wonder if that was overkill. https://github.com/ghc-proposals/ghc-proposals/pull/149

dfeuer added inline comments.Jun 22 2018, 2:56 PM
libraries/base/GHC/ForeignPtr.hs
324

Do we still need this bang? I suspect not.

dfeuer added inline comments.Jun 22 2018, 3:39 PM
rts/PrimOps.cmm
625

Why do we do this? Why not this?

...
StgThunk_payload(y,0) = z;

x = StgMutVar_var(mv);
retry:
  StgThunk_payload(z,1) = x;
  (h) = prim ...
  if (h != x) { x = h; goto retry; }
fryguybob added inline comments.
rts/PrimOps.cmm
625

I'm not sure there is a one-size-fits all way to write this loop, but an argument for how it is now is that the common case should be that the cas succeeds (if not we won't get good performance anyway). We would hope that branch prediction would follow that common case (again, bad performance if it doesn't) so there is some penalty regardless to get back to the cas. If you are waiting a little bit anyway, getting a fresh load could only help (if it is a cache miss the cas would be a cache miss too). You could argue for a pause instruction inside the loop as well as this gives a semantic hint that this loop is waiting for concurrent work. But this would only be the case if we expected longish phases where the cas does not succeed. If that is the case this isn't really the appropriate synchronization.

dfeuer added inline comments.Jun 22 2018, 11:33 PM
rts/PrimOps.cmm
625

I don't really understand. If we fail, it's precisely because someone else succeeded. I would think we'd just want to loop as fast as possible. I don't see the advantage in getting a fresh load. We haven't done anything since the last one except an equality test; shouldn't it still be fresh? Please take this speciation with a grain of salt; I'm no assembly programmer. BTW, do you have any thoughts on the patch itself? Comments on the proposal would be particularly welcome.

dfeuer added inline comments.Jun 23 2018, 1:33 AM
rts/PrimOps.cmm
625

Ahhhh! Now I understand what you mean about refreshing the value. Even that tiny equality test and conditional jump could be too much. The shorter the time between looking up the value and performing the CAS the better.

Ah, that was pretty silly. I opened a GHC proposal for this, but I'm beginning to wonder if that was overkill. https://github.com/ghc-proposals/ghc-proposals/pull/149

Great, but could you also update the title & summary for the diff please?

(btw looking at it briefly I really like the idea, but I'll take a closer look later)

dfeuer edited the summary of this revision. (Show Details)Jun 23 2018, 2:08 PM
dfeuer added reviewers: fryguybob, rrnewton.
dfeuer updated this revision to Diff 17070.Jun 23 2018, 5:09 PM
  • Return old value too
dfeuer updated this revision to Diff 17071.Jun 23 2018, 7:22 PM
  • Return old value too
dfeuer updated this revision to Diff 17072.Jun 23 2018, 7:37 PM
  • Return old value too
dfeuer updated this revision to Diff 17073.Jun 23 2018, 7:41 PM
  • Return old value too
dfeuer updated this revision to Diff 17074.Jun 23 2018, 9:14 PM
  • Return old value too
dfeuer updated this revision to Diff 17075.Jun 23 2018, 10:34 PM
  • Return old value too
dfeuer updated this revision to Diff 17274.Jul 11 2018, 2:13 PM

Implement atomicModifyMutVar_#, change some names, and remove the
unaccepted additions to Data.IORef.

@simonmar, @fryguybob, I've implemented atomicModifyMutVar2# and atomicModifyMutVar_#, but I could use some help with atomicSwapMutVar#. Could one of you give me a hand?

dfeuer edited the summary of this revision. (Show Details)Jul 11 2018, 2:44 PM
dfeuer updated the Trac tickets for this revision.
dfeuer added inline comments.Jul 11 2018, 5:17 PM
libraries/base/Data/IORef.hs
94–96

Whoops... Forgot to force _new.

dfeuer updated this revision to Diff 17276.Jul 11 2018, 5:49 PM

Fix atomicModifyIORef' strictness

dfeuer marked an inline comment as done.Jul 11 2018, 5:50 PM

For atomicSwapMutVar# it looks like we don't have the right primitive in Cmm yet, namely %xchg. I suppose we could defer that for a separate diff?

What testing do we have for atomicModifyIORef and friends? It would be good to have some tests to check the strictness properties at least.

libraries/base/Data/IORef.hs
94–96

Because I'm slightly paranoid, would you mind checking the Core for this to ensure that the optimiser is doing what we hope? There are a few layers and a pair to eliminate.

libraries/base/GHC/IORef.hs
64–170

Maybe put an INLINE on this?

64–170

Uh, ignore this, I guess I added this comment on an earlier version of the diff, I'm not sure what it was referring to.

dfeuer marked an inline comment as done.Jul 12 2018, 11:07 AM
dfeuer added inline comments.
libraries/base/Data/IORef.hs
94–96

Looks good to me (including the unfoldings and inlining guidance). You agree?

GHC.IORef.atomicModifyIORef'_1
  :: forall a.
     IORef a
     -> (a -> a) -> State# RealWorld -> (# State# RealWorld, (a, a) #)
[GblId,
 Arity=3,
 Caf=NoCafRefs,
 Str=<S(S),1*U(U)><L,U><S,U>,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=3,unsat_ok=True,boring_ok=False)
         Tmpl= \ (@ a_a2uv)
                 (ref_a217 [Occ=Once] :: IORef a_a2uv)
                 (f_a218 [Occ=Once] :: a_a2uv -> a_a2uv)
                 (s_a2CN [Occ=Once] :: State# RealWorld) ->
                 case ref_a217 `cast` <Co:2> of { STRef ref1_a211 [Occ=Once] ->
                 case atomicModifyMutVar_#
                        @ RealWorld @ a_a2uv ref1_a211 f_a218 s_a2CN
                 of
                 { (# ipv_s2E9 [Occ=Once], ipv1_s2Ea [Occ=Once],
                      ipv2_s2Eb [Occ=Once] #) ->
                 case ipv2_s2Eb of new_X21s [Occ=Once] { __DEFAULT ->
                 (# ipv_s2E9, (ipv1_s2Ea, new_X21s) #)
                 }
                 }
                 }}]
ORef.atomicModifyIORef'_1
  = \ (@ a_a2uv)
      (ref_a217 :: IORef a_a2uv)
      (f_a218 :: a_a2uv -> a_a2uv)
      (s_a2CN :: State# RealWorld) ->
      case ref_a217 `cast` <Co:2> of { STRef ref1_a211 ->
      case atomicModifyMutVar_#
             @ RealWorld @ a_a2uv ref1_a211 f_a218 s_a2CN
      of
      { (# ipv_s2E9, ipv1_s2Ea, ipv2_s2Eb #) ->
      case ipv2_s2Eb of new_X21s { __DEFAULT ->
      (# ipv_s2E9, (ipv1_s2Ea, new_X21s) #)
      }
      }
      }

-- RHS size: {terms: 1, types: 0, coercions: 14, joins: 0/0}
atomicModifyIORef'_ :: forall a. IORef a -> (a -> a) -> IO (a, a)
[GblId,
 Arity=3,
 Caf=NoCafRefs,
 Str=<S(S),1*U(U)><L,U><S,U>,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=True)
         Tmpl= GHC.IORef.atomicModifyIORef'_1 `cast` <Co:14>}]
atomicModifyIORef'_ = GHC.IORef.atomicModifyIORef'_1 `cast` <Co:14>
libraries/base/GHC/IORef.hs
64–170

I haven't double checked with the very latest edition, but when I dug through all the core of an earlier draft, I saw what consistently looked like very good unfoldings and very aggressive inlining guidance. If someone finds a situation where things don't inline enough, we can revisit it, of course.

dfeuer marked an inline comment as done.Jul 12 2018, 11:09 AM

For atomicSwapMutVar# it looks like we don't have the right primitive in Cmm yet, namely %xchg. I suppose we could defer that for a separate diff?

What testing do we have for atomicModifyIORef and friends? It would be good to have some tests to check the strictness properties at least.

Yes, we could definitely do that in a separate diff. This is already an improvement. I don't know what testing we have exactly, but we definitely don't have enough. When I forgot to force the new value in atomicModifyIORef', that didn't make any tests fail! On the plus side, it's a heck of a lot easier now to see what is being forced when than it was before.

dfeuer added inline comments.Jul 12 2018, 10:15 PM
libraries/base/Data/IORef.hs
94–96

Er .... sorry... that was the wrong function. I meant

Data.IORef.atomicModifyIORef'1
  :: forall a b.
     IORef a
     -> (a -> (a, b)) -> State# RealWorld -> (# State# RealWorld, b #)
[GblId,
 Arity=3,
 Caf=NoCafRefs,
 Str=<S(S),1*U(U)><L,U><S,U>,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=3,unsat_ok=True,boring_ok=False)
         Tmpl= \ (@ a_a1qc)
                 (@ b_a1qd)
                 (ref_aYC [Occ=Once] :: IORef a_a1qc)
                 (f_aYD [Occ=Once] :: a_a1qc -> (a_a1qc, b_a1qd))
                 (s_a1uk [Occ=Once] :: State# RealWorld) ->
                 case ref_aYC `cast` <Co:2> of { STRef r#_a1uX [Occ=Once] ->
                 case atomicModifyMutVar2#
                        @ RealWorld @ a_a1qc @ (a_a1qc, b_a1qd) r#_a1uX f_aYD s_a1uk
                 of
                 { (# ipv_a1v1 [Occ=Once], _ [Occ=Dead], ipv2_a1v3 [Occ=Once!] #) ->
                 case ipv2_a1v3 of { (_new_a1v7 [Occ=Once], _res_a1v8 [Occ=Once]) ->
                 case _new_a1v7 of { __DEFAULT ->
                 case _res_a1v8 of res_XYX [Occ=Once] { __DEFAULT ->
                 (# ipv_a1v1, res_XYX #)
                 }
                 }
                 }
                 }
                 }}]
Data.IORef.atomicModifyIORef'1
  = \ (@ a_a1qc)
      (@ b_a1qd)
      (ref_aYC :: IORef a_a1qc)
      (f_aYD :: a_a1qc -> (a_a1qc, b_a1qd))
      (s_a1uk :: State# RealWorld) ->
      case ref_aYC `cast` <Co:2> of { STRef r#_a1uX ->
      case atomicModifyMutVar2#
             @ RealWorld @ a_a1qc @ (a_a1qc, b_a1qd) r#_a1uX f_aYD s_a1uk
      of
      { (# ipv_a1v1, ipv1_a1v2, ipv2_a1v3 #) ->
      case ipv2_a1v3 of { (_new_a1v7, _res_a1v8) ->
      case _new_a1v7 of { __DEFAULT ->
      case _res_a1v8 of res_XYX { __DEFAULT -> (# ipv_a1v1, res_XYX #) }
      }
      }
      }
      }

atomicModifyIORef' :: forall a b. IORef a -> (a -> (a, b)) -> IO b
[GblId,
 Arity=3,
 Caf=NoCafRefs,
 Str=<S(S),1*U(U)><L,U><S,U>,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=True)
         Tmpl= Data.IORef.atomicModifyIORef'1 `cast` <Co:17>}]
atomicModifyIORef' = Data.IORef.atomicModifyIORef'1 `cast` <Co:17>

This one also looks right.

simonmar accepted this revision.Jul 13 2018, 7:50 AM

Core looks good. Would you mind adding tests? (ok to do it in a separate diff)

This revision is now accepted and ready to land.Jul 13 2018, 7:50 AM
dfeuer updated this revision to Diff 17310.Jul 13 2018, 1:48 PM
  • Tweak the definition of atomicModifyIORef'
dfeuer requested review of this revision.Jul 13 2018, 1:54 PM

@simonmar, I realized last night that my nice, simple definition of atomicModifyIORef' could lead to extra thunks in certain cases. I believe I've cleaned that up, but I'd appreciate a second look.

I just realized my last change makes a slight semantic difference, but one I expect is probably tolerable. With the old version,

atomicModifyIORef' ref \x -> (x + 1, undefined)

would increment the IORef and throw an exception in the current thread. With the new definition, it will install an undefined value in the IORef. If that's bad, I can change the definition to only force the new value on the inside and force the result on the outside. That would be simpler: no unsafeCoerce. But unless code is actually relying on this weird corner of the current behavior, I think the current version is somewhat better.

I've restored the original semantics for now. We can revisit that later. The latest version is less aggressive, but at least shouldn't have the earlier regression.

dfeuer updated this revision to Diff 17312.Jul 13 2018, 10:05 PM
  • Restore original semantics
dfeuer added inline comments.Jul 14 2018, 8:33 PM
libraries/base/GHC/ForeignPtr.hs
89

Why not

| HaskellFinalizers (IO ()) Finalizers

or even

| forall a. HaskellFinalizers (IO a) Finalizers

That would cut one constructor allocation when adding a finalizer.

324

Actually, what we want here is probably atomicModifyIORef'_. We can case match on the old value to figure out if we need to kill; no need to allocate a pair and a selector thunk.

347

We can actually do better here, I believe. This is almost a single CAS. The only reason it's not is that the IORef could contain a thunk that evaluates to NoFinalizers. I believe this happens when either:

  1. Someone imports this module and fiddles with internals, or
  2. foreignPtrFinalizer started in another thread, and the atomicSwapIORef call (not yet a primop) is in progress.

But we can certainly write it as a CAS loop if we want, and know that it'll soon be almost guaranteed to run once.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 15 2018, 9:19 AM
This revision was automatically updated to reflect the committed changes.