Use an IORef for QSemN

Replace the outer MVar in QSemN with an IORef. This should
probably be lighter, and it removes the need for uninterruptibleMask.

While the code no longer uses uninterruptibleMask, it remains possible for contention to lead to a thread being unkillable for a time. It's always possible for a thread to be descheduled between performing its atomicModifyIORef and forcing the result. In that case, it will leave a thunk in the IORef. In particular, signalQSemN can leave a good chunk of reversing work behind (it can also leave dead-thread-skipping work behind, but I think it's probably reasonable to assume there won't be too much of that). I don't think there's any way to give an absolute guarantee that no thread will be unkillable for long, but if that's an important goal, I think we can get pretty close by using a non-amortized queue. At first, I thought maybe we could just force the IORef contents at the beginning of each waitQSemN (before masking) but that would violate the ordering guarantee: threads attempting waitQSemN while an expensive thunk was being evaluated would be reordered arbitrarily. I don't know if the benefits of a fancier queue would be worth the price.

