Use a priority queue for Windows delays
Needs RevisionPublic

Authored by dfeuer on Jul 15 2018, 12:28 PM.

Details

Reviewers
simonmar
hvr
bgamari
Trac Issues
#15366
Summary

GHC.Conc.Windows used to store delays in a sorted list, leading
to quadratic time and space usage. Use a pairing heap instead.

dfeuer created this revision.Jul 15 2018, 12:28 PM

It doesn't look like this compiles.

Did you consider using GHC.Event.PSQ instead?

It doesn't look like this compiles.

Did you consider using GHC.Event.PSQ instead?

I'll get that fixed. I considered using that, but it seems to be more than we need here: we never search.

dfeuer updated this revision to Diff 17343.Jul 16 2018, 9:44 AM

Fix module name; add documentation.

dfeuer updated this revision to Diff 17346.Jul 16 2018, 12:20 PM

Remove module import cycle

dfeuer updated this revision to Diff 17347.Jul 16 2018, 1:39 PM

Fix -Wall

dfeuer updated this revision to Diff 17375.Jul 17 2018, 4:47 PM

Another fixup

@simonmar, I see what you likely mean about the PSQ implementation now: it has the advantage of being specialized to fixed-width priorities. We could probably find/create something similarly appropriate for the Windows module, but I don't have the time to invest in it right now (note: we'd need something with a merge operation to maintain the current behavior; the PSQs here don't currently have one).

I'm really not tied to a pairing heap. Something like a leftist heap might be better because it offers worst-case logarithmic dequeueing. Fortunately, it will be trivial to swap out one implementation for another once the queue is in a separate module.

simonmar added a subscriber: Phyx.Jul 18 2018, 2:43 AM

@Phyx should have a look at this. Also, can you run a benchmark? It's easy to make things theoretically faster but practically slower.

I believe there were people working on porting the IO manager to Windows which would presumably make this redundant, but I'm not sure what the status of those efforts is, perhaps @Phyx knows.

@Phyx should have a look at this. Also, can you run a benchmark? It's easy to make things theoretically faster but practically slower.

I believe there were people working on porting the IO manager to Windows which would presumably make this redundant, but I'm not sure what the status of those efforts is, perhaps @Phyx knows.

I cannot run a benchmark as I have no Windows machine. The pairing heap is simple enough that it shouldn't be (significantly) slower than even a list with just one or two elements. Since the order of (time and space) complexity is much better, that seems like a no-brainer. Note that this application is likely close to the worst case for an ordered list most of the time. Assuming delays are fairly evenly distributed, new delays are most likely to be added near the end of the list, meaning almost the entire spine has to be replaced for each added delay.

Phyx added a comment.Jul 18 2018, 4:38 PM

@Phyx should have a look at this. Also, can you run a benchmark? It's easy to make things theoretically faster but practically slower.

I have in principle nothing against this patch, even with the new I/O manager we'll be keeping the old one around for a big, particularly until libraries like network and directory etc can be updated.
However I too wonder about a non-synthetic use case for which this approach would be significantly faster (so where the list operations are the actual overhead). I was thinking about e.g. network but that uses threadWaitWrite.

I believe there were people working on porting the IO manager to Windows which would presumably make this redundant, but I'm not sure what the status of those efforts is, perhaps @Phyx knows.

I'm still working on that. It's the top priority for me for the next GHC release. I'm currently working on plumbing everything in GHC. Though I have been struggling with some scheduler issues which has slowed my progress a bit.
In any case, indeed that implementation has a completely different implementation for these delays, one that also works for the non-threaded runtime and gets merged with the completion port dequeuing code https://github.com/Mistuke/ghc/blob/io-manager-simple/libraries/base/GHC/Event/Windows/Thread.hs , that one just uses the existing PSQ.

Phyx added a comment.Jul 18 2018, 4:38 PM

@Phyx should have a look at this. Also, can you run a benchmark? It's easy to make things theoretically faster but practically slower.

I believe there were people working on porting the IO manager to Windows which would presumably make this redundant, but I'm not sure what the status of those efforts is, perhaps @Phyx knows.

I cannot run a benchmark as I have no Windows machine. The pairing heap is simple enough that it shouldn't be (significantly) slower than even a list with just one or two elements. Since the order of (time and space) complexity is much better, that seems like a no-brainer. Note that this application is likely close to the worst case for an ordered list most of the time. Assuming delays are fairly evenly distributed, new delays are most likely to be added near the end of the list, meaning almost the entire spine has to be replaced for each added delay.

If you have the a benchmark I can run it for you.

bgamari requested changes to this revision.EditedAug 6 2018, 12:02 PM

It sounds like this needs some more work (e.g. some performance characterisation at least)

This revision now requires changes to proceed.Aug 6 2018, 12:02 PM