Optimize RTS HashTable (Fixes #13165)
Needs RevisionPublic

Authored by AnthonySuper on Aug 16 2018, 4:23 PM.

Details

Reviewers
bgamari
erikd
simonmar
Trac Issues
#13165
Summary

This patch re-writes the RTS Hash Table to use linear probing, which is
roughly twice as fast according to my (admittedly meager) benchmarks. I
have tested the system with Valgrind and I believe it to be free of
memory errors.

AnthonySuper created this revision.Aug 16 2018, 4:23 PM

Fix an unused variable error

During refactoring I apparently forgot to remove an index variable. This is now fixed.

One concern I have here is that changing the O(1) incremental expansion to a O(n) growth operation means that we have an unbounded unintnerruptible delay during compaction. This means that a compaction operation might delay GC arbitrarily, blocking all threads until the growth is complete.

One concern I have here is that changing the O(1) incremental expansion to a O(n) growth operation means that we have an unbounded unintnerruptible delay during compaction. This means that a compaction operation might delay GC arbitrarily, blocking all threads until the growth is complete.

That's definitely a pretty big concern of my current approach. I did it this way out of concern for cache-friendliness, although I'll be the first to admit that I don't really have the benchmarks to back my suspicions up. Another way to implement this would be to modify the current implementation to use linear probing within the buckets, although that might be a bit tricky. Should I try that instead?

One concern I have here is that changing the O(1) incremental expansion to a O(n) growth operation means that we have an unbounded unintnerruptible delay during compaction. This means that a compaction operation might delay GC arbitrarily, blocking all threads until the growth is complete.

That's definitely a pretty big concern of my current approach. I did it this way out of concern for cache-friendliness, although I'll be the first to admit that I don't really have the benchmarks to back my suspicions up. Another way to implement this would be to modify the current implementation to use linear probing within the buckets, although that might be a bit tricky. Should I try that instead?

It's up to you, but I think for compaction it's pretty important that insert is O(1) (or very close to it, traversing bucket chains notwithstanding). Not amortized O(1). Maybe there are alternative ways to do incremental growth?

bgamari requested changes to this revision.Oct 15 2018, 12:38 PM

Gently pinging @AnthonySuper.

This revision now requires changes to proceed.Oct 15 2018, 12:38 PM

Apologies for the period of silence there. In truth, I am not sure what implementation to use exactly. Getting non-amortized O(1) with good cache behavior is extremely tricky, since the easiest way to get good cache behavior is to have a large block of contiguous memory upon which all accesses are performed. This makes growing a table without doing a lot of copying quite difficult, unfortunately, and I don't really know of any good methods to do it. I'd love to hear suggestions, however!