Rewrite FastString table in concurrent hashtable

Authored by watashi on Oct 7 2018, 10:22 AM.



Reimplement global FastString table using a concurrent hashatable with
fixed size segments and dynamically growing buckets instead of fixed size

This addresses the problem that mkFastString was not linear when the
total number of entries was large.

Test Plan


inplace/bin/ghc-stage2 --interactive -dfaststring-stats < /dev/null
GHCi, version 8.7.20181005:  :? for help
Prelude> Leaving GHCi.
FastString stats:
    segments:          256
    buckets:           16384
    entries:           7117
    largest segment:   64
    smallest segment:  64
    longest bucket:    5
    has z-encoding:    0%

Also comapre the two implementation using


The new implementation is on a par with the old version with different
conbination of parameters and perform better when the number of
FastString's are large.


Diff Detail

rGHC Glasgow Haskell Compiler
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.
watashi created this revision.Oct 7 2018, 10:22 AM

Wow! Some questions below.


perhaps call this numSegments?




So we use a load factor of 1? Did you experiment with other choices here?


Is MVar better than atomicModifyIORef''? Did you try both?

watashi planned changes to this revision.Oct 10 2018, 7:01 AM
watashi added inline comments.

We perform more IO including maybeResizeSegment in writes now, so MVar is just more natural to use without unsafePerformIO.

watashi updated this revision to Diff 18420.Oct 23 2018, 5:54 PM
  • comments on naming
  • use Array# (IORef FastStringTableSegment)
  • didn't see significant difference with different load factor, keep it as 1 as it's common and simple
  • the benchmark in exterm case looks better, it's hard to measure in normal build run as this part is usually not dominator
watashi marked 3 inline comments as done.Oct 23 2018, 5:56 PM
simonmar accepted this revision.Oct 24 2018, 8:27 AM

This is great. It could be even better if

  • We had your benchmark in the test suite, perhaps testsuite/tests/perf? We can use small parameters so it doesn't run for long.
  • We point to the benchmark from comments in the code, so that others can use it when making future changes.
This revision is now accepted and ready to land.Oct 24 2018, 8:27 AM
watashi updated this revision to Diff 18443.Oct 25 2018, 2:17 AM

add test and comment as @simonmar suggested:

  • we don't test perf in the test as we cannot see significant difference with small numbers;
  • it should be easy to write a benchmark on top of it;
  • we can verify that there is no duplicate in FastString table in the test.

Thanks for adding the test!


Say explicitly where this lives - testsuite/tests/utils/T14854.hs

watashi updated this revision to Diff 18471.Oct 26 2018, 12:50 PM

refer tests as full path

watashi marked an inline comment as done.Oct 26 2018, 12:51 PM
bgamari accepted this revision.Oct 28 2018, 11:39 AM

Let's merge this. Great work @watashi!

This revision was automatically updated to reflect the committed changes.