rts: Specialize hashing at call site rather than in struct.
AbandonedPublic

Authored by Crazycolorz5 on Jun 24 2018, 1:05 AM.

Details

Reviewers
bgamari
erikd
simonmar
Trac Issues
#13165
Summary

Separate word and string hash tables on the type level, and do not store the hashing function.
Thus when a different hash function is desire it is provided upon accessing the table. This is worst case the same as
before the change, and in the majority of cases is better. Also mark the functions for aggressive inlining to improve performance.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
HashtableImprovement
Lint
Lint WarningsExcuse: Function prototypes.
SeverityLocationCodeMessage
Warningrts/Hash.c:56TXT3Line Too Long
Warningrts/Hash.c:152TXT3Line Too Long
Warningrts/Hash.c:174TXT3Line Too Long
Warningrts/Hash.c:188TXT3Line Too Long
Warningrts/Hash.c:259TXT3Line Too Long
Warningrts/Hash.c:287TXT3Line Too Long
Warningrts/Hash.c:305TXT3Line Too Long
Warningrts/Hash.c:336TXT3Line Too Long
Warningrts/Hash.c:350TXT3Line Too Long
Warningrts/Hash.h:49TXT3Line Too Long
Warningrts/Hash.h:51TXT3Line Too Long
Warningrts/Hash.h:87ExtJsonLint
Warningrts/Hash.h:104TXT3Line Too Long
Warningrts/Hash.h:105TXT3Line Too Long
Warningrts/Hash.h:106TXT3Line Too Long
Unit
No Unit Test Coverage
Build Status
Buildable 21478
Build 48768: [GHC] Linux/amd64: Continuous Integration
Build 48767: [GHC] OSX/amd64: Continuous Integration
Build 48766: [GHC] Windows/amd64: Continuous Integration
Build 48765: arc lint + arc unit
Crazycolorz5 created this revision.Jun 24 2018, 1:05 AM

Seems like a sensible thing to do.

Have you characterised how much of an effect this has on real hashing performance?

rts/Hash.h
14

Perhaps we should made these nominally distinct types (e.g. typedef struct { struct hashtable table; } StrHashTable;) to ensure they aren't accidentally swapped?

  • Removed inline in header due to reference of implementation of HashTable.

Seems like a sensible thing to do.

Have you characterised how much of an effect this has on real hashing performance?

I'm not sure how much of a difference it makes for the string hash table, nor am I sure if it has negative consequences on other, non-pointer hash tables (e.g. those for file handles). However, I did test it against the current public GHC build for compacting. See the attached file.

Also, I'm not able to get to it at the moment, but I'll try to address some of those lint errors.
Any convention for the function prototypes, though? Seems like splitting them up would just be a bit awkward.

rts/Hash.h
14

Check out Hash.c; they are different. In fact, that is how strhashtable is defined. I kept it in this style as that's how the original hashtable was defined.

simonmar requested changes to this revision.Jun 25 2018, 3:26 AM

Seems like a sensible idea. What is the effect on code size?

Some nits:

  • Please update the diff title to say what the change is.
  • Please add the benchmark results (or a link to them) to the summary
  • Please fix the long lines (we have an 80-column policy in the RTS)
This revision now requires changes to proceed.Jun 25 2018, 3:26 AM

Seems like a sensible idea. What is the effect on code size?

Some nits:

  • Please update the diff title to say what the change is.
  • Please add the benchmark results (or a link to them) to the summary
  • Please fix the long lines (we have an 80-column policy in the RTS)

This may be affected by other changes since the most recently publically released GHC, but here is the effect compiling with -O2 and -dynamic:
-rwxr-xr-x 1 crazycolorz5 users 18216 Jun 28 21:07 original_ghc
-rwxr-xr-x 1 crazycolorz5 users 18656 Jun 28 21:07 test
As you can see, there's about a 400 byte increase. I'm still unsure how much other compilation options make an effect.

Also, do you have any suggestions on what the norm is for prototypes that are over 80 columns long?

Crazycolorz5 retitled this revision from rts: address trac issue #13165. to rts: address trac issue #13165; Specialize hashing at call site rather than in struct..Jun 28 2018, 8:22 PM
Crazycolorz5 edited the summary of this revision. (Show Details)
  • Lint fix changes.

I'm unsure why the tests are failing. The logs seem to indicate library loading issues, but I haven't touched anything in the process of compilation to Core?

I'm unsure why the tests are failing. The logs seem to indicate library loading issues, but I haven't touched anything in the process of compilation to Core?

I would guess that this patch is subtly wrong. Afterall, the linker uses the RTS's hashtable implementation.

rts/Hash.h
14

Check out Hash.c; they are different.

Whoops, so they are.

  • Fixed expand always using hashWord.
Crazycolorz5 updated this revision to Diff 17163.EditedJul 2 2018, 11:41 PM
  • Fixed expand always using hashWord.

(Previous diff missed a change)

-rwxr-xr-x 1 crazycolorz5 users 18216 Jun 28 21:07 original_ghc
-rwxr-xr-x 1 crazycolorz5 users 18656 Jun 28 21:07 test
As you can see, there's about a 400 byte increase. I'm still unsure how much other compilation options make an effect.

What were you trying to measure here? This is a dynamically linked binary and so doesn't include the code of the RTS itself.

What were you trying to measure here? This is a dynamically linked binary and so doesn't include the code of the RTS itself.

Ah, I wasn't aware that the RTS could be dynamically linked.
When I compile it statically, the executable this GHC gives me is about 11x the size of the current GHC's, which I find highly dubious. I'll see if I can build it before these diffs and after and looking for a change in size.

Sorry for the delay. Aside from additional benchmarks (which I'll upload soon, after ghc builds), is there any additional work needed on this?

Crazycolorz5 edited the summary of this revision. (Show Details)Jul 13 2018, 8:53 PM

Just the benchmarks. Could you run the benchmarks in nofib please? See https://ghc.haskell.org/trac/ghc/wiki/Building/RunningNoFib

Just the benchmarks. Could you run the benchmarks in nofib please? See https://ghc.haskell.org/trac/ghc/wiki/Building/RunningNoFib

Took me a while to set up. Not sure if this is the correct thing. Let me know.

Ahh, good work finding the bug.

Just the benchmarks. Could you run the benchmarks in nofib please? See https://ghc.haskell.org/trac/ghc/wiki/Building/RunningNoFib

Took me a while to set up. Not sure if this is the correct thing. Let me know.

I suspect @simonmar was suggesting to run a nofib comparison; that is, run nofib before and after your patch and compare the two with nofib-analyse. Absolute nofib figures are rarely useful.

That being said, we have perf.haskell.org to do this for us. I have pushed this to wip/D4889 so it can work its magic. The results will eventually be found at https://perf.haskell.org/ghc/.

bgamari retitled this revision from rts: address trac issue #13165; Specialize hashing at call site rather than in struct. to rts: Specialize hashing at call site rather than in struct..Aug 2 2018, 4:33 PM

That being said, we have perf.haskell.org to do this for us. I have pushed this to wip/D4889 so it can work its magic. The results will eventually be found at https://perf.haskell.org/ghc/.

It looks like I neglected to actually apply the patch before pushing. I just pushed the patch to the above branch. Hopefully we'll have results in a few hours.