Always expose unfoldings for overloaded functions
Needs RevisionPublic

Authored by mpickering on Jan 6 2017, 7:03 AM.

Details

Summary

Users expect their overloaded functions to be specialised at call sites,
however, this is only the case if they are either lucky and GHC chooses to
include the unfolding or they mark their definition with an INLINABLE pragma.
This leads to library authors marking all their functions with INLINABLE (or
more accurately INLINE) so they ensure that downstream consumers pay no cost
for their abstraction.

A more sensible default is to do this job for the library author and give more
predictable guarantees about specialisation.

Empirically, I compiled a selection of 1150 packages with (a similar) patch applied. The total size of the interface files before the patch was 519mb and after 634mb. On modern machines, I think this increase is justified for the result.

mpickering retitled this revision from to Always expose unfoldings for overloaded functions..Jan 6 2017, 7:03 AM
mpickering edited the test plan for this revision. (Show Details)
mpickering added a reviewer: simonpj.
mpickering updated this object.
mpickering retitled this revision from Always expose unfoldings for overloaded functions. to Always expose unfoldings for overloaded functions.Jan 6 2017, 7:03 AM
mpickering updated this revision to Diff 10299.Jan 6 2017, 7:08 AM
  • missing import

Seems reasonable. But you say "I think this increase is justified for the result". Can you say what "the result" is. That is, does this change actually make things go faster?

Simon

Seems reasonable. But you say "I think this increase is justified for the result". Can you say what "the result" is. That is, does this change actually make things go faster?

Simon

I have no examples. My argument is something like

  1. The size of interface went up because more things were included.
  2. The inclusion of these functions will allow them to be specialised at call sites in other modules.
  3. Users expect functions to be specialised at call sites so this will make specialisation more predictable.
  4. The increase in the size of interface files is not too large.

Seems reasonable. But you say "I think this increase is justified for the result". Can you say what "the result" is. That is, does this change actually make things go faster?

Simon

I have no examples. My argument is something like

  1. The size of interface went up because more things were included.
  2. The inclusion of these functions will allow them to be specialised at call sites in other modules.
  3. Users expect functions to be specialised at call sites so this will make specialisation more predictable.
  4. The increase in the size of interface files is not too large.

What happens to compiler allocations in the performance testsuite?

It would be nice to have some nofib results. That being said, I suspect nofib would be see less of an improvement from this change than more "modern" Haskell code, which tends to make great use of overloading in performance-sensitive areas.

carter added a subscriber: carter.Jan 6 2017, 10:36 AM

How is this similar or different from specializable pragma? Does this replace needing that on newer ghc?

Additionally: one possible library that could be a stress test for this would be the vector-algorithms library, which I think currently does very very aggressive in lining to make sure specialization happens

In D2929#85268, @carter wrote:

How is this similar or different from specializable pragma? Does this replace needing that on newer ghc?

SPECIALISABLE creates a specialised version of the function at the site of the pragma which is then used at call sites so GHC only ever specialises once at that specific type. This exposes the unspecialised unfoldings so that if ever called, the GHC can use it to perform the specialisation as normal.

i was thinking of https://ghc.haskell.org/trac/ghc/ticket/12463, which seem to have not been implemented ever, or something like it

In D2929#85268, @carter wrote:

How is this similar or different from specializable pragma? Does this replace needing that on newer ghc?

I think this comment answers the question:

https://ghc.haskell.org/trac/ghc/ticket/12963#comment:6

In simplification, exposing unfoldings doesn't force specialization, while the proposed SPECIALISABLE, SPECIALISABLE_RECURSIVE, SPECIALISABLE_ALL (and INLINABLE and INLINE) does (and differs from SPECIALIZE in that it doesn't pertain only to a single type).

Maybe have a flag to control this, and switch it on with -O2?

Check out Specialise.lhs

Note [Specialise imported INLINABLE things]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What imported functions do we specialise?  The basic set is
 * DFuns and things with INLINABLE pragmas.
but with -fspecialise-aggressively we add
 * Anything with an unfolding template

Trac #8874 has a good example of why we want to auto-specialise DFuns.

We have the -fspecialise-aggressively flag (usually off), because we
risk lots of orphan modules from over-vigorous specialisation.
However it's not a big deal: anything non-recursive with an
unfolding-template will probably have been inlined already.

So these extra exposed unfoldings will only take effect if you have -fspecialise-aggressively on as well. (In fact it might be sensible to have one flag to control both exposing and exploiting overloaded unfoldings.)

So I think that you are currently increasing the size of interface files to no benefit whatsoever! That's partly why I'd like to see performance gains actually measured before doing this.

I vaguely recall that -fspecialise-aggressively made binaries a lot bigger before I got distracted with something else. By all means try it out!

Oh, and it'll make a lot more orphan modules (with RULES for specialised functions) which might also have an effect on compile time.

PS: worth making a ticket for this to track progress

To be clear about this Simon. The note says "What imported functions do we specialise?", it doesn't talk at all about which unfoldings are included in the interface files to begin with from my reading?

This lines up with my intuition as if we already included all usefully specialisable things in interface files then adding the additional clause to show_unfolding shouldn't have any effect?

mpickering updated the Trac tickets for this revision.Jan 9 2017, 3:31 AM

To be clear about this Simon. The note says "What imported functions do we specialise?", it doesn't talk at all about which unfoldings are included in the interface files to begin with from my reading?

That's true. So the patch will indeed expose more unfoldings. But without additionally specifying fspecialise-aggressively those unfoldings will no effect.

This lines up with my intuition as if we already included all usefully specialisable things in interface files then adding the additional clause to show_unfolding shouldn't have any effect?

I'm sorry, but I don't understand the question.

I understand why you are confused now Simon. I was under the impression that if an unfolding was in an interface file that it would be specialised in other modules. In fact, this isn't the case, only things which are either DFunIds or marked INLINABLE are able to be specialised.

The motivation for this ticket is that if you go to any popular library, ALL definitions are marked as INLINE. @danharaj has also experimented with a plugin which marks every single identifier as INLINABLE and using -fspecialise-aggresively. It is something that is appearing downstream more and more.

OK, so you understand now?

Currently you need both (a) to expose the unfolding and (b) have -fspecialise-aggressively for downsteam specialisations to always happen.

I recall someone saying that -fspecialise-aggressively cost a lot of compile time but I have no clear picture of why, or how big a problem it is.

I did some testing yesterday after some prompting by @rwbarton. I modified the code to essentially always turn on -fspecialise-aggresively, it is on the wip/all-inlinable branch.

Here is a harbourmaster build with the changes, three tests significantly improved and one regressed.
https://phabricator.haskell.org/harbormaster/build/17926/?l=100

Here is a nofib run
https://perf.haskell.org/ghc/#compare/6c869f906b879bc746ea1aa3e79e02f146d85093/9dec87ea33e7b398db7d86a30756d1d655895e8b

Compile time of GHC increased by 14%.

The size of libGHC.a increased from 118mb to 145mb.

In any case, I think that it could be worthwhile to add these options to -O3 so that users can try them out easily if they really want to?

mpickering updated this revision to Diff 10371.Jan 10 2017, 4:18 AM
  • Unused identifiers

I did some testing yesterday after some prompting by @rwbarton. I modified the code to essentially always turn on -fspecialise-aggresively, it is on the wip/all-inlinable branch.

Here is a harbourmaster build with the changes, three tests significantly improved and one regressed.
https://phabricator.haskell.org/harbormaster/build/17926/?l=100

Here is a nofib run
https://perf.haskell.org/ghc/#compare/6c869f906b879bc746ea1aa3e79e02f146d85093/9dec87ea33e7b398db7d86a30756d1d655895e8b

Compile time of GHC increased by 14%.

The size of libGHC.a increased from 118mb to 145mb.

In any case, I think that it could be worthwhile to add these options to -O3 so that users can try them out easily if they really want to?

This sounds quite reasonable to me given we currently don't even have a -O3.

bgamari requested changes to this revision.Feb 14 2017, 4:27 PM

Bumping out of the review queue for now.

This revision now requires changes to proceed.Feb 14 2017, 4:27 PM
austin resigned from this revision.Nov 9 2017, 11:32 AM