Inline the default `enumFromTo` and `enumFromThenTo` methods in `Enum` typeclass.
Changes PlannedPublic

Authored by sighingnow on Jun 9 2018, 2:46 AM.

Details

Reviewers
hvr
bgamari
Trac Issues
#15185
Summary

We have already marked enumFromTo and enumFromThenTo methods for Int
and Word type. But for Int32, Word32, Int64 type, (as well as Int16,
Word16 and such things), these two methods will fall back to the default
implmentations in Enum typeclass, which are not marked as INLINE.

The lack of INLINE breaks stream fusion, leading to degeneration in
performance for Int32, Int64 and similar types. See Trac Trac #15185.
This patch fixs that.

For benchmark program in Trac #15185:

fact1 :: Integral t => t -> t
fact1 n = product [1..n]

fact2 :: Integral t => t -> t
fact2 n = go 1 n
  where go acc 1 = acc
        go acc n = go (acc * n) (n - 1)

Without INLINE, we have:

fact1 20/Int                              mean 19.27 ns  ( +- 437.5 ps  )
fact1 20/Word                             mean 19.85 ns  ( +- 350.5 ps  )
fact1 20/Int64                            mean 156.8 ns  ( +- 3.221 ns  )
fact1 20/Word64                           mean 836.9 ns  ( +- 15.94 ns  )

We can see a huge gap between the performance of Int and Int64, as well as
Word and Word64.

After marking these methods (especially the default implementations in
Enum class), we have:

fact1 20/Int                              mean 21.20 ns  ( +- 460.2 ps  )
fact1 20/Word                             mean 19.72 ns  ( +- 446.8 ps  )
fact1 20/Int64                            mean 22.35 ns  ( +- 531.1 ps  )
fact1 20/Word64                           mean 632.1 ns  ( +- 16.81 ns  )

Now the Int64 has the same performance with the Int case. The variance
introduced by outliers is a bit of inflated, but the result is enough to
demonstrate the problem and the performance improvement.

We still cannot optimize the Word64 case to have the similar measures
with Word. The enumFromTo for Word64 (and for Word32 on 32-bit platform)
is integralEnumFromTo, whose overhead is significant. However the INLINE
does make some improvement.

The enumFromTo method for Int16, Word16 and such things will also
benefited from the INLINE pragma.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

Test Plan

[skip ci]

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
enum-inline
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 21155
Build 47494: arc lint + arc unit
sighingnow created this revision.Jun 9 2018, 2:46 AM

Thanks for doing this!

The lack of INLINE breaks stream fusion, leading to degeneration in performance for Int32, Int64 and similar types

Do you understand *why* the lack of INLINE breaks fusion. It's be really good to give a little example to illustrate the steps, and where they break. Otherwise it's too much like alchemy... "try it and see".

Also could you add this info in a Note? Otherwise someone in 5 years time might wonder what those INLINE pragmas are doing, and remove them. (Yes, if they are careful enough they could do a git-annotate, but it's not so clear.)

Thanks for doing this!

The lack of INLINE breaks stream fusion, leading to degeneration in performance for Int32, Int64 and similar types

Do you understand *why* the lack of INLINE breaks fusion. It's be really good to give a little example to illustrate the steps, and where they break. Otherwise it's too much like alchemy... "try it and see".

Also could you add this info in a Note? Otherwise someone in 5 years time might wonder what those INLINE pragmas are doing, and remove them. (Yes, if they are careful enough they could do a git-annotate, but it's not so clear.)

Expressing these concerns in tests would be great on top. Surely coming up with a perf test that allocates drastically more when fusion breaks shouldn't be difficult, right?

sighingnow planned changes to this revision.Jun 13 2018, 5:48 AM

I will look into how INLINE works here and add perf tests for this ticket.