Provide an optimized replicateM_ implementation #11795
ClosedPublic

Authored by snoyberg on Apr 6 2016, 7:58 AM.

Details

Summary

In my testing, the worker/wrapper transformation applied here
significantly decreases the number of allocations performed when using
replicateM_. Additionally, this version of the function behaves
correctly for negative numbers (namely, it will behave the same as
replicateM_ 0, which is what previous versions of base have done).

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
snoyberg retitled this revision from to Provide an optimized replicateM_ implementation #11795.Apr 6 2016, 7:58 AM
snoyberg updated this object.
snoyberg edited the test plan for this revision. (Show Details)
snoyberg added a reviewer: bgamari.
snoyberg updated the Trac tickets for this revision.
snoyberg updated this revision to Diff 7196.Apr 6 2016, 8:24 AM
  • Add missing (-) import

Not wholly related but it seems like replicateM has the same problem with negative numbers?

I had serious tunnel vision when writing this patch. I don't mind doing a pass through the other functions in that module to see if similar optimizations and correctness tweaks could be applied, if the reviewers don't mind mixing that in here.

bgamari accepted this revision.Apr 6 2016, 12:30 PM

@snoyberg, good catch! It would be great if you could fix replicateM as well.

This revision is now accepted and ready to land.Apr 6 2016, 12:30 PM
snoyberg updated this revision to Diff 7198.Apr 6 2016, 12:55 PM
  • Add missing (-) import
  • Apply similar worker/wrapper and negative fixes to replicateM

No problem Ben, I've updated replicateM as well.

austin accepted this revision.Apr 6 2016, 10:56 PM
austin added a subscriber: simonpj.

Looks good. Generally I'd ask for a comment here, but for hackers the W/W transform here should be fairly obvious, I guess (independent of the upstream issue @simonpj mentioned, Trac #1168, which I think you're right, we're best not to confuse too much).

which is what previous versions of base have done

Then this change should also be merged to 8.0.1 as well (since it presumably regressed somewhat recently in HEAD-only?)

simonpj accepted this revision.Apr 7 2016, 8:01 AM
simonpj added a reviewer: simonpj.

As my notes above say, I would welcome a bit more explanatory help

libraries/base/Control/Monad.hs
214–220

I do think that a Note would help, shared between replicateM and repliacteM_, explaining why the definition looks like it does.

I *think* it may be that the local loop makes it possible to inline the entire definition (as hapens for foldr, for example) thereby specialising for particular arguments x. Is that in fact what happens, and is it the motivation for the change? If not, what is?

To be honest I still do not have a good understanding of why this code is fast and the previous code is slow. (E.g. the ticket has no side-by-side comparisons of the Core produced.) But just a clear explanation of why this is better would be fine. (Other than "it just works fast in 8.0")

bgamari added inline comments.Apr 7 2016, 8:07 AM
libraries/base/Control/Monad.hs
214–220

My understanding is that worker/wrapper here would allow us to unbox the counter accumulator. It's not clear to me that you gain anything from specialising a.

bgamari added inline comments.Apr 7 2016, 8:17 AM
libraries/base/Control/Monad.hs
214–220

Let's move this discussion to Trac Trac #11795 where it can be more easily found later.

bgamari added inline comments.Apr 7 2016, 8:36 AM
libraries/base/Control/Monad.hs
214–220

It's not clear to me that you gain anything from specialising a.

Ahh, but @simonpj said x not a. This would indeed be helpful so you could specialise of m. Excuse my mis-read.

snoyberg updated this revision to Diff 7216.Apr 8 2016, 3:10 AM
  • Add missing (-) import
  • Apply similar worker/wrapper and negative fixes to replicateM
  • Add a Note about worker/wrapper transform

I've updated the diff with a note, which borrows heavily from both @simonpj and @bgamari's comments.

nomeata added a subscriber: nomeata.Apr 8 2016, 4:11 AM
nomeata added inline comments.
libraries/base/Control/Monad.hs
194

What is written in this comment is not very specific to replicateM. The pattern of using a local recursive function happens all over the place.

Maybe it would make sense to make the note talk about this pattern in general, and then refer to it from here, and also from other instances. E.g. it could replace the comment in the code for foldr (in GHC.Base):

foldr            :: (a -> b -> b) -> b -> [a] -> b
-- foldr _ z []     =  z
-- foldr f z (x:xs) =  f x (foldr f z xs)
{-# INLINE [0] foldr #-}
-- Inline only in the final stage, after the foldr/cons rule has had a chance
-- Also note that we inline it when it has *two* parameters, which are the
-- ones we are keen about specialising!
foldr k z = go
          where
            go []     = z
            go (y:ys) = y `k` go ys
This revision was automatically updated to reflect the committed changes.