ByteCodeAsm: Fix AMP performance regression
AbandonedPublic

Authored by bgamari on Jul 24 2015, 5:19 PM.

Details

Summary

D924 proposed to change,

mapM_ = foldr ((>>) . f) (return ())

to,

mapM_ = foldr ((*>) . f) (return ()) = traverse_

This, however, blew up the runtime of
:load testsuite/tests/ghci/should_run/ghcirun004.hs in ghci from 4
seconds to 4 minutes.

With some thinking and luck I managed to localize the issue to a
particular mapM_ in ByteCodeAsm.assembleBCO, which is called at the
Assemble monad. Overriding the definition of (*>) for Assemble
brought the behavior back to the expected regime.

I'll need pen and paper to work out why this works.

Test Plan

Validate

bgamari updated this revision to Diff 3661.Jul 24 2015, 5:19 PM
bgamari retitled this revision from to ByteCodeAsm: Fix AMP performance regression.
bgamari updated this object.
bgamari edited the test plan for this revision. (Show Details)
bgamari added reviewers: simonpj, ekmett, hvr.

Nice catch, how did you notice this performance regression?

Oh I see, it regressed so much that the test driver killed the test due to timeout.

simonpj edited edge metadata.Jul 27 2015, 1:41 AM

Please do work out why it works -- or at least produce a small example? Maybe there are other less-visible examples of the same bug. And there's danger that someone else will just put it back in a few yrs time if it's not well documented.

Simon

bgamari planned changes to this revision.Jul 27 2015, 3:34 AM

Right, I actually don't really intend on merging this (at least not as-is); I just wanted something to point to to demonstrate the fix.

For the record, here is a minimal example showing the shape of the problem. On my machine doTestM can be evaluated in essentially no time, whereas doTestA requires 800ms. If one compares the Core produced by ghc -O the reason becomes obvious,

-- from mapM_ using (*>)
mapA_3 :: Assembler ()
mapA_3 = Main.Pure ()

mapA_2 :: Assembler (() -> ())
mapA_2 = Main.Pure id

lvl4_r3JE :: Integer -> Assembler (() -> ())
lvl4_r3JE = \ _ -> Main.mapA_2

doTestA_go :: [Assembler Integer] -> Assembler ()
doTestA_go =
  \ ds_a1XT ->
    case ds_a1XT of _ {
      [] -> mapA_3;
      : y_a1XY ys_a1XZ ->
        let {
          m2_a1W7
          m2_a1W7 = doTestA_go ys_a1XZ } in
        $c>>=
          ($c>>= y_a1XY lvl4_r3JE)
          (\ x1_a1W8 -> $c>>= m2_a1W7 (\ x2_a1W9 -> Pure (x1_a1W8 x2_a1W9)))
    }

doTestA :: Assembler ()
doTestA = doTestA_go test

-- from mapM_ using (>>)
doTestM_go :: [Assembler Integer] -> Assembler ()
doTestM_go =
  \ ds_a1XT ->
    case ds_a1XT of _ {
      [] -> mapA_3;
      : y_a1XY ys_a1XZ ->
        let {
          k_a1VW
          k_a1VW = doTestM_go ys_a1XZ } in
        $c>>= y_a1XY (\ _ -> k_a1VW)
    }

doTestM :: Assembler ()
doTestM = doTestM_go test

Note that the applicative version requires three binds, whereas monad requires only one. Now I'll try to work out why.

For the record, here is a minimal example showing the shape of the problem. On my machine doTestM can be evaluated in essentially no time, whereas doTestA requires 800ms. If one compares the Core produced by ghc -O the reason becomes obvious,

-- from mapM_ using (*>)
mapA_3 :: Assembler ()
mapA_3 = Main.Pure ()

mapA_2 :: Assembler (() -> ())
mapA_2 = Main.Pure id

lvl4_r3JE :: Integer -> Assembler (() -> ())
lvl4_r3JE = \ _ -> Main.mapA_2

doTestA_go :: [Assembler Integer] -> Assembler ()
doTestA_go =
  \ ds_a1XT ->
    case ds_a1XT of _ {
      [] -> mapA_3;
      : y_a1XY ys_a1XZ ->
        let {
          m2_a1W7
          m2_a1W7 = doTestA_go ys_a1XZ } in
        $c>>=
          ($c>>= y_a1XY lvl4_r3JE)
          (\ x1_a1W8 -> $c>>= m2_a1W7 (\ x2_a1W9 -> Pure (x1_a1W8 x2_a1W9)))
    }

doTestA :: Assembler ()
doTestA = doTestA_go test

-- from mapM_ using (>>)
doTestM_go :: [Assembler Integer] -> Assembler ()
doTestM_go =
  \ ds_a1XT ->
    case ds_a1XT of _ {
      [] -> mapA_3;
      : y_a1XY ys_a1XZ ->
        let {
          k_a1VW
          k_a1VW = doTestM_go ys_a1XZ } in
        $c>>= y_a1XY (\ _ -> k_a1VW)
    }

doTestM :: Assembler ()
doTestM = doTestM_go test

Note that the applicative version requires three binds, whereas monad requires only one. Now I'll try to work out why.

bgamari abandoned this revision.Aug 3 2015, 11:17 AM

Abandoning this as we won't be redefining mapM_.