Fix dataToTag# argument evaluation
ClosedPublic

Authored by osa1 on Oct 4 2018, 2:46 AM.

Details

Summary

See Trac #15696 for more details. We now always enter dataToTag# argument (done in
generated Cmm, in StgCmmExpr). Any high-level optimisations on dataToTag#
applications are done by the simplifier. Looking at tag bits (instead of
reading the info table) for small types is left to another diff.

Incorrect test T14626 is removed. We no longer do this optimisation (see
comment:44, comment:45, comment:60).

Comments and notes about special cases around dataToTag# are removed. We no
longer have any special cases around it in Core.

Other changes related to evaluating primops (seq# and dataToTag#) will be
pursued in follow-up diffs.

Test Plan

Validates with three regression tests

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
osa1 created this revision.Oct 4 2018, 2:46 AM
dfeuer added a subscriber: dfeuer.Oct 4 2018, 3:11 AM
dfeuer added inline comments.
compiler/codeGen/StgCmmExpr.hs
75

If the constructor is already known, we should surely be able to catch that in PrelRules, no? That particular situation seems much better to handle in core2core if we can.

83

Exactly. What I meant by "type-driven rewriting" is that we can use either a different primop or a primop taking an extra argument representing family size. We can rewrite that in core2core. Alternatively, we can make the determination in Core Prep or when lowering to STG. Any later looks dicy.

dfeuer added inline comments.Oct 4 2018, 3:16 AM
compiler/codeGen/StgCmmExpr.hs
83

Like I said before, I think either:

  1. Add a dataToTagSmall# primop and rewrite to that between the end of core2core and lowering to STG, or
  2. Replace the dataToTag# primop with one that takes an extra argument indicating whether the family is small.
osa1 added inline comments.Oct 4 2018, 5:19 AM
compiler/codeGen/StgCmmExpr.hs
75

Explained in https://ghc.haskell.org/trac/ghc/ticket/15696#comment:44. TLDR: Yes, we should be able to do this in Core, but somehow we don't.

We can remove this case but let's track this optimisation in another ticket.

83

Right, two primops for small and large families would work.

osa1 added inline comments.Oct 4 2018, 5:30 AM
compiler/codeGen/StgCmmExpr.hs
83

Actually, we don't need two primops. We can pass whatever information we want here in CoreToStg. That's simpler, no need to do any rewriting, no need for new primops, no worries about using the wrong primop etc.

osa1 updated this revision to Diff 18229.Oct 5 2018, 1:16 AM
  • Update
osa1 updated this revision to Diff 18230.Oct 5 2018, 1:24 AM
  • Reword
osa1 retitled this revision from Simplify dataToTag# and getTag handling to Fix dataToTag# argument evaluation.Oct 5 2018, 1:31 AM
osa1 edited the summary of this revision. (Show Details)
osa1 edited the test plan for this revision. (Show Details)
osa1 added a reviewer: dfeuer.Oct 5 2018, 1:49 AM
osa1 removed a subscriber: dfeuer.
dfeuer added a comment.Oct 5 2018, 1:57 AM

The small type optimization is low priority. I realized we never use dataToTag# for small types in Ord deriving anyway!

osa1 updated this revision to Diff 18235.Oct 5 2018, 11:04 AM
  • Remove more comments
  • Revert can_fail, put getTag argument back

I'd be really interested to see what effect this has on performance and code size too.

compiler/codeGen/StgCmmExpr.hs
70–76

This looks like it would do the right thing, but there are probably assumptions elsewhere that primops don't do any evaluation. e.g. I think isSimpleScrut will be wrong for dataToTag# and cgCase will make the wrong decision about whether to emit a heap check. Whether this leads to a crash or not I'm not sure, but it seems likely.

osa1 added a comment.Oct 7 2018, 3:20 AM

I'd be really interested to see what effect this has on performance and code size too.

Why do you think there will be any difference? We already generate a case around the argument to evaluate it, but we used to do it in Core and sometimes avoid it (incorrectly). Now we do it in Cmm. So the code should really be the same.

I can still test of course, but I'm just curious why you think this makes a difference in code size or performance.

compiler/codeGen/StgCmmExpr.hs
70–76

Ah, you're right! If I add a special case in isSimpleOp

isSimpleOp (StgPrimOp DataToTagOp) _ = return False

this validates!

What's the best way to deal with this? I thought what we do is very similar to seq# so checked the code for it. I see a special case in cgCase for seq#, but it doesn't look like a good place to add a special case for dataToTag# because we'd have to duplicate code for the general case in cgCase (with some simplifications).

What do you think about adding a special case in isSimpleOp for DataToTagOp? (the code shown above)

In D5201#143481, @osa1 wrote:

I'd be really interested to see what effect this has on performance and code size too.

Why do you think there will be any difference? We already generate a case around the argument to evaluate it, but we used to do it in Core and sometimes avoid it (incorrectly). Now we do it in Cmm. So the code should really be the same.

I can still test of course, but I'm just curious why you think this makes a difference in code size or performance.

The old code was omitting the eval in more cases. Some of those cases were wrong, yes, but some of them were OK - e.g. the example that @simonpj mentioned on the ticket where we have case x of y { _ -> case dataToTag# y of { ... }}. Measuring the difference would give us some indication of how much there might be to gain by further optimisations.

osa1 updated this revision to Diff 18245.Oct 7 2018, 3:33 AM
  • Update note again
  • Fix dataToTag# code gen
osa1 added a comment.Oct 7 2018, 6:13 AM

Nofib results:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
             CS           0.0%      0.0%     0.092     0.092      0.0%
            CSD           0.0%      0.0%     +1.1%     +1.2%      0.0%
             FS           0.0%      0.0%     0.189     0.189      0.0%
              S           0.0%      0.0%     +0.2%     +0.3%      0.0%
             VS           0.0%      0.0%     0.182     0.182      0.0%
            VSD           0.0%      0.0%     0.005     0.005      0.0%
            VSM           0.0%      0.0%     -0.7%     -0.5%      0.0%
           anna           0.0%      0.0%     0.039     0.039      0.0%
           ansi           0.0%      0.0%     0.000     0.000      0.0%
           atom           0.0%      0.0%     0.096     0.096      0.0%
         awards           0.0%      0.0%     0.000     0.000      0.0%
         banner           0.0%      0.0%     0.000     0.000      0.0%
     bernouilli           0.0%      0.0%     0.064     0.064      0.0%
   binary-trees           0.0%      0.0%     -0.8%     -0.8%      0.0%
          boyer           0.0%      0.0%     0.015     0.015      0.0%
         boyer2           0.0%      0.0%     0.005     0.005      0.0%
           bspt           0.0%      0.0%     0.003     0.003      0.0%
      cacheprof          +0.0%     +0.0%     0.128     0.129     -7.0%
       calendar           0.0%      0.0%     0.000     0.000      0.0%
       cichelli           0.0%      0.0%     0.028     0.028      0.0%
        circsim           0.0%      0.0%     -0.2%     -0.3%      0.0%
       clausify           0.0%      0.0%     0.013     0.013      0.0%
  comp_lab_zift           0.0%      0.0%     0.065     0.065      0.0%
       compress           0.0%      0.0%     0.053     0.054      0.0%
      compress2           0.0%      0.0%     0.046     0.047      0.0%
    constraints           0.0%      0.0%     +0.1%     +0.1%      0.0%
   cryptarithm1           0.0%      0.0%      0.0%     -0.2%      0.0%
   cryptarithm2           0.0%      0.0%     0.003     0.003      0.0%
            cse           0.0%      0.0%     0.000     0.000      0.0%
   digits-of-e1           0.0%      0.0%     -0.5%     -0.6%      0.0%
   digits-of-e2           0.0%      0.0%     +0.2%     +0.2%      0.0%
          eliza           0.0%      0.0%     0.000     0.000      0.0%
          event           0.0%      0.0%     0.051     0.051      0.0%
    exact-reals           0.0%      0.0%     -2.8%     -2.8%      0.0%
         exp3_8           0.0%      0.0%     0.089     0.089      0.0%
         expert           0.0%      0.0%     0.000     0.000      0.0%
 fannkuch-redux           0.0%      0.0%     -0.8%     -0.8%      0.0%
          fasta           0.0%      0.0%     -1.2%     -1.2%      0.0%
            fem           0.0%      0.0%     0.008     0.008      0.0%
            fft           0.0%      0.0%     0.013     0.013      0.0%
           fft2           0.0%      0.0%     0.016     0.016      0.0%
       fibheaps           0.0%      0.0%     0.009     0.009      0.0%
           fish           0.0%      0.0%     0.004     0.004      0.0%
          fluid           0.0%      0.0%     0.004     0.004      0.0%
         fulsom           0.0%      0.0%     0.099     0.099      0.0%
         gamteb           0.0%      0.0%     0.012     0.012      0.0%
            gcd           0.0%      0.0%     0.015     0.015      0.0%
    gen_regexps           0.0%      0.0%     0.000     0.000      0.0%
         genfft           0.0%      0.0%     0.011     0.011      0.0%
             gg           0.0%      0.0%     0.004     0.004      0.0%
           grep           0.0%      0.0%     0.000     0.000      0.0%
         hidden           0.0%      0.0%     0.141     0.141      0.0%
            hpg           0.0%      0.0%     0.059     0.059      0.0%
            ida           0.0%      0.0%     0.033     0.033      0.0%
          infer           0.0%      0.0%     0.019     0.019      0.0%
        integer           0.0%      0.0%     -1.4%     -1.4%      0.0%
      integrate           0.0%      0.0%     0.041     0.041      0.0%
   k-nucleotide           0.0%     +0.0%     +2.0%     +2.0%      0.0%
          kahan           0.0%      0.0%     0.133     0.133      0.0%
        knights           0.0%      0.0%     0.002     0.002      0.0%
         lambda           0.0%      0.0%     +0.9%     +0.9%      0.0%
     last-piece           0.0%      0.0%     +0.5%     +0.5%      0.0%
           lcss           0.0%      0.0%     0.152     0.152      0.0%
           life           0.0%      0.0%     0.085     0.086      0.0%
           lift           0.0%      0.0%     0.000     0.000      0.0%
         linear           0.0%      0.0%     +1.1%     +1.1%      0.0%
      listcompr           0.0%      0.0%     0.033     0.033      0.0%
       listcopy           0.0%      0.0%     0.036     0.036      0.0%
       maillist           0.0%      0.0%     0.021     0.021     +3.5%
         mandel           0.0%      0.0%     0.020     0.020      0.0%
        mandel2           0.0%      0.0%     0.001     0.002      0.0%
           mate           0.0%      0.0%     +0.2%     +0.2%      0.0%
        minimax           0.0%      0.0%     0.001     0.001      0.0%
        mkhprog           0.0%      0.0%     0.001     0.001      0.0%
     multiplier           0.0%      0.0%     0.035     0.035      0.0%
         n-body           0.0%      0.0%     -0.5%     -0.5%      0.0%
       nucleic2           0.0%      0.0%     0.022     0.022      0.0%
           para           0.0%      0.0%     0.102     0.102      0.0%
      paraffins           0.0%      0.0%     0.046     0.046      0.0%
         parser           0.0%      0.0%     0.009     0.009      0.0%
        parstof           0.0%      0.0%     0.002     0.002      0.0%
            pic           0.0%      0.0%     0.002     0.002      0.0%
       pidigits           0.0%      0.0%     -0.5%     -0.6%      0.0%
          power           0.0%      0.0%     0.126     0.126      0.0%
         pretty           0.0%      0.0%     0.000     0.000      0.0%
         primes           0.0%      0.0%     0.031     0.031      0.0%
      primetest           0.0%      0.0%     0.038     0.038      0.0%
         prolog           0.0%      0.0%     0.001     0.001      0.0%
         puzzle           0.0%      0.0%     0.046     0.046      0.0%
         queens           0.0%      0.0%     0.005     0.005      0.0%
        reptile           0.0%      0.0%     0.004     0.004      0.0%
reverse-complem           0.0%      0.0%     0.049     0.049      0.0%
        rewrite           0.0%      0.0%     0.005     0.005      0.0%
           rfib           0.0%      0.0%     0.004     0.004      0.0%
            rsa           0.0%      0.0%     0.009     0.009      0.0%
            scc           0.0%      0.0%     0.000     0.000      0.0%
          sched           0.0%      0.0%     0.008     0.008      0.0%
            scs           0.0%      0.0%     0.144     0.144      0.0%
         simple           0.0%      0.0%     0.063     0.063      0.0%
          solid           0.0%      0.0%     0.045     0.045      0.0%
        sorting           0.0%      0.0%     0.000     0.000      0.0%
  spectral-norm           0.0%      0.0%     +3.1%     +3.1%      0.0%
         sphere           0.0%      0.0%     0.015     0.015      0.0%
         symalg           0.0%      0.0%     0.003     0.003      0.0%
            tak           0.0%      0.0%     0.004     0.004      0.0%
      transform           0.0%      0.0%     0.123     0.123      0.0%
       treejoin           0.0%      0.0%     0.047     0.047      0.0%
      typecheck           0.0%      0.0%     0.095     0.095      0.0%
        veritas          +0.0%      0.0%     0.000     0.000      0.0%
           wang           0.0%      0.0%     0.034     0.034      0.0%
      wave4main           0.0%      0.0%     0.083     0.083      0.0%
   wheel-sieve1           0.0%      0.0%     0.184     0.184      0.0%
   wheel-sieve2           0.0%      0.0%     0.076     0.076      0.0%
           x2n1           0.0%      0.0%     0.001     0.001      0.0%
--------------------------------------------------------------------------------
            Min           0.0%      0.0%     -2.8%     -2.8%     -7.0%
            Max          +0.0%     +0.0%     +3.1%     +3.1%     +3.5%
 Geometric Mean          -0.0%     -0.0%     -0.0%     -0.0%     -0.0%

Note that -7.0% TotalMem is in cacheprof, which is known to be non-deterministic.

I checked maillist to see what changed. Cores and STGs are identical as the program doesn't have any dataToTag#s, so the difference is just noise. (one thing I realized in Cmm is that order of SRT entries are different between two versions, probably a non-determinism during compilation)

Code sizes don't change -- 0.0% in all programs.

So I think we don't introduce any regressions.

@simonmar anything else you want me to check?

osa1 updated this revision to Diff 18246.Oct 7 2018, 6:38 AM
  • Rebase

Looks good. Was that with the libraries recompiled too?

I imagine we won't see any difference with this set of benchmarks. If there's any difference at all it would probably only show up in an Ord-heavy workload.

osa1 added a comment.Oct 7 2018, 3:06 PM

Was that with the libraries recompiled too?

Right. I have two trees, with and without the patch. Both are built after a distclean and git clean -xfd.

I'll try comparing containers benchmarks. @dfeuer do you know any other libraries/programs that use dataToTag# that we can use to compare binary sizes?

simonmar accepted this revision.Oct 8 2018, 2:07 AM
simonmar added inline comments.
compiler/prelude/primops.txt.pp
3007–3011

"don't want to float it out, because that would cause the argument to be evaluated more eagerly"

This revision is now accepted and ready to land.Oct 8 2018, 2:07 AM

Lots of suggestions.

And we really need an overview Note like this

Note [dataToTag#]
~~~~~~~~~~~~~~~~~
The primop dataToTag# is unusual because it evaluates its argument.
Only `SeqOp` shares that property.  (Other primops do not do anything
as fancy as argument evaluation.)  The special handling for dataToTag#
is:

* There is a special case for DataToTagOp in StgCmmExpr.cgExpr,
  that evaluates its argument and then extracts the tag from
  the returned value.

* An application like (dataToTag# (Just x)) is optimised by
  dataToTagRule in PrelRules

* A case expression like
     case (dataToTag# e) of <alts>
  gets transformed t
     case e of <transformed alts>
  by PrelRules.caseRules; see Note [caseRules for dataToTag]

We may need more bullets when we've decided about ok-for-speculation etc.

compiler/codeGen/StgCmmExpr.hs
74

This is remarkably nice code!

Can I check, though? I think (but I am not certain) that cgIdApp will not actually enter a if its tag-bits say that it is already evaluated. (Like any normal case-expression.) Correct? It might be worth showing the Cmm that you expect to get, in a comment

563

See Note ...

compiler/coreSyn/CorePrep.hs
1614–1615

Are we actually making use of this optimisation in the code generator? If so where?

I think it is INCORRECT to do so, for the reasons we have uncovered in Trac #15696. That is, it's not enough for Core to think it's evaluated.

The only case where it IS ok is where we have a lexcially enclosing case like the example above; and that is something the code generator can track all by itself without help from the evaluated-ness flag.

So I think we should *stop* preserving the evluatedness flag in CorePrep.

But we should make a separate ticket for this.

compiler/coreSyn/CoreUtils.hs
1697

Hang on! dataToTag# x is *not* ok-for-speculation, unless x is; dataToTag# now evaluates its argument!!

So I think we need a special case for DataToTagOp just like that for SeqOp in CoreUtils.app_ok.

In fact I think we want applications of DataToTagOp to be NOT ok-for-speculation, even if the argument is evaluated. Reason: as we've seen, we need something stronger than "evaluated". And suppose we have this

\z. case x of y -> let v = dataToTag# y in ...

If we allow that let to stand (because dataToTag# y can't diverge), then after the binder-swap in SetLevels we'd get

\z. case x of y -> let v = dataToTag# x in ...

which does not satisfy the let/app invariant. Exactly the same is true for SeqOp.

Solutions (a) avoid binder-swapping in the lifted arguments of dataToTag and SeqOp, or (b) declare that dataToTag and SeqOp are never ok-for-speculation.

I incline towards (b) because it is simpler.

compiler/prelude/PrelRules.hs
1033

You are deleting this comment. Why? Isn't it still true? Or do we optimise (tagToEnum# (dataToTag# x)) to x? If so how?

compiler/prelude/primops.txt.pp
3007–3011

I don't think this is right. Consider this example (from the comment you deleted, although it remains as relevant as ever):

\z. case x of y -> let v = dataToTag# y in ...

The concern is apparently that the v-binding might float out.

But *exactly the same* would apply to

\z. case x of y -> let v =case y of I# y2 -> y2 in ...

So it would be suspicious to treat dataToTag# differently.

In fact the exprOkForSpeculation fix in CoreUtils will ensure that the code looks like

\z. case x of y -> case dataToTag# y of r -> ...

so all is well.

osa1 added inline comments.Oct 8 2018, 7:04 AM
compiler/codeGen/StgCmmExpr.hs
74

cgIdApp will not actually enter a if its tag-bits say that it is already evaluated

Exactly. Here's an example:

    // dataToTag#
    I64[Sp - 24] = c3si;   // CmmStore
    R1 = Main.main_a_closure;   // CmmAssign
    Sp = Sp - 24;   // CmmAssign
    if (R1 & 7 != 0) goto c3si; else goto c3sj;   // CmmCondBranch
c3sj: // global
    call (I64[R1])(R1) returns to c3si, args: 8, res: 8, upd: 24;   // CmmCall
c3si: // global
    _s3rj::I64 = %MO_UU_Conv_W32_W64(I32[I64[R1 & (-8)] - 4]);   // CmmAssign

I'll add a comment.

compiler/coreSyn/CorePrep.hs
1614–1615

Are we actually making use of this optimisation in the code generator? If so where?

I don't know how to answer this question. exprIsHNF is used in dozens of places in the simplifier (in Simplifier, CorePrep, WorkWrap, FloatIn ...). Do we need to review all use sites? It seems to me that using exprIsHNF (which uses evaldUnfolding) is mostly just changes when we evaluate an object, which does not cause segfaults or correctness issues (except when we do low-level stuff as in the case of dataToTag#, which we're fixing).

I'll file a ticket for this.

compiler/coreSyn/CoreUtils.hs
1697

Agreed -- I misunderstood the concern with floating out / speculation. I'll update.

I'm happy to do (b) too if you think that'd be better, but I'll need some pointers. However perhaps for the time being we should focus on getting this right (rather than fast/efficient).

compiler/prelude/PrelRules.hs
1033

I realized that this case is already somehow optimised so deleted the comment. Example:

{-# LANGUAGE MagicHash #-}

import GHC.Exts
import GHC.Prim

data T = T1 | T2
  deriving (Show)

main :: IO ()
main = print (tagToEnum# (dataToTag# a) :: T)
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

Compile with -O with GHC 8.6 and you'll get a Core like this:

-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
main_f
main_f = T2

-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
main_a
main_a = main_f

-- RHS size: {terms: 6, types: 1, coercions: 0, joins: 0/0}
main1
main1
  = case main_a of {
      T1 -> $fShowT4;
      T2 -> $fShowT2
    }

You get the same code with this patch.

I looked at rule firings and simplifier iterations. I see that dataToTag# rule fires once. At some point we have this core:

-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
f_a1ey
f_a1ey = T2

-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
a_a1ez
a_a1ez = f_a1ey

-- RHS size: {terms: 30, types: 24, coercions: 0, joins: 0/3}
main_s21k
main_s21k
  = case a_a1ez of lwild_s3pU {
      __DEFAULT ->
        case dataToTag# lwild_s3pU of wild_Xz { __DEFAULT ->
        let {
          lwild_s3pT
          lwild_s3pT = wild_Xz } in
        build (\ @ b_X21H c_X21J n_X21L -> foldr c_X21J n_X21L lvl_s3on)
        };
      T2 ->
        let {
          wild_Xz
          wild_Xz = 1# } in
        let {
          lwild_s3pT
          lwild_s3pT = wild_Xz } in
        build (\ @ b_X21H c_X21J n_X21L -> foldr c_X21J n_X21L lvl_s3or)
    }

in the next iteration this becomes:

-- RHS size: {terms: 20, types: 19, coercions: 0, joins: 0/0}
main_s21k
main_s21k
  = case a_a1ez of lwild_s3pU {
      T1 ->
        build (\ @ b_X21H c_X21J n_X21L -> foldr c_X21J n_X21L lvl_s3on);
      T2 ->
        build (\ @ b_X21H c_X21J n_X21L -> foldr c_X21J n_X21L lvl_s3or)
    }
compiler/prelude/primops.txt.pp
3007–3011

Hmm, I now realize that I misunderstood the concern here. I agree that we still don't want to float out bindings to dataToTag#. I'll revert this change, but make an update the original comment to make the example (hopefully) more clear.

simonpj added inline comments.Oct 8 2018, 7:29 AM
compiler/coreSyn/CorePrep.hs
1614–1615

I really did mean "in the code generator", not in coreprep, workwrap etc. The comment says:

* In the code generator if we have
     case x of y { Red -> e1; DEFAULT -> y }
  we can return 'y' rather than entering it, if we know
  it is evaluated (Trac #14626)

So if we are indeed doing this (returning , not entering) where is the test that does so?

Ah.... I have just looked at Trac #14626, which led me to Trac #14677, which led me to Trac #15155, which appears to be stalled.

So it looks as if

My conclusion: we should ignore the "evaluated" flag from Core, and just track case-binders in the code-gen

osa1 updated this revision to Diff 18254.Oct 8 2018, 7:44 AM
  • Update Note [dataToTag#]
  • Add a reference to Note [dataToTag#]
osa1 marked 10 inline comments as done.Oct 8 2018, 7:45 AM
osa1 added inline comments.Oct 8 2018, 8:06 AM
compiler/coreSyn/CorePrep.hs
1614–1615

My conclusion: we should ignore the "evaluated" flag from Core, and just track case-binders in the code-gen

Do you mean only when speculating seq# and dataToTag# (in app_ok), or in general?

dfeuer added inline comments.Oct 8 2018, 10:51 AM
compiler/coreSyn/CorePrep.hs
1614–1615

When we see an expression is evaluated, we just remove seq# altogether. Not sure what you mean.

My conclusion: we should ignore the "evaluated" flag from Core, and just track case-binders in the code-gen

Sorry -- I meant specifically and only in the context of Note [Preserve evaluated-ness in CorePrep] in CorePrep. That is

  • I think that *both* bullets in that comment are bogus
  • But actually I think we should still preserve evaluated-ness for the reasons described in Note [Preserve evaluatedness] in CoreTidy. That is, ot satisfy Core Lint (only).

But the code generator should IGNORE that info, as indeed it currently does.

osa1 updated this revision to Diff 18260.Oct 8 2018, 1:04 PM
  • Remove bogus comments in CorePrep
osa1 marked 5 inline comments as done.Oct 8 2018, 1:05 PM
osa1 added inline comments.
compiler/coreSyn/CoreUtils.hs
1697

declare that dataToTag and SeqOp are never ok-for-speculation

There's a special case for SeqOp in app_ok, and DataToTagOp is marked as can_fail so it's not OK for speculation. So I think we don't have to do anything for this.

simonpj added inline comments.Oct 8 2018, 4:12 PM
compiler/coreSyn/CoreUtils.hs
1697

But the can_fail thing is really bogus. Why treat SeqOp one way and DataToTagOp entirely differently?! I think we should just treat them both uniformly, in app_ok. And get rid of can_fail in DataToTag.

simonpj added inline comments.Oct 8 2018, 4:58 PM
compiler/prelude/PrelRules.hs
1033

I realized that this case is already somehow optimised so deleted the comment

You mean that's how it came out in this particular case? Or are you saying that you are sure that in the general case dataToTag# (tagToToEnum# e) always optimises to e? How?

I have tried to summarise the various moving parts in the ticket Trac Trac #15696

osa1 added inline comments.Oct 9 2018, 2:35 AM
compiler/prelude/PrelRules.hs
1033

are you saying that you are sure that in the general case

I'm not giving any guarantees but I think most cases are optimised. Here's a more general case where we don't know anything about what t is:

{-# LANGUAGE MagicHash #-}
{-# LANGUAGE ScopedTypeVariables #-}

import GHC.Exts
import GHC.Prim

data T = T1 | T2
  deriving (Show, Read)

main :: IO ()
main = do
    t :: T <- readLn
    print (tagToEnum# (dataToTag# t) :: T)

Both tagToEnum# and dataToTag# are optimised away.

-ddump-simpl-iterations and -dverbose-core2core don't give enough details, but with a little bit experimentation I found out that this program is optimised with the case rules for dataToTag# and tagToEnum# and the usual simplifier steps (the non-case rules for the primops are not used).

I also found out that (in simpl-iterations output) tagToEnum# is eliminated before dataToTag#. Also, neither of them are optimised away if I have 10 constructors in the type (with 9 constructor they're still both eliminated). I guess related with when case-of-case happens (e.g. how much duplication is allowed).

So I think it goes like this:

$cshow_a1Ti
  = \ (x_a2rT :: T) ->
      case x_a2rT of {
        T1 -> lvl_s4q4;
        T2 -> lvl_s4q8
      }

$cshow_a1Ti
  (case dataToTag# @ T x_a4pw of wild_XO { __DEFAULT ->
   tagToEnum# @ T wild_XO
   })

==> unfold cshow_a1Ti

case
  (case dataToTag# @ T x_a4pw of wild_XO { __DEFAULT ->
   tagToEnum# @ T wild_XO
   }) of T1 -> lvl_s4q4; T2 -> lvl_s4q8

==> case-of-case

case dataToTag# @ T x_a4pw of wild_XO {
  __DEFAULT -> case tagToEnum# @ T wild_XO of
                 T1 -> lvl_s4q4
                 T2 -> lvl_s4q8
}

==> apply tagToEnum# case rule

case dataToTag# @ T x_a4pw of wild_XO {
  __DEFAULT -> case wild_XO of
                 0# -> lvl_s4q4
                 1# -> lvl_s4q8
}

==> Not sure what happens here, but we somehow apply `dataToTag#` case rule, and take some other simplifier steps:

case x_a4pw of _ {
   T1 -> lvl_s4q4;
   T2 -> lvl_s4q8
}
osa1 edited the summary of this revision. (Show Details)Oct 10 2018, 1:29 AM
osa1 edited the test plan for this revision. (Show Details)
This revision was automatically updated to reflect the committed changes.