Nested CPR
Needs ReviewPublic

Authored by mpickering on Nov 29 2017, 6:27 PM.

Details

Summary

This patch implements an extension to the constructed product result analysis which recursively unpacks nested products.

For example, previously if a function foo :: Int -> (Int, Int) had the cpr property then the worker would have type foo :: Int -> (# Int, Int #) leaving the Ints boxed. With nested CPR, we also unbox the integers (if they have the correct demand) to leave a worker with type foo :: Int -> (# Int#, Int# #).

nomeata originally implemented this patch which was subsequently carefully refined by akio. mpickering and sgraf then rebased and polished the patch.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Branch
arcpatch-D4244_1
Lint
Lint WarningsExcuse: todo
SeverityLocationCodeMessage
Warningcompiler\basicTypes\Demand.hs:26TXT3Line Too Long
Warningcompiler\basicTypes\Demand.hs:1031TXT3Line Too Long
Warningcompiler\basicTypes\Demand.hs:1086TXT3Line Too Long
Warningcompiler\basicTypes\Demand.hs:1980TXT3Line Too Long
Warningcompiler\basicTypes\Demand.hs:1983TXT3Line Too Long
Warningcompiler\coreSyn\CoreUnfold.hs:351TXT3Line Too Long
Warningcompiler\prelude\PrimOp.hs:568TXT3Line Too Long
Warningcompiler\prelude\PrimOp.hs:569TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:162TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:163TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:246TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:307TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:308TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:533TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:583TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:603TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:604TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:609TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:610TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:616TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:623TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:630TXT3Line Too Long
Warningcompiler\stranal\DmdAnal.hs:632TXT3Line Too Long
Warningcompiler\stranal\WwLib.hs:806TXT3Line Too Long
Warningcompiler\stranal\WwLib.hs:814TXT3Line Too Long
Unit
No Unit Test Coverage
Build Status
Buildable 18978
Build 39256: [GHC] Linux/amd64: Continuous Integration
Build 39255: [GHC] OSX/amd64: Continuous Integration
Build 39254: [GHC] Windows/amd64: Continuous Integration
Build 39253: arc lint + arc unit
There are a very large number of changes, so older changes are hidden. Show Older Changes
sgraf added inline comments.Dec 18 2017, 12:59 PM
compiler/stranal/DmdAnal.hs
1344

This seems a little dangerous to call for a HyperStrict id dmd, in which case forgetLazyCPR would not return a finite DmdResult here. Although I like the elegance of that formulation, that could make for hideous non-termination in pprTraceing behavior while debugging..

1356

I feel like nestedCPRResultForType should be moved to Demand.hs and ae_fam_envs extracted as a parameter.

1385

This needs some reformatting

docs/nested-cpr/nested-cpr.tex
131 ↗(On Diff #14862)

This can probably wait until we commited this, but the first 'return' should be deleted

Thanks for your review!

compiler/basicTypes/Demand.hs
1882

What you write makes sense to me. Did I write that Note? Not sure anymore what I was thinking.

compiler/stranal/DmdAnal.hs
730

That’s probably just an oversight on my part.

1299

This is a pattern where helper functions are simply called …1, see for example simplExprF and simplExprF1. But a better name is of course appreciated.

sgraf added a comment.EditedDec 19 2017, 7:16 AM

I updated the test output for most of the failing test cases.

One which I was unsure about was Trac #4908 which is a test for SpecConstr. The core looks quite similar to the expected output but the SpecConstr rule is not present.

See P165 for the new core and P166 for the expected output.

This is an artifact of the rebase. P168 is the current master's output and P169 is nested-cpr's. It seems like the worker f_$wf has been inlined into f, so f calls f_$s$wf (which came into life as a specialisation of the worker for the case that it's called with a reboxed snd tuple compenent) directly after unpacking and matching.
After that, the original worker is dead and so is the specialisation rule.

That one looks good to me! In fact, I wonder why current master doesn't do this.

sgraf added a comment.EditedDec 19 2017, 10:29 AM

I wonder why T5644 is broken.

It's testing for some RTS regression and consistently fails when I run it through the python test harness (make -C testsuite/ TEST=T5644), but succeeds when I do the compile and run manually. CLEANUP=0 suggests it's because of the exact regression it's ought to fix (e.g. exhausts its 20MB of heap space), but I can't reproduce.
Note that I can even run it successfully with -M4k instead of -M20m outside of the test harness.

sgraf added a comment.Dec 20 2017, 6:14 AM
In D4244#119192, @sgraf wrote:

I wonder why T5644 is broken.

It's testing for some RTS regression and consistently fails when I run it through the python test harness (make -C testsuite/ TEST=T5644), but succeeds when I do the compile and run manually. CLEANUP=0 suggests it's because of the exact regression it's ought to fix (e.g. exhausts its 20MB of heap space), but I can't reproduce.
Note that I can even run it successfully with -M4k instead of -M20m outside of the test harness.

I totally misinterpreted the test: It was *expected to fail* with an out of heap memory error, but it doesn't anymore, so it's actually an unexpected pass.
It seems that we can reproduce that behavior with -M1m. That still produces the expected error message, but I'm a little unsure if that still tests the regression outlined in T5644 or if that has been optimized away.

libraries/base/GHC/Float.hs
568

This should probably be removed before the merge and be put in a separate diff.

sgraf added a comment.Dec 20 2017, 8:56 AM

So I just installed arcanist and am playing around with it to get familiar.
I'm not sure what the workflow is. What I've done so far:

  1. Manually downloaded the diff
  2. git apply D4244.diff on master
  3. Fixed T4908 and T5644, also removed that INLINE pragma from GHC/Float.hs
  4. git checkout -b nested-cpr && git commit -m "Fixed 2 tests"
  5. arc diff --update 4244 HEAD~ at which point it asks me to proceed because I'm not the owner of this revision.

This feels very weird, because the diff doesn't contain any git metadata. Also I'm not sure where to upload the diff: update this revision or open a new one?

Any tips on workflow appreciated.

I attached an updated diff:

In D4244#119206, @sgraf wrote:

So I just installed arcanist and am playing around with it to get familiar.
I'm not sure what the workflow is. What I've done so far:

  1. Manually downloaded the diff
  2. git apply D4244.diff on master
  3. Fixed T4908 and T5644, also removed that INLINE pragma from GHC/Float.hs
  4. git checkout -b nested-cpr && git commit -m "Fixed 2 tests"
  5. arc diff --update 4244 HEAD~ at which point it asks me to proceed because I'm not the owner of this revision.

    This feels very weird, because the diff doesn't contain any git metadata. Also I'm not sure where to upload the diff: update this revision or open a new one?

    Any tips on workflow appreciated.

    I attached an updated diff:

You want to use arc patch D4244 to apply the differential (it will make a new branch unless you pass --nobranch).

After that, commit your changes and the final command is correct to update this diff.

You can also commandeer this differential if you want to take total ownership by selecting the right option in the drop down above the comment box.

In D4244#119189, @sgraf wrote:

I updated the test output for most of the failing test cases.

One which I was unsure about was Trac #4908 which is a test for SpecConstr. The core looks quite similar to the expected output but the SpecConstr rule is not present.

See P165 for the new core and P166 for the expected output.

This is an artifact of the rebase. P168 is the current master's output and P169 is nested-cpr's. It seems like the worker f_$wf has been inlined into f, so f calls f_$s$wf (which came into life as a specialisation of the worker for the case that it's called with a reboxed snd tuple compenent) directly after unpacking and matching.
After that, the original worker is dead and so is the specialisation rule.

That one looks good to me! In fact, I wonder why current master doesn't do this.

If the test is meant to test SpecConstr then perhaps it should now be run with -fno-nested-cpr?

If it is also a useful test for nested-cpr then perhaps it should be duplicated into the right place and run as normal as well?

sgraf added a comment.EditedDec 20 2017, 4:45 PM

You want to use arc patch D4244 to apply the differential (it will make a new branch unless you pass --nobranch).

Thanks, works like a charm.

If the test is meant to test SpecConstr then perhaps it should now be run with -fno-nested-cpr?

If it is also a useful test for nested-cpr then perhaps it should be duplicated into the right place and run as normal as well?

Ah, right. I still think that we don't need to exclude nested-cpr, because the relevant specialisation (f_$s$wf for the call pattern (_, Int# _) ) still happens and doesn't happen for -fno-spec-constr (P170). Unrelated yet unfortunate is that the absent argument doesn't get optimized away... A second WW run could help here.

I don't know if the same applies to T5644, so I'll add -fcpr-depth=1 there.

sgraf updated this revision to Diff 14960.Dec 20 2017, 4:49 PM
  • Removed an INLINE statement in GHC.Float that should be part of another Diff
  • Set cpr-depth to 1 for T5644
  • Fixed T4908, which still tests for the right regression
sgraf updated this revision to Diff 14962.Dec 20 2017, 5:14 PM

Yuck, messed up the diff. Wrong base commit

sgraf added a comment.Dec 21 2017, 2:48 AM

Turns out the INLINE pragma I deleted on decodeFloat in libraries/base/GHC/Float.hs was a workaround for integerConstantFolding.
See P171 for the current simplifier output without the pragma: Actually, decodeFloat *was* inlined, but the worker for GHC.Integer.decodeDoubleInteger wasn't.

I'm not sure what went wrong here (inlining probably? annotate GHC.Integer.decodeDoubleInteger?), so a pointer for how to proceed would be appreciated.

In D4244#119248, @sgraf wrote:

Turns out the INLINE pragma I deleted on decodeFloat in libraries/base/GHC/Float.hs was a workaround for integerConstantFolding.
See P171 for the current simplifier output without the pragma: Actually, decodeFloat *was* inlined, but the worker for GHC.Integer.decodeDoubleInteger wasn't.

I'm not sure what went wrong here (inlining probably? annotate GHC.Integer.decodeDoubleInteger?), so a pointer for how to proceed would be appreciated.

Did you read the comments about this on the ticket? Trac #1600

sgraf added a comment.EditedDec 21 2017, 4:26 AM

Did you read the comments about this on the ticket? Trac #1600

Not with my particular issue in mind. Although there was some discussion about the S# :: Int# -> Integer constructor, I don't think it's related to that INLINE pragma: The fix there was that such constructor wrappers are now inlined by default.

I think the problem here is that the worker for decodeDouble_Integer amounts to $wdecodeDouble_Integer = decodeDouble_Int64 with a NOINLINE pragma and also has no CONSTANT_FOLDED pragma (P172). The INLINE pragma on decodeFloat only fixes the symptoms (probably by storing the unoptimized unfolding), but $wdecodeDouble_Integer will still not be folded in a number of places.

So CONSTANT_FOLDED should probably inherited to workers/specialisations, right?

Edit: Or rather the NOINLINE pragma (probably implied by CONSTANT_FOLDED) should be respected at call sites.

In D4244#119251, @sgraf wrote:

Did you read the comments about this on the ticket? Trac #1600

Not with my particular issue in mind. Although there was some discussion about the S# :: Int# -> Integer constructor, I don't think it's related to that INLINE pragma: The fix there was that such constructor wrappers are now inlined by default.

I think the problem here is that the worker for decodeDouble_Integer amounts to $wdecodeDouble_Integer = decodeDouble_Int64 with a NOINLINE pragma and also has no CONSTANT_FOLDED pragma (P172). The INLINE pragma on decodeFloat only fixes the symptoms (probably by storing the unoptimized unfolding), but $wdecodeDouble_Integer will still not be folded in a number of places.

So CONSTANT_FOLDED should probably inherited to workers/specialisations, right?

Edit: Or rather the NOINLINE pragma (probably implied by CONSTANT_FOLDED) should be respected at call sites.

CONSTANT_FOLDED is just a synonym for NOINLINE. I'll try to understand your comment tomorrow. Apart from this, what else does this patch need? Is it correct from your understanding? Should Simon have a look at it you think?

sgraf updated this revision to Diff 14969.Dec 22 2017, 10:45 AM
  • Using foldl' instead of foldl
  • Cleaned up dmdAnalVarApp
  • Using id instead of var for consistent variable names
  • Renamed extendAnalEnvs1 to extendAnalEnvsAssocs
  • Cutting the CPR depth in nestedCPRResultForType
  • Formating nits

CONSTANT_FOLDED is just a synonym for NOINLINE. I'll try to understand your comment tomorrow. Apart from this, what else does this patch need? Is it correct from your understanding? Should Simon have a look at it you think?

I'm not sure what a patch needs to be accepted, but by now I think it's functionally correct. Simon can probably take a look.

compiler/basicTypes/Demand.hs
1603

Let's try to re-enact why this is the right thing to do.

The postProcess* family of functions is called by dmdAnalStar, which is responsible for 'multiplying' with the DmdShell of the demand the argument is put under (e.g. if some thunk is used lazily, we analyse its RHS and weaken all strict demands in the result to lazy).

In the case of an application, we multiply with the shell of the argument demand, so if some function has a sig of <L>t (which means that the argument is not evaluated for the function to reduce to WHNF, otherwise halting problem), then we want to keep the t result, regardless if the argument diverges or not.

Which functions take arguments and surely terminate? Data Constructors: If we don't need to evaluate the argument (think: bomb) to construct some record, then we surely can't blow up. Note that this doesn't mean that the argument is completely absent (from a usage perspective), rather that we don't evaluate it to get to WHNF.

So this line is OK after all: A proper explanation would be in order, though.

If you modify the integerConstantFolding test to pass -fcpr-depth=1 then it still fails. So is this a mistake in the patch?

What was the motivation to remove the pragma from the patch? It seems like a good fix (or perhaps an INLINABLE pragma) to me.

docs/nested-cpr/nested-cpr.tex
1 ↗(On Diff #14962)

I think this file should be removed.

If you modify the integerConstantFolding test to pass -fcpr-depth=1 then it still fails. So is this a mistake in the patch?

What was the motivation to remove the pragma from the patch? It seems like a good fix (or perhaps an INLINABLE pragma) to me.

The problem is that GHC.Integer.Type was compiled with -fcpr-depth=3, so decodeDoubleInteger gets WW'd. decodeDoubleInteger is probably recognized by constant folding, its worker $wdecodeDoubleInteger is not.

mpickering added a comment.EditedDec 22 2017, 12:02 PM
In D4244#119280, @sgraf wrote:

If you modify the integerConstantFolding test to pass -fcpr-depth=1 then it still fails. So is this a mistake in the patch?

What was the motivation to remove the pragma from the patch? It seems like a good fix (or perhaps an INLINABLE pragma) to me.

The problem is that GHC.Integer.Type was compiled with -fcpr-depth=3, so decodeDoubleInteger gets WW'd. decodeDoubleInteger is probably recognized by constant folding, its worker $wdecodeDoubleInteger is not.

Thanks for clearing that up. Makes perfect sense.

In that case, is this not analogous to using RULES where you have to be careful in this regard? If that is so, adding an INLINABLE pragma would be the correct solution to ensure that the constant folding rule has chance to file before the wrapper gets inlined?

(I confirmed that an INLINABLE pragma works.)

sgraf added a comment.Dec 22 2017, 5:42 PM

Thanks for clearing that up. Makes perfect sense.

In that case, is this not analogous to using RULES where you have to be careful in this regard? If that is so, adding an INLINABLE pragma would be the correct solution to ensure that the constant folding rule has chance to file before the wrapper gets inlined?

(I confirmed that an INLINABLE pragma works.)

Thanks for the RULES analogy. INLINABLE indeed seems like the best solution, although I wonder why the NOINLINE/CONSTANT_FOLDED wrapper for decodeDoubleInteger is inlined in the first place. There are probably other functions that would need the same treatment by that reasoning.

mpickering added inline comments.Dec 23 2017, 6:06 AM
compiler/basicTypes/MkId.hs
494

Where does mAX_CPR_SIZE apply now?

compiler/stranal/DmdAnal.hs
618

I think this is where the check is now.

docs/users_guide/using-optimisation.rst
233

The default is actually 4.

mpickering updated this revision to Diff 14974.Dec 23 2017, 8:02 AM
  • Add test 1 from wiki page
  • Add akio's tests from the wiki page
  • Add back comment about mAX_CPR_SIZE
  • Correct -fcpr-depth default
  • Add back INLINABLE pragma to decodeFloat
mpickering updated this revision to Diff 14975.Dec 23 2017, 8:05 AM
  • Remove Nested CPR tex document

@dfeuer I added the tests from the wiki page now. It should be clear that *something* is happening as the strictness signature is from the extended domain. I don't think it is necessary to commit them separately from the rest of the patch.

mpickering updated this revision to Diff 14976.Dec 23 2017, 9:47 AM
  • Add space for clang cpp
  • rebase
mpickering retitled this revision from WIP: Nested CPR to Nested CPR.Dec 23 2017, 9:51 AM
mpickering edited the summary of this revision. (Show Details)
mpickering updated this revision to Diff 14977.Dec 23 2017, 4:06 PM
  • Fix OSX CPP
sgraf updated this revision to Diff 14987.Dec 27 2017, 4:34 PM
  • Elaborated Note [Termination information and arguments]
  • More in-depth inline description of CPRResult
sgraf added a comment.Jan 6 2018, 4:53 AM

These are the NoFib results counting instructions with cachegrind:

Digest (delta >= 0.2%):

ProgramAllocsInstrs
compress2-0.9%+0.5%
constraints0.0%-0.2%
fasta-0.0%-1.6%
fft+0.0%+0.4%
gamteb+0.0%-0.3%
gg-0.0%+1.2%
hidden+0.0%+0.2%
hpg+0.7%+1.6%
infer-1.2%+0.2%
k-nucleotide-0.2%-0.2%
linear-0.0%-0.2%
maillist-0.2%+0.1%
mandel+0.0%-2.0%
n-body-0.1%-0.3%
pic-0.6%-0.1%
reverse-complem-0.0%+1.1%
scs+0.0%+0.2%
solid-6.6%-3.6%
sphere0.0%-0.2%
x2n1-84.8%-11.6%
Min-84.8%-11.6%
Max+0.7%+1.6%
Geometric Mean-1.7%-0.2%
compiler/basicTypes/Demand.hs
1138

Done. Bounding by the configured nesting depth is probably the better behavior.

compiler/stranal/DmdAnal.hs
613

Done

730

This was due to handle_sum_cpr and handle_thunk_cpr calling Prelude.id. I instead inlined id at those call sites, so this is Done.

Thanks for the results!

In D4244#119697, @sgraf wrote:
x2n1-84.8%-11.6%

I believe x2n1 is, since a while, a basically non-allocating benchmark, so this enormous looking improvement might just be a small change in absolute numbers that somehow affects the main function.

nomeata added inline comments.Jan 29 2018, 10:29 PM
compiler/basicTypes/Demand.hs
1013

typo: “we cause” instead of “because”

This patch does a very good job on this program with nested state monads.

https://phabricator.haskell.org/P174

@mpickering, I have been periodically reminding @simonpj of this and he says he will get to it soon.

It would really help to have an overview of how all this works, perhaps in a Note.

Is https://ghc.haskell.org/trac/ghc/wiki/NestedCPR/Akio2017 up to date? And if so

  • Why do we need NeverReturns? Example!
  • I don't understand how the example 'strictness.hs' has a nested CPR property.

In general the examples on that page are super-helpful, but each needs more explanation. Eg. even the simplest simple.hs requires reasoning that the returned a is non-bottom, and hence a-1 is guaranteed bounded

Some comments... partial review only so far

compiler/basicTypes/Demand.hs
972

Dunno res means "I can't say whether this function converges or diverges, but if it converges it'll return a result of shape described by res.

974

Converges res is same, but guarantees convergence.

983

What is NeverReturns? Does not seem to describe a shape at all!

999

This is a good comment, but would be FAR easier to understand if you gave an example -- and an example of when it's not sound to do nested CPR

1549

This seems to be the unique raison d'etre for NeverReturns. If that is true, say so, and give an example of why (say) topRes won't do.

1882

This is all a bit mysterious. Would you like to agree a common Note that you understand?!

compiler/basicTypes/MkId.hs
494

What happened to dataConCPR????

compiler/coreSyn/CoreUnfold.hs
349

I don't get this. Why isn't the strictness signature of $WP enough to tell us this?

353

Is this orthogonal to everything else? It'd apply equally to any wrapper, even for an ordinary strict function wouldn't it? Did you measure its effect in isolation?

compiler/stranal/DmdAnal.hs
533

What is going on here? What is cutSigResult? How did you get rid of dmdTranformDataConSig?

simonpj added inline comments.Apr 2 2018, 6:24 PM
compiler/basicTypes/Demand.hs
1077

I know that I'm at fault here, but could you add a BNF somwhere for how to read strictness and CPR signatures?

compiler/stranal/DmdAnal.hs
236

I have absolutely no idea what is happening here, or why all this code duplication is necessary, or what it achieves.

Please, at least offer a concrete example!!

sgraf added a comment.EditedApr 3 2018, 8:53 AM

Added a few explanations where I could. I'm not sure I really understand everything in this diff, so we maybe should try some changes in isolation, as @simonpj suggested? Then we can rebase this based on those changes.

Here is a list of commits/ideas that could be tested in isolation:

Also (sorry for nagging) I still think that CPR and this diff in particular would benefit from being split off from the demand analyser or at least being moved into a pass separate from usage and strictness analysis.

compiler/basicTypes/Demand.hs
1549

The wiki page has this explanation to offer:

It's different from ThrowsExn in that it might perform some side effect. Indeed it might be an IO-performing infinite loop. The addition of this constructor is necessary to infer a nested CPR property for a tail- recursive function that does some I/O.

I'm not sure I understand what exactly makes this necessary.

1882

I elaborated the Note after I understood what this means. In particular, I wrote the following paragraph, explaining why this doesn't mean that a lazy arg has to be absent. Does this help?

A function like f x = Just x is lazy in its argument (assuming a WHNF demand <S,U> on a call like f x as usual), but always terminates, because the potential bottom is hidden inside the Just constructor. Thus, f gets a demand signature of <L,U>t.

compiler/basicTypes/MkId.hs
494

Relevant commit. Seems to be unleashed in dmdAnalVarApp (DmdAnal.hs:596) now.

compiler/coreSyn/CoreUnfold.hs
349

It should be. I'm afraid the original version of this code is over 4 years old, so it's hard to say if these changes are really needed.

353

Responsible commit, for reference.

compiler/stranal/DmdAnal.hs
236

It's a radical extension of the special case for Var here. The idea is that CPR is fundamentally a forward analysis: Analysing the scrutinee before alts means we know things about the case binder and the data-con binders. At the same time, strictness and usage are backward analyses.

This additional branch is a trade-off for better CPR information at a negligible cost for strictness and usage (I'm referring to that by cardinality). It will analyse the scrutinee before any RHS in a forward fashion for a case where there is (almost?) no cardinality info flowing from the alts to the scrutinee in backward style.

It will analyse forward if

  1. The scrutinee is non-trivial
  2. The scrutinee evaluates to a product (with only a single matching alt, consequently)

Cardinality info gained from this scenario is limited, because the scrutinee will be some kind of application (proof needed), where the product demand coming up from the alt isn't worth more than knowing that the scrutinee is evaluated at all: If case f a of p@(n, m) -> if n == 0 then p else (m,n) puts its scrutinee under demand <S(S(S)L),U>, this isn't worth more than <S,U> when we finally arrive at the callee f where we only have a demand signature for the <S,U> case anyway.

By contrast, CPR gets to know the DmdResult from fs demand signature which it can pass on to the case binder and data-con binders in the alt. This makes only sense for such applications and when it's a product we are constructing. For the above, if f a returned a DmdResult of m(tm(t),tm(t)) (hope I got the syntax right), we can annotate the case binder p with tm(tm(t),tm(t)) because in the alt we can assume that f a terminated. Also we get to know that n and m have the CPR property themselves: tm(t) for each one. Now we can see that both branches return a constructed product and, consequently, that the whole expression has the nested CPR property with m(tm(t),tm(t)) (m because of bothDmdType). If we had analysed backwards, we would have inferred no CPR at all, because we wouldn't know anything about p in the alt.

I agree that this is not obvious at all and cost me quite some time to understand when I first looked at this. Also I still find the amount of duplicate code frightening. See the comment below.

533

This is because this passes around infinite DmdResults, so that each usage can decide how much info it wants to 'consume' by cutSigResulting the sig to appropriate depth. See line 1344. While this is an elegant use of laziness, I find it quite surprising when suddenly pprTraces lead to non-termination.

As for why dmdTransformDataConSig vanished: I believe because Vars are now handled by dmdAnalVarApp (line 596), which doesn't call dmdTransform anymore, the logic ended up in there. That logic seems to take some shortcuts wrt. to saturation though. I'd expect it to look at dmd to respect incoming call demands, but it doesn't seem to. Instead it just looks at syntactic saturation, which is OK for CPR, I guess, but less so for usage/strictness. Question is if this matters somewhere. Seems very ad-hoc, though.

Indeed, I found https://github.com/ghc/ghc/commit/d87b14a3a6a07f0d70cdf8d4312c6b44e13c643d which confirms the last paragraph.

Like you, I am very uncomfortable about making changes that we don't understand. Let's not do that. Instead, perhaps we should treat this Phab as a source of ideas, and make fresh patches for individual pieces, as you suggest.

Let's make sure that everything is illustrated with a concrete example.

I'm entirely open to separating CPR, but I'm not sure what you intend. Do you really mean an entirely separate pass? That seems quite drastic, because it does fit very neatly into DmdResult. Or do you mean making DmdResult = (TermResult, CPRResult), that is, a pair treating termination and CPR separately? Also why do you feel so strongly that this will be a win?

Finally, Phab is a bad place to have this conversation. Trac and the wiki are better. We could continue on Trac Trac #1600 or start a new ticket. It would be fantastic to garbage collect the wiki pages too.

akio added a comment.Apr 23 2018, 1:32 AM

I'm really sorry to getting here so late, and making people (in particular @sgraf) decipher code I wrote. I needed some energy to come back to this problem, which I basically abondoned because it overwhelmed me.

I replied to some unanswered questions.

compiler/basicTypes/Demand.hs
983

NeverReturns means "there is no possible shape". It's the converse of NoCPR, which means "every shape is possible".

1549

Consider this function:

f :: Int -> IO Int
f n | n <= 0 = return $! n + 3
f n = do
  doIO n
  f (n - 10)

Here, suppose we don't know what doIO does. It may perform an observable action, and/or fail to terminate.

This f has a nested CPR property, and the worker could return a (# State# RealWorld, Int# #). However, without the change to defer_res above, the compiler is unable to spot the property, because in the first iteration of the fixpoint search, f (n - 10) gets the Diverges result, which is turned into topRes by defer_res. The use of NeverReturns fixes the problem by exploiting the following fact: if we know that r diverges, then we can at least say that someIO >> r will never return a value: it may or may not diverge, it may or may not throw an exception, and indeed it may do a lot of useful work before executing r. Still we know that this expression doesn't return a value. The NeverReturns constructor is introduced to capture this idea.

compiler/coreSyn/CoreUnfold.hs
349

The strictness signature of $WP will be just <S,U><S,U>m(,), and doesn't show the fact that it's a constructor (or a constructor wrapper). This information is insufficient for us to infer a nested CPR property for $WP x x.

compiler/stranal/DmdAnal.hs
253

No, we don't need it anymore because this is now done in dmdTransform.

1344

I agree that the use of infinite data structures seems suboptimal. I just wasn't able to find a better alternative.

michalt added a subscriber: michalt.Mon, May 7, 1:17 PM