fix -ddump-splices to parenthesize ((\x -> x) a) correctly

Authored by rodlogic on Jul 30 2015, 2:17 PM.

Diff Detail

rGHC Glasgow Haskell Compiler
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.
rodlogic updated this revision to Diff 3715.Jul 30 2015, 2:17 PM
rodlogic retitled this revision from to fix -ddump-splices to parenthesize ((\x -> x) a) correctly.
rodlogic updated this object.
rodlogic edited the test plan for this revision. (Show Details)
rodlogic updated the Trac tickets for this revision.
bgamari requested changes to this revision.Jul 31 2015, 3:26 AM
bgamari edited edge metadata.

It looks like you still need to update the testsuite. See for details.

This revision now requires changes to proceed.Jul 31 2015, 3:26 AM
rodlogic added a comment.EditedJul 31 2015, 10:50 AM

My bad. I could swear I did run validate.

Now, looking at the tests that failed, it seems that this patch is not in good shape as it is adding parenthesis on expressions when not really needed. I will take a second look.

osa1 added a subscriber: osa1.Aug 2 2015, 12:49 PM

This patch actually fixes some illegal syntax made it to the testsuite -- we have lots of cases like this(D1067 fixed lots of cases like this, for example).

For example, quick look at the validation failure showed that this patch makes this change:

-      it <- ghciStepIO :: IO a -> IO a (Just <$> _)
+      it <- (ghciStepIO :: IO a -> IO a) (Just <$> _)

which makes the validate fail even though it's actually a fix.

If you decide not to work on this further please ping me and I can give it a try.

osa1 added a comment.Aug 2 2015, 12:49 PM
This comment was removed by osa1.
rodlogic added a comment.EditedAug 2 2015, 1:22 PM

@osa1 I don't have many spare cycles, so if you know what to do, please go ahead.

Aside from fixing the test cases, I am not 100% why I am seeing, for instance, (return) () since hsExprNeedsParens properly disables parenthesis for HsVar. Although it is correct, it is a bit noisy for no reason.

Just to be a bit more specific, here are the failures:

--- ./th/T8987.stderr.normalised	2015-08-02 14:50:27.000000000 -0400
+++ ./th/T8987.comp.stderr.normalised	2015-08-02 14:50:27.000000000 -0400
@@ -2,4 +2,4 @@
     Exception when trying to run compile-time code:
-    Code: (>>) reportWarning ['1', undefined] return []
\ No newline at end of file
+    Code: (>>) reportWarning ['1', undefined] (return) []
\ No newline at end of file
*** unexpected failure for T8987(normal)
--- ./th/T5976.stderr.normalised	2015-08-02 14:50:21.000000000 -0400
+++ ./th/T5976.comp.stderr.normalised	2015-08-02 14:50:21.000000000 -0400
@@ -2,4 +2,4 @@
     Exception when trying to run compile-time code:
-      Code: error ((++) "foo " error "bar")
\ No newline at end of file
+    Code: (error) ((++) "foo " (error) "bar")
\ No newline at end of file
*** unexpected failure for T5976(normal)
--- ./th/T5358.stderr.normalised	2015-08-02 14:50:17.000000000 -0400
+++ ./th/T5358.comp.stderr.normalised	2015-08-02 14:50:17.000000000 -0400
@@ -3,7 +3,7 @@
     Exception when trying to run compile-time code:
       runTest called forall (t_0 :: *) . t_0 -> GHC.Types.Bool
     Code: do { VarI _ t _ _ <- reify (mkName "prop_x1");
-               ($) error ((++) "runTest called " pprint t) }
+               ($) error ((++) "runTest called " (pprint) t) }
     In the untyped splice:
       $(do { VarI _ t _ _ <- reify (mkName "prop_x1");
              error $ ("runTest called " ++ pprint t) })
\ No newline at end of file
*** unexpected failure for T5358(normal)
--- ./th/TH_runIO.stderr.normalised	2015-08-02 14:50:12.000000000 -0400
+++ ./th/TH_runIO.comp.stderr.normalised	2015-08-02 14:50:12.000000000 -0400
@@ -2,5 +2,5 @@
     Exception when trying to run compile-time code:
       user error (hi)
-    Code: runIO (fail "hi")
+    Code: (runIO) ((fail) "hi")
     In the untyped splice: $(runIO (fail "hi"))
\ No newline at end of file
*** unexpected failure for TH_runIO(normal)
--- ./th/TH_exn2.stderr.normalised	2015-08-02 14:50:11.000000000 -0400
+++ ./th/TH_exn2.comp.stderr.normalised	2015-08-02 14:50:11.000000000 -0400
@@ -3,4 +3,4 @@
     Exception when trying to run compile-time code:
       Prelude.tail: empty list
       Code: do { ds <- [d| |];
-                 return (tail ds) }
\ No newline at end of file
+               (return) ((tail) ds) }
\ No newline at end of file
*** unexpected failure for TH_exn2(normal)
--- ./th/TH_exn1.stderr.normalised	2015-08-02 14:50:11.000000000 -0400
+++ ./th/TH_exn1.comp.stderr.normalised	2015-08-02 14:50:11.000000000 -0400
@@ -3,4 +3,4 @@
     Exception when trying to run compile-time code:
       TH_exn1.hs:(9,4)-(10,23): Non-exhaustive patterns in case
-    Code: case reverse "no" of { [] -> return [] }
\ No newline at end of file
+    Code: case (reverse) "no" of { [] -> (return) [] }
\ No newline at end of file
*** unexpected failure for TH_exn1(normal)
goldfire added a subscriber: goldfire.

Let me know if I can be of help here.

@goldfire How is return in (return) [] represented in HsExpr? In general how do you debug HsExpr trees since there is no Show instance?

osa1 added a comment.Aug 3 2015, 1:33 PM

Hm, so I had a quick look at HsExpr.hs and I don't understand how can this printer works well without passing current precedence to printers to be able to add parens when necessary. Am I missing anything?

goldfire edited edge metadata.Aug 3 2015, 4:59 PM

This is more interesting than I thought.

The problem is that HsExpr is intended to be parsed Haskell source. Printing should thus be a breeze: the programmer had to insert all the necessary parens, so we just spit back out what the programmer said.

However, splicing in TH also produces HsExpr. Because it's not programmer-specified, the parentheses are unnecessary here, because no parser will ever look at it. But here we do want parentheses.

We could try to get the HsExpr pretty-printer to be smarter about all of this. But I think that's barking up the wrong tree -- ideally, the HsExprpretty-printer should be very dumb about parentheses. (Indeed, I believe that ppr_expr should never recur into pprParendExpr. But that's more than we need to accomplish today.)

Instead, I think a better approach is to have the Convert module add HsPar nodes. By doing this, we make it more possible that printing HsExpr will be exactly what the programmer wrote, while still getting valid output from TH.

I have no idea why you're seeing (return). Something is wrong somewhere. Personally, I would debug by adding extra debug output within ppr_expr, but you're under no obligation to figure out what's happening here.

Does that make sense?

@goldfire The notion that ppr_expr should be dumb seems reasonable, but there are a few cases at fault here:

ppr_expr (HsApp e1 e2)
  = let (fun, args) = collect_args e1 [e2] in
    hang (ppt_lexpr fun) 2 (sep (map pprParendExpr args))

ppr_expr (ELazyPat e)   = char '~' <> pprParendExpr e
ppr_expr (EAsPat v e)   = ppr v <> char '@' <> pprParendExpr e

ppr_expr (HsSCC _ (_,lbl) expr)
  = sep [ ptext (sLit "{-# SCC") <+> doubleQuotes (ftext lbl) <+> ptext (sLit "#-}"),
          pprParendExpr expr ]

ppr_expr (HsStatic e)
  = hsep [ptext (sLit "static"), pprParendExpr e]

Any reason not to fix these now? Or were you considering a separate ticket/change for that?

I will undo the current trivial fix and move to Convert.hs to try and resolve this issue there.

I just saw the following I missed before in HsSyn.hs:

HsSyn records exactly where the user put parens, with HsPar.
So generally speaking we print without adding any parens.
However, some code is internally generated, and in some places
parens are absolutely required; so for these places we use
pprParendExpr (but don't print double parens of course).

For operator applications we don't add parens, because the operator
fixities should do the job, except in debug mode (-dppr-debug) so we
can see the structure of the parse tree.

This is clear, but it seems a bit fragile (I understand that ppr_expr seems to be a low value area of the compiler so maybe there is no incentive to make it better considering all the other priorities).

In any case, it strikes me as interesting that there are 'other' areas in the compiler that generate code programmatically and without HsPar expressions. One other area that does the same is TH Syntax created programmatically. Another area that comes to mind is TH reification which provides a view into HsSyn by converting it back to TH Syntax. Shouldn't those 'other' areas of the compiler use a more abstract syntax for generating Haskell code programmatically in the compiler in the same way that TH does?

I recently came across and was wondering if a typed tagless interpretation could be used as a single user visible API for the programmatic generation of Haskell's AST in a way that is correct by construction and hygienic. Maybe it is possible to have HsSyn directly as one interpretation without the need for to/from conversions. What if this typed tagless interpretation could be exposed as a TH Syntax alternative? A bit off topic, but curious about opinions here :-)

In my opinion, removing all the pprParendExpr uses from within ppr_expr is a good idea, as long as we're then careful to insert HsPars everywhere code is generated. (The TcGenDeriv module is one such place.) Otherwise, ppr_expr is always going to be a bit wrong in corner cases, as we're seeing here. There's no reason not to do this now -- I just didn't want to say that fixing this ticket required doing all the other refactoring. If you want more work, though, go for it!

As to the tagless approach: I've skimmed the linked PDF, but I don't see exactly what you're proposing. If you care to flesh it out a little, it might be an interesting conversation.

Removing pprParendExpr from ppr_expr will cause the following test failures (and they didn't seem obvious):

Unexpected failures:
   deriving/should_fail          T4846 [stderr mismatch] (normal)
   generics                      GenDerivOutput [stderr mismatch] (normal)
   generics                      GenDerivOutput1_0 [stderr mismatch] (normal)
   generics                      GenDerivOutput1_1 [stderr mismatch] (normal)
   ghci/scripts                  T10248 [bad stderr] (ghci)
   indexed-types/should_compile  PushedInAsGivens [stderr mismatch] (normal)
   safeHaskell/safeLanguage      SafeLang10 [stderr mismatch] (normal)
   safeHaskell/safeLanguage      SafeLang17 [stderr mismatch] (normal)
   typecheck/should_fail         T8603 [stderr mismatch] (normal)
   typecheck/should_fail         tcfail177 [stderr mismatch] (normal)

So we need to tread carefully. For now, I will focus only on the fix in Convert.hs to generate HsPar. It seems that we need something similar to:

type Precedence = Int
appPrec, unopPrec, opPrec, noPrec :: Precedence
appPrec = 3    -- Argument of a function application
opPrec  = 2    -- Argument of an infix operator
unopPrec = 1   -- Argument of an unresolved infix operator
noPrec  = 0    -- Others

@goldfire I will try to get some more meet into this tagless idea and see if you guys think it has legs.

As I've said, I'm fine with skipping the rest of the refactoring.

But I don't quite agree with your choice of precedences within Convert. For example, an unresolved infix operator poses a real challenge.

In fact, in thinking about this all, I worry that the behavior of unresolved infix operators is ill-specified. Consider the TH AST for (1 + 2) * 3 vs 1 + (2 * 3) (without using ParensE), and consider mixing and matching the choice of UInfixE and InfixE for the two operators. I haven't a clue as to what would happen in the 4 scenarios. And expressions can, of course, be much larger. We'll have to get to the bottom of that before getting all of this right.

Depending on how deep you want to go here, it's also OK just to fix the case in the ticket. I don't want progress to be held up while we're stuck analyzing strange pathological cases!

rodlogic updated this revision to Diff 3778.Aug 5 2015, 7:43 PM
rodlogic edited edge metadata.

Now fixing Convert.hs instead since HsExpr's ppr_expr should be a one-to-one pretty-printer for HsExpr

rodlogic updated this revision to Diff 3779.Aug 5 2015, 7:44 PM
rodlogic edited edge metadata.

Reinserted empty line between comment and cvtl function

This fixes this ticket. But is it possible to do better? For example, it would still get (case mb of Just _ -> id; Nothing -> not) b parenthesized wrongly. I imagine that "doing better" would mean tracking precedences and such through another parameter to cvt.

Regardless of the "doing better" bit, this needs a regression test.


bgamari requested changes to this revision.Aug 7 2015, 11:18 AM
bgamari edited edge metadata.

Still blocked on a regression test.

This revision now requires changes to proceed.Aug 7 2015, 11:18 AM

@goldfire Certainly! I will get some free time soon to go back to this.

This revision was automatically updated to reflect the committed changes.

I've added a regression test and pushed this. Thanks @rodlogic!