Remove special casing of singleton strings, split all strings.
ClosedPublic

Authored by xnyhps on Sep 6 2014, 3:24 AM.

Details

Summary

exprIsConApp_maybe now detects string literals and correctly
splits them. This means case-statemnts on string literals can
now push the literal into the cases.

fix trac issue Trac #9400

Test Plan

validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Lint Skipped
Unit
Unit Tests Skipped
xnyhps updated this revision to Diff 502.Sep 6 2014, 3:24 AM
xnyhps retitled this revision from to Remove special casing of singleton strings, split all strings..
xnyhps updated this object.
xnyhps edited the test plan for this revision. (Show Details)
xnyhps added a reviewer: austin.
xnyhps updated the Trac tickets for this revision.
xnyhps updated this revision to Diff 503.Sep 6 2014, 3:33 AM
xnyhps edited edge metadata.
  • Fix too long lines.
simonpj edited edge metadata.Sep 6 2014, 3:49 AM

Looks good to me, although see my note above.

It seems surprising to me that unpackCString and unpackCStringUtf8 can be treated uniformly, since presumably they do something different. But if so, great; please make sure that the note explains this point.

Simon

compiler/coreSyn/CoreSubst.lhs
1186

Could you share these two cases, so that first you check for unpackCString#/unpackCStringUtf8 applied to a MachStr, and then you call an auxiliary function? The auxiliary function can do the BS.null check.

Also please add a careful Note [exprIsConApp_maybe on literal strings], which explains the issue, references the ticket.

xnyhps updated this revision to Diff 504.Sep 6 2014, 4:52 AM
xnyhps edited edge metadata.
  • Added simonpj's suggestions.
austin requested changes to this revision.Sep 6 2014, 8:54 AM
austin edited edge metadata.

This looks OK, but a test would be nice, based on the feedback in Trac #9400: if an example such as

T9400.hs
main = case "abc" of
    (x:xs) -> putStrLn xs

now produces nice core at -O0 (by pushing the unpack call through the case), then testing it should be easy: test the -ddump-simpl output. You can find a similar test here and follow it as an example: ./testsuite/tests/simplCore/should_compile/T4908.hs.

This revision now requires changes to proceed.Sep 6 2014, 8:54 AM
xnyhps updated this revision to Diff 508.Sep 6 2014, 9:44 AM
xnyhps edited edge metadata.
  • Added a test case for Trac #9400.
  • Use hasKey, which seems more idiomatic.
  • Add built-in rewrite rules for singleton/empty strings.
xnyhps added a comment.Sep 6 2014, 9:49 AM

Whoops, there's an error in all.T, I'll fix it.

xnyhps updated this revision to Diff 509.Sep 6 2014, 9:56 AM
xnyhps edited edge metadata.
  • Fix T9400

More notes from Simon

compiler/coreSyn/CoreSubst.lhs
1232

I would make this top level, with a type signatures
And especially I would give it a comment that explicitly says "See Note [blah]" since you have so carefully written this note (thank you) but you do not otherwise refer to it.

Simon

compiler/prelude/PrelRules.lhs
1114 ↗(On Diff #508)

Argh! You know what I'm going to say! Here's a new chunk of code but no Note [blah] to explain what it is doing. Why did you add this? What does it do?

And, doesn't it re-introduce the very problem that we solved by eliminating the size-1-string case in the desugarer?

xnyhps added a comment.Sep 6 2014, 2:27 PM

Okay. I was assuming that single char string literals would be a waste of space (why store the string with a terminating NUL byte if you know it will be only one character?), but I decided to actually test that. This is all on 64-bit OSX, so it might be slightly different on other platforms.

putStrLn "A" represented as "A"# turns into the assembly:

.const
.align 3
.align 0
c38E_str:
	.byte	65
	.byte	0

Every section is at least 8 byte-aligned, so this effectively takes up 8 bytes (this is surprising me somewhat, the compiled C code I've seen never seems to align literal strings).

However, adding the built-in rule back (so converting "A" into 'A':[]) turns into:

.data
.align 3
.align 0
.globl Main.main7_closure
Main.main7_closure:
	.quad	GHC.Types.C#_static_info
	.quad	65
.data
.align 3
.align 0
.globl Main.main6_closure
Main.main6_closure:
	.quad	:_static_info
	.quad	Main.main7_closure+1
	.quad	GHC.Types.[]_closure+1
	.quad	1

Despite having multiple single-char literals, each of them introduces a new block like Main.main6_closure. A quad is 8 bytes, so this turns into 6 * 8 = 48 bytes!

So I think it would be fine to conclude that a rule that converts singleton strings to 'c':[] would be a bad idea.

compiler/prelude/PrelRules.lhs
1114 ↗(On Diff #508)

You are right, this re-introduces the problem. I was hoping the problematic rule from text would now fire first, but looking at -ddump-rule-rewrites for xmlhtml this is not the case.

I'll revert this and look for a later place where this transformation can be done.

xnyhps updated this revision to Diff 511.Sep 6 2014, 2:43 PM
  • Revert "Add built-in rewrite rules for singleton/empty strings."
  • Fixed the test again.
xnyhps added a comment.Sep 6 2014, 3:56 PM

Okay, for my previous comment I somehow missed the setup section before the unpackCString# call, which inserts quite a lot of code. However, there is already a patch on Trac #7307 to make this code shared.

austin accepted this revision.Sep 6 2014, 8:21 PM
austin edited edge metadata.

LGTM

This revision is now accepted and ready to land.Sep 6 2014, 8:21 PM

I missed a big thing; see comments above.

Simon

compiler/coreSyn/CoreSubst.lhs
1180

Oh, wait! You are ignoring two arguments to 'go'!!! That can't possibly be right.

Moreover, you are deeply pattern-matching on the second argument of 'go, which is utterly different to the way that the other cases work. That can't be right either. Instead just let the existing App equation for 'go' push the argument on to the stack in the 'cont' argument.

Now you can put your new equation AFTER the go (Right sub) (Var v) equation. Now you only have to deal with (Left in_scope) in the first argument. And the App equation will have taken effect. So your new equation will be something like

go (Left in_scope) (Var fun) (CC [Lit (MatchStr str)] co)
   | fun `hasKey` unpackCStringIdKey ... etc..
  = dealWithStringLiteral fun str co

Now you have to deal with that coercion. But that's easy: just call dealWithCoercion from dealWithStringLiteral.

I'm sorry I missed all this the first time around. But really you should NEVER NEVER discard arguments of a function like this without knowing just what they do, and being sure that it is right to drop them on the floor.

Simon

Commented inline.

compiler/coreSyn/CoreSubst.lhs
1180

Sorry, I clearly didn't read this carefully enough. The Coercion made me think the arguments were only related to type variables, and string literals not being polymorphic I could ignore them. Will work on fixing this.

xnyhps updated this revision to Diff 523.Sep 11 2014, 10:30 AM
xnyhps edited edge metadata.
  • Made sure the coercion isn't ignored here, merged with the (Var fun) case.
xnyhps updated this revision to Diff 524.Sep 11 2014, 10:35 AM
  • Whitespace fix, improved comment.
xnyhps updated this revision to Diff 525.Sep 11 2014, 10:38 AM
  • Whitespace fix, improved comment.
Harbormaster failed remote builds in B802: Diff 523!

Much better!

Now I'd just like to see some test cases that actually exercise the exprIsConApp_maybe code.

Apart from that, good to go.

It's surprising how tricky a little change like this can be :-)

compiler/coreSyn/CoreSubst.lhs
1190

I'd put this last, after all the other guards (for data constructors and DFuns), because it's less common.

1241

You need [Type charTy] as the argument here, not [].

In D199#43, @simonpj wrote:

Much better!

Now I'd just like to see some test cases that actually exercise the exprIsConApp_maybe code.

There's already a tiny testcase. I could extend it to try some more different options, is that what you mean?

xnyhps updated this revision to Diff 527.Sep 11 2014, 3:12 PM
  • Added Simon's suggestions:
xnyhps updated this revision to Diff 528.Sep 11 2014, 3:49 PM
  • Fix whitespace in testcase stderr.

I just mean test cases that actually exercise the exprIsConApp_maybe code. Your current test does not, unless I am mistaken.

Simon

But adding trace statements shows that dealWithStringLiteral is called 6 times:

[1 of 1] Compiling T9400            ( T9400.hs, T9400.o )
dealWithStringLiteral
    Just (:,
          [GHC.Types.Char],
          [GHC.Types.C# 'a', GHC.CString.unpackCString# "bc"#])
dealWithStringLiteral
    Just (:,
          [GHC.Types.Char],
          [GHC.Types.C# 'b', GHC.CString.unpackCString# "c"#])
dealWithStringLiteral
    Just (:,
          [GHC.Types.Char],
          [GHC.Types.C# 'a', GHC.CString.unpackCString# "b"#])
dealWithStringLiteral
    Just (:,
          [GHC.Types.Char],
          [GHC.Types.C# 'b', GHC.Types.[] @ GHC.Types.Char])
dealWithStringLiteral
    Just (:,
          [GHC.Types.Char],
          [GHC.Types.C# 'd', GHC.CString.unpackCString# "efg"#])
dealWithStringLiteral
    Just (:,
          [GHC.Types.Char],
          [GHC.Types.C# 'a', GHC.CString.unpackCString# "b"#])

And the Core output of this test on GHC HEAD with this patch eliminates all case statements, while they are all evaluated at runtime with 7.8.3 (going to -O3 doesn't improve that).

simonpj accepted this revision.Sep 12 2014, 2:26 AM
simonpj edited edge metadata.

Oh ok, fine thank you!. Good to go.

austin closed this revision.Sep 16 2014, 7:58 AM
austin updated this revision to Diff 550.

Closed by commit rGHCfe9f7e408448 (authored by @xnyhps, committed by @austin).