Use a response file for linker command line arguments #10777
ClosedPublic

Authored by snoyberg on Aug 20 2015, 11:24 PM.

Details

Summary

On Windows, we're constrained to 32k bytes total for command line arguments.
When building large projects, this limit can be exceeded. This patch changes
GHC to always use response files for linker arguments, a feature first used by
Microsoft compilers and added to GCC (over a decade ago).

Alternatives here include:

  • Only use this method on Windows systems
  • Check the length of the command line arguments and use that to decide whether to use this method

I did not pursue either of these, as I believe it would make the patch more
likely to break in less tested situations.

Test Plan

Confirm that linking still works in general. Ideally: compile a very large
project on Windows with this patch. (I am attempting to do that myself now, but
having trouble getting the Windows build tool chain up and running.)

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
snoyberg updated this revision to Diff 3879.Aug 20 2015, 11:24 PM
snoyberg retitled this revision from to Use a response file for linker command line arguments #10777.
snoyberg updated this object.
snoyberg edited the test plan for this revision. (Show Details)
snoyberg updated the Trac tickets for this revision.
Phyx added a reviewer: Phyx.Aug 21 2015, 1:00 AM
Phyx added a subscriber: Restricted Project.
Phyx edited edge metadata.EditedAug 21 2015, 1:32 AM

@snoyberg Thanks for getting started on this patch. A couple of things:

  • I think in general patches should be made against the GHC Head and not the branches. I think the workflow is that they will be back-ported if required. Especially in this case since I think this is a new feature and not really a bug fix.
  • This patch only alleviates part of the problem. Particularly only when GHC makes a linker call. I think runCC and runGCC should also do this.
  • I don't know what the others think @thomie @bgamari but I think think response file support should be added universally. As in, GHC should also itself support accepting response files. This should also fix Trac #9685 and get us (probably) faster builds on windows.

I'm fine with if these are split up in different smaller tasks.

Macro ihavenostrongfeelings:

compiler/main/SysTools.hs
1278

Perhaps use the GHC temp management functions in this file? They should honor --keep-tmp-files as well which is very useful for debugging. Especially since you won't see the command anymore now. getTempDir and addFilesToClean should be used here instead of getTemporaryDirectory and removeFile.

1280

Perhaps use newTempName ? Although if we make a lot of calls to Ld, CC or GCC we'd get a lot of response files.. An alternate approach would be to print the command that will be before running the program (so printing what's in the response file).

I'm fine with sending to master first. However, I disagree with this being a feature and not a bug fix: users are currently unable to build their projects with GHC in many cases, and this will make it possible to build.

I'm happy to add this to runCc as well, but I'm unaware of a runGCC function.

Phyx added a comment.Aug 21 2015, 1:46 AM

Yes sorry, I read runCpp as runGCC for some reason... Yeah only runCc needs it.
The reason I think it's a feature is that it's failing not because of a mistake that GHC is making, but because of the limitations of one platform.
We're essentially providing a workaround. But It's up to the others to decide :)

awson added a subscriber: awson.Aug 21 2015, 2:35 AM

GNU tools expect response files containing forward slashes in paths. Thus the corresponding conversion should be added (as is made in example patch, see normslash function there).

Thanks for pointing that out, it does need to be resolved. However, I'm worried about converting all backslashes to forward slashes, since in theory we could generate a command line argument containing a backslash which is not a filepath. Instead, I'd like to take the route of escaping the backslashes (and for that matter, other problematic characters). Unfortunately, I've yet to find an actual spec from GCC on what the exact rules for a response file are.

My tentative escape function for now is:

escape x = concat
    [ "\""
    , concatMap
        (\c ->
            case c of
                '\\' -> "\\\\"
                '\n' -> "\\n"
                '\"' -> "\\\""
                _    -> [c])
        x
    , "\""
    ]

I'll continue looking for a spec from GCC, but if anyone else has information, please share.

awson added a comment.Aug 21 2015, 3:18 AM

In the context of example patch that conversion was perfectly safe because we put filepaths only into the response files.

Perhaps, it would be cleaner to put only object files paths into the response files? This way we should act on a higher level, e.g. working with o_files in the linkBinary' in DriverPipeline.hs.

snoyberg updated this revision to Diff 3887.Aug 21 2015, 3:19 AM
snoyberg edited edge metadata.
snoyberg removed rGHC Glasgow Haskell Compiler as the repository for this revision.

This diff now:

  • Applies the change to GCC as well as the linker
  • Escapes characters in the response file
snoyberg added inline comments.Aug 21 2015, 3:27 AM
compiler/main/SysTools.hs
1280

Being a novice at this, I can't really judge whether I should use newTempName or openTempFile. I'm fine making either change, I'd just like a little more firm guidance.

austin added inline comments.Aug 21 2015, 4:45 PM
compiler/main/SysTools.hs
1277–1288

I think this should use newTempName. It's kind of ugly inside but basically it will make sure the file is also cleaned up later for you automatically when GHC is run.

snoyberg updated this revision to Diff 3907.Aug 23 2015, 3:14 AM

Now using newTempName

compiler/main/SysTools.hs
1269

OK, updated.

After some testing with users: this patch improves the situation marginally, but does not fix the problem. Now, GHC is capable of calling gcc.exe correctly, but gcc.exe is not capable of calling itself due to the same path limit. This problem was identified back in 2012 and fixed in MinGW 4.7. So now it appears we'll also need to upgrade our GCC toolchain to make this work.

I can start playing with that myself already for testing purposes, but perhaps others can provide some insight into what that process would look like. I know there's been talk of doing this upgrade in the past.

Actually, from reviewing changes to master, it appears that change has already landed.

To summarize status: I believe this is ready to be merged. Together with updating the GCC toolchain, this can in fact fixed user problems (see stack Github issue. That change has also been assigned for backporting to 7.10.3, see: https://ghc.haskell.org/trac/ghc/ticket/10726#comment:11.

If there are any blockers left on this one, please let me know.

Can you try updating this diff with arc diff --update D1158. For some reason the continuous integration didn't kick off when you uploaded it the first time.

As mentioned by Phyx, it would be nice if ghc -v would still show the full linker command, as otherwise debugging linker issues becomes even harder than it already is.

compiler/main/SysTools.hs
1249

Calling addFilesToClean is not needed I think, because newTempName already calls filesToClean.

snoyberg updated this revision to Diff 3940.Aug 27 2015, 6:40 AM
  • Use a response file for linker and C compiler command line arguments Trac #10777

I just kicked off an update, I'm not sure if it did the right thing due to the change in branches (from ghc-7.10 to master, as discussed earlier).

it would be nice if ghc -v would still show the full linker command

I actually didn't notice that comment originally, but that is the current behavior of the patch.

Hm, this doesn't look right. Can you try arc diff HEAD^.

snoyberg updated this revision to Diff 3941.Aug 27 2015, 7:21 AM
snoyberg edited edge metadata.

Trying to move correctly to master branch

Ah, that seems to have done it, thank you for assisting a Phabricator novice out on this.

The call to addFilesToClean is not needed. This is in the output of ghc -v:

*** Deleting temp files:
Deleting: /tmp/ghc1276_0/ghc_10.rsp /tmp/ghc1276_0/ghc_10.rsp /tmp/ghc1276_0/ghc_9.rsp /tmp/ghc1276_0/ghc_9.rsp /tmp/ghc1276_0/ghc_8.o /tmp/ghc1276_0/ghc_7.s /tmp/ghc1276_0/ghc_6.rsp /tmp/ghc1276_0/ghc_6.rsp /tmp/ghc1276_0/ghc_5.o /tmp/ghc1276_0/ghc_4.c
Warning: deleting non-existent /tmp/ghc1276_0/ghc_10.rsp
Warning: deleting non-existent /tmp/ghc1276_0/ghc_9.rsp
Warning: deleting non-existent /tmp/ghc1276_0/ghc_6.rsp

@snoyberg: Could you mention in a Note:

  • why the escape function is needed. It's not obvious.
  • @awson's idea of putting only reslashed filenames in the response files (I guess it would work just the same?)

it would be nice if ghc -v would still show the full linker command

I actually didn't notice that comment originally, but that is the current behavior of the patch.

Ok, you're right.

snoyberg updated this revision to Diff 3944.Aug 27 2015, 11:46 AM
snoyberg edited edge metadata.

Add Note about escape; remove unneeded addFilesToClean

thomie accepted this revision.Aug 27 2015, 11:59 AM
thomie added a reviewer: thomie.

LGTM

bgamari accepted this revision.Aug 29 2015, 4:46 AM
bgamari edited edge metadata.

Looks good to me. I've added a note inline explaining the difference between newTempName and openTempFile.

compiler/main/SysTools.hs
1280

newTempName is correct here. Ideally we'd also expose a openTempFile-like variant in SysTools as well so people aren't tempted to use openTempFile from base.

The difference here is that newTempName will preserve temporary files if GHC is passed -keep-tmp-files whereas openTempFile from base knows nothing about this. This is incredibly important for debugging as it allows you to retrace the steps that the compiler took.

Another option would be to just print the contents of the response file, although this would mean that the command line we print would be inconsistent with what we actually run. Moreover, given that the response files tend to be rather long, I think it using the response file might actually improve debuggability.

Phyx accepted this revision.Aug 30 2015, 8:46 AM
Phyx edited edge metadata.
This revision was automatically updated to reflect the committed changes.

Is there any objection to backporting this to the 7.10 branch?

I have merged this to the stable branch just in case we end up doing a 7.10.3 release. At this point this doesn't look likely, however.