Fix gcc.exe: error: CreateProcess: No such file or directory
ClosedPublic

Authored by angerman on May 31 2018, 11:37 PM.

Details

Summary

When GHC links binaries on windows, we pass a -L and -l flag
to gcc for each dependency in the transitive dependency
closure. As this will usually overflow the command argument
limit on windows, we use response files to pass all arguments
to gcc. gcc however internally passes only the -l flags via
a response file to the collect2 command, but puts the -L flags
on the command line. As such if we pass enough -L flags to
gcc--even via a response file--we will eventually overflow the
command line argument length limit due to gcc passing them
to collect2 without resorting to a response file.

To prevent this from happening we move all lirbaries into a
shared temporary folder, and only need to pass a single -L
flag to gcc. Ideally however this was fixed in gcc.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
angerman created this revision.May 31 2018, 11:37 PM
Phyx added a comment.Jun 1 2018, 3:49 PM

I think this shouldn't be added for a number of reasons, among others:

  • You're now going to do several I/O calls extra for each compilation. This can be very expensive, especially on filesystems like NTFS that are slow for small file copies. Things get even worse for large files, and cross disk copies. E.g. For my my temp directory is on another disk. So I will always incur a large hit. The larger the library the larger the silent new wait time. If you don't think this is a problem, it is. This is why things like thin archives were invented for C compilers. And file copies with Directory always go through the GHC I/O manager. In other words, they are slow and there's no way the OS can ever optimize them.
  • And finally it also doesn't solve the issue where cc1 instead of collect2 overflows due to having too many -I. It doesn't even solve the linking problem completely, it doesn't account for import libraries or the fact that libraries don't always carry the lib prefix on Windows (So a dynamic way Windows build will immediately trigger the issue again.) It also doesn't solve the fact that user specified library directories can still trigger the issue.

Let's please not add any more hacks to the code-base, especially when the fix is literally two lines upstream.
Since people seem reluctant to file bugs for GCC, I have gone ahead and done so https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86030

In D4762#131532, @Phyx wrote:

I think this shouldn't be added for a number of reasons, among others:

  • You're now going to do several I/O calls extra for each compilation. This can be very expensive, especially on filesystems like NTFS that are slow for small file copies. Things get even worse for large files, and cross disk copies. E.g. For my my temp directory is on another disk. So I will always incur a large hit. The larger the library the larger the silent new wait time. If you don't think this is a problem, it is. This is why things like thin archives were invented for C compilers. And file copies with Directory always go through the GHC I/O manager. In other words, they are slow and there's no way the OS can ever optimize them.

Well, there's a reason why it's hidden behind a flag. As this only happens for rather dependency-complex packages, and cabal as well as stack allow to provide flags per package, this allows to only use this approach on the fat end.

  • And finally it also doesn't solve the issue where cc1 instead of collect2 overflows due to having too many -I. It doesn't even solve the linking problem completely, it doesn't account for import libraries or the fact that libraries don't always carry the lib prefix on Windows (So a dynamic way Windows build will immediately trigger the issue again.) It also doesn't solve the fact that user specified library directories can still trigger the issue.

Right, this is a rather poor and quick hack. I'd appreciate suggestions on how to improve this.

Let's please not add any more hacks to the code-base, especially when the fix is literally two lines upstream.
Since people seem reluctant to file bugs for GCC, I have gone ahead and done so https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86030

I'd be happy to fix this upstream as well, and if you can help me out a little where to look for what in gcc, that would be much appreciated. I agree with the sentiment thought; and I *have* patched llvm in the past precisely for this reason. This however still means we need to wait for gcc-9 until this fix is available, and as such compiling complex code bases on windows will be broken up to the first ghc release that ships with gcc-9, or am I missing something?

Another option is to teach stack and cabal, not to put libraries into separate directories but instead put them all in the same library folder. I doubt we'd have conflicts, as the library names already include the abi hash, and name. I know where would be some push back at least on the cabal side, as it would move libraries outside of their store, and as such break the "everything for this package is contained within this folder" approach.

erikd added a subscriber: erikd.Jun 4 2018, 3:43 AM

I would love to hear other suggestions on fixing this issue, because its killing us on the Cardano code base. We have a large number of PRs blocked because we can't build on Windows.

Phyx added a comment.Jun 13 2018, 1:55 PM
In D4762#131532, @Phyx wrote:

I think this shouldn't be added for a number of reasons, among others:

  • You're now going to do several I/O calls extra for each compilation. This can be very expensive, especially on filesystems like NTFS that are slow for small file copies. Things get even worse for large files, and cross disk copies. E.g. For my my temp directory is on another disk. So I will always incur a large hit. The larger the library the larger the silent new wait time. If you don't think this is a problem, it is. This is why things like thin archives were invented for C compilers. And file copies with Directory always go through the GHC I/O manager. In other words, they are slow and there's no way the OS can ever optimize them.

Well, there's a reason why it's hidden behind a flag. As this only happens for rather dependency-complex packages, and cabal as well as stack allow to provide flags per package, this allows to only use this approach on the fat end.

  • And finally it also doesn't solve the issue where cc1 instead of collect2 overflows due to having too many -I. It doesn't even solve the linking problem completely, it doesn't account for import libraries or the fact that libraries don't always carry the lib prefix on Windows (So a dynamic way Windows build will immediately trigger the issue again.) It also doesn't solve the fact that user specified library directories can still trigger the issue.

Right, this is a rather poor and quick hack. I'd appreciate suggestions on how to improve this.

Let's please not add any more hacks to the code-base, especially when the fix is literally two lines upstream.
Since people seem reluctant to file bugs for GCC, I have gone ahead and done so https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86030

I'd be happy to fix this upstream as well, and if you can help me out a little where to look for what in gcc, that would be much appreciated. I agree with the sentiment thought; and I *have* patched llvm in the past precisely for this reason. This however still means we need to wait for gcc-9 until this fix is available, and as such compiling complex code bases on windows will be broken up to the first ghc release that ships with gcc-9, or am I missing something?

It doesn't, once the patch is accepted upstream it can be back-ported to the mingw ports and respun. mingw-w64 builds always have custom patches in them. And this is one I think it'd happily take.
The patch to fix this is actually quite simple.

This method right here https://github.com/gcc-mirror/gcc/blob/39bc186ec4a2109b8572427ab0599973542d9c39/gcc/gcc.c#L5198 do_spec_1 handles the driver code for GCC. If you look at the 'o' case (%o is the spec code for all object files)
it does this https://github.com/gcc-mirror/gcc/blob/39bc186ec4a2109b8572427ab0599973542d9c39/gcc/gcc.c#L5662

so it calls

if (at_file_supplied)
   open_at_file ();

at the start and at the end

if (at_file_supplied)
   close_at_file ();

and that's it. All that needs to be done is a similar case added for %D and %I`. Though these are controlled by spec_path.

Another option is to teach stack and cabal, not to put libraries into separate directories but instead put them all in the same library folder. I doubt we'd have conflicts, as the library names already include the abi hash, and name. I know where would be some push back at least on the cabal side, as it would move libraries outside of their store, and as such break the "everything for this package is contained within this folder" approach.

In any case, I'm not blocking this from going in, I just think it's equally easy to fix upstream, and mingw-w64 has more releases than GHC anyways.

Phyx added a comment.Jun 13 2018, 1:58 PM
In D4762#132039, @erikd wrote:

I would love to hear other suggestions on fixing this issue, because its killing us on the Cardano code base. We have a large number of PRs blocked because we can't build on Windows.

The Correct fix is just fixing the small bug in the GCC driver. This approach won't give you something that works out of the box, and it has caveats that a user needs to understand. I'm OK with it
going in, don't think it should be documented (which it hasn't been).

Phyx added a comment.Jun 13 2018, 4:19 PM

Actually, as of two days ago it's even simpler thanks to https://github.com/gcc-mirror/gcc/commit/39bc186ec4a2109b8572427ab0599973542d9c39

I believe the only thing needed now is to wrap end_outgoing_arg https://github.com/gcc-mirror/gcc/blob/39bc186ec4a2109b8572427ab0599973542d9c39/gcc/gcc.c#L4852 in the two stubs I mentioned above when it's making the call to store_arg. So the patch should be 4 lines.

Howdy @Phyx, I actually did try my luck on GHC 7.3; but this ended up being really messy. And then I have no idea how to build gcc/mingw on windows properly.

In D4762#133575, @Phyx wrote:

Actually, as of two days ago it's even simpler thanks to https://github.com/gcc-mirror/gcc/commit/39bc186ec4a2109b8572427ab0599973542d9c39

I believe the only thing needed now is to wrap end_outgoing_arg https://github.com/gcc-mirror/gcc/blob/39bc186ec4a2109b8572427ab0599973542d9c39/gcc/gcc.c#L4852 in the two stubs I mentioned above when it's making the call to store_arg. So the patch should be 4 lines.

Actually, unless I'm reading https://github.com/gcc-mirror/gcc/commit/39bc186ec4a2109b8572427ab0599973542d9c39 wrong. That thing does in fact fix the issue, specifically for L and I.

Interestingly the approach is somewhat similar to what I tired to hack up.

The %@{L*} spec should as far as I understand put all -L arguments into an @file if gcc was provided with one.

Phyx added a comment.Jun 14 2018, 5:50 AM
In D4762#133575, @Phyx wrote:

Actually, as of two days ago it's even simpler thanks to https://github.com/gcc-mirror/gcc/commit/39bc186ec4a2109b8572427ab0599973542d9c39

I believe the only thing needed now is to wrap end_outgoing_arg https://github.com/gcc-mirror/gcc/blob/39bc186ec4a2109b8572427ab0599973542d9c39/gcc/gcc.c#L4852 in the two stubs I mentioned above when it's making the call to store_arg. So the patch should be 4 lines.

Actually, unless I'm reading https://github.com/gcc-mirror/gcc/commit/39bc186ec4a2109b8572427ab0599973542d9c39 wrong. That thing does in fact fix the issue, specifically for L and I.

Interestingly the approach is somewhat similar to what I tired to hack up.

The %@{L*} spec should as far as I understand put all -L arguments into an @file if gcc was provided with one.

Yup, I was looking into that this morning as well, the patch should fix the issue https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01172.html the patch only adds the user specified include directories though. The changes I mentioned would have also forced the compiler derived ones, such as the multilib directories. But that's not a big issue.

Now it just needs to be backported to the mingw-w64 release and kindly ask them to re-spin the binaries before 8.6 release and we can include it.

bgamari requested changes to this revision.Jun 15 2018, 3:35 PM

It sounds like this shouldn't be needed any longer. Thanks @Phyx!

This revision now requires changes to proceed.Jun 15 2018, 3:35 PM
Phyx added a comment.EditedJun 18 2018, 1:49 PM

@bgamari what's the timeline for updating the GCC toolchain for 8.6? I assume since you've already branched you're close to a release?

In that case I'd accept this patch for that branch and we can revert it for 8.8. the GCC patch has been merged into mingw-w64 however
they're having some trouble building GCC 8 due to some Ada build failures... and the Ada maintainers aren't being very helpful.

I've started looking at it as well but I'm stretched a bit thin here and don't know when it'll be solved.

It's far from ideal but at least @erikd et all would be able to (slowly) build their projects again. The flag is undocumented so we can remove it without issue I think.

bgamari accepted this revision.Jun 18 2018, 2:41 PM
In D4762#134268, @Phyx wrote:

@bgamari what's the timeline for updating the GCC toolchain for 8.6? I assume since you've already branched you're close to a release?

In that case I'd accept this patch for that branch and we can revert it for 8.8. the GCC patch has been merged into mingw-w64 however
they're having some trouble building GCC 8 due to some Ada build failures... and the Ada maintainers aren't being very helpful.

I've started looking at it as well but I'm stretched a bit thin here and don't know when it'll be solved.

It's far from ideal but at least @erikd et all would be able to (slowly) build their projects again. The flag is undocumented so we can remove it without issue I think.

Alright, given the circumstances I'm fine with merging this to ghc-8.6.

This revision is now accepted and ready to land.Jun 18 2018, 2:41 PM
Closed by commit rGHC227ede4aa0b6: Fix gcc.exe: error: CreateProcess: No such file or directory (authored by Moritz Angermann <moritz.angermann@gmail.com>, committed by bgamari). · Explain WhyJun 20 2018, 10:52 AM
This revision was automatically updated to reflect the committed changes.