Remove MAX_PATH restrictions from RTS, I/O manager and various utilities
ClosedPublic

Authored by Phyx on Feb 16 2018, 10:15 PM.

Details

Summary

This shims out fopen and sopen so that they use modern APIs under the hood
along with namespaced paths.

This lifts the MAX_PATH restrictions from Haskell programs and makes the new
limit ~32k.

There are only some slight caveats that have been documented.

Some utilities have not been upgraded such as lndir, since all these things are
different cabal packages I have been forced to copy the source in different places
which is less than ideal. But it's the only way to keep sdist working.

Test Plan

./validate

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
Phyx created this revision.Feb 16 2018, 10:15 PM
Phyx added a comment.Feb 16 2018, 10:18 PM

Example:

> .\inplace\bin\ghc-stage2.exe -package-db R:\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\foo-db -v3
Glasgow Haskell Compiler, Version 8.5.20180202, stage 2 booted by GHC version 8.0.1
Using binary package database: E:\ghc-dev\msys64\home\Tamar\ghc\inplace\lib\package.conf.d\package.cache
Using binary package database: R:\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\foo-db\package.cache
package flags []
loading package database E:\ghc-dev\msys64\home\Tamar\ghc\inplace\lib\package.conf.d
loading package database R:\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\tFLqpQRCHmWV7mBMKwOh\foo-db
wired-in package ghc-prim mapped to ghc-prim-0.5.2.0
wired-in package integer-gmp mapped to integer-gmp-1.0.1.0
wired-in package base mapped to base-4.11.0.0
wired-in package rts mapped to rts
wired-in package template-haskell mapped to template-haskell-2.13.0.0
wired-in package ghc mapped to ghc-8.5
wired-in package dph-seq not found.
wired-in package dph-par not found.
*** Deleting temp files:
Deleting:
*** Deleting temp dirs:
Deleting:
ghc-stage2.exe: no input files
Usage: For basic information, try the `--help' option.
Phyx updated this revision to Diff 15475.Feb 16 2018, 10:35 PM
  • Add missing files
  • update notes
Phyx updated this revision to Diff 15476.Feb 16 2018, 10:52 PM
  • fix unix builds
Phyx updated this revision to Diff 15477.Feb 16 2018, 11:13 PM
  • fix non-windows headers

Yay! That being said, I'm really not pleased with the repetition here. I understand we can't use symlinks. Perhaps we could instead teach the build system to copy the sources into place?

docs/users_guide/8.6.1-notes.rst
152

The mark-up here needs to be fixed here (e.g. markdown-style monospace syntax is used. I have fixed this up in D4420.

Phyx added a comment.Feb 18 2018, 5:37 PM

Yay! That being said, I'm really not pleased with the repetition here. I understand we can't use symlinks. Perhaps we could instead teach the build system to copy the sources into place?

Yes I must admit that part of me died having to copy the file. I avoided doing build system changes since I didn't want to touch two build systems. But I'll just bite the bullet :)
That said, it is sub-optimal, since it means in order for you to be able to do an sdist for the cabal packages we put on hackage, you need to have a full build.

Phyx added inline comments.Feb 18 2018, 5:44 PM
docs/users_guide/8.6.1-notes.rst
152

Thanks!

In general it sounds like a good plan, though I'm not familiar with the details of NT paths. I'm amazed and saddened at the lengths upper layers have to go to to work around the legacy backwards compatibility in lower layers though.

libraries/base/include/HsBase.h
527–528

This INLINE stuff isn't buying us anything these days, since we end up calling the static verisons of the functions from the Haskell code anyway. It's a relic from when we regularly compiled via C and got to take advantage of inline C functions. Maybe one day someone should clean this up...

Phyx added a comment.EditedFeb 20 2018, 3:14 AM

In general it sounds like a good plan, though I'm not familiar with the details of NT paths. I'm amazed and saddened at the lengths upper layers have to go to to work around the legacy backwards compatibility in lower layers though.

The newer builds of Windows 10 allow you to "opt in" to a similar trick in the API layer, where opt-in can be as simple as just adding a flag to your exe's manifest. It has the same requirements though that you have to be calling the wide variants of the APIs.
So for the posix compatible functions you have to call the Microsoft extensions, e.g. fwopen instead of fopen, and for the normal Win32 API functions CreateFileW instead of CreateFileA. But the opt-in would have limited the fix to only a small subset of Windows 10 users.

The I/O manager branch of mine treats paths by default as NT paths, so the rewriting of paths are not as extensive (e.g. prevents relative paths from reaching the IO functions to begin with etc)0 and filepath and directory are already namespace aware. So everything should just work :)

Phyx added a comment.Feb 21 2018, 5:28 PM

So we have 4 options here, and they all king of suck..

  1. copy the source files to each component as this patch does. Has code duplication but the code is unlikely to change.
  2. put the files in a top level folder and refer to them. downside: this means cabal files refer to locations outside the package, which breaks sdist.
  3. copy source files to correct folder using boot script. Downside: can't do sdist without running boot. but that's at least fast. copying with make rules also possible but means you need a full build before sdist.
  4. put the files in a package and add a dependency to the package. Downside: need to modify two build systems, convert utilities to cabal packages, and every package gets a new dependency that also needs to be on hackage.

So @bgamari any particular brand of poison you want? :)

In D4416#123201, @Phyx wrote:

So we have 4 options here, and they all king of suck..

  1. copy the source files to each component as this patch does. Has code duplication but the code is unlikely to change.

Famous last words ;)

  1. put the files in a top level folder and refer to them. downside: this means cabal files refer to locations outside the package, which breaks sdist.

Right, I'd like to avoid this given that @hvr has been trying to make GHC components actual working Cabal packages.

  1. copy source files to correct folder using boot script. Downside: can't do sdist without running boot. but that's at least fast. copying with make rules also possible but means you need a full build before sdist.

Well, we could also do this in configure, no? Moreover, I'm not convinced that doing this in make would require a full build. Especially given that make will become shake soon.

  1. put the files in a package and add a dependency to the package. Downside: need to modify two build systems, convert utilities to cabal packages, and every package gets a new dependency that also needs to be on hackage.

Yeah, I'm not sure this one pays well.

So @bgamari any particular brand of poison you want? :)

IMHO the best option among the terrible options before us is (3), implemented in the build system.

Phyx updated this revision to Diff 15787.Mar 19 2018, 4:00 PM
  • rebase
  • implement file copy.
Phyx retitled this revision from Remove MAX_PATH restrictions from RTS and I/O manager to Remove MAX_PATH restrictions from RTS, I/O manager and various utilities.Mar 19 2018, 4:47 PM
bgamari accepted this revision.Mar 19 2018, 6:06 PM

Awesome. Thanks for backing out the copies!

This revision is now accepted and ready to land.Mar 19 2018, 6:06 PM
Phyx planned changes to this revision.Mar 19 2018, 11:31 PM

arggg, of course OSX has a different cp :/ need to fix.

Phyx updated this revision to Diff 15795.Mar 20 2018, 5:44 AM
  • Use ln directly so OSX builds.
This revision is now accepted and ready to land.Mar 20 2018, 5:44 AM
Phyx updated this revision to Diff 15807.Mar 21 2018, 9:29 PM
  • Include stdio.h
Phyx updated this revision to Diff 15808.Mar 21 2018, 9:41 PM
  • Correct linux api
Phyx updated this revision to Diff 15817.Mar 23 2018, 5:27 AM
  • Fix failing nix tests.
Phyx updated this revision to Diff 15822.Mar 23 2018, 11:44 AM
  • declare base symbol as weak as well
Phyx updated this revision to Diff 15825.Mar 24 2018, 7:12 AM
  • Fix include failure.

Arg, this is quite unfortunate: the linkwhole test now fails due to symbol redefinition in libHSrts and libHSbase:

=====> linkwhole(normal) 1 of 1 [0, 0, 0]
cd "./driver/linkwhole/linkwhole.run" && $MAKE -s --no-print-directory linkwhole  
Wrong exit code for linkwhole()(expected 0 , actual 2 )
Stdout ( linkwhole ):
Makefile:14: recipe for target 'linkwhole' failed
Stderr ( linkwhole ):
/usr/bin/ld.gold: error: /mnt/work/ghc/ghc-testing/rts/dist/build/libHSrts.a(fs.o): multiple definition of '__hs_fopen'
/usr/bin/ld.gold: /mnt/work/ghc/ghc-testing/libraries/base/dist-install/build/libHSbase-4.11.0.0.a(fs.o): previous definition here
collect2: error: ld returned 1 exit status
`gcc' failed in phase `Linker'. (Exit code: 1)
make[2]: *** [linkwhole] Error 1
*** unexpected failure for linkwhole(normal)
Phyx planned changes to this revision.Mar 26 2018, 4:00 AM

Arg, this is quite unfortunate: the linkwhole test now fails due to symbol redefinition in libHSrts and libHSbase:

=====> linkwhole(normal) 1 of 1 [0, 0, 0]
cd "./driver/linkwhole/linkwhole.run" && $MAKE -s --no-print-directory linkwhole  
Wrong exit code for linkwhole()(expected 0 , actual 2 )
Stdout ( linkwhole ):
Makefile:14: recipe for target 'linkwhole' failed
Stderr ( linkwhole ):
/usr/bin/ld.gold: error: /mnt/work/ghc/ghc-testing/rts/dist/build/libHSrts.a(fs.o): multiple definition of '__hs_fopen'
/usr/bin/ld.gold: /mnt/work/ghc/ghc-testing/libraries/base/dist-install/build/libHSbase-4.11.0.0.a(fs.o): previous definition here
collect2: error: ld returned 1 exit status
`gcc' failed in phase `Linker'. (Exit code: 1)
make[2]: *** [linkwhole] Error 1
*** unexpected failure for linkwhole(normal)

Yeah, I tried to do a workaround by using weak symbols, while that worked for MacOS it completely broke Linux. I'm setting up a Linux vm so I can iterate a bit faster to fix it.

Phyx updated this revision to Diff 15874.Mar 27 2018, 2:04 PM
  • Fixed linux
  • whitespaces
  • fix changelog

*sigh* It's far from pretty, but it works. This moves the copies of the functions in the rts to a different namespace.

This revision is now accepted and ready to land.Mar 27 2018, 2:04 PM
Phyx updated this revision to Diff 15875.Mar 27 2018, 6:40 PM
  • Fix typo in windows definitions
Phyx updated this revision to Diff 15885.Mar 29 2018, 6:34 AM
  • Correct pathopen for Windows, all builds should be fixed now.
Phyx updated this revision to Diff 15888.Mar 29 2018, 8:27 AM
  • revert hadrian submodule
This revision was automatically updated to reflect the committed changes.