Use the correct return type for Windows' send()/recv() (Fix #12010)
ClosedPublic

Authored by enolan on May 4 2016, 6:10 AM.

Details

Summary

They return signed 32 bit ints on Windows, even on a 64 bit OS, rather than
Linux's 64 bit ssize_t. This means when recv() returned -1 to signal an error we
thought it was 4294967295. It was converted to an int, -1 and the buffer was
memcpy'd which caused a segfault. Other bad stuff happened with send()s.

See also note CSsize in System.Posix.Internals.

Add a test for Trac #12010

Test Plan

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
enolan retitled this revision from to Use the correct return type for Windows' send()/recv() (Fix #12010).May 4 2016, 6:10 AM
enolan updated this object.
enolan edited the test plan for this revision. (Show Details)
enolan updated the Trac tickets for this revision.

Nice catch!

However, if I'm not mistaken the len argument is also an int, no (judging from https://msdn.microsoft.com/en-us/library/windows/desktop/ms740121%28v=vs.85%29.aspx)? We should make sure that is correct as well.

thomie added a reviewer: Phyx.May 4 2016, 7:38 AM

@enolan: nice find.

I have some questions below, resulting mostly from my own ignorance on this matter (calling conventions / windows).

  • Commit 7f3e27157d1f812f3c98179458242f3d7c97bee8 changed the return type for safe_recv_off/c_safe_recv from CInt to CSsize. Now you're changing it back. Was the bug introduced in that commit? Could you check if that commit maybe introduced other bugs as well?
  • Commit cf14ed623f97eaf238dd9a9232c01479b177a40b changed WINDOWS_CCONV to capi for read and write, supposedly fixing bugs introduced in the previously mentioned commit. But they forgot to fix it for recv and send. Would changing WINDOWS_CCONV to capi for recv and send also fix the current bug? What is the difference with your fix?

See also note CSsize in System.Posix.Internals.

{-
Note: CSsize

On Win64, ssize_t is 64 bit, but functions like read return 32 bit
ints. The CAPI wrapper means the C compiler takes care of doing all
the necessary casting.

When using ccall instead, when the functions failed with -1, we thought
they were returning with 4294967295, and so didn't throw an exception.
This lead to a segfault in echo001(ghci).
-}
  • What does it mean for a function to be "like read"? If you know, could you maybe update that comment, and make sure there aren't any other bugs lurking where those functions are called (in base).

About the test you added:

  • Is there any way to test for this bug that doesn't rely on network? We don't install network when running validate, so this test would be skipped basically always.
Phyx added a comment.May 5 2016, 12:54 AM

Interesting.. nice catch @enolan

Wouldn't _read and _write also suffer from the same issue?

I don't quite understand why size_t was being used for these calls...

libraries/base/GHC/IO/FD.hs
611–612

I think the type of _read and _write should also be a signed 32bit int. https://msdn.microsoft.com/en-us/library/wyssk1bs.aspx

Phyx added a comment.May 5 2016, 1:19 AM

I Think I can answer some of these:

In D2170#63001, @thomie wrote:
  • Commit cf14ed623f97eaf238dd9a9232c01479b177a40b changed WINDOWS_CCONV to capi for read and write, supposedly fixing bugs introduced in the previously mentioned commit. But they forgot to fix it for recv and send. Would changing WINDOWS_CCONV to capi for recv and send also fix the current bug? What is the difference with your fix?

I think the reason read and write work with capi is because they are C functions. So they are declared with a cdecl calling convention.

_CRTIMP int __cdecl _read(int _FileHandle,void *_DstBuf,unsigned int _MaxCharCount);

If i'm not mistaken capi is translated to cdecl during codegen. https://github.com/ghc/ghc/blob/5d98b8bf249fab9bb0be6c5d4e8ddd4578994abb/compiler/llvmGen/LlvmCodeGen/CodeGen.hs

recv and send however are Windows API calls. So they're declared as __stdcall

WINSOCK_API_LINKAGE int WSAAPI recv(SOCKET s,char *buf,int len,int flags);

So these can't be made capi as the wrong calling convention would be used. According to the note you posted
this also means that no conversion would be done.

See also note CSsize in System.Posix.Internals.

{-
Note: CSsize

On Win64, ssize_t is 64 bit, but functions like read return 32 bit
ints. The CAPI wrapper means the C compiler takes care of doing all
the necessary casting.

When using ccall instead, when the functions failed with -1, we thought
they were returning with 4294967295, and so didn't throw an exception.
This lead to a segfault in echo001(ghci).
-}
  • What does it mean for a function to be "like read"? If you know, could you maybe update that comment, and make sure there aren't any other bugs lurking where those functions are called (in base).

I think what happened here is a difference between the POSIX definition of read and what Windows defines.
For POSIX the return type is an ssize_t http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
Which explains the CSize usage. I'm guessing capi does a movsx in it's implementation and so corrects the value.

This is rather hidden though... perhaps pointing to this Note on read and write would be useful.

Phyx added a comment.May 5 2016, 1:21 AM
In D2170#63108, @Phyx wrote:

Interesting.. nice catch @enolan

Wouldn't _read and _write also suffer from the same issue?

I don't quite understand why size_t was being used for these calls...

I guess I answered my own question already. So ignore this :)

I do still Wonder why the use of size_t only to convert it to an int in the calling function anyway.

enolan added inline comments.May 6 2016, 11:14 AM
libraries/base/GHC/IO/FD.hs
611–612

Those are defined in System.Posix.Internals and use capi which does the conversion for us.

Phyx added inline comments.May 8 2016, 12:04 PM
libraries/base/GHC/IO/FD.hs
611–612

I guess my question was more of why we use capi and do the conversion still. I am guessing this was done before because we had the same ffi import declaration for both posix and windows.

Since Trac #11223 this is no longer the case since posix imports from read and windows _read. As far as I can see, c_safe_read is only used here.

So I'm wondering why we do int -> ssize_t -> int. Seems like a waste of two casts. Would it be worth while to change this?

In D2170#63001, @thomie wrote:

@enolan: nice find.

I have some questions below, resulting mostly from my own ignorance on this matter (calling conventions / windows).

  • Commit 7f3e27157d1f812f3c98179458242f3d7c97bee8 changed the return type for safe_recv_off/c_safe_recv from CInt to CSsize. Now you're changing it back. Was the bug introduced in that commit? Could you check if that commit maybe introduced other bugs as well?

I tried to checkout that commit and got a tree with only base in it. Not sure what I'm supposed to do.

  • Commit cf14ed623f97eaf238dd9a9232c01479b177a40b changed WINDOWS_CCONV to capi for read and write, supposedly fixing bugs introduced in the previously mentioned commit. But they forgot to fix it for recv and send. Would changing WINDOWS_CCONV to capi for recv and send also fix the current bug? What is the difference with your fix?

I don't understand this very well either. in FD.hs WINDOWS_CCONV is stdcall on x86 and stdcall on x86_64. I'm assuming this is right and Phyx's explanation is correct when applied to 32 bit Windows.

See also note CSsize in System.Posix.Internals.

{-
Note: CSsize

On Win64, ssize_t is 64 bit, but functions like read return 32 bit
ints. The CAPI wrapper means the C compiler takes care of doing all
the necessary casting.

When using ccall instead, when the functions failed with -1, we thought
they were returning with 4294967295, and so didn't throw an exception.
This lead to a segfault in echo001(ghci).
-}
  • What does it mean for a function to be "like read"? If you know, could you maybe update that comment, and make sure there aren't any other bugs lurking where those functions are called (in base).

My guess is it's read, write, recv and send. But that's a guess. Debugging this problem is 90% of my Windows programming experience.

About the test you added:

  • Is there any way to test for this bug that doesn't rely on network? We don't install network when running validate, so this test would be skipped basically always.

I can copy paste the necessary bits from network.

I'll have a revised patch soonish. Late tonight or sometime Monday.

enolan added inline comments.May 8 2016, 12:25 PM
libraries/base/GHC/IO/FD.hs
611–612

Oh, I read it wrong. Didn't realize c_read and c_safe_read were #ifdefd. What is the difference between read and _read?

Phyx added inline comments.May 8 2016, 12:35 PM
libraries/base/GHC/IO/FD.hs
611–612

The posix version read is deprecated. Windows isn't really POSIX compliant so Microsoft decided to move these functions following the C++ spec in order to free up the namespace for user programs. Hence the _ prefixes on POSIX functions on Windows.

Previously the RTS was special casing these calls and we would re-export the symbols from the RTS. This was problematic for a couple of reasons, so we don't do that anymore and require you to handle that at program level. Hence the #ifdef. read happens to have existed before they did the rename, which is why it's still around (for backwards compatibility) but shouldn't be used really.

bgamari requested changes to this revision.May 11 2016, 5:25 AM

@enolan, what is the status of this? It seems like @Phyx might have a point: the casts seem unnecessary.

This revision now requires changes to proceed.May 11 2016, 5:25 AM
enolan added inline comments.May 13 2016, 11:29 PM
libraries/base/GHC/IO/FD.hs
611–612

c_safe_write/c_write/etc would then have different types on different platforms. That's doable of course, but super weird. System.Posix.Internals is visible to client code (but hidden in Haddock) so we'd pass the weirdness down.

Unless you mean there should be c__read on Windows and c_read on Posixes?

Phyx added a comment.May 15 2016, 4:39 AM

Ok, I'm still not entirely convinced that we shouldn't change it. End user programs shouldn't be using internal modules anyway.

That said, since it's only a return value conversion I can live with it, but the input conversions I find potentially harmful.
Could you change those and provide a test that doesn't use network like @thomie suggested?

That should be it then.

libraries/base/GHC/IO/FD.hs
634

The len argument as @bgamari suggested should be a CInt since we can't pass on a 64bit int, so the API if dangerous here.

637

Here as well

I wrote a test that doesn't use network. It uses hsc2hs, and needs to link C code, so I wrote a Makefile. Tests using run_command don't have access to the flags from the way and this only happens in the threaded RTS - it's in blocking*RawBufferPtr. So I modified run_command to pass two new environment variables down: WAY_FLAGS and WAY_RTS_FLAGS. Here's what I have now: https://github.com/enolan/ghc/commit/4184264931fe9676c0450abed757e02ff202beb2. The bash -c is needed because some commands have ;s or &&s in them. It still breaks with commands that have double quotes in them though. I can figure out the shell escaping but I have the feeling there's an easier way to deal with this problem.

Am I doing this the hard way, or is modifying run_command the right solution?

Phyx added a comment.May 15 2016, 11:06 AM
In D2170#64203, @enolan wrote:

I wrote a test that doesn't use network. It uses hsc2hs, and needs to link C code, so I wrote a Makefile. Tests using run_command don't have access to the flags from the way and this only happens in the threaded RTS - it's in blocking*RawBufferPtr. So I modified run_command to pass two new environment variables down: WAY_FLAGS and WAY_RTS_FLAGS. Here's what I have now: https://github.com/enolan/ghc/commit/4184264931fe9676c0450abed757e02ff202beb2. The bash -c is needed because some commands have ;s or &&s in them. It still breaks with commands that have double quotes in them though. I can figure out the shell escaping but I have the feeling there's an easier way to deal with this problem.

Am I doing this the hard way, or is modifying run_command the right solution?

Indeed I don't think you need to modify run_command, you can use only_ways(threaded_ways) to only have the test run with the threaded rts and you can pass along flags from the test file to the makefile. See https://github.com/ghc/ghc/blob/5d98b8bf249fab9bb0be6c5d4e8ddd4578994abb/testsuite/tests/ghci/prog002/prog002.T if it's not already in TEST_HC_OPTS.

enolan updated this revision to Diff 7628.May 17 2016, 2:59 PM

Review fixes

  • Modified the types of _read and _write as well
  • Corrected type of input length in recv/send
  • Added a test that doesn't use network
enolan updated this revision to Diff 7629.May 17 2016, 3:03 PM

Comments

Phyx accepted this revision.May 18 2016, 4:47 PM

Thanks @enolan !

The fix is much appreciated!

This revision was automatically updated to reflect the committed changes.