Fix hWaitForInput not being interruptible (#8684).
Needs ReviewPublic

Authored by nh2 on Jul 6 2014, 6:47 PM.

Details

Reviewers
simonmar
hvr
bgamari
erikd
syd
austin
mpickering
Group Reviewers
GHC
Trac Issues
#8684
Summary
  • Make fdReady a ccall interruptible
  • Move EINTR loop from C into Haskell so that Haskell code can run between interruptions
  • Fix ccall interruptible not being interruptible when calling fdReady() on Windows, by using an Event HANDLE to signal interruption
  • Call interruptible between fdReady() calls so that exceptions have a chance to be delivered.
  • Call yield on the non-threaded RTS between fdReady() calls so that functions like timeout get a chance to produce their exceptions.
  • Make the context switch timer call interruptOSThread() to interrupt blocking IO in the non-threaded RTS on platforms where the timer signal doesn't do that automatically.

This also fixes Ctrl+C not terminating cancelling Haskell
programs that are stuck in hWaitForInput.
Until now, one had to force a "dirty exit" with double-Ctrl+C;
with this commit, this is no longer necessary.

Note that this commit, for the first time, makes context-switching
work in the non-threaded RTS in the presence blocking syscalls
on Windows, OSX and iOS.

I've verified these improvements on Linux, Windows, and OSX,
in both -threaded and non-threaded.

I've also added commentary in handle_tick() that explains the
reasoning behind this, as an outcome of the discussion on
https://phabricator.haskell.org/D42.

To get access to the information whether or not we need to call
pthread_kill() via interruptOSThread() in the context switch timer,
that is, USE_PTHREAD_FOR_ITIMER, I had to refactor its logic into a
new file rts/Itimer.h. I also changed USE_PTHREAD_FOR_ITIMER
from a defined/undefined macro to an always-defined 1/0 macro,
so that failure to inlude the right header would result in an
error (via -Werror,-Wundef) instead of it silently being undefined
and the wrong code path being taken.

With original contribution from Nandor Licker.

Test Plan

Diff Detail

nh2 updated this revision to Diff 91.Jul 6 2014, 6:47 PM
nh2 retitled this revision from to hWaitForInput cannot be interrupted by async exceptions on unix.
nh2 updated this object.
nh2 edited the test plan for this revision. (Show Details)
nh2 added reviewers: simonmar, GHC, austin.
nh2 set the repository for this revision to rGHC Glasgow Haskell Compiler.
nh2 added a project: GHC.
simonmar edited edge metadata.Jul 6 2014, 7:55 PM

I don't think disabling the SIGVTALRM signal for the duration of the call is a good idea, the signal is important for various things.

You could move the EINTR loop into Haskell, which is what we do for other syscalls. That should work, but care will be needed to keep track of the remaining time correctly.

simonmar requested changes to this revision.Jul 6 2014, 7:55 PM
simonmar edited edge metadata.
This revision now requires changes to proceed.Jul 6 2014, 7:55 PM

@nh2, do you suppose you will continue this? If not, could you mark it as abandoned?

nh2 added a comment.Jul 31 2015, 4:25 AM

@bgamari Yes, I would very much like to get this finished, but I haven't found the time to do so yet.

@simonmar, so if I understand you right you'd keep SIGVTALRM on during the syscall, return which signal caused the interrupt from the C bits and do the if (errno != EINTR ) check and loop on the Haskell side?

@simonmar Another question: Why does the existing code work on platforms that are not Linux? In my select man page it says:

On Linux, select() modifies timeout to reflect the amount of  time  not
slept;  most  other implementations do not do this.  (POSIX.1-2001 per‐
mits either behavior.)  This causes problems both when Linux code which
reads  timeout  is  ported to other operating systems, and when code is
ported to Linux that reuses a struct timeval for multiple select()s  in
a  loop  without  reinitializing  it.  Consider timeout to be undefined
after select() returns.

The existing select loop seems to rely on the fact that &tv is updated as described here.

1st question: yes
2nd question: perhaps it is broken on other platforms?

bgamari updated the Trac tickets for this revision.Mar 29 2017, 11:04 AM
nh2 added a comment.Aug 4 2017, 12:59 PM
In D42#30579, @simonmar wrote:

2nd question: perhaps it is broken on other platforms?

It was broken on other platforms; I filed https://ghc.haskell.org/trac/ghc/ticket/13497 for it. By now, select() was replaced by poll() mitigating the issue for platforms that have it, and the linked ticket addresses the remaining problems in that direction.

I'll continue working on this issue.

nh2 added a comment.Aug 4 2017, 5:22 PM

I have an updated patch now but it doesn't work for the nonthreaded runtime -- any ideas why? https://ghc.haskell.org/trac/ghc/ticket/8684#comment:15

austin resigned from this revision.Nov 6 2017, 4:01 PM
nh2 updated this revision to Diff 14820.Nov 25 2017, 5:21 PM

3 years after opening this, D42 is back!

The improved patch that also works for nonthreaded.

I've tested it on Linux and Windows so far.

nh2 added inline comments.Nov 25 2017, 6:24 PM
libraries/base/Control/Concurrent.hs
412

I'd need some feedback on whether this is true, probably from @simonmar

nh2 updated this revision to Diff 14821.Nov 25 2017, 6:55 PM

Update commit message with good news

nh2 updated this revision to Diff 14823.Nov 25 2017, 7:05 PM
nh2 retitled this revision from hWaitForInput cannot be interrupted by async exceptions on unix to Fix hWaitForInput not being interruptible (#8684)..
nh2 edited the summary of this revision. (Show Details)
nh2 edited the test plan for this revision. (Show Details)
nh2 added a reviewer: syd.
nh2 added a subscriber: syd.

Updated description from commit message

nh2 added a comment.Nov 26 2017, 9:33 AM

TODO for myself: I need to update the user's guide section Interruptible foreign calls with the information about the newly added events.

nh2 updated this revision to Diff 14825.Nov 26 2017, 11:53 AM

Also update user's guide

simonmar requested changes to this revision.Nov 27 2017, 3:44 AM

Mostly this looks reasonable to me, thanks for doing all this! One question inline.

libraries/base/Control/Concurrent.hs
412

I think I agree.

rts/Timer.c
47

This isn't necessarily true, if we're implementing timer signals via timerfd() on Linux then the timer doesn't cause EINTR.

61

It's not clear to me why you need this. If we need to interrupt the syscall then the RTS will call interruptOSThread(). Why do we also need to do it in handle_tick()?

This revision now requires changes to proceed.Nov 27 2017, 3:44 AM
nh2 added a comment.Nov 27 2017, 10:58 AM

For easier testing, I have provided binary distributions of this change backported to GHC 8.0.2 here: https://github.com/nh2/ghc/releases/tag/ghc-8.0.2-bug-8684-interruptible-hWaitForInput-iohk-2017-11-26

And I'm currently creating some backported to 8.2.2 as well.

libraries/base/Control/Concurrent.hs
412

@simonmar I have a patch that does this in https://github.com/nh2/ghc/commit/6260b53a78d6953ed3486a69db8e02a6d71e0fc6

Do you think it's best to merge that into this one, or to land it as a separate Diff?

rts/Timer.c
47

I wasn't aware of that, will take a look.

61

@simonmar I think we need this for the non-threaded RTS (thus !definied(THREADED_RTS)) to give the Haskell code some choice to run.

You're right that when the RTS wants to interrupt a ccall interruptible, it will call interruptOSThread() and that one will already call SetEvent(). But it will only want to interrupt when there is already an exception produced that necessitates this interrupt.

But when there is no exception produced yet, and the single thread we have is stuck in a foreign WaitForMultipleObjects(), then our timer signal handler handle_tick() here will just run run through without seeing a need to interrupt anything, without calling interruptOSThread(), and return control back to the foreign call.

In that case, no Haskell code will be run, and thus functions like timeout, or other code based on threadDelay ... >> throwTo will not have the chance to produce an exception. Then we'll stay in WaitForMultipleObjects() forever, only occasionally entering handle_tick() which will run through and return.

At least that is my understanding of it.

On Unix, we don't have this problem: Already the occurrence of the timer signal being a signal will have interrupted syscalls with EINTR and thus yielded control back to Haskell code (perhaps not in the case of timerfd() that you mentioned above, I wasn't aware of that possibility, will take a look and report back).

So one could say that this SetEvent() here makes the timer signal have the same effect on Windows as it has on Unix: to force all syscalls to return to Haskell, even if there's no exception yet.

Does that make sense (modulo the timerfd() thing that I still need to look into)?

If yes, I will update the comment accordingly.

nh2 added inline comments.Nov 28 2017, 11:53 AM
rts/Timer.c
47

OK, I've taken a look now.

I think what I say is true for the non-threaded RTS.

Because timerfd() is used only used in itimer/Pthread.c, which happens only when

#if defined(USE_PTHREAD_FOR_ITIMER)
#include "itimer/Pthread.c"

which happens only if we're in the threaded RTS:

#if defined(linux_HOST_OS) && defined(THREADED_RTS) && HAVE_SYS_TIMERFD_H
#define USE_PTHREAD_FOR_ITIMER

I should update the comment to say in the non-threaded RTS.

simonmar added inline comments.Nov 28 2017, 2:17 PM
libraries/base/Control/Concurrent.hs
412

separate diff is fine.

rts/Timer.c
61

All of this makes me wonder why we aren't implementing hWaitForInput using the RTS's internal IO manager for the non-threaded RTS. After all, that's how we do all the other concurrent I/O support in the non-threaded RTS. We would need a primop waitReadWithDelay#. Maybe that's more complicated than what you have here.

Personally I'm also ok with just saying that you need the threaded RTS. Are you concerned about embedded environments or something where running the threaded RTS is problematic?

Anyway, back to this particular change. It feels like a bit of a hack, but if we really must have it, I could feel better about it if it were part of the context switch logic (just below) rather than happening on every timer signal.

nh2 added inline comments.Nov 29 2017, 11:51 AM
rts/Timer.c
61

@simonmar

All of this makes me wonder why we aren't implementing hWaitForInput using the RTS's internal IO manager for the non-threaded RTS. After all, that's how we do all the other concurrent I/O support in the non-threaded RTS. We would need a primop waitReadWithDelay#. Maybe that's more complicated than what you have here.

That would do, but I'd like the interruption of foreign calls to work in nonthreaded in general for FFI calls; hWaitForInput is the most prominent use case of that infrastructure so that's how I discovered these problems, but if I wanted to use any other FFI function beyond this specific one, it'd like that to work as well, in the same reliable fashion across operating systems and threaded/nonthreaded runtimes.

I'd certainly not be opposed to add primeops to improve special cases within ghc such as hWaitForInput/fdReady, but it sounds a bit like an orthogonal optimisation to me from the perspective that I'd like to have the general FFI-interruptibility work as it does on Unix. It also seems to be of way bigger scope than my little improvement here, and I have heard that some people are looking at bringing IOCP to Windows-GHC and that way making a high-performance IO manager for Windows as we have for Linux, so maybe that would provide a general overhaul of how those things work in the future.

Personally I'm also ok with just saying that you need the threaded RTS. Are you concerned about embedded environments or something where running the threaded RTS is problematic?

Yes, I rely on the nonthreaded RTS a lot, so I would love if all programs that work in the threaded RTS don't have totally different behaviour or a broken in the non-threaded RTS.

Some key reasons for non-threaded for me are:

  • Haskell programs are much easier to debug in the non-threaded RTS (especially with strace)
  • The non-threaded RTS can provide lower latency and (probably as a consequence) better performance with some IO/networking workloads. For some clients' high-performance use cases we used the non-threaded runtime only to get big improvements without big modifications.

Anyway, back to this particular change. It feels like a bit of a hack, but if we really must have it, I could feel better about it if it were part of the context switch logic (just below) rather than happening on every timer signal.

So that move would be equivalent to saying that the rate at which FFI calls are woken up to give Haskell a chance to run is controlled by +RTS -C as opposed to +RTS -V, right?

That sounds OK to me.

I guess it would also be consistent for it to be inside RtsFlags.ConcFlags.ctxtSwitchTicks > 0, since that way, if context switching is disabled, we won't interrupt the FFI IO for the purpose of making some Haskell threads run?

The only thing that might be an argument against this plan is that then things would work differently between Unix and Windows. With the approach of the IO interrupt happening unconditionally in the signal handler, we'd ensure equal nonthreaded behaviour on Unix and Windows (you could rely on every timer tick waking up your FFI code). With the approach of putting it into the context switch logic, IO interrupts would happen more rarely on Windows nonthreaded than they happen on Unix nonthreaded. I guess one could be in favour of that difference by saying "our Windows implementation is simply be more efficient, as it doesn't suffer from the 'caveat' that the timer signal interrupts IO unconditionally" as it does on Unix.

Whatever we choose to do here, I'd like to document our rationale in the commentary, so please let me know what you think about these pros/cons.

It feels like a bit of a hack

Also useful to know would be: What about it feels like a hack to you? If I were to write a single-threaded runtime offering a feature that it can cooperatively schedule its in-language green threads even while user-provided FFI calls are still going (e.g. if I wanted to add such a feature to the nodejs or cpython runtimes), this way (timer + interrupting the IO, like you already do for Unix and how this proposes to do for Windows) is exactly how I would go about it even if doing it from clean-slate.

Another thing that occurred to me yesterday night: To really get the desired give-Haskell-code-a-chance-to-run behaviour I described, I'd have to call not only SetEvent(rts_interruptOSThreadEvent()); but also CancelSynchronousIo(). That is, I should directly call interruptWorkerTask() from here, which does exactly those two things.

It feels like a bit of a hack

Well, if we're thinking about this as a way to context-switch away from (interruptible) FFI calls in the non-threaded RTS, then I think it's OK. We don't need this functionality in the threaded RTS because the scheduler can run other threads concurrently with the FFI call.

On the other hand it felt like something of a hack because there are just things that you fundamentally can't do with the non-threaded RTS, when you need concurrency with an FFI call that can't be made interruptible (because it's in some third-party library, and can't be made to interact with our interruptible support). So this feels like a partial solution.

But OK, I don't object to doing this, and it's fairly easy to document in terms of context-switching.

The only thing that might be an argument against this plan is that then things would work differently between Unix and Windows

I think the fact that the timer signal causes system calls to return EINTR on Unix is an unintentional side-effect, not something we want to rely on. If we could avoid doing that, we would (and we did with the threaded RTS). So arguably we *should* be doing interruptOSThread() on Unix when the context switch timer goes off, but there I'm slightly worried about having lots of EPIPE signals being thrown in case that has any unintended side effects.

The non-threaded RTS can provide lower latency and (probably as a consequence) better performance with some IO/networking workloads

I would consider that a bug in the threaded RTS, or possibly a bug in the program (using both bound and unbound threads is a common cause of accidental slowdown with -threaded).

nh2 added a comment.Jan 6 2018, 6:29 PM

For my reference, https://ghc.haskell.org/trac/ghc/ticket/10840 is what introduced the timerfd usage and also the other uses of non-SIGVTALRM timers.
I wasn't aware that they were introduced as recently as GHC 8.2.

I've done some write-up of all the methods how the timer signal is implemented at https://ghc.haskell.org/trac/ghc/ticket/8684#comment:29.

If you're interested in that, I can drop it into the code tree as official implementation documentation, as things get outdated quickly out-of-tree. Should I?

Well, if we're thinking about this as a way to context-switch away from (interruptible) FFI calls in the non-threaded RTS, then I think it's OK. We don't need this functionality in the threaded RTS because the scheduler can run other threads concurrently with the FFI call.

Agreed.

On the other hand it felt like something of a hack because there are just things that you fundamentally can't do with the non-threaded RTS, when you need concurrency with an FFI call that can't be made interruptible (because it's in some third-party library, and can't be made to interact with our interruptible support). So this feels like a partial solution.

Yes, if there's a library that doesn't have a mechanism to return to us, all bets are off. But I don't consider that bad or special: Such a library could similarly not be integrated correctly with cooperative threading in other languages' runtimes like nodejs or cpython, and would block those as well, so I think offering a way to do it right is as much as we can do from the Haskell side.

But OK, I don't object to doing this, and it's fairly easy to document in terms of context-switching.

Excellent. I've given a shot at writing it up in terms of context-switching here: https://github.com/nh2/ghc/blob/c33615eaddb0d36c0687ee81f17ab9d13b2a1908/rts/Timer.c#L51-L92.

Will push it into this diff once I've solved the problem below:

I think the fact that the timer signal causes system calls to return EINTR on Unix is an unintentional side-effect, not something we want to rely on. If we could avoid doing that, we would (and we did with the threaded RTS). So arguably we *should* be doing interruptOSThread() on Unix when the context switch timer goes off, but there I'm slightly worried about having lots of EPIPE signals being thrown in case that has any unintended side effects.

OK, all of that makes sense to me.

I'm hitting one problem though trying to call interruptOSThread(OSThreadId id) on Windows from within if (ticks_to_ctxt_switch <= 0) { in addition to my SetEvent():

Which OSThreadId id can I pass?

Surely in non-threaded there will only be one (the current one, I suspect that is the right one even though we're in an interrupt handler), but what is the correct way to obtain its number?

The non-threaded RTS can provide lower latency and (probably as a consequence) better performance with some IO/networking workloads

I would consider that a bug in the threaded RTS, or possibly a bug in the program (using both bound and unbound threads is a common cause of accidental slowdown with -threaded).

I would also prefer if the threaded RTS was strictly better than the non-threaded one in all cases, but getting there is a project of a different scope than the couple of fixes I provide here.

There might also be cases where the threaded RTS cannot be used at all, for example I recently learned that HaLVM depends on the non-threaded RTS.

nh2 added a comment.Jan 6 2018, 8:46 PM
In D42#119712, @nh2 wrote:

Which OSThreadId id can I pass? [...] what is the correct way to obtain its number?

Ah, I think I got it. As I wrote up here, CreateTimerQueueTimer() with WT_EXECUTEINTIMERTHREAD will have the timer callback function run in a separate thread. So we can't simply call GetCurrentThreadId() there.

Instead, we have to capture the thread ID of the (single, main) Haskell thread somewhere before outside that callback.

So I'm thinking of, #ifdef'd to Windows,

  • Adding a global OSThreadId mainThreadId = 0; in Task.c and a corresponding extern in Task.h
  • Assigning mainThreadId = GetCurrentThreadId(); in initTaskManager()
  • Then we can interruptOSThread(mainThreadId); in the context switch logic.
nh2 added a comment.Jan 6 2018, 9:02 PM

There's another potential issue in current GHC I just found: I wrote:

// For the non-threaded RTS on Unix (nb: where we don't use `timerfd`
// because we use `timerfd` only in -threaded), that enforcing happens
// automatically as a side effect of the timer signal:
// The timer signal is a POSIX signal here, and POSIX signals interrupt
// blocking syscalls on Unix (they return -1 and set EINTR).

But according to what I found in https://ghc.haskell.org/trac/ghc/ticket/8684#comment:29, this isn't true for all Unixes: It's true for Linux (where we use a pthread only if we're in threaded), but not for Darwin and iOS. Those use a pthread with a usleep() loop even for non-threaded.

So I suspect this means that context switch timers do not currently interrupt blocking foreign calls on Darwin and iOS in non-threaded, and that the same approach as I propose for Windows above (interrupting the main thread, in this case with interruptOSThread(), which here is pthread_kill()) would fix this.

nh2 updated this revision to Diff 15058.Jan 9 2018, 3:42 PM
nh2 edited the summary of this revision. (Show Details)

Implemented context-switching in non-threaded on non-Linux

I definitely support bringing in those docs somewhere (with the fix I just added to https://ghc.haskell.org/trac/ghc/ticket/10840).

I still feel slightly weird about doing kill(pid(),EPIPE) at 50Hz or whatever the context switch interval is set to. Maybe it's OK.

nh2 added a comment.Apr 28 2018, 3:02 PM

I still feel slightly weird about doing kill(pid(),EPIPE) at 50Hz or whatever the context switch interval is set to. Maybe it's OK.

@simonmar I thought about this some more and I think I got convinced by you.

As we know, it's good practice in C, even in libraries, to loop around EINTR

It is normal for libraries to do the loop themselves and not return the EINTR all the way up (e.g. up to Haskell).

So while I can fix code that's directly part of GHC (such as fdReady()) to return all the way up to Haskell, in general that approach won't work for all C libraries that users may want to bind to, and so the problem of foreign calls blocking the timer signal will remain.

Let me suggest a different path then:

The big problem with the non-threaded RTS is that it works _differently_ across the various operating systems. On Linux, due to the timer signal waking up syscalls, some Haskell programs are magically more interruptible than they are on, say, OSX.

As far as I can tell, on OSX, even when using the non-threaded RTS, GHC uses two threads: One for the timer signal, and one for everything else. (I found this a bit surprising given that it's called "non-threaded RTS", but the approach is still sensible and I guess it should just be clearly documented.)

So how about the following:

Instead of making non-threaded on OSX and Windows work the same way as it works on Linux now (with a 50Hz signal), we make the Linux-non-threaded RTS work the same as it does on OSX (1 thread for the timer, 1 for everything else). Then we have consistency across operating systems and a program you write on Linux nonthreaded will have similar interruptibility on OSX.

If somebody wants to deploy to a platform that does not have pthreads (e.g. some embedded devices), then it is reasonable to assume that they have control over the whole stack including all C libraries, so they can change them to return EINTR all the way up to Haskell, and they can also subscribe to being spammed by a signal themselves at program startup.

What do you think about this?

One concern I have is that the pthread implementation of the timer, when it doesn't use timerfd() just does a usleep(), which means that it will be less accurate than the timerfd() version or the setitimer() verison, because it relies on the scheduler not descheduling the thread for long periods.

But as I think I mentioned before, I'm not all that worried about the non-threaded RTS because I think it's only useful in very specialised circumstances these days. (and if there are performance issues with the threaded RTS then we should fix those).

nh2 added a comment.Jun 21 2018, 12:12 PM

Note for myself:

One possible drawback of using a timer thread for non-threaded on Linux like OSX does _might_ be that fork()ing is more complicated with threads. Some specific Haskell programs that use forkProcess directly might rely on the fact that on Linux so far there was really only 1 thread. We may want to offer them a way at runtime to keep the old behaviour.

mpickering resigned from this revision.Thu, Nov 8, 4:26 PM