Allow limiting the number of GC threads (+RTS -qn<n>)

This allows the GC to use fewer threads than the number of capabilities.
At each GC, we choose some of the capabilities to be "idle", which means
that the thread running on that capability (if any) will sleep for the
duration of the GC, and the other threads will do its work. We choose
capabilities that are already idle (if any) to be the idle capabilities.

The idea is that this helps in the following situation:

  • We want to use a large -N value so as to make use of hyperthreaded cores
  • We use a large heap size, so GC is infrequent
  • But we don't want to use all -N threads in the GC, because that thrashes the memory too much.

See docs for usage.

