Mark xmm6 as caller saved in the register allocator for windows.
ClosedPublic

Authored by AndreasK on Jan 28 2018, 3:57 AM.

Details

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
AndreasK created this revision.Jan 28 2018, 3:57 AM
Phyx requested changes to this revision.Jan 28 2018, 6:52 AM
Phyx added a subscriber: Phyx.

This just feels like a hack, you're just arbitrarily forcing the spilling of xmm6 because the example just happened to use xmm6.

xmm6 through xmm15 are not caller saved registers at all, they're callee saved. https://docs.microsoft.com/en-us/cpp/build/register-usage for which StgRun already treats xmm6 as callee saved as it should. (though it is missing xmm7 through xmm15)

So this fix is wrong. It's pampering over the problem by arbitrarily violating the calling convention, and I suspect increasing register pressure will just introduce the same issue on another register.

That said, callClobberedRegs is wrong in that xmm4 and xmm5 and need to be removed. But I don't expect that to fix this issue.

This revision now requires changes to proceed.Jan 28 2018, 6:52 AM
AndreasK updated this revision to Diff 15251.Jan 28 2018, 7:10 AM

Mixed up caller/callee in the comment.

Phyx added a comment.Jan 28 2018, 7:13 AM

Right, the comment confused me, so that said xmm6 through xmm15 need to be filtered out.
and in StgRun xmm7 though xmm15 treated as callee saved for x64 windows.

I guess one question is why are we implicitly the list this way. Seems like a tricky way to specify the register ranges. We know what platforms windows is ... right?

AndreasK updated this revision to Diff 15252.Jan 28 2018, 8:00 AM
  • Mark xmm6 AND ABOVE as callee saves
  • Respect Win64 ABI (save xmm6-15) in StgRun.
Phyx added a comment.EditedJan 28 2018, 8:22 AM

Right thanks @AndreasK , I guess something for later is checking allFPArgRegs and allArgRegs and allIntArgRegs look weird to me.

allFPArgRegs would only be correct for _m128 types on x64, but then wrong for x86.
If allFPArgRegs is supposed to be general FP arguments (e.g. 32, 64 bit types) then it's still wrong since xmm0..xmm3 are used for argument passing https://docs.microsoft.com/en-us/cpp/build/parameter-passing.

allIntArgRegs is also wonkers since RCX, RDX, R8, R9 are used to pass integers on x64. I'm not sure about on x86 for that one.

Phyx accepted this revision.Jan 28 2018, 8:35 AM

Right, so it seems allFPArgRegs and allIntArgRegs aren't used on Windows, conversely allArgRegs is used only on Windows.
Taking a look at load_args_win this seems to do the right thing, it enforced that the total number of registers used for argument passing is 4, and the constraints on them by tupling the registers, so it keeps the ordering.

The rest are correct I believe. Thanks again @AndreasK!

This revision is now accepted and ready to land.Jan 28 2018, 8:35 AM
RyanGlScott requested changes to this revision.Jan 28 2018, 11:53 AM
RyanGlScott added a subscriber: RyanGlScott.

Make sure to include the test case from Trac #14619!

compiler/nativeGen/X86/Regs.hs
410–413

Include a link to https://docs.microsoft.com/en-us/cpp/build/register-usage for future reference as well.

This revision now requires changes to proceed.Jan 28 2018, 11:53 AM
AndreasK marked an inline comment as done.Jan 28 2018, 2:17 PM

Make sure to include the test case from Trac #14619!

The given repro test doesn't fail on HEAD so doesn't help at all to prevent regressions.

I don't object to adding it. But not sure if it's actually adding value.

AndreasK updated this revision to Diff 15254.Jan 28 2018, 2:18 PM
  • Expand comment

The given repro test doesn't fail on HEAD so doesn't help at all to prevent regressions.

True, it's not failing on HEAD. But if we were to backport this patch to older GHCs that don't happen to have the necessary sqrt instructions needed to sidestep this issue, then we'd be in murkier waters. Better to add a regression test just to be on the safe side in such a scenario.

AndreasK updated this revision to Diff 15256.Jan 28 2018, 2:52 PM
  • Add Testcase
AndreasK updated this revision to Diff 15257.Jan 28 2018, 2:54 PM
  • Update comment
RyanGlScott accepted this revision.Jan 28 2018, 3:17 PM

LGTM, aside from some minor suggestions.

testsuite/tests/codeGen/should_run/T14619.hs
7

s/The immitate/To imitate/

30

This use of unsafeCoerce seems dodgy... why not just realToFrac?

This revision is now accepted and ready to land.Jan 28 2018, 3:17 PM
simonpj accepted this revision.Jan 29 2018, 11:10 AM
simonpj added a subscriber: simonpj.

Well discovered! I'd love a few more signposts in the code, for those who follow in your footsteps.

includes/rts/Constants.h
125

How is this number determined? Why does it change? (Don't answer me here; put the answer in a comment or Note.)

rts/StgCRun.c
289–298

This has to line up with callClobberedRegisters perhaps? Say so!!

testsuite/tests/codeGen/should_run/T14619.hs
8

What does this test test? Maybe just say in a comment?

AndreasK marked 5 inline comments as done.Jan 29 2018, 4:10 PM
AndreasK updated this revision to Diff 15275.Jan 29 2018, 4:10 PM
  • Test case comment, remove unsafeCoerce
  • Expanded comments.
This revision was automatically updated to reflect the committed changes.