testsuite: Move echoing commands in make invocations to VERBOSE=5
ClosedPublic

Authored by rwbarton on Feb 15 2017, 7:03 PM.

Details

Summary

D2894 added a new verbosity level VERBOSE=4 to strip -s/--silent
flags from make invocations in test commands. This will probably
cause the test to fail of course, but is useful for seeing what
a test that's already failing is doing.

However there was already an undocumented meaning of VERBOSE=4,
added in commit cfeededf, that causes the results of performance
tests to be printed unconditionally (even when they are within the
expected range). nomeata's ghc builder uses these figures to
collect historical data on performance test figures. The new
meaning of VERBOSE=4 added in D2894 means that any test that uses
make now fails on the builder.

This commit moves the new behavior of D2894 to the level VERBOSE=5
so that nomeata's ghc builder again produces useful results on
failing tests. It also adds documentation for both settings.

Test Plan

did some manual testing

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
rwbarton created this revision.Feb 15 2017, 7:03 PM
Phyx accepted this revision.Feb 15 2017, 7:24 PM

Lgtm

This revision is now accepted and ready to land.Feb 15 2017, 7:24 PM
nomeata requested changes to this revision.Feb 15 2017, 10:40 PM

I have doubts about the VERBOSE setting. If increasing the verbosity can make tests fail, then that is a harsh violation of the principle of least surprise!

Better introduce a new settings called DEBUG_MAKE or similar, with clear documentation that this might produce spurious failures.

(Or make sure that no extra tests fail with this settings.)

This revision now requires changes to proceed.Feb 15 2017, 10:40 PM
Phyx added a comment.Feb 16 2017, 2:26 AM

Passing --trace already makes a test fail, passing any flag to make that's used recursively can make the tests fail. I also don't see how getting the performance output in the output unconditionally is not making the test fail.

It's practically impossible to not make the tests fail when stripping away the silent flag. If it's a pre_cmd then the test is fine since the output of those are no recorded, the only issues are tests that invoke make in their actual test command.

I rather not have another environment variable to do another thing, as it's already hard to remember what to use for the various things. Personally if it's of concern I would just remove it. It was undocumented so no one knows it's there to begin with.

In D3141#92010, @Phyx wrote:

I rather not have another environment variable to do another thing, as it's already hard to remember what to use for the various things.

I think it is harder to remember what a certain number of verbosity means than learning about a self-explaining name.

Personally if it's of concern I would just remove it. It was undocumented so no one knows it's there to begin with.

Fine with me.

I've never used this echoing feature before (since I didn't know about it), but I have done the same thing manually, so it is useful. The best outcome would be if we could extract the echoed commands separately from the stdout of the test commands, so we could echo them without interfering with checking the test output.

But this Diff is simply about making the "unexpected failures" figure on perf.haskell.org useful again, in a minimally disruptive way. If @nomeata or someone else wants to remove the new VERBOSE=5 setting, feel free to do so in a separate Diff.

bgamari accepted this revision.Mar 2 2017, 3:37 PM

I am fine with this (although I do wish there were a comment describing what these various verbosity levels included).

This revision was automatically updated to reflect the committed changes.