When iconv is unavailable, use an ASCII encoding to encode ASCII
ClosedPublic

Authored by rwbarton on Jul 20 2015, 6:01 PM.

Details

Summary

D898 and D1059 implemented a fallback behavior to handle the case
that the end user's iconv installation is broken (typically due to
running inside a chroot in which the necessary locale files and/or
gconv modules have not been installed). In this case, if the
program requests an ASCII locale, GHC's char8 encoding is used
rather than the program failing.

However, silently mangling data like char8 does when the programmer
did not ask for it is poor behavior, for reasons described in D1059.

This commit implements an ASCII encoding and uses it in the fallback
case when iconv is unavailable and the user has requested ASCII.

Test Plan

Added tests for the encodings defined in Latin1.
Also, manually ran a statically-linked executable of that test
in a chroot and the tests passed (up to the ones that call
mkTextEncoding "LATIN1", since there is no fallback from iconv
for that case yet).

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
rwbarton updated this revision to Diff 3606.Jul 20 2015, 6:01 PM
rwbarton retitled this revision from to When iconv is unavailable, use an ASCII encoding to encode ASCII.
rwbarton updated this object.
rwbarton edited the test plan for this revision. (Show Details)
rwbarton added reviewers: bgamari, hsyl20.
rwbarton updated the Trac tickets for this revision.
hsyl20 accepted this revision.Jul 21 2015, 5:24 AM
hsyl20 edited edge metadata.

It looks good to me.

libraries/base/GHC/IO/Encoding.hs
265

The comment needs to be slightly modified

This revision is now accepted and ready to land.Jul 21 2015, 5:24 AM
bgamari accepted this revision.Jul 21 2015, 5:43 AM
bgamari edited edge metadata.

Brilliant! This looks like a great improvement over the existing behavior. Thanks for the test.

However, might it be good to clarify specifically what is meant by "ASCII" (e.g. 7-bit ASCII)?

This revision was automatically updated to reflect the committed changes.