Update unicode tables to v. 12 of the standard
ClosedPublic

Authored by ulysses4ever on Aug 14 2018, 5:10 AM.

Diff Detail

Repository
rGHC Glasgow Haskell Compiler
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
ulysses4ever created this revision.Aug 14 2018, 5:10 AM

Update Unicode tables to v. 12 of the standard

Two tests are failing.

  1. T10412 (trac, phab) — I'm confused here. That diff updated the script generating Unicode-related definitions, but didn't run the new script to actually produce those definitions. It also added the test which worked for old definition but fails under the new definition.

Second test, unicode002, I still can't wrap my mind around it. Basically, it generates a huge table with the result of 7 character tests for first 6554 codepoints. The way this test is mainted, I guess, is just replace the whole reference table with every Unicode standard update (like this was done, e.g., in this commit).

My current guess that T10412.hs should be deleted or changed to what we have now, and unicode002 should be just updated.

thomie added subscribers: Azel, thomie.

@Azel: as the author of D4593, could you maybe review this and address @ulysses4ever's comments.

bgamari accepted this revision.Aug 17 2018, 12:20 PM

Thanks for doing this! Let's just update unicode002; to do so simply run make -Ctestsuite accept TEST=unicode002. I'm also a bit perplexed by T10412; it would be nice to hear what @Azel says about this.

This revision is now accepted and ready to land.Aug 17 2018, 12:20 PM
  • update unicode002 test according to Unicode v12
Azel added a comment.Aug 19 2018, 9:10 AM

Isn't libraries/base/cbits/WCsubst.c generated by libraries/base/cbits/ubconfc though? That would make te latter the one to update I think. (As I recall libraries/base/cbits/WCsubst.c is regenerated when you build GHC but I'd have to check)

ulysses4ever updated the Trac tickets for this revision.EditedAug 20 2018, 10:56 AM

@Azel WCsubst.c is generated by ubconfc indeed, but I don't think this is done on rebuilding. Git history says WCsubst.c was updated 4 years ago last time. And the manual update using the current Unicode spec (v.12) fixes several bugs (Trac #5518, Trac #15525), which is the aim of this revision. We just need to decide what to do with T10412. /cc @bgamari

bgamari requested changes to this revision.Aug 21 2018, 10:17 AM

@ulysses4ever, the new output of T10412 looks correct to me. I suspect the status quo is just the result of an oversight. Feel free to update the test output.

This revision now requires changes to proceed.Aug 21 2018, 10:17 AM

Also, do you think you could add a note to libraries/base/Changelog.md and docs/users_guide/8.8.1-notes.rst noting the change?

  • Update T104121 reference output to reflect current Unicode-related definitions.
  • Add the notes about the content of the patch to base's changelog and 8.8.1 release notes.

@bgamari done. Check that the chengelog/release-notes updated reasonably.

bgamari accepted this revision.Aug 21 2018, 5:28 PM

Perfect. Thanks!

This revision is now accepted and ready to land.Aug 21 2018, 5:28 PM
Azel accepted this revision.Aug 21 2018, 9:32 PM

(Looking at it, WCsubst.c doesn't seem to be automatically regenerated so thanks for catching that oversight. I suppose WCsubst.c isn't generated on build to ease building on Windows?)

This revision was automatically updated to reflect the committed changes.