Fix a bug in 'alexInputPrevChar'

Authored by harpocrates on Oct 18 2017, 1:07 AM.



The lexer hacks around unicode by squishing any character into a 'Word8'
and then storing the actual character in its state. This happens at

That is all and well, but we ought to be careful that the characters we
retrieve via 'alexInputPrevChar' also fit this convention.

In fact, Trac #13986 exposes nicely what can go wrong: the regex in the left
context of the type application rule uses the '$idchar' character set
which relies on the unicode hack. However, a left context corresponds
to a call to 'alexInputPrevChar', and we end up passing full blown
unicode characters to '$idchar', despite it not being equipped to deal
with these.

Test Plan

Added a regression test case

Diff Detail

rGHC Glasgow Haskell Compiler
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.
harpocrates created this revision.Oct 18 2017, 1:07 AM

Hmm, tricky. Indeed this look right, although it took a bit of staring to figure out what was going on.

@harpocrates, do you think you understand this code well enough to write a longer Note explaining what is happening here?

  • Add a Note explaining unicode handling in Alex
This revision is now accepted and ready to land.Oct 25 2017, 1:25 PM
This revision was automatically updated to reflect the committed changes.