Option for LINE pragmas to get lexed into tokens

Authored by harpocrates on Jan 21 2018, 11:50 PM.



This adds a parser-level switch to have 'LINE' and 'COLUMN'
pragmas lexed into actual tokens (as opposed to updating the
position information in the parser).

'lexTokenStream' is the only place where this option is enabled.

Diff Detail

rGHC Glasgow Haskell Compiler
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.
harpocrates created this revision.Jan 21 2018, 11:50 PM

Most third-party uses of GHC's lexer I've seen use getTokenStream/getRichTokenStream. It is known that this is somewhat problematic in the presence of CPP (see https://ghc.haskell.org/trac/ghc/ticket/8265), but there are reliable workarounds (again see the thread). I recently stumbled on something much nastier: Haskell sources that use {-# LINE ... #-} and {-# COLUMN ... #-} pragmas will send getRichTokenStream into an infinite loop.

The problem is that when lexTokenStream (which is behind both of getTokenStream/getRichTokenStream) encounters {-# LINE ... #-} or {-# COLUMN ... #-} pragmas, it swallows that source and silently changes the position in the parser state instead of emitting pragma tokens . However, addSourceToTokens (used by getRichTokenStream) crucially relies on position information of tokens, so it fails rather badly when given a token stream with LINE/COLUMN pragmas. This diff adds an option to the parser to lex these pragmas as actual tokens and enables this option in lexTokenStream.

This would fix https://github.com/haskell/haddock/issues/730 in Haddock.

alanz added a subscriber: alanz.Jan 22 2018, 5:20 AM
mpickering requested changes to this revision.Jan 22 2018, 5:22 AM

There is something going slightly wrong. The comment on use_pos_prags doesn't seem to match the implementation?


These should probably be ITline_prag SourceText like other pragmas to account for {-# line ... #-}.


Shouldn't this be False? According to the comment

-- If this is enabled, '{-# LINE ... -#}' and '{-# COLUMN ... #-}'
-- pragmas are lexed as tokens. Otherwise, they update the 'loc' field.

So it should be disabled by default?


Likewise, this should be True?

This revision now requires changes to proceed.Jan 22 2018, 5:22 AM
  • Update comment...
harpocrates marked 3 inline comments as done.Jan 22 2018, 6:54 AM

There is something going slightly wrong. The comment on use_pos_prags doesn't seem to match the implementation?

@mpickering Thanks for catching that! The comment on use_pos_prags was incorrect (which is very inconvenient given all the other comments referring to it).

mpickering accepted this revision.Jan 22 2018, 6:56 AM

Perhaps using a new datatype would be preferable but it's fine without.

This revision is now accepted and ready to land.Jan 22 2018, 6:56 AM
  • Carelessly missed definition
alexbiehl added inline comments.Jan 22 2018, 7:03 AM

It seems you are missing something like this let !src = lexemeToString buf len somewhere along the lines here.


ha, sorry I was too slow


..here too

This revision was automatically updated to reflect the committed changes.