The GHC Commentary - Just Syntax

The lexical and syntactic analyser for Haskell programs are located in fptools/ghc/compiler/parser/.

The Lexer

The lexer is a rather tedious piece of Haskell code contained in the module Lex. Its complexity partially stems from covering, in addition to Haskell 98, also the whole range of GHC language extensions plus its ability to analyse interface files in addition to normal Haskell source. The lexer defines a parser monad P a, where a is the type of the result expected from a successful parse. More precisely, a result of type

data ParseResult a = POk PState a
		   | PFailed Message

is produced with Message being from ErrUtils (and currently is simply a synonym for SDoc).

The record type PState contains information such as the current source location, buffer state, contexts for layout processing, and whether Glasgow extensions are accepted (either due to -fglasgow-exts or due to reading an interface file). Most of the fields of PState store unboxed values; in fact, even the flag indicating whether Glasgow extensions are enabled is represented by an unboxed integer instead of by a Bool. My (= chak's) guess is that this is to avoid having to perform a case on a boxed value in the inner loop of the lexer.

The same lexer is used by the Haskell source parser, the Haskell interface parser, and the package configuration parser.

The Haskell Source Parser

The parser for Haskell source files is defined in the form of a parser specification for the parser generator Happy in the file Parser.y. The parser exports three entry points for parsing entire modules (parseModule, individual statements (parseStmt), and individual identifiers (parseIdentifier), respectively. The last two are needed for GHCi. All three require a parser state (of type PState) and are invoked from HscMain.

Parsing of Haskell is a rather involved process. The most challenging features are probably the treatment of layout and expressions that contain infix operators. The latter may be user-defined and so are not easily captured in a static syntax specification. Infix operators may also appear in the right hand sides of value definitions, and so, GHC's parser treats those in the same way as expressions. In other words, as general expressions are a syntactic superset of expressions - ok, they nearly are - the parser simply attempts to parse a general expression in such positions. Afterwards, the generated parse tree is inspected to ensure that the accepted phrase indeed forms a legal pattern. This and similar checks are performed by the routines from ParseUtil. In some cases, these routines do, in addition to checking for wellformedness, also transform the parse tree, such that it fits into the syntactic context in which it has been parsed; in fact, this happens for patterns, which are transformed from a representation of type RdrNameHsExpr into a representation of type RdrNamePat.

The Haskell Interface Parser

The parser for interface files is also generated by Happy from ParseIface.y. It's main routine parseIface is invoked from RnHiFiles.readIface.

The Package Configuration Parser

The parser for configuration files is by far the smallest of the three and defined in ParsePkgConf.y. It exports loadPackageConfig, which is used by DriverState.readPackageConf.

Last modified: Wed Jan 16 00:30:14 EST 2002