polyparse

What is polyparse?
How do I use it?
Downloads

What is polyparse?

polyparse is a collection of parser combinator libraries in Haskell. It is distributed as a package, but you are likely to use only one of the included modules at any one time - they have almost identical APIs and in general they are alternatives to each other.

Polyparsers are both applicative and monadic. The applicative style is most useful with context-free grammars, since it avoids the need to name intermediate structures, and is very close to a BNF specification. For a context sensitive grammar, the monadic style is needed: preserving some context means binding it to a name.

The name polyparse comes from the polymorphic ability to feed any interesting type as input to the parsers, not just a list of characters. It does not even need to be a list of tokens - a ByteString, or tree structure, will do just as well. For instance, the HaXml package uses polyparsers at three different levels: (1) taking an unlexed string and parsing it to an XPath query; (2) in conjunction with a hand-written lexer, parsing a list of tokens to a generic XML tree structure; and (3) taking the generic XML tree structure as input and further parsing it into typed Haskell values.

In addition to its fully polymorphic parsers, polyparse also includes a replacement for the Haskell standard Read class: Text.Parse takes strings as input and parses values of Haskell datatypes, matching the format generated by the standard Show class. This includes for instance, different kinds of numeric parser (Ints, Doubles, Hex), character and string literals (dealing with escapes), and support for dealing with constructors, named fields, infix operators, and so on. The benefits of using Text.Parse over Read are: it is much more efficient; and you get decent error messages when a parse fails.

If you have only ever used the parsec combinators before, then you might be pleasantly surprised by polyparse: all of the same functionality is available, but it removes the confusion that all too commonly arises from a failure to use parsec's try combinator correctly. Ambiguous grammars often fail to be compositional in parsec, and it can be a black art guessing where to introduce a try to fix it. In contrast, polyparsers are by default fully compositional. It is possible to improve their efficiency (and the accuracy of error messages) by inserting commits (which are the dual of try), but it is not necessary for writing a correct parser, and furthermore, it is usually obvious where it can be beneficial to add a commit.

Text.Parse The Text.Read class from Haskell'98 is widely recognised to have many problems. It is inefficient. If a read fails, there is no indication of why. Worst of all, a read failure crashes your whole program! Text.Parse is a proposed replacement for the Read class. It defines a new class, Parse, with methods that return an explicit notification of errors, through the Either type. It also defines a number of useful helper functions to enable the construction of parsers for textual representations of Haskell data structures, e.g. named fields. Unsurprisingly, Text.Parse is really just a specialisation of the Poly combinators for String input, and the entire Poly API is also re-exported. The DrIFT tool can derive instances of the Parse class for you automatically. (Use the syntax {-! derive : Parse !-})
Text.Parse.ByteString is a variant of Text.Parse, where the input is simply ByteString instead of String.
Text.ParserCombinators.Poly Currently re-exports Text.ParserCombinators.Poly.Plain. The name Poly comes from the arbitrary token type. Thus, you can write your own lexer if you wish, rather than needing to encode lexical analysis within the parser itself.
Text.ParserCombinators.Poly.Plain This is a fresh set of combinators, improving on the HuttonMeijer variety by keeping only a single success, not a list of them. This is more space-efficient, whilst still permitting backtracking. Error-handling is also much improved: there are essentially two kinds of failure, soft and hard. Soft failure just means that the current parse did not work out, but another parse might be OK. Hard failure means that no parse will succeed, because we have already passed a point of commitment. Thus you can give far more accurate error messages to the user, including multi-layered locations.
Text.ParserCombinators.Poly.State is just like Poly, except it adds an arbitrary running state parameter.
Text.ParserCombinators.Poly.Lazy is just like Poly, except it does not return explicit failures. Instead, an exception is raised. Thus, it can return a partial parse result, before a full parse is complete. The word partial indicates that, having committed to return some outer data constructor, we might later discover some parse error further inside, so the value will be partial, as in incomplete: containing bottom. However, if you are confident that the input is error-free, then you will gain hugely in space-efficiency - essentially you can stream-process your parsed data-structure within very small constant space. This is especially useful for large structures like e.g. XML trees.
Text.ParserCombinators.Poly.StateLazy combines Poly.State and Poly.Lazy.
Text.ParserCombinators.Poly.Base
Text.ParserCombinators.Poly.Result
Text.ParserCombinators.Poly.Parser
Text.ParserCombinators.Poly.StateParser These modules are internal details. All of the Poly variations (Plain, Lazy, State, etc) share a lot in common: many of the combinators are indeed implemented identically. To reduce code duplication in the library, we provide a class-based interface here. The datatype implementations for strict, lazy, and so on, are more-or-less identical, but with separate instances of the classes, defined in modules at the same level in the hierarchy (e.g. T.P.Poly.Lazy etc). Every individual variation re-exports the base combinators, and base types, so there should be no need to import these modules directly.
Text.ParserCombinators.Poly.ByteString
Text.ParserCombinators.Poly.ByteStringChar are specialised versions of the Poly parser for ByteString input only.
Text.ParserCombinators.Poly.Text
Text.ParserCombinators.Poly.StateText are specialised versions of the Poly parser for Data.Text input only.
Text.ParserCombinators.Poly.Lex is a specialised version of the Poly parser for iteratee-style tokeniser input.
Text.ParserCombinators.HuttonMeijer The most venerable of all monadic parser combinator libraries, this version dates from 1996. Originally distributed with Gofer, then Hugs, as ParseLib. It uses the idea of "failure as a list of successes" to give multiple possible parses through backtracking. (But in practice, almost nobody wants any parse except the first complete one.)
Text.ParserCombinators.HuttonMeijerWallace The Hutton/Meijer combinators, extended to take an arbitrary token type as input (not just characters), plus a running state (e.g. to collect a symbol table, or macros), plus some facilities for simple error-reporting.

All the Poly* variations share the same basic API, so it is easy to switch from one set to another, when you discover you need an extra facility, just by changing a single import.

How do I use it?

Detailed documentation of the polyparse APIs is generated automatically by Haddock directly from the source code.

In general, you can just add an import of the relevant module to your source code, and everything should just compile OK. However, if the package is not 'exposed' (in ghc-pkg terminology), then you might need to use a command-line option --package polyparse at compile time.

The original Hutton/Meijer combinators are described in a very nice tutorial tech report: NOTTCS-TR-96-4

I wrote some motivation for Text.Parse (including simple examples) on my blog a while back. Here is the posting.

If you are familiar with the Parsec library, then the key insight for using PolyParse is that the two libraries' approach to backtracking are the duals of each another. In Parsec, you must explicitly add a try combinator at any location where backtracking might be necessary. Users often find this a bit of a black art. In PolyParse by contrast, all parsers are backtracking unless you explicitly add a commit (or one of its variations). It is easy to tell where to add a commit point, because you have already parsed enough of a data structure to know that only one outcome is possible. For instance, if you are parsing a Haskell value produced by 'show', then as soon as you have parsed the initial constructor, you know that no other constructor of that datatype is possible, so you can commit to returning it.

User-contributed documentation for polyparse is on the Haskell wiki at: http://haskell.org/haskellwiki/Polyparse Please edit the wiki if you discover any nice tricks!

Known problems:

No problems currently known.
Report bugs to Malcolm.Wallace@me.com
Even better, fix any bugs you find, and then darcs send a patch.

Downloads

Development version:

darcs get http://code.haskell.org/polyparse

Current released version:
polyparse-1.12, release date 2016.04.12
By HTTP: Hackage,

Older versions:
polyparse-1.11, release date 2015.01.01
By HTTP: Hackage.
polyparse-1.10, release date 2014.10.28
By HTTP: Hackage.
polyparse-1.9, release date 2013.05.15
By HTTP: .tar.gz, .zip.
polyparse-1.8, release date 2012.03.09
By HTTP: .tar.gz, .zip.
polyparse-1.7, release date 2011.06.26
By HTTP: .tar.gz, .zip.
polyparse-1.6, release date 2011.05.22
By HTTP: .tar.gz, .zip.
polyparse-1.5, release date 2011.01.07
By HTTP: .tar.gz, .zip.
polyparse-1.4.1, release date 2010.05.29
By HTTP: .tar.gz, .zip.
polyparse-1.4, release date 2009.12.10
By HTTP: .tar.gz, .zip.
polyparse-1.3, release date 2009.03.09
By HTTP: .tar.gz, .zip.
polyparse-1.2, release date 2009.03.04
By HTTP: .tar.gz, .zip.
polyparse-1.1, release date 2007.10.23
By HTTP: .tar.gz, .zip.
polyparse-1.0, release date 2007.01.26
By HTTP: .tar.gz, .zip.
All older versions by FTP: ftp://ftp.cs.york.ac.uk/pub/haskell/polyparse/

Installation

To install polyparse, you must have a Haskell compiler: ghc-6.2 or later, and/or nhc98-1.16/hmake-3.06 or later, and/or Hugs98 (Sept 2003) or later. For more recent compilers, use the standard Cabal method of installation:

    cabal install polyparse

    runhaskell Setup.hs configure [--prefix=...] [--buildwith=...]
    runhaskell Setup.hs build
    runhaskell Setup.hs install

For older compilers, use:

    sh configure [--prefix=...] [--buildwith=...]
    make
    make install

to configure, build, and install polyparse as a package for your compiler(s). If you don't use the --prefix option, you may need write permission on the library installation directories of your compiler(s). Afterwards, to gain access to the polyparse libraries, you only need to add the option -package polyparse to your compiler commandline (no option required for Hugs).

Recent news

Version 1.12 contains a fix for lexing escaped chars inside a literal string in the "word" parser. It also changes the semantics of "parseLitChar" so that it no longer expects surrounding single quotes. A version that does consume surrounding single quotes is now provided under the name parseLitChar'. The new behaviour of parseLitChar matches Haskell's lexLitChar specification, from which it was derived.

Version 1.11 contains only a fix for the Applicative/Monad/Functor classes rearrangement in ghc-7.10.*

Version 1.10 adds a new text combinator "literal", as a replacement for "isWord", when the string to be accepted should not be lexed as Haskell, for instance if it contains spaces or mixed alpha/symbols. Also fixes the implementation of "manyFinally", and adds a new combinator "satisfyMsg" which has a string describing the predicate, in order to produce more informative error messages.

Version 1.9 changes the semantics of Text.Parse.bracket, now allowing backtracking if the closing bracket parser fails. If you want the old behaviour, you can add a "commit" to the closing bracket parser at the use-site.

Version 1.8 fixes the ByteString API to return Word8 instead of Char, whilst retaining the old API as ByteStringChar.

Version 1.7 fixes imports for forward compatibility with ghc-7.

Version 1.6 exposes a Data.Text variant of the parser combinators, based loosely on the ByteString variant.

Version 1.5 has a reasonably large internal API change - moving lots of code around between modules in an attempt to rationalise the amount of cut-n-paste copies. It also adopts the standard Applicative and Alternative classes, whilst additionally retaining the previous names (apply, discard, onFail) for those methods. Users should see little change, unless they define their own instances of the PolyParse class.

Version 1.4.1 has two minor bugfixes: for `discard`, especially in the Lazy variant, and for 'optionalParens' in Text.Parse.

Version 1.4 has several bugfixes, improvements, and API additions. See the complete changelog
for fuller details. The main headline is new ByteString functionality.

Version 1.3 has a single bugfix: Text.Parse.parseFloat now accepts ordinary floating point notation, in addition to scientific (exponent) notation.

Version 1.2 improves the Text.Parse implementation significantly. Where previously all the parsers for builtin basic datatype (Int,Float,Char) were just thin wrappers over the H'98 Read instances, now they are all proper parsers, therefore they should (a) be faster; (b) give better error messages.

Version 1.1 much improves the laziness characteristics of the Poly* combinators. There are also a lot of new implementations of the Poly* parser types, all of which attempt to preserve exactly the same combinator interface, so it is easy to switch between them.

Version 1.00 is the first release of polyparse as a separate package. It was previously part of the HaXml suite. HaXml continues to use polyparse, but polyparse will be useful more widely. If you are looking for examples of the usage of polyparse, the implementations of Text.XML.HaXml.Parse, Text.XML.HaXml.ParseLazy, and Text.XML.HaXml.XmlContent are good places to look.
Complete Changelog

Contacts

We are interested in hearing your feedback on these parser combinators - suggestions for improvements, comments, criticisms, bug reports. Please mail

Malcolm.Wallace@me.com

Licence: The library is Free and Open Source Software, i.e., the bits we wrote are copyright to us, but freely licensed for your use, modification, and re-distribution, provided you don't restrict anyone else's use of it. The polyparse library is distributed under the GNU Lesser General Public Licence (LGPL) - see file LICENCE-LGPL for more details. We allow one special exception to the LGPL - see COPYRIGHT. (If you don't like any of these licensing conditions, please contact us to discuss your requirements.)

Related work

Parser combinators have a long history in Haskell. The first(?) monadic combinator tutorial was introduced by Hutton and Meijer in 1996, and the accompanying library was distributed with Gofer (a precursor to Hugs), and known simply as ParseLib. That library lives on here as Text.ParserCombinators.HuttonMeijer.
Niklas Rojemo's combinators. The parser combinators developed and used in the implementation of the nhc98 compiler are designed for space-efficiency. They were the first example of Applicative parsers (although the term was not known then), as opposed to Monadic parsers.
Daan Leijen's parsec. The parsec library is widely used, because it was once distributed with ghc. Its combinators are fairly robust, but you need to place explicit backtracking into your parsers, using the try operator. This can be tricky.
Doaitse Swierstra's UU_Parse. An all-singing, all-dancing parsing library. Deeply sophisticated. Allows on-line results, which is closely related to lazy parsing.
Koen Claessen's ReadP. This is a different proposed replacement for the standard Haskell'98 Read class. It is a whole lot more efficient than Read, but because it is also API-compatible with Read, that unfortunately means it suffers from not giving good error messages. Also, it requires rank-2 types, which is a non-Haskell'98 extension.