Text.Regex.Lazy

Version 0.70 (2006-08-10)

By Chris Kuklewicz (TextRegexLazy (at) personal (dot) mightyreason (dot) com)

Changes from 0.66 to 0.70 Changes from 0.55 to 0.66 Changes from 0.44 to 0.55 Changes from 0.33 to 0.44 See the LICENSE file for details on copyright. See README for building instructions.
The new API is very close to JRegex and supports 4 backends: And for all backends, there are two types that can be used as a source of regular expressions or to match a regular expression against: String, and ByteString. The ByteString library will be in the next GHC and can be gotten from http://www.cse.unsw.edu.au/~dons/fps.html.

For simplest use of the new API: import Text.Regex.Lazy and one of

import Text.Regex.PCRE((=~),(=~~))
import Text.Regex.Parsec((=~),(=~~))
import Text.Regex.DFA((=~),(=~~))
import Text.Regex.PosixRE((=~),(=~~))
import Text.Regex.TRE((=~),(=~~))
The things you can demand of (=~) and (=~~) are all instance defined in Text.Regex.Impl.Context and they are used in Example.hs as well.

You can redefine (=~) and (=~~) to use different options by using makeRegexOpts:

(=~) :: (RegexMaker Regex CompOption ExecOption source,RegexContext Regex source1 target) => source1 -> source -> target
(=~) x r = let q :: Regex
               q = makeRegexOpts (some compoption) (some execoption) r
           in match q x

(=~~) ::(RegexMaker Regex CompOption ExecOption source,RegexContext Regex source1 target,Monad m) => source1 -> source -> m target
(=~~) x r = let q :: Regex
                q = makeRegexOpts (some compoption) (some execoption) r
            in matchM q x
There is a medium level API with functions compile/execute/regexec in all the Text.Regex.*.(String|ByteString) modules. These allow for errors to be reported as Either types when compiling or running.

The low level APIs are in the Text.Regex.*.Wrap modules. For the c-library backends these expose most of the c-api in wrap* functions that make the type more Haskell-like: CString and CStingLen and newtypes to specify compile and execute options. The actual foreign calls are not exported; it does not export the raw c api.

Also, Text.Regex.PCRE.Wrap will let you query if it was compiled with UTF8 suppor: configUTF8 :: Bool. But I do not provide a way to marshall to or from UTF8. (If you have a UTF8 ByteString then you would probably be able to make it work, assuming the indices PCRE uses are in bytes, otherwise look at the wrap* functions which are a thin layer over the pcreapi).

The old Text.Regex API is can be replaced. If you need to be drop in compatible with Text.Regex then you can import Text.Regex.New and report any infidelities as bugs. Some advantages of Text.Regex.Parsec over Text.Regex:

Internally it uses Parsec to turn the string regex into a Pattern data type, simplify the Pattern, then transform the Pattern into a Parsec parser that accepts matching strings and stores the sub-strings of parenthesized groups.

All of this was motivated by the inability to use Text.Regex to complete the regex-dna benchmark on The Computer Language Shootout. The current entry there, by Don Stewart and Alson Kemp and Chris Kuklewicz, does not use this Parsec solution, but rather a custom DFA lexer from the CTK library.