Writing a Regular Expression parser in Haskell: Part 4

Article
06/21/2008

With the previous two modules in place we are now set up to use a DFA to match against a string. In my implementation I support either a greedy match or an short match. In a full featured regular expression engine this ability to choose greedy or not would be per operator but for simplicity I have it for the overall match.

To do the matching I have a general function which will create a list of all matches. Then the difference between short and greedy matching is which of the candidate solutions does it choose.

This is the method:

    1: doMatch func machine st [] = doAccept  machine st []

    2: doMatch func machine st string =  func $ map (\f -> doMatch' st f []) (tails string)

    3:     where

    4:       doMatch' state [] soFar = doAccept machine st soFar

    5:       doMatch' state (s:str) soFar =

    6:           case findTransition machine s state of

    7:             Nothing -> doAccept machine state soFar

    8:             Just (from, to, val) -> case doMatch' to str (soFar ++ [s]) of

    9:                                       (False,_) -> case canAccept machine to of

   10:                                                     True -> (True, soFar ++ [s])

   11:                                                     False -> doMatch' to str (soFar ++ [s])

   12:                                       (True,res) -> (True,res)

This creates the list of matches and uses the passed in function to determine how to filter to either the shortest or longest match.

For short or long matches I pass in one of these two functions:

    1: -- Get the shortest match

    2: shortest matches = case  filter (\s->fst s) (sort matches) of

    3:                      [] -> (False,"")

    4:                      ms -> head ms

5:

    6: -- Get the longest match

    7: longest matches = last.sort $ matches

I created aliases for the functions to make it more handy:

    1: (=~) = greedyMatch

    2: (=~?) = shortMatch

And then the final result:

    1: *SimpleRegex> "hiphiphiphorray" =~? "hip(hip)*"

    2: (True,"hip")

3:

    4: *SimpleRegex> "hiphiphiphorray" =~ "hip(hip)*"

    5: (True,"hiphiphip")

I attached a zip of all the files for this project.

Enjoy!

SimpleRegex.zip

Writing a Regular Expression parser in Haskell: Part 4

Additional resources