Writing a Regular Expression parser in Haskell: Part 4

With the previous two modules in place we are now set up to use a DFA to match against a string.  In my implementation I support either a greedy match or an short match.  In a full featured regular expression engine this ability to choose greedy or not would be per operator but for simplicity I have it for the overall match. 


To do the matching I have a general function which will create a list of all matches.  Then the difference between short and greedy matching is which of the candidate solutions does it choose.

This is the method:

   1: doMatch func machine st [] = doAccept  machine st []
   2: doMatch func machine st string =  func $ map (\f -> doMatch' st f []) (tails string)
   3:     where
   4:       doMatch' state [] soFar = doAccept machine st soFar
   5:       doMatch' state (s:str) soFar = 
   6:           case findTransition machine s state of
   7:             Nothing -> doAccept machine state soFar
   8:             Just (from, to, val) -> case doMatch' to str (soFar ++ [s]) of
   9:                                       (False,_) -> case canAccept machine to of
  10:                                                     True -> (True, soFar ++ [s])
  11:                                                     False -> doMatch' to str (soFar ++ [s])
  12:                                       (True,res) -> (True,res)


This creates the list of matches and uses the passed in function to determine how to filter to either the shortest or longest match.

For short or long matches I pass in one of these two functions:

   1: -- Get the shortest match
   2: shortest matches = case  filter (\s->fst s) (sort matches) of
   3:                      [] -> (False,"")
   4:                      ms -> head ms
   6: -- Get the longest match
   7: longest matches = last.sort $ matches


I created aliases for the functions to make it more handy:

   1: (=~) = greedyMatch
   2: (=~?) = shortMatch


And then the final result:

   1: *SimpleRegex> "hiphiphiphorray" =~? "hip(hip)*"
   2: (True,"hip")
   4: *SimpleRegex> "hiphiphiphorray" =~ "hip(hip)*"
   5: (True,"hiphiphip")



I attached a zip of all the files for this project.



Comments (1)

  1. The third module in the simple regular expression parser is called: NFAtoDFA. Which as you might have

Skip to main content