Writing a Regular Expression parser in Haskell: Part 4


With the previous two modules in place we are now set up to use a DFA to match against a string.  In my implementation I support either a greedy match or an short match.  In a full featured regular expression engine this ability to choose greedy or not would be per operator but for simplicity I have it for the overall match. 

 

To do the matching I have a general function which will create a list of all matches.  Then the difference between short and greedy matching is which of the candidate solutions does it choose.

This is the method:

   1: doMatch func machine st [] = doAccept  machine st []
   2: doMatch func machine st string =  func $ map (\f –> doMatch’ st f []) (tails string)
   3:     where
   4:       doMatch’ state [] soFar = doAccept machine st soFar
   5:       doMatch’ state (s:str) soFar = 
   6:           case findTransition machine s state of
   7:             Nothing –> doAccept machine state soFar
   8:             Just (from, to, val) –> case doMatch’ to str (soFar ++ [s]) of
   9:                                       (False,_) –> case canAccept machine to of
  10:                                                     True –> (True, soFar ++ [s])
  11:                                                     False –> doMatch’ to str (soFar ++ [s])
  12:                                       (True,res) –> (True,res)

 


This creates the list of matches and uses the passed in function to determine how to filter to either the shortest or longest match.


For short or long matches I pass in one of these two functions:



   1: — Get the shortest match
   2: shortest matches = case  filter (\s->fst s) (sort matches) of
   3:                      [] –> (False,””)
   4:                      ms –> head ms
   5:  
   6: — Get the longest match
   7: longest matches = last.sort $ matches

 


I created aliases for the functions to make it more handy:



   1: (=~) = greedyMatch
   2: (=~?) = shortMatch

 


And then the final result:



   1: *SimpleRegex> “hiphiphiphorray” =~? “hip(hip)*”
   2: (True,”hip”)
   3:  
   4: *SimpleRegex> “hiphiphiphorray” =~ “hip(hip)*”
   5: (True,”hiphiphip”)

 


 


I attached a zip of all the files for this project.


Enjoy!

SimpleRegex.zip

Comments (1)

  1. The third module in the simple regular expression parser is called: NFAtoDFA. Which as you might have