The third module in the simple regular expression parser is called: NFAtoDFA. Which as you might have guessed, takes the NFA that resulted from the first module and converts it into a DFA. The structure that the DFA uses is the same that the NFA uses since they are both finite state machines.
Converting an NFA to a DFA requires mapping sets of nodes in the NFA to a single node in the DFA. Many nodes in a NFA will correspond to one node in the DFA. Making this change requires updating transitions to point to and from sets of nodes. To manage this transformation I create a state monad using the following context:
Most of the code in this module is just managing this context and updating it according to two operation:
- Epsilon Closure
- Set Move
These are explained in more detail in this article.
Basically, epsilon closure is the process of taking a set of initial nodes and returning a new set of all nodes you can traverse to purely on epsilon transitions. To help with this I created some smaller methods to build up to an epsilon closure.
First are a couple methods (findToNodes and closure):
findToNodes searches a transition table for all nodes which go from any node in fromNodes on value. It will builds up a set with all the to nodes that match.
closure wraps findToNodes to let us easily union together an initial set and the nodes we can reach from that set.
With this in hand we can write clearly a epsilon closure function:
This function takes full advantage of the lazy nature of Haskell. It repeats the closure on epsilon over and over and streams its results into our function foldUntilRepeat. This method does what it says, it will fold the values that are streamed in until it sees the same value twice.
The set move is just combination of what you have already seen:
With these functions in hand, this module just becomes calling them and updating the context until we have no more nodes in the NFA to process.
In the next installment I will discuss using the output of this modules to match a regex against a string.
Also, once again the code is attached.