Converting a DFA to a table, Regular expressions part 3

So, last time we explored the wonderful world of regular expressions we ended on a nice looking graph representing our deterministic machine that recognized the regular expression that generated it. That’s all fine, but in reality we’d like a table representation for our machine. Tables offer some significant benefits, most prominently they’re really fast with O(1) access times. They are, however, not very compact. That flaw can be remedied with table compression though, which I will not delve into in this post. Mostly because I do not completely understand it yet.

Anyway, here’s where we ended up last time:

First order of business is to assign state numbers to each state. It’s really arbitrary what numbers you give the states as long as they are unique, but keeping with tradition we’re going to assign 0 to the starting state. The graph now looks like this:

Now we can create a table. On the X-axis we will use the character input, on the Y axis we will use the state number. For each state, if there is a transition to another state using a given input character, we fill in the state number to go to in the corresponding column. If the state is an accept state, add a special ACC for accept. The table will look like this:

STATE | a | b | c | d |
  0   | 1 | 2 |   |   |
  1   | 1 | 2 |   |   |
  2   | 1 | 3 |   |   |
  3   | 2 | 3 | 4 |   |
  4   |   |   |   |ACC|

Apply the pseudocode algorithm outlined in the first post to this table, and you’ve got your very own regular expression engine!

Wrapping up, there’s of course more to do. This engine does not handle things such as capture groups, lookahead and all that advanced stuff. But it isn’t that hard to add, and nothing of what’s been said will not hold true for the more full-featured engines. Another option is to never do the table generation stuff and instead emit actual IL code that will execute the DFA straight up, which would be incredibly cool stuff. As I said earlier these tables end to be very sparse, since the input requited normally is at least 256 different characters and only a few are legal for each state. This means that there’s a lot to do on the compression side as well.

If there’s any questions or requests for elaborations on all of this, I’ll happily accept and answer. Otherwise, I think it’s time to continue with regular expressions big brother – parsers.

Tagged , , ,

One thought on “ Converting a DFA to a table, Regular expressions part 3

  1. Very nice. I ran through your tutorial to sharpen my skills and to get a handle on the inner world of the regex. I’m handy with the perl style short-hand, and I use state machines to parse all the time, but NFA to DFA reductions are from aeons ago for me. Oh, State1(b) maps to State3 not State2. It was a *fine* exercise to find that out. Thank you.

Leave a Reply


%d bloggers like this: