# Class 09: From Specification to Optimal DFA (1)

Back to Finite Automata. On to From Specification to Optimal DFA (2).

Held: Monday, 9 February 2004

Summary: Today we start to consider how to move from the readable but declarative regular expression notation to the executable but obtuse finite automaton notation.

Related Pages:

Due

Assignments

Notes:

• Today at 2:15 in 2417, there is a talk entitled Beyond Pizza and Code: The ACM Code of Ethics. Extra credit for attending.
• Today at 4:30, there is a talk entitled GUI: More than just a pretty interface. Extra credit for attending.

Overview:

• Review
• From regular expressions to NFAs
• An example

## Review

• Regular expressions and DFAs both can represent some simple but useful languages.
• They can also be used to tokenize.
• Regular expressions are declarative: They say what, rather than how.
• DFAs are imperative: They also say how.
• Some languages are easier to represent with regular expressions, some with DFAs, some with NFAs.
• Note that regular expressions are also non

## Limitations of Regular Expressions, DFAs, and the Ilk

• Many useful languages cannot be represented by regular expressions.
• For example, strings of a's and b's with equal numbers of a's and b's
• Here's an even simpler one: akbk
• How would you prove that you can't represent that last one? (No, I won't put it here, since many of you read along.)
• Hint: Think about the F in DFA.

## From Regular Expression to DFA

• We've now seen two mechanisms for describing tokens (and, in effect, for tokenizing): regular expressions and finite automata.
• How do we go from one to another?
• We convert regular expressions to nondeterministic finite automata using a surprisingly naive translation.
• We convert NFAs to DFAs.
• We then optimize the DFAs.

## From Regular Expression to NFA

• Our regular expression to NFA converter builds a finite automaton with one final state (and one start state). It uses the recursive definitions of regular expressions to build a finite automaton for a compound expression from the finite automata for the base expressions.
• Each created NFA has a unique start state and a unique final state.
• The NFA for the empty string has two states with an epsilon transition between the states. The first state is the start state. The second state is the final state.
• The NFA for a single symbol is similar, except that the transition is labeled with the symbol.
• The NFA for alternation has a new start state with epsilon edges to the start states of the NFA for the subexpressions and a new final state with epsilon edges from the final states of the NFA for the subexpressions.
• The finite automaton for concatenation uses the start state of the NFA for the first expression and the final state of the NFA for the second expression. There is an epislon transition from the final state of the first NFA to the start state of the second NFA.
• Alternatively, we can merge the final state of the first NFA with the start state of the second NFA.
• The NFA for repetition has a new start state and a new final state. There are also epsilon transitions
• from the start state to the final state,
• from the start state to the start state of the sub-NFA,
• from the final state of the sub-NFA to the final state, and
• from the final state of the sub-NFA to the start state of the sub-NFA.
• If it seems to you that some of these extra states and epsilon transitions are unnecessary, you're probably right. However, you can't get rid of all of them.
• For example, consider what happens if you don't create any extra states for concatenation or repetition.

## Some Examples

• We'll develop some examples on the fly.

Back to Finite Automata. On to From Specification to Optimal DFA (2).

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Wed May 5 11:46:59 2004.
The source to the document was last modified on Tue Jan 20 23:06:45 2004.
This document may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2004S/Outlines/outline.09.html`.

You may wish to validate this document's HTML ; ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu