afl-material: comparison handouts/ho03.tex

equal deleted inserted replaced

-:b79e704acb72
+:5b5a68df6d16
 \documentclass{article}
-\usepackage{hyperref}
+\usepackage{../style}
-\usepackage{amssymb}
+\usepackage{../langs}
-\usepackage{amsmath}
-\usepackage[T1]{fontenc}
-\usepackage{listings}
 \usepackage{xcolor}
 \usepackage{tikz}
 \usetikzlibrary{arrows}
 \usetikzlibrary{automata}
 \usetikzlibrary{shapes}
 \usetikzlibrary{shadows}
 \usetikzlibrary{positioning}
 \usetikzlibrary{calc}
 \usetikzlibrary{fit}
 \usetikzlibrary{backgrounds}
-\usepackage{../langs}
-\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
 \begin{document}
 \section*{Handout 3}
-Let us have a closer look at automata and their relation to regular expressions. This will help us to understand
+Let us have a closer look at automata and their relation to
-why the regular expression matchers in Python and Ruby are so slow with certain regular expressions.
+regular expressions. This will help us to understand why the
+regular expression matchers in Python and Ruby are so slow
-A \emph{deterministic finite automaton} (DFA), say $A$, is defined by  a four-tuple written $A(Q, q_0, F, \delta)$ where
+with certain regular expressions.
+A \emph{deterministic finite automaton} (DFA), say $A$, is
+defined by a four-tuple written $A(Q, q_0, F, \delta)$ where
 \begin{itemize}
 \item $Q$ is a set of states,
 \item $q_0 \in Q$ is the start state,
 \item $F \subseteq Q$ are the accepting states, and
 \item $\delta$ is the transition function.
 \end{itemize}
-\noindent
+\noindent The transition function determines how to
-The transition function determines how to ``transition'' from one state to the next state with respect to a character.
+``transition'' from one state to the next state with respect
-We have the assumption that these functions do not need to be defined everywhere: so it can be the case that
+to a character. We have the assumption that these functions do
-given a character there is no next state, in which case we need to raise a kind of ``raise an exception''.  A typical
+not need to be defined everywhere: so it can be the case that
+given a character there is no next state, in which case we
+need to raise a kind of ``raise an exception''. A typical
 example of a DFA is
 \begin{center}
 \begin{tikzpicture}[>=stealth',very thick,auto,
 every state/.style={minimum size=0pt,inner sep=2pt,draw=blue!50,very thick,fill=blue!20},]
 \path[->] (q_2) edge [loop left] node  {$b$} ();
 \path[->] (q_3) edge [bend left=95, looseness=1.3] node [below]  {$b$} (q_0);
 \end{tikzpicture}
 \end{center}
-\noindent
+\noindent The accepting state $q_4$ is indicated with double
-The accepting state $q_4$ is indicated with double circles. It is possible that a DFA has no
+circles. It is possible that a DFA has no accepting states at
-accepting states at all, or that the starting state is also an accepting state.
+all, or that the starting state is also an accepting state. In
-In the case above the transition function is defined everywhere and can be given as a table
+the case above the transition function is defined everywhere
-as follows:
+and can be given as a table as follows:
 \[
 \begin{array}{lcl}
 (q_0, a) &\rightarrow& q_1\\
 (q_0, b) &\rightarrow& q_2\\
 (q_4, a) &\rightarrow& q_4\\
 (q_4, b) &\rightarrow& q_4\\
 \end{array}
 \]
-\noindent
+\noindent We need to define the notion of what language is
-We need to define the notion of what language is accepted by an automaton. For this we
+accepted by an automaton. For this we lift the transition
-lift the transition function $\delta$ from characters to strings as follows:
+function $\delta$ from characters to strings as follows:
 \[
 \begin{array}{lcl}
 \hat{\delta}(q, "")        & \dn & q\\
 \hat{\delta}(q, c\!::\!s) & \dn & \hat{\delta}(\delta(q, c), s)\\
 \end{array}
 \]
-\noindent
+\noindent Given a string this means we start in the starting
-Given a string this means we start in the starting state and take the first character of the string,
+state and take the first character of the string, follow to
-follow to the next sate, then take the second character and so on. Once the string is exhausted
+the next sate, then take the second character and so on. Once
-and we end up in an accepting state, then this string is accepted. Otherwise it is not accepted.
+the string is exhausted and we end up in an accepting state,
-So $s$ in the \emph{language accepted by the automaton} $A(Q, q_0, F, \delta)$ iff
+then this string is accepted. Otherwise it is not accepted. So
+$s$ in the \emph{language accepted by the automaton} $A(Q,
+q_0, F, \delta)$ iff
 \[
 \hat{\delta}(q_0, s) \in F
 \]
-While with DFA it will always clear that given a character what the next state is, it will be useful to relax
+While with DFA it will always clear that given a character
-this restriction. The resulting construction is called a \emph{non-deterministic finite automaton} (NFA) given
+what the next state is, it will be useful to relax this
-as a four-tuple $A(Q, q_0, F, \rho)$ where
+restriction. The resulting construction is called a
+\emph{non-deterministic finite automaton} (NFA) given as a
+four-tuple $A(Q, q_0, F, \rho)$ where
 \begin{itemize}
 \item $Q$ is a finite set of states
 \item $q_0$ is a start state
 \item $F$ are some accepting states with $F \subseteq Q$, and
 \path[->] (r_2) edge [bend left] node  [right] {$a$} (r_1);
 \end{tikzpicture}}
 \end{tabular}
 \end{center}
-\noindent
+\noindent There are a number of points you should note. Every
-There are a number of points you should note. Every DFA is a NFA, but not vice versa.
+DFA is a NFA, but not vice versa. The $\rho$ in NFAs is a
-The $\rho$ in NFAs is a transition \emph{relation}
+transition \emph{relation} (DFAs have a transition function).
-(DFAs have a transition function). The difference between a function and a relation is that a function
+The difference between a function and a relation is that a
-has always a single output, while a relation gives, roughly speaking, several outputs. Look
+function has always a single output, while a relation gives,
-at the NFA on the right-hand side above: if you are currently in the state $r_2$ and you read a
+roughly speaking, several outputs. Look at the NFA on the
-character $a$, then you can transition to $r_1$ \emph{or} $r_3$. Which route you take is not
+right-hand side above: if you are currently in the state $r_2$
-determined. This means if we need to decide whether a string is accepted by a NFA, we might have
+and you read a character $a$, then you can transition to $r_1$
-to explore all possibilities. Also there is a special transition in NFAs which is called \emph{epsilon-transition}
+\emph{or} $r_3$. Which route you take is not determined. This
-or \emph{silent transition}. This transition means you do not have to ``consume'' no part of
+means if we need to decide whether a string is accepted by a
-the input string, but ``silently'' change to a different state.
+NFA, we might have to explore all possibilities. Also there is
+a special transition in NFAs which is called
-The reason for introducing NFAs is that there is a relatively simple (recursive) translation of regular expressions into
+\emph{epsilon-transition} or \emph{silent transition}. This
-NFAs. Consider the simple regular expressions $\varnothing$, $\epsilon$ and $c$. They can be translated
+transition means you do not have to ``consume'' no part of the
-as follows:
+input string, but ``silently'' change to a different state.
+The reason for introducing NFAs is that there is a relatively
+simple (recursive) translation of regular expressions into
+NFAs. Consider the simple regular expressions $\varnothing$,
+$\epsilon$ and $c$. They can be translated as follows:
 \begin{center}
 \begin{tabular}[t]{l@{\hspace{10mm}}l}
 \raisebox{1mm}{$\varnothing$} &
 \begin{tikzpicture}[scale=0.7,>=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \path[->] (q_0) edge node [below]  {$c$} (q_1);
 \end{tikzpicture}\\\\
 \end{tabular}
 \end{center}
-\noindent
+\noindent The case for the sequence regular expression $r_1
-The case for the sequence regular expression $r_1 \cdot r_2$ is as follows: We are given by recursion
+\cdot r_2$ is as follows: We are given by recursion two
-two automata representing $r_1$ and $r_2$ respectively.
+automata representing $r_1$ and $r_2$ respectively.
 \begin{center}
 \begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \node[state, initial]  (q_0)  {$\mbox{}$};
 \node [yshift=2mm] at (2.north) {$r_2$};
 \end{pgfonlayer}
 \end{tikzpicture}
 \end{center}
-\noindent
+\noindent The first automaton has some accepting states. We
-The first automaton has some accepting states. We obtain an automaton for $r_1\cdot r_2$ by connecting
+obtain an automaton for $r_1\cdot r_2$ by connecting these
-these accepting states with $\epsilon$-transitions to the starting state of the second automaton. By doing
+accepting states with $\epsilon$-transitions to the starting
-so we make them non-accepting like so:
+state of the second automaton. By doing so we make them
+non-accepting like so:
 \begin{center}
 \begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \node[state, initial]  (q_0)  {$\mbox{}$};
 \node [yshift=2mm] at (3.north) {$r_1\cdot r_2$};
 \end{pgfonlayer}
 \end{tikzpicture}
 \end{center}
-\noindent
+\noindent The case for the choice regular expression $r_1 +
-The case for the choice regular expression $r_1 + r_2$ is slightly different: We are given by recursion
+r_2$ is slightly different: We are given by recursion two
-two automata representing $r_1$ and $r_2$ respectively.
+automata representing $r_1$ and $r_2$ respectively.
 \begin{center}
 \begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \node at (0,0)  (1)  {$\mbox{}$};
 \node [yshift=3mm] at (2.north) {$r_2$};
 \end{pgfonlayer}
 \end{tikzpicture}
 \end{center}
-\noindent
+\noindent Each automaton has a single start state and
-Each automaton has a single start state and potentially several accepting states. We obtain a
+potentially several accepting states. We obtain a NFA for the
-NFA for the regular expression $r_1 + r_2$ by introducing a new starting state and connecting it
+regular expression $r_1 + r_2$ by introducing a new starting
-with an $\epsilon$-transition to the two starting states above, like so
+state and connecting it with an $\epsilon$-transition to the
+two starting states above, like so
 \begin{center}
 \hspace{2cm}\begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \node at (0,0) [state, initial]  (1)  {$\mbox{}$};
 \node [yshift=3mm] at (1.north) {$r$};
 \end{pgfonlayer}
 \end{tikzpicture}
 \end{center}
-\noindent
+\noindent and connect its accepting states to a new starting
-and connect its accepting states to a new starting state via $\epsilon$-transitions. This new
+state via $\epsilon$-transitions. This new starting state is
-starting state is also an accepting state, because $r^*$ can also recognise the empty string.
+also an accepting state, because $r^*$ can also recognise the
-This gives the following automaton for $r^*$:
+empty string. This gives the following automaton for $r^*$:
 \begin{center}
 \begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \node at (0,0) [state, initial,accepting]  (1)  {$\mbox{}$};
 \node [yshift=3mm] at (2.north) {$r^*$};
 \end{pgfonlayer}
 \end{tikzpicture}
 \end{center}
-\noindent
+\noindent This construction of a NFA from a regular expression
-This construction of a NFA from a regular expression was invented by Ken Thompson in 1968.
+was invented by Ken Thompson in 1968.
 \end{document}
 %%% Local Variables:
 %%% mode: latex

changeset 251	5b5a68df6d16
parent 217	cd6066f1056a
child 268	18bef085a7ca