--- a/handouts/ho03.tex Mon Sep 15 07:25:17 2014 +0100
+++ b/handouts/ho03.tex Mon Sep 15 09:36:02 2014 +0100
@@ -1,9 +1,7 @@
\documentclass{article}
-\usepackage{hyperref}
-\usepackage{amssymb}
-\usepackage{amsmath}
-\usepackage[T1]{fontenc}
-\usepackage{listings}
+\usepackage{../style}
+\usepackage{../langs}
+
\usepackage{xcolor}
\usepackage{tikz}
\usetikzlibrary{arrows}
@@ -14,19 +12,19 @@
\usetikzlibrary{calc}
\usetikzlibrary{fit}
\usetikzlibrary{backgrounds}
-\usepackage{../langs}
-
-\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
\begin{document}
\section*{Handout 3}
-Let us have a closer look at automata and their relation to regular expressions. This will help us to understand
-why the regular expression matchers in Python and Ruby are so slow with certain regular expressions.
+Let us have a closer look at automata and their relation to
+regular expressions. This will help us to understand why the
+regular expression matchers in Python and Ruby are so slow
+with certain regular expressions.
-A \emph{deterministic finite automaton} (DFA), say $A$, is defined by a four-tuple written $A(Q, q_0, F, \delta)$ where
+A \emph{deterministic finite automaton} (DFA), say $A$, is
+defined by a four-tuple written $A(Q, q_0, F, \delta)$ where
\begin{itemize}
\item $Q$ is a set of states,
@@ -35,10 +33,12 @@
\item $\delta$ is the transition function.
\end{itemize}
-\noindent
-The transition function determines how to ``transition'' from one state to the next state with respect to a character.
-We have the assumption that these functions do not need to be defined everywhere: so it can be the case that
-given a character there is no next state, in which case we need to raise a kind of ``raise an exception''. A typical
+\noindent The transition function determines how to
+``transition'' from one state to the next state with respect
+to a character. We have the assumption that these functions do
+not need to be defined everywhere: so it can be the case that
+given a character there is no next state, in which case we
+need to raise a kind of ``raise an exception''. A typical
example of a DFA is
\begin{center}
@@ -61,11 +61,11 @@
\end{tikzpicture}
\end{center}
-\noindent
-The accepting state $q_4$ is indicated with double circles. It is possible that a DFA has no
-accepting states at all, or that the starting state is also an accepting state.
-In the case above the transition function is defined everywhere and can be given as a table
-as follows:
+\noindent The accepting state $q_4$ is indicated with double
+circles. It is possible that a DFA has no accepting states at
+all, or that the starting state is also an accepting state. In
+the case above the transition function is defined everywhere
+and can be given as a table as follows:
\[
\begin{array}{lcl}
@@ -82,9 +82,9 @@
\end{array}
\]
-\noindent
-We need to define the notion of what language is accepted by an automaton. For this we
-lift the transition function $\delta$ from characters to strings as follows:
+\noindent We need to define the notion of what language is
+accepted by an automaton. For this we lift the transition
+function $\delta$ from characters to strings as follows:
\[
\begin{array}{lcl}
@@ -93,20 +93,24 @@
\end{array}
\]
-\noindent
-Given a string this means we start in the starting state and take the first character of the string,
-follow to the next sate, then take the second character and so on. Once the string is exhausted
-and we end up in an accepting state, then this string is accepted. Otherwise it is not accepted.
-So $s$ in the \emph{language accepted by the automaton} $A(Q, q_0, F, \delta)$ iff
+\noindent Given a string this means we start in the starting
+state and take the first character of the string, follow to
+the next sate, then take the second character and so on. Once
+the string is exhausted and we end up in an accepting state,
+then this string is accepted. Otherwise it is not accepted. So
+$s$ in the \emph{language accepted by the automaton} $A(Q,
+q_0, F, \delta)$ iff
\[
\hat{\delta}(q_0, s) \in F
\]
-While with DFA it will always clear that given a character what the next state is, it will be useful to relax
-this restriction. The resulting construction is called a \emph{non-deterministic finite automaton} (NFA) given
-as a four-tuple $A(Q, q_0, F, \rho)$ where
+While with DFA it will always clear that given a character
+what the next state is, it will be useful to relax this
+restriction. The resulting construction is called a
+\emph{non-deterministic finite automaton} (NFA) given as a
+four-tuple $A(Q, q_0, F, \rho)$ where
\begin{itemize}
\item $Q$ is a finite set of states
@@ -145,21 +149,26 @@
\end{tabular}
\end{center}
-\noindent
-There are a number of points you should note. Every DFA is a NFA, but not vice versa.
-The $\rho$ in NFAs is a transition \emph{relation}
-(DFAs have a transition function). The difference between a function and a relation is that a function
-has always a single output, while a relation gives, roughly speaking, several outputs. Look
-at the NFA on the right-hand side above: if you are currently in the state $r_2$ and you read a
-character $a$, then you can transition to $r_1$ \emph{or} $r_3$. Which route you take is not
-determined. This means if we need to decide whether a string is accepted by a NFA, we might have
-to explore all possibilities. Also there is a special transition in NFAs which is called \emph{epsilon-transition}
-or \emph{silent transition}. This transition means you do not have to ``consume'' no part of
-the input string, but ``silently'' change to a different state.
+\noindent There are a number of points you should note. Every
+DFA is a NFA, but not vice versa. The $\rho$ in NFAs is a
+transition \emph{relation} (DFAs have a transition function).
+The difference between a function and a relation is that a
+function has always a single output, while a relation gives,
+roughly speaking, several outputs. Look at the NFA on the
+right-hand side above: if you are currently in the state $r_2$
+and you read a character $a$, then you can transition to $r_1$
+\emph{or} $r_3$. Which route you take is not determined. This
+means if we need to decide whether a string is accepted by a
+NFA, we might have to explore all possibilities. Also there is
+a special transition in NFAs which is called
+\emph{epsilon-transition} or \emph{silent transition}. This
+transition means you do not have to ``consume'' no part of the
+input string, but ``silently'' change to a different state.
-The reason for introducing NFAs is that there is a relatively simple (recursive) translation of regular expressions into
-NFAs. Consider the simple regular expressions $\varnothing$, $\epsilon$ and $c$. They can be translated
-as follows:
+The reason for introducing NFAs is that there is a relatively
+simple (recursive) translation of regular expressions into
+NFAs. Consider the simple regular expressions $\varnothing$,
+$\epsilon$ and $c$. They can be translated as follows:
\begin{center}
\begin{tabular}[t]{l@{\hspace{10mm}}l}
@@ -180,9 +189,9 @@
\end{tabular}
\end{center}
-\noindent
-The case for the sequence regular expression $r_1 \cdot r_2$ is as follows: We are given by recursion
-two automata representing $r_1$ and $r_2$ respectively.
+\noindent The case for the sequence regular expression $r_1
+\cdot r_2$ is as follows: We are given by recursion two
+automata representing $r_1$ and $r_2$ respectively.
\begin{center}
\begin{tikzpicture}[node distance=3mm,
@@ -206,10 +215,11 @@
\end{tikzpicture}
\end{center}
-\noindent
-The first automaton has some accepting states. We obtain an automaton for $r_1\cdot r_2$ by connecting
-these accepting states with $\epsilon$-transitions to the starting state of the second automaton. By doing
-so we make them non-accepting like so:
+\noindent The first automaton has some accepting states. We
+obtain an automaton for $r_1\cdot r_2$ by connecting these
+accepting states with $\epsilon$-transitions to the starting
+state of the second automaton. By doing so we make them
+non-accepting like so:
\begin{center}
\begin{tikzpicture}[node distance=3mm,
@@ -235,9 +245,9 @@
\end{tikzpicture}
\end{center}
-\noindent
-The case for the choice regular expression $r_1 + r_2$ is slightly different: We are given by recursion
-two automata representing $r_1$ and $r_2$ respectively.
+\noindent The case for the choice regular expression $r_1 +
+r_2$ is slightly different: We are given by recursion two
+automata representing $r_1$ and $r_2$ respectively.
\begin{center}
\begin{tikzpicture}[node distance=3mm,
@@ -264,10 +274,11 @@
\end{tikzpicture}
\end{center}
-\noindent
-Each automaton has a single start state and potentially several accepting states. We obtain a
-NFA for the regular expression $r_1 + r_2$ by introducing a new starting state and connecting it
-with an $\epsilon$-transition to the two starting states above, like so
+\noindent Each automaton has a single start state and
+potentially several accepting states. We obtain a NFA for the
+regular expression $r_1 + r_2$ by introducing a new starting
+state and connecting it with an $\epsilon$-transition to the
+two starting states above, like so
\begin{center}
\hspace{2cm}\begin{tikzpicture}[node distance=3mm,
@@ -315,10 +326,10 @@
\end{tikzpicture}
\end{center}
-\noindent
-and connect its accepting states to a new starting state via $\epsilon$-transitions. This new
-starting state is also an accepting state, because $r^*$ can also recognise the empty string.
-This gives the following automaton for $r^*$:
+\noindent and connect its accepting states to a new starting
+state via $\epsilon$-transitions. This new starting state is
+also an accepting state, because $r^*$ can also recognise the
+empty string. This gives the following automaton for $r^*$:
\begin{center}
\begin{tikzpicture}[node distance=3mm,
@@ -340,8 +351,8 @@
\end{tikzpicture}
\end{center}
-\noindent
-This construction of a NFA from a regular expression was invented by Ken Thompson in 1968.
+\noindent This construction of a NFA from a regular expression
+was invented by Ken Thompson in 1968.
\end{document}