afl-material: comparison handouts/ho03.tex

equal deleted inserted replaced

-:d0fc671bcbbf
+:7d9d86dc7aa0
 one state to the next state with respect to a character. We have the
 assumption that these transition functions do not need to be defined
 everywhere: so it can be the case that given a character there is no
 next state, in which case we need to raise a kind of ``failure
 exception''.  That means actually we have \emph{partial} functions as
-transitions---see the Scala implementation of DFAs later on.  A
+transitions---see the Scala implementation for DFAs later on.  A
 typical example of a DFA is
 \begin{center}
 \begin{tikzpicture}[>=stealth',very thick,auto,
 every state/.style={minimum size=0pt,
 (Q_4, a) &\rightarrow& Q_4\\
 (Q_4, b) &\rightarrow& Q_4\\
 \end{array}
 \]
+\noindent
+Please check that this table represents the same transition function
+as the graph above.
 We need to define the notion of what language is accepted by
 an automaton. For this we lift the transition function
 $\delta$ from characters to strings as follows:
 \[
 \end{tikzpicture}
 \end{center}
 \noindent
 This NFA happens to have only one starting state, but in general there
-could be more.  Notice that in state $Q_0$ we might go to state $Q_1$
+could be more than one.  Notice that in state $Q_0$ we might go to
-\emph{or} to state $Q_2$ when receiving an $a$. Similarly in state
+state $Q_1$ \emph{or} to state $Q_2$ when receiving an $a$. Similarly
-$Q_1$ and receiving an $a$, we can stay in state $Q_1$ \emph{or} go to
+in state $Q_1$ and receiving an $a$, we can stay in state $Q_1$
-$Q_2$. This kind of choice is not allowed with DFAs. The downside of
+\emph{or} go to $Q_2$. This kind of choice is not allowed with
-this choice in NFAs is that when it comes to deciding whether a string is
+DFAs. The downside of this choice in NFAs is that when it comes to
-accepted by a NFA we potentially have to explore all possibilities. I
+deciding whether a string is accepted by a NFA we potentially have to
-let you think which strings the above NFA accepts.
+explore all possibilities. I let you think which strings the above NFA
+accepts.
 There are a number of additional points you should note about
 NFAs. Every DFA is a NFA, but not vice versa. The $\rho$ in NFAs is a
 transition \emph{relation} (DFAs have a transition function). The
 \end{equation}
 \noindent
 I let you think whether the NFAs can match exactly those strings the
 regular expressions can match. To do this translation in code we need
-a way to construct states programatically...and as an additional
+a way to construct states ``programatically''...and as an additional
-constrain Scala needs to recognise that these states are being distinct.
+constraint Scala needs to recognise that these states are being distinct.
 For this I implemented in Figure~\ref{thompson1} a class
 \texttt{TState} that includes a counter and a companion object that
 increases this counter whenever a new state is created.\footnote{You might
 have to read up what \emph{companion objects} do in Scala.}
 implement a way of how to create new states that are all
 distinct by virtue of a counter. This counter is
 increased in the companion object of \texttt{TState}
 whenever a new state is created. The code in Lines 24--40
 constructs NFAs for the simple regular expressions $\ZERO$, $\ONE$ and $c$.
-Compare the pictures given in \eqref{simplecases}.
+Compare this code with the pictures given in \eqref{simplecases} on
+Page~\pageref{simplecases}.
 \label{thompson1}}
 \end{figure}
 \begin{figure}[p]
 \small
 \caption{The second part of the Thompson Construction implementing
 the composition of NFAs according to $\cdot$, $+$ and ${}^*$.
 The implicit class about rich partial functions
 implements the infix operation \texttt{+++} which
 combines an $\epsilon$NFA transition with a NFA transition
-(both given as partial functions).\label{thompson2}}
+(both are given as partial functions---but with different type!).\label{thompson2}}
 \end{figure}
 The case for the sequence regular expression $r_1 \cdot r_2$ is a bit more
 complicated: Say, we are given by recursion two NFAs representing the regular
 expressions $r_1$ and $r_2$ respectively.
 The case for the alternative regular expression $r_1 + r_2$ is
 slightly different: We are given by recursion two NFAs representing
 $r_1$ and $r_2$ respectively. Each NFA has some starting states and
 some accepting states. We obtain a NFA for the regular expression $r_1
 + r_2$ by composing the transition functions (this crucially depends
-on knowing that the states of each component NFA are distinct); and
+on knowing that the states of each component NFA are distinct---recall
-also combine the starting states and accepting functions.
+we implemented for this to hold some bespoke code for states). We also
+need to combine the starting states and accepting functions
+appropriately.
 \begin{center}
 \begin{tabular}[t]{ccc}
 \begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick,
 accepting states to a new starting state via
 $\epsilon$-transitions. This new starting state is also an accepting
 state, because $r^*$ can recognise the empty string.
 \begin{center}
-\begin{tabular}[b]{@{\hspace{-4mm}}ccc@{}}
+\begin{tabular}[b]{@{}ccc@{}}
 \begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick,
 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},
 baseline=(current bounding box.north)]
-\node at (0,0)  (1)  {$\mbox{}$};
+\node (2)  {$\mbox{}$};
-\node[state, initial]  (2)  [right=16mm of 1] {$\mbox{}$};
+\node[state, initial]  (4)  [above=1mm of 2] {$\mbox{}$};
+\node[state, initial]  (5)  [below=1mm of 2] {$\mbox{}$};
 \node (a)  [right=of 2] {$\ldots$};
 \node[state, accepting]  (a1)  [right=of a] {$\mbox{}$};
 \node[state, accepting]  (a2)  [above=of a1] {$\mbox{}$};
 \node[state, accepting]  (a3)  [below=of a1] {$\mbox{}$};
 \begin{pgfonlayer}{background}
 \begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick,
 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},
 baseline=(current bounding box.north)]
 \node at (0,0) [state, initial,accepting]  (1)  {$\mbox{}$};
-\node[state]  (2)  [right=16mm of 1] {$\mbox{}$};
+\node (2)  [right=16mm of 1] {$\mbox{}$};
+\node[state]  (4)  [above=1mm of 2] {$\mbox{}$};
+\node[state]  (5)  [below=1mm of 2] {$\mbox{}$};
 \node (a)  [right=of 2] {$\ldots$};
 \node[state]  (a1)  [right=of a] {$\mbox{}$};
 \node[state]  (a2)  [above=of a1] {$\mbox{}$};
 \node[state]  (a3)  [below=of a1] {$\mbox{}$};
-\path[->] (1) edge node [above]  {$\epsilon$} (2);
+\path[->] (1) edge node [below]  {$\epsilon$} (4);
-\path[->] (a1) edge [bend left=45] node [above]  {$\epsilon$} (1);
+\path[->] (1) edge node [below]  {$\epsilon$} (5);
+\path[->] (a1) edge [bend left=45] node [below]  {$\epsilon$} (1);
 \path[->] (a2) edge [bend right] node [below]  {$\epsilon$} (1);
 \path[->] (a3) edge [bend left=45] node [below]  {$\epsilon$} (1);
 \begin{pgfonlayer}{background}
 \node (2) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (1) (a2) (a3)] {};
 \node [yshift=3mm] at (2.north) {$r^*$};
 \texttt{nexts} from the NFA class already calculates this for us. The
 accepting-states function for the DFA is true henevner at least one
 state in the subset is accepting (that is true) in the NFA.\medskip
 \noindent
-You might be able to spend some quality tinkering with this code and
+You might be able to spend some quality time tinkering with this code
-time to ponder about it. Then you will probably notice it is actually
+and time to ponder about it. Then you will probably notice that it is
-a bit silly. The whole point of translating the NFA into a DFA via the
+actually a bit silly. The whole point of translating the NFA into a
-subset construction is to make the decision of whether a string is
+DFA via the subset construction is to make the decision of whether a
-accepted or not faster. Given the code above, the generated DFA will
+string is accepted or not faster. Given the code above, the generated
-be exactly as fast, or as slow, as the NFA we started with (actually
+DFA will be exactly as fast, or as slow, as the NFA we started with
-it will even be a tiny bit slower). The reason is that we just re-use
+(actually it will even be a tiny bit slower). The reason is that we
-the \texttt{nexts} function from the NFA. This function implements the
+just re-use the \texttt{nexts} function from the NFA. This function
-non-deterministic breadth-first search.  You might be thinking: This
+implements the non-deterministic breadth-first search.  You might be
-is cheating! \ldots{} Well, not quite as you will see later, but in
+thinking: This is cheating! \ldots{} Well, not quite as you will see
-terms of speed we still need to work a bit in order to get
+later, but in terms of speed we still need to work a bit in order to
-sometimes(!) a faster DFA. Let's do this next.
+get sometimes(!) a faster DFA. Let's do this next.
 \subsection*{DFA Minimisation}
 As seen in \eqref{subsetdfa}, the subset construction from NFA to a
 DFA can result in a rather ``inefficient'' DFA. Meaning there are
 states that are not needed. There are two kinds of such unneeded
-states: \emph{unreachable} states and \emph{nondistinguishable}
+states: \emph{unreachable} states and \emph{non-distinguishable}
 states. The first kind of states can just be removed without affecting
 the language that can be recognised (after all they are
 unreachable). The second kind can also be recognised and thus a DFA
 can be \emph{minimised} by the following algorithm:
 the functions would be thrown away. So the compromise is to not being
 able to minimise (easily) our DFAs.
 \subsection*{Brzozowski's Method}
-I know tyhis is already a long, long rant: but after all it is a topic
+I know this handout is already a long, long rant: but after all it is
-that has been researched for more than 60 years. If you reflect on
+a topic that has been researched for more than 60 years. If you
-what you have read so far, the story is that you can take a regular
+reflect on what you have read so far, the story is that you can take a
-expression, translate it via the Thompson Construction into an
+regular expression, translate it via the Thompson Construction into an
 $\epsilon$NFA, then translate it into a NFA by removing all
 $\epsilon$-transitions, and then via the subset construction obtain a
 DFA. In all steps we made sure the language, or which strings can be
-recognised, stays the same. After the last section, we can even
+recognised, stays the same. Of couse we should have proved this in
-minimise the DFA (maybe not in code). But again we made sure the same
+each step, but let us cut corners here.  After the last section, we
-language is recognised. You might be wondering: Can we go into the
+can even minimise the DFA (maybe not in code). But again we made sure
-other direction?  Can we go from a DFA and obtain a regular expression
+the same language is recognised. You might be wondering: Can we go
-that can recognise the same language as the DFA?\medskip
+into the other direction?  Can we go from a DFA and obtain a regular
+expression that can recognise the same language as the DFA?\medskip
 \noindent
 The answer is yes. Again there are several methods for calculating a
 regular expression for a DFA. I will show you Brzozowski's method
 because it calculates a regular expression using quite familiar
 module.
 \subsection*{Where Have Derivatives Gone?}
-By now you are probably fed up by this text. It is now way too long
+By now you are probably fed up with this text. It is now way too long
 for one lecture, but there is still one aspect of the
-automata-connection I like to highlight for you. Perhaps by now you
+automata-regular-expression-connection I like to describe. Perhaps by
-are asking yourself: Where have the derivatives gone? Did we just
+now you are asking yourself: Where have the derivatives gone? Did we
-forget them?  Well, they have a place in the picture of calculating a
+just forget them?  Well, they have a place in the picture of
-DFA from the regular expression.
+calculating a DFA from the regular expression.
 To be done
+\begin{center}
+\begin{tikzpicture}
+[level distance=25mm,very thick,auto,
+level 1/.style={sibling distance=30mm},
+level 2/.style={sibling distance=15mm},
+every node/.style={minimum size=30pt,
+inner sep=0pt,circle,draw=blue!50,very thick,
+fill=blue!20}]
+\node {$r$} [grow=right]
+child[->] {node (cn) {$d_{c_n}$}
+child { node {$dd_{c_nc_n}$}}
+child { node {$dd_{c_nc_1}$}}
+%edge from parent node[left] {$c_n$}
+}
+child[->] {node (c1) {$d_{c_1}$}
+child { node {$dd_{c_1c_n}$}}
+child { node {$dd_{c_1c_1}$}}
+%edge from parent node[left] {$c_1$}
+};
+%%\draw (cn) -- (c1) node {\vdots};
+\end{tikzpicture}
+\end{center}
 %\section*{Further Reading}
 %Compare what a ``human expert'' would create as an automaton for the
 %regular expression $a\cdot (b + c)^*$ and what the Thomson

changeset 495	7d9d86dc7aa0
parent 492	39b7ff2cf1bc
child 497	c498cb53a9a8