afl-material: comparison handouts/ho02.tex

equal deleted inserted replaced

-:70c307641d05
+:1e4da6d2490c
 \begin{document}
 \section*{Handout 2}
-Having specified what problem our matching algorithm,
+Having specified what problem our matching algorithm, \pcode{match},
-\pcode{match}, is supposed to solve, namely for a given
+is supposed to solve, namely for a given regular expression $r$ and
-regular expression $r$ and string $s$ answer \textit{true} if
+string $s$ answer \textit{true} if and only if
-and only if
 \[
 s \in L(r)
 \]
 \noindent we can look at an algorithm to solve this problem.
 Clearly we cannot use the function $L$ directly for this,
 because in general the set of strings $L$ returns is infinite
 (recall what $L(a^*)$ is). In such cases there is no way we
 can implement an exhaustive test for whether a string is
-member of this set or not.
+member of this set or not. Before we come to the matching
+algorithm, lets have a closer look at what it means when
-The algorithm we will define below consists of two parts. One is the function $nullable$ which takes a
+two regular expressions are equivalent.
-regular expression as argument and decides whether it can match the empty string (this means it returns a
+\subsection*{Regular Expression Equivalences}
+\subsection*{Matching Algorithm}
+The algorithm we will define below consists of two parts. One is the
+function $nullable$ which takes a regular expression as argument and
+decides whether it can match the empty string (this means it returns a
 boolean). This can be easily defined recursively as follows:
 \begin{center}
 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
 $nullable(\varnothing)$      & $\dn$ & $f\!\/alse$\\
 \[
 nullable(r) \;\;\text{if and only if}\;\; ""\in L(r)
 \]
 \noindent
-Note on the left-hand side we have a function we can implement; on the right we have its specification.
+Note on the left-hand side we have a function we can implement; on the
+right we have its specification.
-The other function of our matching algorithm calculates a \emph{derivative} of a regular expression. This is a function
-which will take a regular expression, say $r$, and a character, say $c$, as argument and return
+The other function of our matching algorithm calculates a
-a new regular expression. Be careful that the intuition behind this function is not so easy to grasp on first
+\emph{derivative} of a regular expression. This is a function which
-reading. Essentially this function solves the following problem: if $r$ can match a string of the form
+will take a regular expression, say $r$, and a character, say $c$, as
-$c\!::\!s$, what does the regular expression look like that can match just $s$. The definition of this
+argument and return a new regular expression. Be careful that the
+intuition behind this function is not so easy to grasp on first
+reading. Essentially this function solves the following problem: if
+$r$ can match a string of the form $c\!::\!s$, what does the regular
+expression look like that can match just $s$. The definition of this
 function is as follows:
 \begin{center}
 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
 $der\, c\, (\varnothing)$      & $\dn$ & $\varnothing$ & \\
 $der\, c\, (r^*)$          & $\dn$ & $(der\,c\,r) \cdot (r^*)$ &
 \end{tabular}
 \end{center}
 \noindent
-The first two clauses can be rationalised as follows: recall that $der$ should calculate a regular
+The first two clauses can be rationalised as follows: recall that
-expression, if the ``input'' regular expression can match a string of the form $c\!::\!s$. Since neither
+$der$ should calculate a regular expression, if the ``input'' regular
-$\varnothing$ nor $\epsilon$ can match such a string we return $\varnothing$. In the third case
+expression can match a string of the form $c\!::\!s$. Since neither
-we have to make a case-distinction: In case the regular expression is $c$, then clearly it can recognise
+$\varnothing$ nor $\epsilon$ can match such a string we return
-a string of the form $c\!::\!s$, just that $s$ is the empty string. Therefore we return the $\epsilon$-regular
+$\varnothing$. In the third case we have to make a case-distinction:
-expression. In the other case we again return $\varnothing$ since no string of the $c\!::\!s$ can be matched.
+In case the regular expression is $c$, then clearly it can recognise a
-The $+$-case is relatively straightforward: all strings of the form $c\!::\!s$ are either matched by the
+string of the form $c\!::\!s$, just that $s$ is the empty
-regular expression $r_1$ or $r_2$. So we just have to recursively call $der$ with these two regular
+string. Therefore we return the $\epsilon$-regular expression. In the
-expressions and compose the results again with $+$. The $\cdot$-case is more complicated:
+other case we again return $\varnothing$ since no string of the
-if $r_1\cdot r_2$ matches a string of the form $c\!::\!s$, then the first part must be matched by $r_1$.
+$c\!::\!s$ can be matched.  The $+$-case is relatively
-Consequently, it makes sense to construct the regular expression for $s$ by calling $der$ with $r_1$ and
+straightforward: all strings of the form $c\!::\!s$ are either matched
-``appending'' $r_2$. There is however one exception to this simple rule: if $r_1$ can match the empty
+by the regular expression $r_1$ or $r_2$. So we just have to
-string, then all of $c\!::\!s$ is matched by $r_2$. So in case $r_1$ is nullable (that is can match the
+recursively call $der$ with these two regular expressions and compose
-empty string) we have to allow the choice $der\,c\,r_2$ for calculating the regular expression that can match
+the results again with $+$. The $\cdot$-case is more complicated: if
-$s$. The $*$-case is again simple: if $r^*$ matches a string of the form $c\!::\!s$, then the first part must be
+$r_1\cdot r_2$ matches a string of the form $c\!::\!s$, then the first
-``matched'' by a single copy of $r$. Therefore we call recursively $der\,c\,r$ and ``append'' $r^*$ in order to
+part must be matched by $r_1$.  Consequently, it makes sense to
-match the rest of $s$.
+construct the regular expression for $s$ by calling $der$ with $r_1$
+and ``appending'' $r_2$. There is however one exception to this simple
-Another way to rationalise the definition of $der$ is to consider the following operation on sets:
+rule: if $r_1$ can match the empty string, then all of $c\!::\!s$ is
+matched by $r_2$. So in case $r_1$ is nullable (that is can match the
+empty string) we have to allow the choice $der\,c\,r_2$ for
+calculating the regular expression that can match $s$. The $*$-case is
+again simple: if $r^*$ matches a string of the form $c\!::\!s$, then
+the first part must be ``matched'' by a single copy of $r$. Therefore
+we call recursively $der\,c\,r$ and ``append'' $r^*$ in order to match
+the rest of $s$.
+Another way to rationalise the definition of $der$ is to consider the
+following operation on sets:
 \[
 Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
 \]
 \noindent
-which essentially transforms a set of strings $A$ by filtering out all strings that do not start with $c$ and then
+which essentially transforms a set of strings $A$ by filtering out all
-strips off the $c$ from all the remaining strings. For example suppose $A = \{"f\!oo", "bar", "f\!rak"\}$ then
+strings that do not start with $c$ and then strips off the $c$ from
+all the remaining strings. For example suppose $A = \{"f\!oo", "bar",
+"f\!rak"\}$ then
 \[
 Der\,f\,A = \{"oo", "rak"\}\quad,\quad
 Der\,b\,A = \{"ar"\}  \quad \text{and} \quad
 Der\,a\,A = \varnothing
 \]
 \noindent
-Note that in the last case $Der$ is empty, because no string in $A$ starts with $a$. With this operation we can
+Note that in the last case $Der$ is empty, because no string in $A$
-state the following property about $der$:
+starts with $a$. With this operation we can state the following
+property about $der$:
 \[
 L(der\,c\,r) = Der\,c\,(L(r))
 \]
 \noindent
-This property clarifies what regular expression $der$ calculates, namely take the set of strings
+This property clarifies what regular expression $der$ calculates,
-that $r$ can match (that is $L(r)$), filter out all strings not starting with $c$ and strip off the $c$ from the
+namely take the set of strings that $r$ can match (that is $L(r)$),
-remaining strings---this is exactly the language that $der\,c\,r$ can match.
+filter out all strings not starting with $c$ and strip off the $c$
+from the remaining strings---this is exactly the language that
-If we want to find out whether the string $"abc"$ is matched by the regular expression $r$
+$der\,c\,r$ can match.
-then we can iteratively apply $Der$ as follows
+If we want to find out whether the string $"abc"$ is matched by the
+regular expression $r$ then we can iteratively apply $Der$ as follows
 \begin{enumerate}
 \item $Der\,a\,(L(r))$
 \item $Der\,b\,(Der\,a\,(L(r)))$
 \item $Der\,c\,(Der\,b\,(Der\,a\,(L(r))))$
 \end{enumerate}
 \noindent
-In the last step we need to test whether the empty string is in the set. Our matching algorithm will work similarly,
+In the last step we need to test whether the empty string is in the
-just using regular expression instead of sets. For this we need to lift the notion of derivatives from characters to strings. This can be
+set. Our matching algorithm will work similarly, just using regular
-done using the following function, taking a string and regular expression as input and a regular expression
+expression instead of sets. For this we need to lift the notion of
-as output.
+derivatives from characters to strings. This can be done using the
+following function, taking a string and regular expression as input
+and a regular expression as output.
 \begin{center}
 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
 $der\!s\, []\, r$     & $\dn$ & $r$ & \\
 $der\!s\, (c\!::\!s)\, r$ & $\dn$ & $der\!s\,s\,(der\,c\,r)$ & \\
 \[
 match\,s\,r\quad\text{if and only if}\quad s\in L(r)
 \]
 \noindent
-holds, which means our algorithm satisfies the specification. This algorithm was introduced by
+holds, which means our algorithm satisfies the specification. This
-Janus Brzozowski in 1964. Its main attractions are simplicity and being fast, as well as
+algorithm was introduced by Janus Brzozowski in 1964. Its main
-being easily extendable for other regular expressions such as $r^{\{n\}}$, $r^?$, $\sim{}r$ and so on.
+attractions are simplicity and being fast, as well as being easily
+extendable for other regular expressions such as $r^{\{n\}}$, $r^?$,
+$\sim{}r$ and so on.
+\subsection*{The Matching Algorithm in Scala}
 \begin{figure}[p]
 {\lstset{language=Scala}\texttt{\lstinputlisting{../progs/app5.scala}}}
 {\lstset{language=Scala}\texttt{\lstinputlisting{../progs/app6.scala}}}
 \caption{Scala implementation of the nullable and derivatives functions.}

changeset 258	1e4da6d2490c
parent 251	5b5a68df6d16
child 259	e5f4b8ff23b8