--- a/cws/cw03.tex Tue Nov 21 16:31:11 2017 +0000
+++ b/cws/cw03.tex Thu Nov 23 10:56:47 2017 +0000
@@ -1,18 +1,60 @@
\documentclass{article}
\usepackage{../style}
\usepackage{../langs}
+\usepackage{tikz}
+\usepackage{pgf}
+\usepackage{pgfplots}
+
+\begin{filecontents}{re-python2.data}
+1 0.033
+5 0.036
+10 0.034
+15 0.036
+18 0.059
+19 0.084
+20 0.141
+21 0.248
+22 0.485
+23 0.878
+24 1.71
+25 3.40
+26 7.08
+27 14.12
+28 26.69
+\end{filecontents}
+
+\begin{filecontents}{re-java.data}
+5 0.00298
+10 0.00418
+15 0.00996
+16 0.01710
+17 0.03492
+18 0.03303
+19 0.05084
+20 0.10177
+21 0.19960
+22 0.41159
+23 0.82234
+24 1.70251
+25 3.36112
+26 6.63998
+27 13.35120
+28 29.81185
+\end{filecontents}
+
\begin{document}
-\section*{Coursework 8 (Scala, Regular Expressions)}
+\section*{Coursework 8 (Scala, Regular Expressions, Brainf***)}
This coursework is worth 10\%. It is about regular expressions,
-pattern matching and polymorphism. The first part is due on 30
+pattern matching and an interpreter. The first part is due on 30
November at 11pm; the second, more advanced part, is due on 21
-December at 11pm. You are asked to implement a regular expression
-matcher based on derivatives of regular expressions. The reason is
-that regular expression matching in Java can be extremely slow
-sometimes.\bigskip
+December at 11pm. In the first part, you are asked to implement a
+regular expression matcher based on derivatives of regular
+expressions. The reason is that regular expression matching in Java
+can sometimes be extremely slow. The advanced part is about an
+interpreter for a very simple programming language.\bigskip
\noindent
\textbf{Important:}
@@ -54,9 +96,11 @@
\subsection*{Part 1 (6 Marks)}
The task is to implement a regular expression matcher that is based on
-derivatives of regular expressions. The implementation can deal
-with the following regular expressions, which have been predefined
-in the file re.scala:
+derivatives of regular expressions. Most of the functions are defined by
+recursion over regular expressions and can be elegantly implemented
+using Scala's pattern-matching. The implementation should deal with the
+following regular expressions, which have been predefined in the file
+\texttt{re.scala}:
\begin{center}
\begin{tabular}{lcll}
@@ -91,7 +135,7 @@
\begin{itemize}
\item[(1a)] Implement a function, called \textit{nullable}, by
recursion over regular expressions. This function tests whether a
- regular expression can match the empty string, that is given a
+ regular expression can match the empty string. This means given a
regular expression it either returns true or false.
\begin{center}
@@ -163,9 +207,9 @@
\item[(1c)] Implement the function \textit{simp}, which recursively
traverses a regular expression from the inside to the outside, and
- simplifies every sub-regular-expression on the left (see below) to
- the regular expression on the right, except it does not simplify inside
- ${}^*$-regular expressions.
+ on the way simplifies every regular expression on the left (see
+ below) to the regular expression on the right, except it does not
+ simplify inside ${}^*$-regular expressions.
\begin{center}
\begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
@@ -182,17 +226,18 @@
For example the regular expression
\[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
- simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be
+ simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be
seen as trees and there are several methods for traversing
- trees. One of them corresponds to the inside-out traversal. Also
- remember numerical expressions from school: there you had expressions
+ trees. One of them corresponds to the inside-out traversal, which is
+ sometimes also called post-order traversal. Furthermore,
+ remember numerical expressions from school times: there you had expressions
like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
and simplification rules that looked very similar to rules
above. You would simplify such numerical expressions by replacing
for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
look whether more rules are applicable. If you organise the
simplification in an inside-out fashion, it is always clear which
- rule should applied next.\\\mbox{}\hfill[1 Mark]
+ rule should be applied next.\hfill[2 Marks]
\item[(1d)] Implement two functions: The first, called \textit{ders},
takes a list of characters and a regular expression as arguments, and
@@ -214,81 +259,10 @@
according to \textit{ders} and after that tests whether the resulting
derivative regular expression can match the empty string (using
\textit{nullable}). For example the \textit{matcher} will produce
-true given the regular expression $(a\cdot b)\cdot c$ and the string
-$abc$.\\ \mbox{}\hfill[1 Mark]
-
-\item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches
- (from the left to
-right) in the string $s_1$ all the non-empty substrings that match the
-regular expression $r$---these substrings are assumed to be
-the longest substrings matched by the regular expression and
-assumed to be non-overlapping. All these substrings in $s_1$ matched by $r$
-are replaced by $s_2$. For example given the regular expression
-
-\[(a \cdot a)^* + (b \cdot b)\]
-
-\noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and
-replacement the string $s_2 = c$ yields the string
-
-\[
-ccbcabcaccc
-\]
-
-\hfill[2 Marks]
-\end{itemize}
-
-
-
-
-\subsection*{Part 2 (4 Marks)}
-
-You need to copy all the code from \texttt{re.scala} into
-\texttt{re2.scala} in order to complete Part 2. Parts (2a) and (2b)
-give you another method and datapoints for testing the \textit{der}
-and \textit{simp} functions from Part~1.
-
-\subsubsection*{Tasks (file re2.scala)}
+true for the regular expression $(a\cdot b)\cdot c$ and the string
+$abc$, but false if you give it the string $ab$. \hfill[1 Mark]
-\begin{itemize}
-\item[(2a)] Write a \textbf{polymorphic} function, called
- \textit{iterT}, that is \textbf{tail-recursive}(!) and takes an
- integer $n$, a function $f$ and an $x$ as arguments. This function
- should iterate $f$ $n$-times starting with the argument $x$, like
-
- \[\underbrace{f(\ldots (f}_{n\text{-times}}(x)))
- \]
-
- More formally that means \textit{iterT} behaves as follows:
-
- \begin{center}
- \begin{tabular}{lcl}
- $\textit{iterT}(n, f, x)$ & $\dn$ &
- $\begin{cases}
- \;x & \textit{if}\;n = 0\\
- \;f(\textit{iterT}(n - 1, f, x)) & \textit{otherwise}
- \end{cases}$
- \end{tabular}
-\end{center}
-
- Make sure you write a \textbf{tail-recursive} version of
- \textit{iterT}. If you add the annotation \texttt{@tailrec} (see
- below) your code should not produce an error message.
-
- \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]
- import scala.annotation.tailrec
-
- @tailrec
- def iterT[A](n: Int, f: A => A, x: A): A = ...
- \end{lstlisting}
-
- You can assume that \textit{iterT} will only be called for positive
- integers $0 \le n$. Given the type variable \texttt{A}, the type of
- $f$ is \texttt{A => A} and the type of $x$ is \texttt{A}. This means
- \textit{iterT} can be used, for example, for functions from integers
- to integers, or strings to strings, or regular expressions to
- regular expressions. \\ \mbox{}\hfill[2 Marks]
-
-\item[(2b)] Implement a function, called \textit{size}, by recursion
+\item[(1e)] Implement a function, called \textit{size}, by recursion
over regular expressions. If a regular expression is seen as a tree,
then \textit{size} should return the number of nodes in such a
tree. Therefore this function is defined as follows:
@@ -304,49 +278,12 @@
\end{tabular}
\end{center}
-You can use \textit{size} and \textit{iterT} in order to test how much
-the 'evil' regular expression $(a^*)^* \cdot b$ grows when taking
-successive derivatives according the letter $a$ and then compare it to
-taking the derivative, but simlifying the derivative after each step.
-For example, the calls
-
- \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]
- size(iterT(20, (r: Rexp) => der('a', r), EVIL))
- size(iterT(20, (r: Rexp) => simp(der('a', r)), EVIL))
- \end{lstlisting}
-
- produce without simplification a regular expression of size of
- 7340068 after 20 iterations, while the one with
- simplification gives
- just 8.\\ \mbox{}\hfill[1 Mark]
-
-
-\item[(2c)] Write a \textbf{polymorphic} function, called
- \textit{fixpT}, that takes
- a function $f$ and an $x$ as arguments. The purpose
- of \textit{fixpT} is to calculate a fixpoint of the function $f$
- starting from the argument $x$.
- A fixpoint, say $y$, is when $f(y) = y$ holds.
- That means \textit{fixpT} behaves as follows:
-
- \begin{center}
- \begin{tabular}{lcl}
- $\textit{fixpT}(f, x)$ & $\dn$ &
- $\begin{cases}
- \;x & \textit{if}\;f(x) = x\\
- \;\textit{fixpT}(f, f(x)) & \textit{otherwise}
- \end{cases}$
- \end{tabular}
-\end{center}
-
- Make sure you calculate in the code of $\textit{fixpT}$ the result
- of $f(x)$ only once. Given the type variable \texttt{A} in
- $\textit{fixpT}$, the type of $f$ is \texttt{A => A} and the type of
- $x$ is \texttt{A}. The file \texttt{re2.scala} gives two example
- function where in one the fixpoint is 1 and in the other
- it is the string $a$.\\ \mbox{}\hfill[1 Mark]
-\end{itemize}\bigskip
-
+You can use \textit{size} in order to test how much the `evil' regular
+expression $(a^*)^* \cdot b$ grows when taking successive derivatives
+according the letter $a$ without simplification and then compare it to
+taking the derivative, but simplify the result. The sizes
+are given in \texttt{re.scala}. \hfill[1 Mark]
+\end{itemize}
\subsection*{Background}
@@ -354,36 +291,101 @@
function might not so easy to be seen. To understand its purpose
better, assume a regular expression $r$ can match strings of the form
$c\!::\!cs$ (that means strings which start with a character $c$ and have
-some rest, or tail, $cs$). If you now take the derivative of $r$ with
-respect to the character $c$, then you obtain a regular expressions
+some rest, or tail, $cs$). If you take the derivative of $r$ with
+respect to the character $c$, then you obtain a regular expression
that can match all the strings $cs$. In other words, the regular
-expression $\textit{der}\;c\;r$ can match the same strings $c::cs$
+expression $\textit{der}\;c\;r$ can match the same strings $c\!::\!cs$
that can be matched by $r$, except that the $c$ is chopped off.
Assume now $r$ can match the string $abc$. If you take the derivative
according to $a$ then you obtain a regular expression that can match
$bc$ (it is $abc$ where the $a$ has been chopped off). If you now
-build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you
+build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r)$ you
obtain a regular expression that can match the string $c$ (it is $bc$
where $b$ is chopped off). If you finally build the derivative of this
according $c$, that is
-$\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r)))$, you
-obtain a regular expression that can match the empty string. You can
-test this using the function nullable, which is what your matcher is
-doing.
+$\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r))$, you obtain
+a regular expression that can match the empty string. You can test
+whether this is indeed the case using the function nullable, which is
+what your matcher is doing.
-The purpose of the simp function is to keep the regular expression
-small. Normally the derivative function makes the regular expression
-bigger (see the SEQ case and the example in (2b)) and the algorithm
-would be slower and slower over time. The simp function counters this
-increase in size and the result is that the algorithm is fast
-throughout. By the way, this algorithm is by Janusz Brzozowski who
-came up with the idea of derivatives in 1964 in his PhD thesis.
+The purpose of the $\textit{simp}$ function is to keep the regular
+expression small. Normally the derivative function makes the regular
+expression bigger (see the SEQ case and the example in (1b)) and the
+algorithm would be slower and slower over time. The $\textit{simp}$
+function counters this increase in size and the result is that the
+algorithm is fast throughout. By the way, this algorithm is by Janusz
+Brzozowski who came up with the idea of derivatives in 1964 in his PhD
+thesis.
\begin{center}\small
\url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}
\end{center}
+
+If you want to see how badly the regular expression matchers do in
+Java and Python with the `evil' regular expression, then have a look
+at the graphs below (you can try it out for yourself: have a look at
+the file \texttt{catastrophic.java} on KEATS). Compare this with the
+matcher you have implemented. How long can the string of $a$'s be
+in your matcher and stay within the 30 seconds time limit?
+
+\begin{center}
+\begin{tikzpicture}
+\begin{axis}[
+ title={Graph: $(a^*)^*\cdot b$ and strings
+ $\underbrace{a\ldots a}_{n}$},
+ xlabel={$n$},
+ x label style={at={(1.05,0.0)}},
+ ylabel={time in secs},
+ enlargelimits=false,
+ xtick={0,5,...,30},
+ xmax=33,
+ ymax=35,
+ ytick={0,5,...,30},
+ scaled ticks=false,
+ axis lines=left,
+ width=6cm,
+ height=5.0cm,
+ legend entries={Python, Java},
+ legend pos=outer north east]
+\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
+\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
+\end{axis}
+\end{tikzpicture}
+\end{center}
+\newpage
+
+\subsection*{Part 2 (4 Marks)}
+
+Comming from Java or C++, you might think Scala is a quite
+esotheric programming language. But remember, some serious companies
+have built their business on Scala. And there are far more esotheric
+languages out there. One is called \emph{brainf***}. Urban M\"uller
+developed this language in 1993. A close relative was already
+introduced in ... by Corado B\"ohm, an Italian computer pionier, who
+unfortunately died a few months ago. One feature of brainf*** is its
+minimalistic set of instructions. It has just 8 instructions, all of
+which are single characters. Despite this minimalism, this language,
+given enough memory, has been shown to be Turing complete. In this
+part you will implement an interpreter for this language.
+
+
+
+\subsubsection*{Tasks (file bf.scala)}
+
+\begin{itemize}
+\item[(2a)]
+
+\item[(2b)]
+
+\item[(2c)]
+
+\end{itemize}\bigskip
+
+
+
+
\end{document}