cws/cw03.tex
changeset 153 4383809c176a
parent 152 114a89518aea
child 154 39c6b93718f0
--- a/cws/cw03.tex	Tue Nov 21 16:31:11 2017 +0000
+++ b/cws/cw03.tex	Thu Nov 23 10:56:47 2017 +0000
@@ -1,18 +1,60 @@
 \documentclass{article}
 \usepackage{../style}
 \usepackage{../langs}
+\usepackage{tikz}
+\usepackage{pgf}
+\usepackage{pgfplots}
+
+\begin{filecontents}{re-python2.data}
+1 0.033
+5 0.036
+10 0.034
+15 0.036
+18 0.059
+19 0.084
+20 0.141
+21 0.248
+22 0.485
+23 0.878
+24 1.71
+25 3.40
+26 7.08
+27 14.12
+28 26.69
+\end{filecontents}
+
+\begin{filecontents}{re-java.data}
+5  0.00298
+10  0.00418
+15  0.00996
+16  0.01710
+17  0.03492
+18  0.03303
+19  0.05084
+20  0.10177
+21  0.19960
+22  0.41159
+23  0.82234
+24  1.70251
+25  3.36112
+26  6.63998
+27  13.35120
+28  29.81185
+\end{filecontents}
+
 
 \begin{document}
 
-\section*{Coursework 8 (Scala, Regular Expressions)}
+\section*{Coursework 8 (Scala, Regular Expressions, Brainf***)}
 
 This coursework is worth 10\%. It is about regular expressions,
-pattern matching and polymorphism. The first part is due on 30
+pattern matching and an interpreter. The first part is due on 30
 November at 11pm; the second, more advanced part, is due on 21
-December at 11pm. You are asked to implement a regular expression
-matcher based on derivatives of regular expressions. The reason is
-that regular expression matching in Java can be extremely slow
-sometimes.\bigskip
+December at 11pm. In the first part, you are asked to implement a
+regular expression matcher based on derivatives of regular
+expressions. The reason is that regular expression matching in Java
+can sometimes be extremely slow. The advanced part is about an
+interpreter for a very simple programming language.\bigskip
 
 \noindent
 \textbf{Important:}
@@ -54,9 +96,11 @@
 \subsection*{Part 1 (6 Marks)}
 
 The task is to implement a regular expression matcher that is based on
-derivatives of regular expressions. The implementation can deal
-with the following regular expressions, which have been predefined
-in the file re.scala:
+derivatives of regular expressions. Most of the functions are defined by
+recursion over regular expressions and can be elegantly implemented
+using Scala's pattern-matching. The implementation should deal with the
+following regular expressions, which have been predefined in the file
+\texttt{re.scala}:
 
 \begin{center}
 \begin{tabular}{lcll}
@@ -91,7 +135,7 @@
 \begin{itemize}
 \item[(1a)] Implement a function, called \textit{nullable}, by
   recursion over regular expressions. This function tests whether a
-  regular expression can match the empty string, that is given a
+  regular expression can match the empty string. This means given a
   regular expression it either returns true or false.
 
 \begin{center}
@@ -163,9 +207,9 @@
 
 \item[(1c)] Implement the function \textit{simp}, which recursively
   traverses a regular expression from the inside to the outside, and
-  simplifies every sub-regular-expression on the left (see below) to
-  the regular expression on the right, except it does not simplify inside
-  ${}^*$-regular expressions.
+  on the way simplifies every regular expression on the left (see
+  below) to the regular expression on the right, except it does not
+  simplify inside ${}^*$-regular expressions.
 
   \begin{center}
 \begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
@@ -182,17 +226,18 @@
   For example the regular expression
   \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
 
-  simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be
+  simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be
   seen as trees and there are several methods for traversing
-  trees. One of them corresponds to the inside-out traversal. Also
-  remember numerical expressions from school: there you had expressions
+  trees. One of them corresponds to the inside-out traversal, which is
+  sometimes also called post-order traversal. Furthermore,
+  remember numerical expressions from school times: there you had expressions
   like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
   and simplification rules that looked very similar to rules
   above. You would simplify such numerical expressions by replacing
   for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
   look whether more rules are applicable. If you organise the
   simplification in an inside-out fashion, it is always clear which
-  rule should applied next.\\\mbox{}\hfill[1 Mark]
+  rule should be applied next.\hfill[2 Marks]
 
 \item[(1d)] Implement two functions: The first, called \textit{ders},
   takes a list of characters and a regular expression as arguments, and
@@ -214,81 +259,10 @@
 according to \textit{ders} and after that tests whether the resulting
 derivative regular expression can match the empty string (using
 \textit{nullable}).  For example the \textit{matcher} will produce
-true given the regular expression $(a\cdot b)\cdot c$ and the string
-$abc$.\\  \mbox{}\hfill[1 Mark]
-
-\item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches
-  (from the left to 
-right) in the string $s_1$ all the non-empty substrings that match the 
-regular expression $r$---these substrings are assumed to be
-the longest substrings matched by the regular expression and
-assumed to be non-overlapping. All these substrings in $s_1$ matched by $r$
-are replaced by $s_2$. For example given the regular expression
-
-\[(a \cdot a)^* + (b \cdot b)\]
-
-\noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and
-replacement the string $s_2 = c$ yields the string
-
-\[
-ccbcabcaccc
-\]
-
-\hfill[2 Marks]
-\end{itemize}
-
-
-
-
-\subsection*{Part 2 (4 Marks)}
-
-You need to copy all the code from \texttt{re.scala} into
-\texttt{re2.scala} in order to complete Part 2.  Parts (2a) and (2b)
-give you another method and datapoints for testing the \textit{der}
-and \textit{simp} functions from Part~1.
-
-\subsubsection*{Tasks (file re2.scala)}
+true for the regular expression $(a\cdot b)\cdot c$ and the string
+$abc$, but false if you give it the string $ab$. \hfill[1 Mark]
 
-\begin{itemize}
-\item[(2a)] Write a \textbf{polymorphic} function, called
-  \textit{iterT}, that is \textbf{tail-recursive}(!) and takes an
-  integer $n$, a function $f$ and an $x$ as arguments. This function
-  should iterate $f$ $n$-times starting with the argument $x$, like
-
-  \[\underbrace{f(\ldots (f}_{n\text{-times}}(x)))
-  \]
-
-  More formally that means \textit{iterT} behaves as follows:
-
-  \begin{center}
-    \begin{tabular}{lcl}
-      $\textit{iterT}(n, f, x)$ & $\dn$ &
-      $\begin{cases}
-        \;x & \textit{if}\;n = 0\\
-        \;f(\textit{iterT}(n - 1, f, x)) & \textit{otherwise}
-        \end{cases}$      
-    \end{tabular}
-\end{center}
-
-  Make sure you write a \textbf{tail-recursive} version of
-  \textit{iterT}.  If you add the annotation \texttt{@tailrec} (see
-  below) your code should not produce an error message.
-
-  \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]  
-  import scala.annotation.tailrec
-
-  @tailrec
-  def iterT[A](n: Int, f: A => A, x: A): A = ...
-  \end{lstlisting}
-
-  You can assume that \textit{iterT} will only be called for positive
-  integers $0 \le n$. Given the type variable \texttt{A}, the type of
-  $f$ is \texttt{A => A} and the type of $x$ is \texttt{A}. This means
-  \textit{iterT} can be used, for example, for functions from integers
-  to integers, or strings to strings, or regular expressions to
-  regular expressions.  \\ \mbox{}\hfill[2 Marks]
-
-\item[(2b)] Implement a function, called \textit{size}, by recursion
+\item[(1e)] Implement a function, called \textit{size}, by recursion
   over regular expressions. If a regular expression is seen as a tree,
   then \textit{size} should return the number of nodes in such a
   tree. Therefore this function is defined as follows:
@@ -304,49 +278,12 @@
 \end{tabular}
 \end{center}
 
-You can use \textit{size} and \textit{iterT} in order to test how much
-the 'evil' regular expression $(a^*)^* \cdot b$ grows when taking
-successive derivatives according the letter $a$ and then compare it to
-taking the derivative, but simlifying the derivative after each step.
-For example, the calls
-
-  \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]  
-  size(iterT(20, (r: Rexp) => der('a', r), EVIL))
-  size(iterT(20, (r: Rexp) => simp(der('a', r)), EVIL))
-  \end{lstlisting}
-
-  produce without simplification a regular expression of size of
-  7340068 after 20 iterations, while the one with
-  simplification gives
-  just 8.\\ \mbox{}\hfill[1 Mark]
-  
-
-\item[(2c)] Write a \textbf{polymorphic} function, called
-  \textit{fixpT}, that takes
-  a function $f$ and an $x$ as arguments. The purpose
-  of \textit{fixpT} is to calculate a fixpoint of the function $f$
-  starting from the argument $x$.
-  A fixpoint, say $y$, is when $f(y) = y$ holds. 
-  That means \textit{fixpT} behaves as follows:
-
-  \begin{center}
-    \begin{tabular}{lcl}
-      $\textit{fixpT}(f, x)$ & $\dn$ &
-      $\begin{cases}
-        \;x & \textit{if}\;f(x) = x\\
-        \;\textit{fixpT}(f, f(x)) & \textit{otherwise}
-        \end{cases}$      
-    \end{tabular}
-\end{center}
-
-  Make sure you calculate in the code of $\textit{fixpT}$ the result
-  of $f(x)$ only once. Given the type variable \texttt{A} in
-  $\textit{fixpT}$, the type of $f$ is \texttt{A => A} and the type of
-  $x$ is \texttt{A}. The file \texttt{re2.scala} gives two example
-  function where in one the fixpoint is 1 and in the other
-  it is the string $a$.\\ \mbox{}\hfill[1 Mark]  
-\end{itemize}\bigskip  
-
+You can use \textit{size} in order to test how much the `evil' regular
+expression $(a^*)^* \cdot b$ grows when taking successive derivatives
+according the letter $a$ without simplification and then compare it to
+taking the derivative, but simplify the result.  The sizes
+are given in \texttt{re.scala}. \hfill[1 Mark]
+\end{itemize}
 
 \subsection*{Background}
 
@@ -354,36 +291,101 @@
 function might not so easy to be seen. To understand its purpose
 better, assume a regular expression $r$ can match strings of the form
 $c\!::\!cs$ (that means strings which start with a character $c$ and have
-some rest, or tail, $cs$). If you now take the derivative of $r$ with
-respect to the character $c$, then you obtain a regular expressions
+some rest, or tail, $cs$). If you take the derivative of $r$ with
+respect to the character $c$, then you obtain a regular expression
 that can match all the strings $cs$.  In other words, the regular
-expression $\textit{der}\;c\;r$ can match the same strings $c::cs$
+expression $\textit{der}\;c\;r$ can match the same strings $c\!::\!cs$
 that can be matched by $r$, except that the $c$ is chopped off.
 
 Assume now $r$ can match the string $abc$. If you take the derivative
 according to $a$ then you obtain a regular expression that can match
 $bc$ (it is $abc$ where the $a$ has been chopped off). If you now
-build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you
+build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r)$ you
 obtain a regular expression that can match the string $c$ (it is $bc$
 where $b$ is chopped off). If you finally build the derivative of this
 according $c$, that is
-$\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r)))$, you
-obtain a regular expression that can match the empty string. You can
-test this using the function nullable, which is what your matcher is
-doing.
+$\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r))$, you obtain
+a regular expression that can match the empty string. You can test
+whether this is indeed the case using the function nullable, which is
+what your matcher is doing.
 
-The purpose of the simp function is to keep the regular expression
-small. Normally the derivative function makes the regular expression
-bigger (see the SEQ case and the example in (2b)) and the algorithm
-would be slower and slower over time. The simp function counters this
-increase in size and the result is that the algorithm is fast
-throughout.  By the way, this algorithm is by Janusz Brzozowski who
-came up with the idea of derivatives in 1964 in his PhD thesis.
+The purpose of the $\textit{simp}$ function is to keep the regular
+expression small. Normally the derivative function makes the regular
+expression bigger (see the SEQ case and the example in (1b)) and the
+algorithm would be slower and slower over time. The $\textit{simp}$
+function counters this increase in size and the result is that the
+algorithm is fast throughout.  By the way, this algorithm is by Janusz
+Brzozowski who came up with the idea of derivatives in 1964 in his PhD
+thesis.
 
 \begin{center}\small
 \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}
 \end{center}
 
+
+If you want to see how badly the regular expression matchers do in
+Java and Python with the `evil' regular expression, then have a look
+at the graphs below (you can try it out for yourself: have a look at
+the file \texttt{catastrophic.java} on KEATS). Compare this with the
+matcher you have implemented. How long can the string of $a$'s be
+in your matcher and stay within the 30 seconds time limit?
+
+\begin{center}
+\begin{tikzpicture}
+\begin{axis}[
+    title={Graph: $(a^*)^*\cdot b$ and strings 
+           $\underbrace{a\ldots a}_{n}$},
+    xlabel={$n$},
+    x label style={at={(1.05,0.0)}},
+    ylabel={time in secs},
+    enlargelimits=false,
+    xtick={0,5,...,30},
+    xmax=33,
+    ymax=35,
+    ytick={0,5,...,30},
+    scaled ticks=false,
+    axis lines=left,
+    width=6cm,
+    height=5.0cm, 
+    legend entries={Python, Java},  
+    legend pos=outer north east]
+\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
+\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
+\end{axis}
+\end{tikzpicture}
+\end{center}
+\newpage
+
+\subsection*{Part 2 (4 Marks)}
+
+Comming from Java or C++, you might think Scala is a quite
+esotheric programming language.  But remember, some serious companies
+have built their business on Scala. And there are far more esotheric
+languages out there. One is called \emph{brainf***}. Urban M\"uller
+developed this language in 1993.  A close relative was already
+introduced in ... by Corado B\"ohm, an Italian computer pionier, who
+unfortunately died a few months ago. One feature of brainf*** is its
+minimalistic set of instructions. It has just 8 instructions, all of
+which are single characters. Despite this minimalism, this language,
+given enough memory, has been shown to be Turing complete. In this
+part you will implement an interpreter for this language.
+
+
+
+\subsubsection*{Tasks (file bf.scala)}
+
+\begin{itemize}
+\item[(2a)] 
+
+\item[(2b)]   
+
+\item[(2c)]
+
+\end{itemize}\bigskip  
+
+
+
+
 \end{document}