pep-material: comparison cws/main

equal deleted inserted replaced

-:c5ad0e3f2a6d
+:591b9005157e
 % !TEX program = xelatex
 \documentclass{article}
-\usepackage{chessboard}
-\usepackage[LSBC4,T1]{fontenc}
-\let\clipbox\relax
 \usepackage{../style}
 \usepackage{../langs}
 \usepackage{disclaimer}
+\usepackage{tikz}
+\usepackage{pgf}
+\usepackage{pgfplots}
+\usepackage{stackengine}
+%% \usepackage{accents}
+\newcommand\barbelow[1]{\stackunder[1.2pt]{#1}{\raisebox{-4mm}{\boldmath$\uparrow$}}}
+\begin{filecontents}{re-python2.data}
+1 0.033
+5 0.036
+10 0.034
+15 0.036
+18 0.059
+19 0.084
+20 0.141
+21 0.248
+22 0.485
+23 0.878
+24 1.71
+25 3.40
+26 7.08
+27 14.12
+28 26.69
+\end{filecontents}
+\begin{filecontents}{re-java.data}
+5  0.00298
+10  0.00418
+15  0.00996
+16  0.01710
+17  0.03492
+18  0.03303
+19  0.05084
+20  0.10177
+21  0.19960
+22  0.41159
+23  0.82234
+24  1.70251
+25  3.36112
+26  6.63998
+27  13.35120
+28  29.81185
+\end{filecontents}
+\begin{filecontents}{re-js.data}
+5   0.061
+10  0.061
+15  0.061
+20  0.070
+23  0.131
+25  0.308
+26  0.564
+28  1.994
+30  7.648
+31  15.881
+32  32.190
+\end{filecontents}
+\begin{filecontents}{re-java9.data}
+1000  0.01410
+2000  0.04882
+3000  0.10609
+4000  0.17456
+5000  0.27530
+6000  0.41116
+7000  0.53741
+8000  0.70261
+9000  0.93981
+10000 0.97419
+11000 1.28697
+12000 1.51387
+14000 2.07079
+16000 2.69846
+20000 4.41823
+24000 6.46077
+26000 7.64373
+30000 9.99446
+34000 12.966885
+38000 16.281621
+42000 19.180228
+46000 21.984721
+50000 26.950203
+60000 43.0327746
+\end{filecontents}
+\begin{filecontents}{re-swift.data}
+5   0.001
+10  0.001
+15  0.009
+20  0.178
+23  1.399
+24  2.893
+25  5.671
+26  11.357
+27  22.430
+\end{filecontents}
+\begin{filecontents}{re-dart.data}
+20 0.042
+21 0.084
+22 0.190
+23 0.340
+24 0.678
+25 1.369
+26 2.700
+27 5.462
+28 10.908
+29 21.725
+30 43.492
+\end{filecontents}
 \begin{document}
-\setchessboard{smallboard,
+% BF IDE
-zero,
+% https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5
-showmover=false,
-boardfontencoding=LSBC4,
+\section*{Part 8 (Scala, 7 Marks)}
-hlabelformat=\arabic{ranklabel},
-vlabelformat=\arabic{filelabel}}
+%\mbox{}\hfill\textit{``[Google’s MapReduce] abstraction is inspired by the}\\
+%\mbox{}\hfill\textit{map and reduce primitives present in Lisp and many}\\
-\mbox{}\\[-18mm]\mbox{}
+%\mbox{}\hfill\textit{other functional language.''}\smallskip\\
+%\mbox{}\hfill\textit{ --- Dean and Ghemawat, who designed this concept at Google}
-\section*{Part 8 (Scala)}
+%\bigskip\medskip
-\mbox{}\hfill\textit{``The problem with object-oriented languages is they’ve got all this implicit,}\\
-\mbox{}\hfill\textit{environment that they carry around with them. You wanted a banana but}\\
-\mbox{}\hfill\textit{what you got was a gorilla holding the banana and the entire jungle.''}\smallskip\\
-\mbox{}\hfill\textit{ --- Joe Armstrong (creator of the Erlang programming language)}\medskip\bigskip
 \noindent
-This part is about searching and backtracking. You are asked to
+This part is about a regular expression matcher described by
-implement Scala programs that solve various versions of the
+Brzozowski in 1964. This part is due on \cwEIGHTa{} at 5pm.  The
-\textit{Knight's Tour Problem} on a chessboard. The preliminary part (4\%) is
+background is that ``out-of-the-box'' regular expression matching in
-due on  \cwEIGHT{} at 4pm; the core part is due on \cwEIGHTa{} at 4pm.
+mainstream languages like Java, JavaScript and Python can sometimes be
-Note the core, more advanced, part might include material you have not
+excruciatingly slow.  You are supposed to implement a regular
-yet seen in the first three lectures. \bigskip
+expression matcher that is much, much faster. \bigskip
-\IMPORTANT{}
+\IMPORTANTNONE{}
+\noindent
 Also note that the running time of each part will be restricted to a
-maximum of 30 seconds on my laptop: If you calculate a result once,
+maximum of 30 seconds on my laptop.
-try to avoid to calculate the result again. Feel free to copy any code
-you need from files \texttt{knight1.scala}, \texttt{knight2.scala} and
-\texttt{knight3.scala}.
 \DISCLAIMER{}
+\subsection*{Reference Implementation}
+This Scala assignment comes with a reference implementation in form of
+a \texttt{jar}-file. This allows you to run any test cases on your own
+computer. For example you can call Scala on the command line with the
+option \texttt{-cp re.jar} and then query any function from the
+\texttt{re.scala} template file. As usual you have to prefix the calls
+with \texttt{CW8c} or import this object.  Since some tasks
+are time sensitive, you can check the reference implementation as
+follows: if you want to know, for example, how long it takes to match
+strings of $a$'s using the regular expression $(a^*)^*\cdot b$ you can
+query as follows:
+\begin{lstlisting}[xleftmargin=1mm,numbers=none,basicstyle=\ttfamily\small]
+$ scala -cp re.jar
+scala> import CW8c._
+scala> for (i <- 0 to 5000000 by 500000) {
+| println(f"$i: ${time_needed(2, matcher(EVIL, "a" * i))}%.5f secs.")
+| }
+0: 0.00002 secs.
+500000: 0.10608 secs.
+1000000: 0.22286 secs.
+1500000: 0.35982 secs.
+2000000: 0.45828 secs.
+2500000: 0.59558 secs.
+3000000: 0.73191 secs.
+3500000: 0.83499 secs.
+4000000: 0.99149 secs.
+4500000: 1.15395 secs.
+5000000: 1.29659 secs.
+\end{lstlisting}%$
+\subsection*{Preliminaries}
+The task is to implement a regular expression matcher that is based on
+derivatives of regular expressions. Most of the functions are defined by
+recursion over regular expressions and can be elegantly implemented
+using Scala's pattern-matching. The implementation should deal with the
+following regular expressions, which have been predefined in the file
+\texttt{re.scala}:
+\begin{center}
+\begin{tabular}{lcll}
+$r$ & $::=$ & $\ZERO$     & cannot match anything\\
+&   $|$ & $\ONE$      & can only match the empty string\\
+&   $|$ & $c$         & can match a single character (in this case $c$)\\
+&   $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
+&   $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
+&  & & then the second part with $r_2$\\
+&   $|$ & $r^*$       & can match a string with zero or more copies of $r$\\
+\end{tabular}
+\end{center}
+\noindent
+Why? Regular expressions are
+one of the simplest ways to match patterns in text, and
+are endlessly useful for searching, editing and analysing data in all
+sorts of places (for example analysing network traffic in order to
+detect security breaches). However, you need to be fast, otherwise you
+will stumble over problems such as recently reported at
+{\small
+\begin{itemize}
+\item[$\bullet$] \url{https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019}
+\item[$\bullet$] \url{https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
+\item[$\bullet$] \url{https://vimeo.com/112065252}
+\item[$\bullet$] \url{https://davidvgalbraith.com/how-i-fixed-atom}
+\end{itemize}}
+% Knowing how to match regular expressions and strings will let you
+% solve a lot of problems that vex other humans.
+\subsubsection*{Tasks (file re.scala)}
+The file \texttt{re.scala} has already a definition for regular
+expressions and also defines some handy shorthand notation for
+regular expressions. The notation in this document matches up
+with the code in the file as follows:
+\begin{center}
+\begin{tabular}{rcl@{\hspace{10mm}}l}
+& & code: & shorthand:\smallskip \\
+$\ZERO$ & $\mapsto$ & \texttt{ZERO}\\
+$\ONE$  & $\mapsto$ & \texttt{ONE}\\
+$c$     & $\mapsto$ & \texttt{CHAR(c)}\\
+$r_1 + r_2$ & $\mapsto$ & \texttt{ALT(r1, r2)} & \texttt{r1 | r2}\\
+$r_1 \cdot r_2$ & $\mapsto$ & \texttt{SEQ(r1, r2)} & \texttt{r1 $\sim$ r2}\\
+$r^*$ & $\mapsto$ &  \texttt{STAR(r)} & \texttt{r.\%}
+\end{tabular}
+\end{center}
+\begin{itemize}
+\item[(1)] Implement a function, called \textit{nullable}, by
+recursion over regular expressions. This function tests whether a
+regular expression can match the empty string. This means given a
+regular expression it either returns true or false. The function
+\textit{nullable}
+is defined as follows:
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
+$\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
+$\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
+$\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
+$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
+$\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
+\end{tabular}
+\end{center}~\hfill[1 Mark]
+\item[(2)] Implement a function, called \textit{der}, by recursion over
+regular expressions. It takes a character and a regular expression
+as arguments and calculates the derivative of a regular expression according
+to the rules:
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
+$\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
+$\textit{der}\;c\;(d)$     & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
+$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
+$\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
+& & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
+& & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
+$\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
+\end{tabular}
+\end{center}
+For example given the regular expression $r = (a \cdot b) \cdot c$, the derivatives
+w.r.t.~the characters $a$, $b$ and $c$ are
+\begin{center}
+\begin{tabular}{lcll}
+$\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & \quad($= r'$)\\
+$\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\
+$\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$
+\end{tabular}
+\end{center}
+Let $r'$ stand for the first derivative, then taking the derivatives of $r'$
+w.r.t.~the characters $a$, $b$ and $c$ gives
+\begin{center}
+\begin{tabular}{lcll}
+$\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\
+$\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & \quad($= r''$)\\
+$\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$
+\end{tabular}
+\end{center}
+One more example: Let $r''$ stand for the second derivative above,
+then taking the derivatives of $r''$ w.r.t.~the characters $a$, $b$
+and $c$ gives
+\begin{center}
+\begin{tabular}{lcll}
+$\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\
+$\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\
+$\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$ &
+(is $\textit{nullable}$)
+\end{tabular}
+\end{center}
+Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
+\mbox{}\hfill\mbox{[1 Mark]}
+\item[(3)] Implement the function \textit{simp}, which recursively
+traverses a regular expression, and on the way up simplifies every
+regular expression on the left (see below) to the regular expression
+on the right, except it does not simplify inside ${}^*$-regular
+expressions.
+\begin{center}
+\begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
+$r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\
+$\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\
+$r \cdot \ONE$ & $\mapsto$ & $r$\\
+$\ONE \cdot r$ & $\mapsto$ & $r$\\
+$r + \ZERO$ & $\mapsto$ & $r$\\
+$\ZERO + r$ & $\mapsto$ & $r$\\
+$r + r$ & $\mapsto$ & $r$\\
+\end{tabular}
+\end{center}
+For example the regular expression
+\[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
+simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be
+seen as trees and there are several methods for traversing
+trees. One of them corresponds to the inside-out traversal, which is also
+sometimes called post-order tra\-versal: you traverse inside the
+tree and on the way up you apply simplification rules.
+\textbf{Another Hint:}
+Remember numerical expressions from school times---there you had expressions
+like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
+and simplification rules that looked very similar to rules
+above. You would simplify such numerical expressions by replacing
+for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
+look whether more rules are applicable. If you organise the
+simplification in an inside-out fashion, it is always clear which
+simplification should be applied next.\hfill[1 Mark]
+\item[(4)] Implement two functions: The first, called \textit{ders},
+takes a list of characters and a regular expression as arguments, and
+builds the derivative w.r.t.~the list as follows:
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{ders}\;(Nil)\;r$ & $\dn$ & $r$\\
+$\textit{ders}\;(c::cs)\;r$  & $\dn$ &
+$\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
+\end{tabular}
+\end{center}
+Note that this function is different from \textit{der}, which only
+takes a single character.
+The second function, called \textit{matcher}, takes a string and a
+regular expression as arguments. It builds first the derivatives
+according to \textit{ders} and after that tests whether the resulting
+derivative regular expression can match the empty string (using
+\textit{nullable}).  For example the \textit{matcher} will produce
+true for the regular expression $(a\cdot b)\cdot c$ and the string
+$abc$, but false if you give it the string $ab$. \hfill[1 Mark]
+\item[(5)] Implement a function, called \textit{size}, by recursion
+over regular expressions. If a regular expression is seen as a tree,
+then \textit{size} should return the number of nodes in such a
+tree. Therefore this function is defined as follows:
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{size}(\ZERO)$ & $\dn$ & $1$\\
+$\textit{size}(\ONE)$  & $\dn$ & $1$\\
+$\textit{size}(c)$     & $\dn$ & $1$\\
+$\textit{size}(r_1 + r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
+$\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
+$\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\
+\end{tabular}
+\end{center}
+You can use \textit{size} in order to test how much the ``evil'' regular
+expression $(a^*)^* \cdot b$ grows when taking successive derivatives
+according the letter $a$ without simplification and then compare it to
+taking the derivative, but simplify the result.  The sizes
+are given in \texttt{re.scala}. \hfill[1 Mark]
+\item[(6)] You do not have to implement anything specific under this
+task.  The purpose here is that you will be marked for some ``power''
+test cases. For example can your matcher decide within 30 seconds
+whether the regular expression $(a^*)^*\cdot b$ matches strings of the
+form $aaa\ldots{}aaaa$, for say 1 Million $a$'s. And does simplification
+simplify the regular expression
+\[
+\texttt{SEQ(SEQ(SEQ(..., ONE | ONE) , ONE | ONE), ONE | ONE)}
+\]
+\noindent correctly to just \texttt{ONE}, where \texttt{SEQ} is nested
+50 or more times?\\
+\mbox{}\hfill[2 Mark]
+\end{itemize}
 \subsection*{Background}
-The \textit{Knight's Tour Problem} is about finding a tour such that
+Although easily implementable in Scala, the idea behind the derivative
-the knight visits every field on an $n\times n$ chessboard once. For
+function might not so easy to be seen. To understand its purpose
-example on a $5\times 5$ chessboard, a knight's tour is:
+better, assume a regular expression $r$ can match strings of the form
+$c\!::\!cs$ (that means strings which start with a character $c$ and have
-\chessboard[maxfield=d4,
+some rest, or tail, $cs$). If you take the derivative of $r$ with
-pgfstyle= {[base,at={\pgfpoint{0pt}{-0.5ex}}]text},
+respect to the character $c$, then you obtain a regular expression
-text = \small 24, markfield=Z4,
+that can match all the strings $cs$.  In other words, the regular
-text = \small 11, markfield=a4,
+expression $\textit{der}\;c\;r$ can match the same strings $c\!::\!cs$
-text = \small  6, markfield=b4,
+that can be matched by $r$, except that the $c$ is chopped off.
-text = \small 17, markfield=c4,
-text = \small  0, markfield=d4,
+Assume now $r$ can match the string $abc$. If you take the derivative
-text = \small 19, markfield=Z3,
+according to $a$ then you obtain a regular expression that can match
-text = \small 16, markfield=a3,
+$bc$ (it is $abc$ where the $a$ has been chopped off). If you now
-text = \small 23, markfield=b3,
+build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r)$ you
-text = \small 12, markfield=c3,
+obtain a regular expression that can match the string $c$ (it is $bc$
-text = \small  7, markfield=d3,
+where $b$ is chopped off). If you finally build the derivative of this
-text = \small 10, markfield=Z2,
+according $c$, that is
-text = \small  5, markfield=a2,
+$\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r))$, you obtain
-text = \small 18, markfield=b2,
+a regular expression that can match the empty string. You can test
-text = \small  1, markfield=c2,
+whether this is indeed the case using the function nullable, which is
-text = \small 22, markfield=d2,
+what your matcher is doing.
-text = \small 15, markfield=Z1,
-text = \small 20, markfield=a1,
+The purpose of the $\textit{simp}$ function is to keep the regular
-text = \small  3, markfield=b1,
+expressions small. Normally the derivative function makes the regular
-text = \small  8, markfield=c1,
+expression bigger (see the SEQ case and the example in (2)) and the
-text = \small 13, markfield=d1,
+algorithm would be slower and slower over time. The $\textit{simp}$
-text = \small  4, markfield=Z0,
+function counters this increase in size and the result is that the
-text = \small  9, markfield=a0,
+algorithm is fast throughout.  By the way, this algorithm is by Janusz
-text = \small 14, markfield=b0,
+Brzozowski who came up with the idea of derivatives in 1964 in his PhD
-text = \small 21, markfield=c0,
+thesis.
-text = \small  2, markfield=d0
-]
+\begin{center}\small
+\url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}
-\noindent
+\end{center}
-This tour starts in the right-upper corner, then moves to field
-$(3,2)$, then $(4,0)$ and so on. There are no knight's tours on
-$2\times 2$, $3\times 3$ and $4\times 4$ chessboards, but for every
+If you want to see how badly the regular expression matchers do in
-bigger board there is.
+Java\footnote{Version 8 and below; Version 9 and above does not seem to be as
+catastrophic, but still much worse than the regular expression
-A knight's tour is called \emph{closed}, if the last step in the tour
+matcher based on derivatives.}, JavaScript and Python with the
-is within a knight's move to the beginning of the tour. So the above
+`evil' regular expression $(a^*)^*\cdot b$, then have a look at the
-knight's tour is \underline{not} closed because the last
+graphs below (you can try it out for yourself: have a look at the files
-step on field $(0, 4)$ is not within the reach of the first step on
+\texttt{catastrophic9.java}, \texttt{catastrophic.js},
-$(4, 4)$. It turns out there is no closed knight's tour on a $5\times
+\texttt{catastrophic.py} etc on KEATS). Compare this with the matcher you
-5$ board. But there are on a $6\times 6$ board and on bigger ones, for
+have implemented. How long can the string of $a$'s be in your matcher
-example
+and still stay within the 30 seconds time limit?
-\chessboard[maxfield=e5,
+\begin{center}
-pgfstyle={[base,at={\pgfpoint{0pt}{-0.5ex}}]text},
+\begin{tabular}{@{}cc@{}}
-text = \small 10, markfield=Z5,
+\multicolumn{2}{c}{Graph: $(a^*)^*\cdot b$ and strings
-text = \small  5, markfield=a5,
+$\underbrace{a\ldots a}_{n}$}\bigskip\\
-text = \small 18, markfield=b5,
-text = \small 25, markfield=c5,
-text = \small 16, markfield=d5,
-text = \small  7, markfield=e5,
-text = \small 31, markfield=Z4,
-text = \small 26, markfield=a4,
-text = \small  9, markfield=b4,
-text = \small  6, markfield=c4,
-text = \small 19, markfield=d4,
-text = \small 24, markfield=e4,
-% 4  11  30  17   8  15
-text = \small  4, markfield=Z3,
-text = \small 11, markfield=a3,
-text = \small 30, markfield=b3,
-text = \small 17, markfield=c3,
-text = \small  8, markfield=d3,
-text = \small 15, markfield=e3,
-%29  32  27   0  23  20
-text = \small 29, markfield=Z2,
-text = \small 32, markfield=a2,
-text = \small 27, markfield=b2,
-text = \small  0, markfield=c2,
-text = \small 23, markfield=d2,
-text = \small 20, markfield=e2,
-%12   3  34  21  14   1
-text = \small 12, markfield=Z1,
-text = \small  3, markfield=a1,
-text = \small 34, markfield=b1,
-text = \small 21, markfield=c1,
-text = \small 14, markfield=d1,
-text = \small  1, markfield=e1,
-%33  28  13   2  35  22
-text = \small 33, markfield=Z0,
-text = \small 28, markfield=a0,
-text = \small 13, markfield=b0,
-text = \small  2, markfield=c0,
-text = \small 35, markfield=d0,
-text = \small 22, markfield=e0,
-vlabel=false,
-hlabel=false
-]
-\noindent
-where the 35th move can join up again with the 0th move.
-If you cannot remember how a knight moves in chess, or never played
-chess, below are all potential moves indicated for two knights, one on
-field $(2, 2)$ (blue moves) and another on $(7, 7)$ (red moves):
-{\chessboard[maxfield=g7,
-color=blue!50,
-linewidth=0.2em,
-shortenstart=0.5ex,
-shortenend=0.5ex,
-markstyle=cross,
-markfields={a4, c4, Z3, d3, Z1, d1, a0, c0},
-color=red!50,
-markfields={f5, e6},
-setpieces={Ng7, Nb2},
-boardfontsize=12pt,labelfontsize=9pt]}
-\subsection*{Reference Implementation}
-This Scala part comes with three reference implementations in form of
-\texttt{jar}-files. This allows you to run any test cases on your own
-computer. For example you can call Scala on the command line with the
-option \texttt{-cp knight1.jar} and then query any function from the
-\texttt{knight1.scala} template file. As usual you have to
-prefix the calls with \texttt{CW8a}, \texttt{CW8b} and \texttt{CW8c}.
-Since some of the calls are time sensitive, I included some timing
-information. For example
-\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
-$ scala -cp knight1.jar
-scala> CW8a.enum_tours(5, List((0, 0))).length
-Time needed: 1.722 secs.
-res0: Int = 304
-scala> CW8a.print_board(8, CW8a.first_tour(8, List((0, 0))).get)
-Time needed: 15.411 secs.
-51  46  55  44  53   4  21  12
-56  43  52   3  22  13  24   5
-47  50  45  54  25  20  11  14
-42  57   2  49  40  23   6  19
-35  48  41  26  61  10  15  28
-58   1  36  39  32  27  18   7
-37  34  31  60   9  62  29  16
-0  59  38  33  30  17   8  63
-\end{lstlisting}%$
-\subsection*{Hints}
-\noindent
-\textbf{Preliminary Part} useful list functions: \texttt{.contains(..)} checks
-whether an element is in a list, \texttt{.flatten} turns a list of
-lists into just a list, \texttt{\_::\_} puts an element on the head of
-the list, \texttt{.head} gives you the first element of a list (make
-sure the list is not \texttt{Nil}); a useful option function:
-\texttt{.isDefined} returns true, if an option is \texttt{Some(..)};
-anonymous functions can be constructed using \texttt{(x:Int) => ...},
-this function takes an \texttt{Int} as an argument.\medskip
-\noindent
-\textbf{Core Part} a useful list function: \texttt{.sortBy} sorts a list
-according to a component given by the function; a function can be
-tested to be tail-recursive by annotation \texttt{@tailrec}, which is
-made available by importing \texttt{scala.annotation.tailrec}.\medskip
-\subsection*{Preliminary Part (4 Marks)}
-You are asked to implement the knight's tour problem such that the
-dimension of the board can be changed.  Therefore most functions will
-take the dimension of the board as an argument.  The fun with this
-problem is that even for small chessboard dimensions it has already an
-incredibly large search space---finding a tour is like finding a
-needle in a haystack. In the first task we want to see how far we get
-with exhaustively exploring the complete search space for small
-chessboards.\medskip
-\noindent
-Let us first fix the basic datastructures for the implementation.  The
-board dimension is an integer.
-A \emph{position} (or field) on the chessboard is
-a pair of integers, like $(0, 0)$. A \emph{path} is a list of
-positions. The first (or 0th move) in a path is the last element in
-this list; and the last move in the path is the first element. For
-example the path for the $5\times 5$ chessboard above is represented
-by
-\[
-\texttt{List($\underbrace{\texttt{(0, 4)}}_{24}$,
-$\underbrace{\texttt{(2, 3)}}_{23}$, ...,
-$\underbrace{\texttt{(3, 2)}}_1$, $\underbrace{\texttt{(4, 4)}}_0$)}
-\]
-\noindent
-Suppose the dimension of a chessboard is $n$, then a path is a
-\emph{tour} if the length of the path is $n \times n$, each element
-occurs only once in the path, and each move follows the rules of how a
-knight moves (see above for the rules).
-\subsubsection*{Tasks (file knight1.scala)}
-\begin{itemize}
-\item[(1)] Implement an \texttt{is\_legal} function that takes a
-dimension, a path and a position as arguments and tests whether the
-position is inside the board and not yet element in the
-path. \hfill[1 Mark]
-\item[(2)] Implement a \texttt{legal\_moves} function that calculates for a
-position all legal onward moves. If the onward moves are
-placed on a circle, you should produce them starting from
-``12-o'clock'' following in clockwise order.  For example on an
-$8\times 8$ board for a knight at position $(2, 2)$ and otherwise
-empty board, the legal-moves function should produce the onward
-positions in this order:
-\begin{center}
-\texttt{List((3,4), (4,3), (4,1), (3,0), (1,0), (0,1), (0,3), (1,4))}
-\end{center}
-If the board is not empty, then maybe some of the moves need to be
-filtered out from this list.  For a knight on field $(7, 7)$ and an
-empty board, the legal moves are
-\begin{center}
-\texttt{List((6,5), (5,6))}
-\end{center}
-\mbox{}\hfill[1 Mark]
-\item[(3)] Implement two recursive functions (\texttt{count\_tours} and
-\texttt{enum\_tours}). They each take a dimension and a path as
-arguments. They exhaustively search for tours starting
-from the given path. The first function counts all possible
-tours (there can be none for certain board sizes) and the second
-collects all tours in a list of paths. These functions will be
-called with a path containing a single position---the starting field.
-They are expected to extend this path so as to find all tours starting
-from the given position.\\
-\mbox{}\hfill[2 Marks]
-\end{itemize}
-\noindent \textbf{Test data:} For the marking, the functions in (3)
-will be called with board sizes up to $5 \times 5$. If you search
-for tours on a $5 \times 5$ board starting only from field $(0, 0)$,
-there are 304 of tours. If you try out every field of a $5 \times
-5$-board as a starting field and add up all tours, you obtain
-1728. A $6\times 6$ board is already too large to be searched
-exhaustively.\footnote{For your interest, the number of tours on
-$6\times 6$, $7\times 7$ and $8\times 8$ are 6637920, 165575218320,
-19591828170979904, respectively.}\smallskip
-\subsection*{Core Part (6 Marks)}
-\subsubsection*{Tasks (file knight1.scala cont.)}
-\begin{itemize}
-\item[(4)] Implement a \texttt{first}-function. This function takes a list of
-positions and a function $f$ as arguments; $f$ is the name we give to
-this argument). The function $f$ takes a position as argument and
-produces an optional path. So $f$'s type is \texttt{Pos =>
-Option[Path]}. The idea behind the \texttt{first}-function is as follows:
-\[
-\begin{array}{lcl}
-\textit{first}(\texttt{Nil}, f) & \dn & \texttt{None}\\
-\textit{first}(x\!::\!xs, f) & \dn & \begin{cases}
-f(x) & \textit{if}\;f(x) \not=\texttt{None}\\
-\textit{first}(xs, f) & \textit{otherwise}\\
-\end{cases}
-\end{array}
-\]
-\noindent That is, we want to find the first position where the
-result of $f$ is not \texttt{None}, if there is one. Note that
-`inside' \texttt{first}, you do not (need to) know anything about
-the argument $f$ except its type, namely \texttt{Pos =>
-Option[Path]}. If you want to find out what the result of $f$ is
-on a particular argument, say $x$, you can just write $f(x)$.
-There is one additional point however you should
-take into account when implementing \texttt{first}: you will need to
-calculate what the result of $f(x)$ is; your code should do this
-only \textbf{once} and for as \textbf{few} elements in the list as
-possible! Do not calculate $f(x)$ for all elements and then see which
-is the first \texttt{Some}.\\\mbox{}\hfill[1 Mark]
-\item[(5)] Implement a \texttt{first\_tour} function that uses the
+\begin{tikzpicture}
-\texttt{first}-function from (4), and searches recursively for single tour.
+\begin{axis}[
-As there might not be such a tour at all, the \texttt{first\_tour} function
+xlabel={$n$},
-needs to return a value of type
+x label style={at={(1.05,0.0)}},
-\texttt{Option[Path]}.\\\mbox{}\hfill[1 Mark]
+ylabel={time in secs},
-\end{itemize}
+y label style={at={(0.06,0.5)}},
+enlargelimits=false,
-\noindent
+xtick={0,5,...,30},
-\textbf{Testing:} The \texttt{first\_tour} function will be called with board
+xmax=33,
-sizes of up to $8 \times 8$.
+ymax=45,
-\bigskip
+ytick={0,5,...,40},
+scaled ticks=false,
-%%\newpage
+axis lines=left,
+width=6cm,
-\noindent
+height=5.5cm,
-As you should have seen in the earlier parts, a naive search for tours beyond
+legend entries={Python, Java 8, JavaScript, Swift, Dart},
-$8 \times 8$ boards and also searching for closed tours even on small
+legend pos=north west,
-boards takes too much time. There is a heuristics, called \emph{Warnsdorf's
+legend cell align=left]
-Rule} that can speed up finding a tour. This heuristics states that a
+\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
-knight is moved so that it always proceeds to the field from which the
+\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
-knight will have the \underline{fewest} onward moves.  For example for
+\addplot[red,mark=*, mark options={fill=white}] table {re-js.data};
-a knight on field $(1, 3)$, the field $(0, 1)$ has the fewest possible
+\addplot[magenta,mark=*, mark options={fill=white}] table {re-swift.data};
-onward moves, namely 2.
+\addplot[brown,mark=*, mark options={fill=white}] table {re-dart.data};
+\end{axis}
-\chessboard[maxfield=g7,
+\end{tikzpicture}
-pgfstyle= {[base,at={\pgfpoint{0pt}{-0.5ex}}]text},
+&
-text = \small 3, markfield=Z5,
+\begin{tikzpicture}
-text = \small 7, markfield=b5,
+\begin{axis}[
-text = \small 7, markfield=c4,
+xlabel={$n$},
-text = \small 7, markfield=c2,
+x label style={at={(1.05,0.0)}},
-text = \small 5, markfield=b1,
+ylabel={time in secs},
-text = \small 2, markfield=Z1,
+y label style={at={(0.06,0.5)}},
-setpieces={Na3}]
+%enlargelimits=false,
+%xtick={0,5000,...,30000},
-\noindent
+xmax=65000,
-Warnsdorf's Rule states that the moves on the board above should be
+ymax=45,
-tried in the order
+ytick={0,5,...,40},
+scaled ticks=false,
-\[
+axis lines=left,
-(0, 1), (0, 5), (2, 1), (2, 5), (3, 4), (3, 2)
+width=6cm,
-\]
+height=5.5cm,
+legend entries={Java 9},
-\noindent
+legend pos=north west]
-Whenever there are ties, the corresponding onward moves can be in any
+\addplot[cyan,mark=*, mark options={fill=white}] table {re-java9.data};
-order.  When calculating the number of onward moves for each field, we
+\end{axis}
-do not count moves that revisit any field already visited.
+\end{tikzpicture}
+\end{tabular}
-\subsubsection*{Tasks (file knight2.scala)}
+\end{center}
+\newpage
-\begin{itemize}
-\item[(6)] Write a function \texttt{ordered\_moves} that calculates a list of
-onward moves like in (2) but orders them according to
-Warnsdorf’s Rule. That means moves with the fewest legal onward moves
-should come first (in order to be tried out first). \hfill[1 Mark]
-\item[(7)] Implement a \texttt{first\_closed\_tour\_heuristics}
-function that searches for a single
-\textbf{closed} tour on a $6\times 6$ board. It should try out
-onward moves according to
-the \texttt{ordered\_moves} function from (6). It is more likely to find
-a solution when started in the middle of the board (that is
-position $(dimension / 2, dimension / 2)$). \hfill[1 Mark]
-\item[(8)] Implement a \texttt{first\_tour\_heuristics} function
-for boards up to
-$30\times 30$.  It is the same function as in (7) but searches for
-tours (not just closed tours). It might be called with any field on the
-board as starting field.\\
-%You have to be careful to write a
-%tail-recursive function of the \texttt{first\_tour\_heuristics} function
-%otherwise you will get problems with stack-overflows.\\
-\mbox{}\hfill[1 Mark]
-\end{itemize}
-\subsubsection*{Task (file knight3.scala)}
-\begin{itemize}
-\item[(9)] Implement a function \texttt{tour\_on\_mega\_board} which is
-the same function as in (8), \textbf{but} should be able to
-deal with boards up to
-$70\times 70$ \textbf{within 30 seconds} (on my laptop). This will be tested
-by starting from field $(0, 0)$. You have to be careful to
-write a tail-recursive function otherwise you will get problems
-with stack-overflows. Please observe the requirements about
-the submissions: no tricks involving \textbf{.par}.\medskip
-The timelimit of 30 seconds is with respect to the laptop on which the
-marking will happen. You can roughly estimate how well your
-implementation performs by running \texttt{knight3.jar} on your
-computer. For example the reference implementation shows
-on my laptop:
-\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
-$ scala -cp knight3.jar
-scala> CW8c.tour_on_mega_board(70, List((0, 0)))
-Time needed: 9.484 secs.
-...<<long_list>>...
-\end{lstlisting}%$
-\mbox{}\hfill[1 Mark]
-\end{itemize}
-\bigskip
 \end{document}
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: t
 %%% End:

changeset 351	591b9005157e
parent 311	a479ec3ea536
child 356	d1046d9d3213