pep-material: comparison cws/cw02.tex

equal deleted inserted replaced

-:018b9c12ee1f
+:f7bcb27d1940
 \documentclass{article}
-\usepackage{chessboard}
-\usepackage[LSBC4,T1]{fontenc}
-\let\clipbox\relax
 \usepackage{../style}
 \usepackage{disclaimer}
+\usepackage{../langs}
 \begin{document}
-\setchessboard{smallboard,
-zero,
-showmover=false,
-boardfontencoding=LSBC4,
-hlabelformat=\arabic{ranklabel},
-vlabelformat=\arabic{filelabel}}
-\mbox{}\\[-18mm]\mbox{}
+\section*{Coursework 7 (DocDiff and Danube.org)}
-\section*{Coursework 7 (Scala, Knight's Tour)}
+This coursework is worth 10\%. The first part and second part are due
+on 22 November at 11pm; the third, more advanced part, is due on 21
-This coursework is worth 10\%. It is about searching and
+December at 11pm. You are asked to implement Scala programs for
-backtracking. The first part is due on 23 November at 11pm; the
+measuring similarity in texts and for recommending movies
-second, more advanced part, is due on 21 December at 11pm. You are
+according to a ratings list.  Note the second part might include
-asked to implement Scala programs that solve various versions of the
+material you have not yet seen in the first two lectures. \bigskip
-\textit{Knight's Tour Problem} on a chessboard. Note the second part
-might include material you have not yet seen in the first two
-lectures. \bigskip
 \IMPORTANT{}
+\noindent
 Also note that the running time of each part will be restricted to a
-maximum of 360 seconds on my laptop: If you calculate a result once,
+maximum of 30 seconds on my laptop.
-try to avoid to calculate the result again. Feel free to copy any code
-you need from files \texttt{knight1.scala}, \texttt{knight2.scala} and
-\texttt{knight3.scala}.
 \DISCLAIMER{}
-\subsection*{Background}
-The \textit{Knight's Tour Problem} is about finding a tour such that
+\subsection*{Reference Implementation}
-the knight visits every field on an $n\times n$ chessboard once. For
-example on a $5\times 5$ chessboard, a knight's tour is:
-\chessboard[maxfield=d4,
+Like the C++ assignments, the Scala assignments will work like this: you
-pgfstyle= {[base,at={\pgfpoint{0pt}{-0.5ex}}]text},
+push your files to GitHub and receive (after sometimes a long delay) some
-text = \small 24, markfield=Z4,
+automated feedback. In the end we take a snapshot of the submitted files and
-text = \small 11, markfield=a4,
+apply an automated marking script to them.
-text = \small  6, markfield=b4,
-text = \small 17, markfield=c4,
-text = \small  0, markfield=d4,
-text = \small 19, markfield=Z3,
-text = \small 16, markfield=a3,
-text = \small 23, markfield=b3,
-text = \small 12, markfield=c3,
-text = \small  7, markfield=d3,
-text = \small 10, markfield=Z2,
-text = \small  5, markfield=a2,
-text = \small 18, markfield=b2,
-text = \small  1, markfield=c2,
-text = \small 22, markfield=d2,
-text = \small 15, markfield=Z1,
-text = \small 20, markfield=a1,
-text = \small  3, markfield=b1,
-text = \small  8, markfield=c1,
-text = \small 13, markfield=d1,
-text = \small  4, markfield=Z0,
-text = \small  9, markfield=a0,
-text = \small 14, markfield=b0,
-text = \small 21, markfield=c0,
-text = \small  2, markfield=d0
-]
-\noindent
-The tour starts in the right-upper corner, then moves to field
-$(3,2)$, then $(4,0)$ and so on. There are no knight's tours on
-$2\times 2$, $3\times 3$ and $4\times 4$ chessboards, but for every
-bigger board there is.
-A knight's tour is called \emph{closed}, if the last step in the tour
+In addition, the Scala assignments come with a reference
-is within a knight's move to the beginning of the tour. So the above
+implementation in form of a \texttt{jar}-file. This allows you to run
-knight's tour is \underline{not} closed because the last
+any test cases on your own computer. For example you can call Scala on
-step on field $(0, 4)$ is not within the reach of the first step on
+the command line with the option \texttt{-cp docdiff.jar} and then
-$(4, 4)$. It turns out there is no closed knight's tour on a $5\times
+query any function from the template file. Say you want to find out
-5$ board. But there are on a $6\times 6$ board and on bigger ones, for
+what the function \texttt{occurences} produces: for this you just need
-example
+to prefix it with the object name \texttt{CW7a} (and \texttt{CW7b}
+respectively for \texttt{danube.jar}).  If you want to find out what
+these functions produce for the list \texttt{List("a", "b", "b")},
+you would type something like:
-\chessboard[maxfield=e5,
+\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
-pgfstyle={[base,at={\pgfpoint{0pt}{-0.5ex}}]text},
+$ scala -cp docdiff.jar
-text = \small 10, markfield=Z5,
-text = \small  5, markfield=a5,
+scala> CW7a.occurences(List("a", "b", "b"))
-text = \small 18, markfield=b5,
+...
-text = \small 25, markfield=c5,
+\end{lstlisting}%$
-text = \small 16, markfield=d5,
-text = \small  7, markfield=e5,
+\subsection*{Hints}
-text = \small 31, markfield=Z4,
-text = \small 26, markfield=a4,
-text = \small  9, markfield=b4,
-text = \small  6, markfield=c4,
-text = \small 19, markfield=d4,
-text = \small 24, markfield=e4,
-% 4  11  30  17   8  15
-text = \small  4, markfield=Z3,
-text = \small 11, markfield=a3,
-text = \small 30, markfield=b3,
-text = \small 17, markfield=c3,
-text = \small  8, markfield=d3,
-text = \small 15, markfield=e3,
-%29  32  27   0  23  20
-text = \small 29, markfield=Z2,
-text = \small 32, markfield=a2,
-text = \small 27, markfield=b2,
-text = \small  0, markfield=c2,
-text = \small 23, markfield=d2,
-text = \small 20, markfield=e2,
-%12   3  34  21  14   1
-text = \small 12, markfield=Z1,
-text = \small  3, markfield=a1,
-text = \small 34, markfield=b1,
-text = \small 21, markfield=c1,
-text = \small 14, markfield=d1,
-text = \small  1, markfield=e1,
-%33  28  13   2  35  22
-text = \small 33, markfield=Z0,
-text = \small 28, markfield=a0,
-text = \small 13, markfield=b0,
-text = \small  2, markfield=c0,
-text = \small 35, markfield=d0,
-text = \small 22, markfield=e0,
-vlabel=false,
-hlabel=false
-]
-\noindent
-where the 35th move can join up again with the 0th move.
-If you cannot remember how a knight moves in chess, or never played
-chess, below are all potential moves indicated for two knights, one on
-field $(2, 2)$ (blue moves) and another on $(7, 7)$ (red moves):
-\chessboard[maxfield=g7,
+\newpage
-color=blue!50,
+\subsection*{Part 1 (4 Marks, file docdiff.scala)}
-linewidth=0.2em,
-shortenstart=0.5ex,
-shortenend=0.5ex,
-markstyle=cross,
-markfields={a4, c4, Z3, d3, Z1, d1, a0, c0},
-color=red!50,
-markfields={f5, e6},
-setpieces={Ng7, Nb2}]
-\subsection*{Part 1 (7 Marks)}
+It seems source code plagiarism---stealing someone else's code---is a
+serious problem at other universities.\footnote{Surely, King's
-You are asked to implement the knight's tour problem such that the
+students, after all their instructions and warnings, would never
-dimension of the board can be changed.  Therefore most functions will
+commit such an offence.} Dedecting such plagiarism is time-consuming
-take the dimension of the board as an argument.  The fun with this
+and disheartening. To aid the poor lecturers at other universities,
-problem is that even for small chessboard dimensions it has already an
+let's implement a program that determines the similarity between two
-incredibly large search space---finding a tour is like finding a
+documents (be they code or English texts). A document will be
-needle in a haystack. In the first task we want to see how far we get
+represented as a list of strings.
-with exhaustively exploring the complete search space for small
-chessboards.\medskip
-\noindent
-Let us first fix the basic datastructures for the implementation.  The
-board dimension is an integer (we will never go beyond board sizes of
-$40 \times 40$).  A \emph{position} (or field) on the chessboard is
-a pair of integers, like $(0, 0)$. A \emph{path} is a list of
-positions. The first (or 0th move) in a path is the last element in
-this list; and the last move in the path is the first element. For
-example the path for the $5\times 5$ chessboard above is represented
-by
-\[
-\texttt{List($\underbrace{\texttt{(0, 4)}}_{24}$,
-$\underbrace{\texttt{(2, 3)}}_{23}$, ...,
-$\underbrace{\texttt{(3, 2)}}_1$, $\underbrace{\texttt{(4, 4)}}_0$)}
-\]
-\noindent
-Suppose the dimension of a chessboard is $n$, then a path is a
-\emph{tour} if the length of the path is $n \times n$, each element
-occurs only once in the path, and each move follows the rules of how a
-knight moves (see above for the rules).
-\subsubsection*{Tasks (file knight1.scala)}
+\subsection*{Tasks}
 \begin{itemize}
-\item[(1a)] Implement an \texttt{is\_legal\_move} function that takes a
+\item[(1)] Implement a function that cleans a string by finding all
-dimension, a path and a position as arguments and tests whether the
+words in this string. For this use the regular expression
-position is inside the board and not yet element in the
+\texttt{"$\backslash$w+"} and the library function
-path. \hfill[1 Mark]
+\texttt{findAllIn}. The function should return a list of
+strings.\\
+\mbox{}\hfill [1 Mark]
-\item[(1b)] Implement a \texttt{legal\_moves} function that calculates for a
+\item[(2)] In order to compute the similarity between two documents, we
-position all legal onward moves. If the onward moves are
+associate each document with a \texttt{Map}. This Map represents the
-placed on a circle, you should produce them starting from
+strings in a document and how many times these strings occur in a
-``12-o'clock'' following in clockwise order.  For example on an
+document. A simple (though slightly inefficient) method for counting
-$8\times 8$ board for a knight at position $(2, 2)$ and otherwise
+the number of string-occurences in a document is as follows: remove
-empty board, the legal-moves function should produce the onward
+all duplicates from the document; for each of these (unique)
-positions in this order:
+strings, count how many times they occur in the original document.
+Return a Map from strings to occurences. For example
 \begin{center}
-\texttt{List((3,4), (4,3), (4,1), (3,0), (1,0), (0,1), (0,3), (1,4))}
+\pcode{occurences(List("a", "b", "b", "c", "d"))}
 \end{center}
-If the board is not empty, then maybe some of the moves need to be
+produces \pcode{Map(a -> 1, b -> 2, c -> 1, d -> 1)} and
-filtered out from this list.  For a knight on field $(7, 7)$ and an
-empty board, the legal moves are
 \begin{center}
-\texttt{List((6,5), (5,6))}
+\pcode{occurences(List("d", "b", "d", "b", "d"))}
 \end{center}
-\mbox{}\hfill[1 Mark]
-\item[(1c)] Implement two recursive functions (\texttt{count\_tours} and
+produces \pcode{Map(d -> 3, b -> 2)}.\hfill[1 Mark]
-\texttt{enum\_tours}). They each take a dimension and a path as
-arguments. They exhaustively search for tours starting
+\item[(3)] You can think of the Maps calculated under (2) as efficient
-from the given path. The first function counts all possible
+representations of sparse ``vectors''. In this subtask you need to
-tours (there can be none for certain board sizes) and the second
+implement the \emph{product} of two vectors, sometimes also called
-collects all tours in a list of paths.\hfill[2 Marks]
+\emph{dot product}.\footnote{\url{https://en.wikipedia.org/wiki/Dot_product}}
+For this implement a function that takes two documents
+(\texttt{List[String]}) as arguments. The function first calculates
+the (unique) strings in both. For each string, it multiplies the
+occurences in each document. If a string does not occur in one of the
+documents, then the product is zero. At the end you
+sum all products. For the two documents in (2) the dot product is 7:
+\[
+\underbrace{1 * 0}_{"a"} \;\;+\;\;
+\underbrace{2 * 2}_{"b"} \;\;+\;\;
+\underbrace{1 * 0}_{"c"} \;\;+\;\;
+\underbrace{1 * 3}_{"d"}
+\]
+\hfill\mbox{[1 Mark]}
+\item[(4)] Implement first a function that calculates the overlap
+between two documents, say $d_1$ and $d_2$, according to the formula
+\[
+\texttt{overlap}(d_1, d_2) = \frac{d_1 \cdot d_2}{max(d_1^2, d_2^2)}
+\]
+This function should return a \texttt{Double} between 0 and 1. The
+overlap between the lists in (2) is $0.5384615384615384$.
+Second implement a function that calculates the similarity of
+two strings, by first extracting the strings using the function from (1)
+and then calculating the overlap.
+\hfill\mbox{[1 Mark]}
 \end{itemize}
-\noindent \textbf{Test data:} For the marking, the functions in (1c)
-will be called with board sizes up to $5 \times 5$. If you search
-for tours on a $5 \times 5$ board starting only from field $(0, 0)$,
-there are 304 of tours. If you try out every field of a $5 \times
-5$-board as a starting field and add up all tours, you obtain
-1728. A $6\times 6$ board is already too large to be searched
-exhaustively.\footnote{For your interest, the number of tours on
-$6\times 6$, $7\times 7$ and $8\times 8$ are 6637920, 165575218320,
-19591828170979904, respectively.}\bigskip
-\noindent
-\textbf{Hints:} useful list functions: \texttt{.contains(..)} checks
-whether an element is in a list, \texttt{.flatten} turns a list of
-lists into just a list, \texttt{\_::\_} puts an element on the head of
-the list, \texttt{.head} gives you the first element of a list (make
-sure the list is not \texttt{Nil}).
-\subsubsection*{Tasks (file knight2.scala)}
-\begin{itemize}
-\item[(2a)] Implement a \texttt{first}-function. This function takes a list of
-positions and a function $f$ as arguments; $f$ is the name we give to
-this argument). The function $f$ takes a position as argument and
-produces an optional path. So $f$'s type is \texttt{Pos =>
-Option[Path]}. The idea behind the \texttt{first}-function is as follows:
-\[
-\begin{array}{lcl}
-\textit{first}(\texttt{Nil}, f) & \dn & \texttt{None}\\
-\textit{first}(x\!::\!xs, f) & \dn & \begin{cases}
-f(x) & \textit{if}\;f(x) \not=\texttt{None}\\
-\textit{first}(xs, f) & \textit{otherwise}\\
-\end{cases}
-\end{array}
-\]
-\noindent That is, we want to find the first position where the
-result of $f$ is not \texttt{None}, if there is one. Note that
-`inside' \texttt{first}, you do not (need to) know anything about
-the argument $f$ except its type, namely \texttt{Pos =>
-Option[Path]}. There is one additional point however you should
-take into account when implementing \texttt{first}: you will need to
-calculate what the result of $f(x)$ is; your code should do this
-only \textbf{once} and for as \textbf{few} elements in the list as
-possible! Do not calculate $f(x)$ for all elements and then see which
-is the first \texttt{Some}.\\\mbox{}\hfill[1 Mark]
-\item[(2b)] Implement a \texttt{first\_tour} function that uses the
-\texttt{first}-function from (2a), and searches recursively for a tour.
-As there might not be such a tour at all, the \texttt{first\_tour} function
-needs to return a value of type
-\texttt{Option[Path]}.\\\mbox{}\hfill[2 Marks]
-\end{itemize}
-\noindent
-\textbf{Testing:} The \texttt{first\_tour} function will be called with board
-sizes of up to $8 \times 8$.
-\bigskip
-\noindent
-\textbf{Hints:} a useful list function: \texttt{.filter(..)} filters a
-list according to a boolean function; a useful option function:
-\texttt{.isDefined} returns true, if an option is \texttt{Some(..)};
-anonymous functions can be constructed using \texttt{(x:Int) => ...},
-this functions takes an \texttt{Int} as an argument.
-%%\newpage
+\newpage
-\subsection*{Part 2 (3 Marks)}
+You are creating Danube.org, which you hope will be the next big thing
+in online movie provider. You know that you can save money by
+anticipating what movies people will rent; you will pass these savings
+on to your users by offering a discount if they rent movies that Danube.org
+recommends.  This assignment is meant to calculate
-As you should have seen in Part 1, a naive search for tours beyond
-$8 \times 8$ boards and also searching for closed tours even on small
-boards takes too much time. There is a heuristic, called \emph{Warnsdorf's
-Rule} that can speed up finding a tour. This heuristic states that a
-knight is moved so that it always proceeds to the field from which the
-knight will have the \underline{fewest} onward moves.  For example for
-a knight on field $(1, 3)$, the field $(0, 1)$ has the fewest possible
-onward moves, namely 2.
-\chessboard[maxfield=g7,
+To do this, you offer an incentive for people to upload their lists of
-pgfstyle= {[base,at={\pgfpoint{0pt}{-0.5ex}}]text},
+recommended books. From their lists, you can establish suggested
-text = \small 3, markfield=Z5,
+pairs. A pair of books is a suggested pair if both books appear on one
-text = \small 7, markfield=b5,
+person’s recommendation list. Of course, some suggested pairs are more
-text = \small 7, markfield=c4,
+popular than others. Also, any given book is paired with some books
-text = \small 7, markfield=c2,
+much more frequently than with others.
-text = \small 5, markfield=b1,
-text = \small 2, markfield=Z1,
-setpieces={Na3}]
-\noindent
-Warnsdorf's Rule states that the moves on the board above should be
-tried in the order
-\[
-(0, 1), (0, 5), (2, 1), (2, 5), (3, 4), (3, 2)
-\]
-\noindent
-Whenever there are ties, the corresponding onward moves can be in any
-order.  When calculating the number of onward moves for each field, we
-do not count moves that revisit any field already visited.
-\subsubsection*{Tasks (file knight3.scala)}
-\begin{itemize}
-\item[(3a)] Write a function \texttt{ordered\_moves} that calculates a list of
-onward moves like in (1b) but orders them according to the
-Warnsdorf’s Rule. That means moves with the fewest legal onward moves
-should come first (in order to be tried out first). \hfill[1 Mark]
-\item[(3b)] Implement a \texttt{first\_closed-tour\_heuristic}
-function that searches for a
-\textbf{closed} tour on a $6\times 6$ board. It should use the
-\texttt{first}-function from (2a) and tries out onward moves according to
-the \texttt{ordered\_moves} function from (3a). It is more likely to find
-a solution when started in the middle of the board (that is
-position $(dimension / 2, dimension / 2)$). \hfill[1 Mark]
-\item[(3c)] Implement a \texttt{first\_tour\_heuristic} function
-for boards up to
-$40\times 40$.  It is the same function as in (3b) but searches for
-tours (not just closed tours). You have to be careful to write a
-tail-recursive function of the \texttt{first\_tour\_heuristic} function
-otherwise you will get problems with stack-overflows.\\
-\mbox{}\hfill[1 Mark]
-\end{itemize}
-\bigskip
-\noindent
-\textbf{Hints:} a useful list function: \texttt{.sortBy} sorts a list
-according to a component given by the function; a function can be
-tested to be tail recursive by annotation \texttt{@tailrec}, which is
-made available by importing \texttt{scala.annotation.tailrec}.
 \end{document}

changeset 202	f7bcb27d1940
parent 166	780c40aaad27
child 203	eb188f9ac038