cws/cw02.tex
changeset 202 f7bcb27d1940
parent 166 780c40aaad27
child 203 eb188f9ac038
equal deleted inserted replaced
201:018b9c12ee1f 202:f7bcb27d1940
     1 \documentclass{article}
     1 \documentclass{article}
     2 \usepackage{chessboard}
       
     3 \usepackage[LSBC4,T1]{fontenc}
       
     4 \let\clipbox\relax
       
     5 \usepackage{../style}
     2 \usepackage{../style}
     6 \usepackage{disclaimer}
     3 \usepackage{disclaimer}
       
     4 \usepackage{../langs}
     7 
     5 
     8 \begin{document}
     6 \begin{document}
     9 
     7 
    10 \setchessboard{smallboard,
       
    11                zero,
       
    12                showmover=false,
       
    13                boardfontencoding=LSBC4,
       
    14                hlabelformat=\arabic{ranklabel},
       
    15                vlabelformat=\arabic{filelabel}}
       
    16 
     8 
    17 \mbox{}\\[-18mm]\mbox{}
     9 \section*{Coursework 7 (DocDiff and Danube.org)}
    18 
    10 
    19 \section*{Coursework 7 (Scala, Knight's Tour)}
    11 This coursework is worth 10\%. The first part and second part are due
    20 
    12 on 22 November at 11pm; the third, more advanced part, is due on 21
    21 This coursework is worth 10\%. It is about searching and
    13 December at 11pm. You are asked to implement Scala programs for
    22 backtracking. The first part is due on 23 November at 11pm; the
    14 measuring similarity in texts and for recommending movies
    23 second, more advanced part, is due on 21 December at 11pm. You are
    15 according to a ratings list.  Note the second part might include
    24 asked to implement Scala programs that solve various versions of the
    16 material you have not yet seen in the first two lectures. \bigskip
    25 \textit{Knight's Tour Problem} on a chessboard. Note the second part
       
    26 might include material you have not yet seen in the first two
       
    27 lectures. \bigskip
       
    28 
    17 
    29 \IMPORTANT{}
    18 \IMPORTANT{}
       
    19 
       
    20 \noindent
    30 Also note that the running time of each part will be restricted to a
    21 Also note that the running time of each part will be restricted to a
    31 maximum of 360 seconds on my laptop: If you calculate a result once,
    22 maximum of 30 seconds on my laptop.
    32 try to avoid to calculate the result again. Feel free to copy any code
       
    33 you need from files \texttt{knight1.scala}, \texttt{knight2.scala} and
       
    34 \texttt{knight3.scala}.
       
    35 
    23 
    36 \DISCLAIMER{}
    24 \DISCLAIMER{}
    37 
    25 
    38 \subsection*{Background}
       
    39 
    26 
    40 The \textit{Knight's Tour Problem} is about finding a tour such that
    27 \subsection*{Reference Implementation}
    41 the knight visits every field on an $n\times n$ chessboard once. For
       
    42 example on a $5\times 5$ chessboard, a knight's tour is:
       
    43 
    28 
    44 \chessboard[maxfield=d4, 
    29 Like the C++ assignments, the Scala assignments will work like this: you
    45             pgfstyle= {[base,at={\pgfpoint{0pt}{-0.5ex}}]text},
    30 push your files to GitHub and receive (after sometimes a long delay) some
    46             text = \small 24, markfield=Z4,
    31 automated feedback. In the end we take a snapshot of the submitted files and
    47             text = \small 11, markfield=a4,
    32 apply an automated marking script to them.
    48             text = \small  6, markfield=b4,
       
    49             text = \small 17, markfield=c4,
       
    50             text = \small  0, markfield=d4,
       
    51             text = \small 19, markfield=Z3,
       
    52             text = \small 16, markfield=a3,
       
    53             text = \small 23, markfield=b3,
       
    54             text = \small 12, markfield=c3,
       
    55             text = \small  7, markfield=d3,
       
    56             text = \small 10, markfield=Z2,
       
    57             text = \small  5, markfield=a2,
       
    58             text = \small 18, markfield=b2,
       
    59             text = \small  1, markfield=c2,
       
    60             text = \small 22, markfield=d2,
       
    61             text = \small 15, markfield=Z1,
       
    62             text = \small 20, markfield=a1,
       
    63             text = \small  3, markfield=b1,
       
    64             text = \small  8, markfield=c1,
       
    65             text = \small 13, markfield=d1,
       
    66             text = \small  4, markfield=Z0,
       
    67             text = \small  9, markfield=a0,
       
    68             text = \small 14, markfield=b0,
       
    69             text = \small 21, markfield=c0,
       
    70             text = \small  2, markfield=d0
       
    71            ]
       
    72            
       
    73 \noindent
       
    74 The tour starts in the right-upper corner, then moves to field
       
    75 $(3,2)$, then $(4,0)$ and so on. There are no knight's tours on
       
    76 $2\times 2$, $3\times 3$ and $4\times 4$ chessboards, but for every
       
    77 bigger board there is. 
       
    78 
    33 
    79 A knight's tour is called \emph{closed}, if the last step in the tour
    34 In addition, the Scala assignments come with a reference
    80 is within a knight's move to the beginning of the tour. So the above
    35 implementation in form of a \texttt{jar}-file. This allows you to run
    81 knight's tour is \underline{not} closed because the last
    36 any test cases on your own computer. For example you can call Scala on
    82 step on field $(0, 4)$ is not within the reach of the first step on
    37 the command line with the option \texttt{-cp docdiff.jar} and then
    83 $(4, 4)$. It turns out there is no closed knight's tour on a $5\times
    38 query any function from the template file. Say you want to find out
    84 5$ board. But there are on a $6\times 6$ board and on bigger ones, for
    39 what the function \texttt{occurences} produces: for this you just need
    85 example
    40 to prefix it with the object name \texttt{CW7a} (and \texttt{CW7b}
       
    41 respectively for \texttt{danube.jar}).  If you want to find out what
       
    42 these functions produce for the list \texttt{List("a", "b", "b")},
       
    43 you would type something like:
    86 
    44 
    87 \chessboard[maxfield=e5, 
    45 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
    88             pgfstyle={[base,at={\pgfpoint{0pt}{-0.5ex}}]text},
    46 $ scala -cp docdiff.jar
    89             text = \small 10, markfield=Z5,
    47   
    90             text = \small  5, markfield=a5,
    48 scala> CW7a.occurences(List("a", "b", "b"))
    91             text = \small 18, markfield=b5,
    49 ...
    92             text = \small 25, markfield=c5,
    50 \end{lstlisting}%$
    93             text = \small 16, markfield=d5,
    51 
    94             text = \small  7, markfield=e5,
    52 \subsection*{Hints}
    95             text = \small 31, markfield=Z4,
       
    96             text = \small 26, markfield=a4,
       
    97             text = \small  9, markfield=b4,
       
    98             text = \small  6, markfield=c4,
       
    99             text = \small 19, markfield=d4,
       
   100             text = \small 24, markfield=e4,
       
   101             % 4  11  30  17   8  15 
       
   102             text = \small  4, markfield=Z3,
       
   103             text = \small 11, markfield=a3,
       
   104             text = \small 30, markfield=b3,
       
   105             text = \small 17, markfield=c3,
       
   106             text = \small  8, markfield=d3,
       
   107             text = \small 15, markfield=e3,
       
   108             %29  32  27   0  23  20 
       
   109             text = \small 29, markfield=Z2,
       
   110             text = \small 32, markfield=a2,
       
   111             text = \small 27, markfield=b2,
       
   112             text = \small  0, markfield=c2,
       
   113             text = \small 23, markfield=d2,
       
   114             text = \small 20, markfield=e2,
       
   115             %12   3  34  21  14   1 
       
   116             text = \small 12, markfield=Z1,
       
   117             text = \small  3, markfield=a1,
       
   118             text = \small 34, markfield=b1,
       
   119             text = \small 21, markfield=c1,
       
   120             text = \small 14, markfield=d1,
       
   121             text = \small  1, markfield=e1,
       
   122             %33  28  13   2  35  22 
       
   123             text = \small 33, markfield=Z0,
       
   124             text = \small 28, markfield=a0,
       
   125             text = \small 13, markfield=b0,
       
   126             text = \small  2, markfield=c0,
       
   127             text = \small 35, markfield=d0,
       
   128             text = \small 22, markfield=e0,
       
   129             vlabel=false,
       
   130             hlabel=false
       
   131            ]
       
   132 
    53 
   133 
    54 
   134 \noindent
       
   135 where the 35th move can join up again with the 0th move.
       
   136 
       
   137 If you cannot remember how a knight moves in chess, or never played
       
   138 chess, below are all potential moves indicated for two knights, one on
       
   139 field $(2, 2)$ (blue moves) and another on $(7, 7)$ (red moves):
       
   140 
    55 
   141 
    56 
   142 \chessboard[maxfield=g7,
    57 \newpage
   143             color=blue!50,
    58 \subsection*{Part 1 (4 Marks, file docdiff.scala)}
   144             linewidth=0.2em,
       
   145             shortenstart=0.5ex,
       
   146             shortenend=0.5ex,
       
   147             markstyle=cross,
       
   148             markfields={a4, c4, Z3, d3, Z1, d1, a0, c0},
       
   149             color=red!50,
       
   150             markfields={f5, e6},
       
   151             setpieces={Ng7, Nb2}]
       
   152 
    59 
   153 \subsection*{Part 1 (7 Marks)}
    60 It seems source code plagiarism---stealing someone else's code---is a
   154 
    61 serious problem at other universities.\footnote{Surely, King's
   155 You are asked to implement the knight's tour problem such that the
    62   students, after all their instructions and warnings, would never
   156 dimension of the board can be changed.  Therefore most functions will
    63   commit such an offence.} Dedecting such plagiarism is time-consuming
   157 take the dimension of the board as an argument.  The fun with this
    64 and disheartening. To aid the poor lecturers at other universities,
   158 problem is that even for small chessboard dimensions it has already an
    65 let's implement a program that determines the similarity between two
   159 incredibly large search space---finding a tour is like finding a
    66 documents (be they code or English texts). A document will be
   160 needle in a haystack. In the first task we want to see how far we get
    67 represented as a list of strings.
   161 with exhaustively exploring the complete search space for small
       
   162 chessboards.\medskip
       
   163 
       
   164 \noindent
       
   165 Let us first fix the basic datastructures for the implementation.  The
       
   166 board dimension is an integer (we will never go beyond board sizes of
       
   167 $40 \times 40$).  A \emph{position} (or field) on the chessboard is
       
   168 a pair of integers, like $(0, 0)$. A \emph{path} is a list of
       
   169 positions. The first (or 0th move) in a path is the last element in
       
   170 this list; and the last move in the path is the first element. For
       
   171 example the path for the $5\times 5$ chessboard above is represented
       
   172 by
       
   173 
       
   174 \[
       
   175 \texttt{List($\underbrace{\texttt{(0, 4)}}_{24}$,
       
   176   $\underbrace{\texttt{(2, 3)}}_{23}$, ...,
       
   177   $\underbrace{\texttt{(3, 2)}}_1$, $\underbrace{\texttt{(4, 4)}}_0$)}
       
   178 \]
       
   179 
       
   180 \noindent
       
   181 Suppose the dimension of a chessboard is $n$, then a path is a
       
   182 \emph{tour} if the length of the path is $n \times n$, each element
       
   183 occurs only once in the path, and each move follows the rules of how a
       
   184 knight moves (see above for the rules).
       
   185 
    68 
   186 
    69 
   187 \subsubsection*{Tasks (file knight1.scala)}
    70 \subsection*{Tasks}
   188 
    71 
   189 \begin{itemize}
    72 \begin{itemize}
   190 \item[(1a)] Implement an \texttt{is\_legal\_move} function that takes a
    73 \item[(1)] Implement a function that cleans a string by finding all
   191   dimension, a path and a position as arguments and tests whether the
    74   words in this string. For this use the regular expression
   192   position is inside the board and not yet element in the
    75   \texttt{"$\backslash$w+"} and the library function
   193   path. \hfill[1 Mark]
    76   \texttt{findAllIn}. The function should return a list of
       
    77   strings.\\
       
    78   \mbox{}\hfill [1 Mark]
   194 
    79 
   195 \item[(1b)] Implement a \texttt{legal\_moves} function that calculates for a
    80 \item[(2)] In order to compute the similarity between two documents, we
   196   position all legal onward moves. If the onward moves are
    81   associate each document with a \texttt{Map}. This Map represents the
   197   placed on a circle, you should produce them starting from
    82   strings in a document and how many times these strings occur in a
   198   ``12-o'clock'' following in clockwise order.  For example on an
    83   document. A simple (though slightly inefficient) method for counting
   199   $8\times 8$ board for a knight at position $(2, 2)$ and otherwise
    84   the number of string-occurences in a document is as follows: remove
   200   empty board, the legal-moves function should produce the onward
    85   all duplicates from the document; for each of these (unique)
   201   positions in this order:
    86   strings, count how many times they occur in the original document.
       
    87   Return a Map from strings to occurences. For example
   202 
    88 
   203   \begin{center}
    89   \begin{center}
   204   \texttt{List((3,4), (4,3), (4,1), (3,0), (1,0), (0,1), (0,3), (1,4))}
    90   \pcode{occurences(List("a", "b", "b", "c", "d"))}
   205   \end{center}
    91   \end{center}
   206 
    92 
   207   If the board is not empty, then maybe some of the moves need to be
    93   produces \pcode{Map(a -> 1, b -> 2, c -> 1, d -> 1)} and
   208   filtered out from this list.  For a knight on field $(7, 7)$ and an
       
   209   empty board, the legal moves are
       
   210 
    94 
   211   \begin{center}
    95   \begin{center}
   212   \texttt{List((6,5), (5,6))}
    96   \pcode{occurences(List("d", "b", "d", "b", "d"))}
   213   \end{center}
    97   \end{center}
   214   \mbox{}\hfill[1 Mark]
       
   215 
    98 
   216 \item[(1c)] Implement two recursive functions (\texttt{count\_tours} and
    99   produces \pcode{Map(d -> 3, b -> 2)}.\hfill[1 Mark]
   217   \texttt{enum\_tours}). They each take a dimension and a path as
   100 
   218   arguments. They exhaustively search for tours starting
   101 \item[(3)] You can think of the Maps calculated under (2) as efficient
   219   from the given path. The first function counts all possible 
   102   representations of sparse ``vectors''. In this subtask you need to
   220   tours (there can be none for certain board sizes) and the second
   103   implement the \emph{product} of two vectors, sometimes also called
   221   collects all tours in a list of paths.\hfill[2 Marks]
   104   \emph{dot product}.\footnote{\url{https://en.wikipedia.org/wiki/Dot_product}}
       
   105 
       
   106   For this implement a function that takes two documents
       
   107   (\texttt{List[String]}) as arguments. The function first calculates
       
   108   the (unique) strings in both. For each string, it multiplies the
       
   109   occurences in each document. If a string does not occur in one of the
       
   110   documents, then the product is zero. At the end you
       
   111   sum all products. For the two documents in (2) the dot product is 7:
       
   112 
       
   113   \[
       
   114     \underbrace{1 * 0}_{"a"} \;\;+\;\;
       
   115     \underbrace{2 * 2}_{"b"} \;\;+\;\;
       
   116     \underbrace{1 * 0}_{"c"} \;\;+\;\;
       
   117     \underbrace{1 * 3}_{"d"}
       
   118   \]  
       
   119   
       
   120   \hfill\mbox{[1 Mark]}
       
   121 
       
   122 \item[(4)] Implement first a function that calculates the overlap
       
   123   between two documents, say $d_1$ and $d_2$, according to the formula
       
   124 
       
   125   \[
       
   126   \texttt{overlap}(d_1, d_2) = \frac{d_1 \cdot d_2}{max(d_1^2, d_2^2)}  
       
   127   \]
       
   128 
       
   129   This function should return a \texttt{Double} between 0 and 1. The
       
   130   overlap between the lists in (2) is $0.5384615384615384$.
       
   131 
       
   132   Second implement a function that calculates the similarity of
       
   133   two strings, by first extracting the strings using the function from (1)
       
   134   and then calculating the overlap.
       
   135   \hfill\mbox{[1 Mark]}
   222 \end{itemize}
   136 \end{itemize}
   223 
   137 
   224 \noindent \textbf{Test data:} For the marking, the functions in (1c)
       
   225 will be called with board sizes up to $5 \times 5$. If you search
       
   226 for tours on a $5 \times 5$ board starting only from field $(0, 0)$,
       
   227 there are 304 of tours. If you try out every field of a $5 \times
       
   228 5$-board as a starting field and add up all tours, you obtain
       
   229 1728. A $6\times 6$ board is already too large to be searched
       
   230 exhaustively.\footnote{For your interest, the number of tours on
       
   231   $6\times 6$, $7\times 7$ and $8\times 8$ are 6637920, 165575218320,
       
   232   19591828170979904, respectively.}\bigskip
       
   233 
       
   234 \noindent
       
   235 \textbf{Hints:} useful list functions: \texttt{.contains(..)} checks
       
   236 whether an element is in a list, \texttt{.flatten} turns a list of
       
   237 lists into just a list, \texttt{\_::\_} puts an element on the head of
       
   238 the list, \texttt{.head} gives you the first element of a list (make
       
   239 sure the list is not \texttt{Nil}).
       
   240 
       
   241 \subsubsection*{Tasks (file knight2.scala)}
       
   242 
       
   243 \begin{itemize}
       
   244 \item[(2a)] Implement a \texttt{first}-function. This function takes a list of
       
   245   positions and a function $f$ as arguments; $f$ is the name we give to
       
   246   this argument). The function $f$ takes a position as argument and
       
   247   produces an optional path. So $f$'s type is \texttt{Pos =>
       
   248     Option[Path]}. The idea behind the \texttt{first}-function is as follows:
       
   249 
       
   250   \[
       
   251   \begin{array}{lcl}
       
   252   \textit{first}(\texttt{Nil}, f) & \dn & \texttt{None}\\  
       
   253   \textit{first}(x\!::\!xs, f) & \dn & \begin{cases}
       
   254     f(x) & \textit{if}\;f(x) \not=\texttt{None}\\
       
   255     \textit{first}(xs, f) & \textit{otherwise}\\
       
   256                               \end{cases}
       
   257   \end{array}
       
   258   \]
       
   259 
       
   260   \noindent That is, we want to find the first position where the
       
   261   result of $f$ is not \texttt{None}, if there is one. Note that
       
   262   `inside' \texttt{first}, you do not (need to) know anything about
       
   263   the argument $f$ except its type, namely \texttt{Pos =>
       
   264     Option[Path]}. There is one additional point however you should
       
   265   take into account when implementing \texttt{first}: you will need to
       
   266   calculate what the result of $f(x)$ is; your code should do this
       
   267   only \textbf{once} and for as \textbf{few} elements in the list as
       
   268   possible! Do not calculate $f(x)$ for all elements and then see which 
       
   269   is the first \texttt{Some}.\\\mbox{}\hfill[1 Mark]
       
   270   
       
   271 \item[(2b)] Implement a \texttt{first\_tour} function that uses the
       
   272   \texttt{first}-function from (2a), and searches recursively for a tour.
       
   273   As there might not be such a tour at all, the \texttt{first\_tour} function
       
   274   needs to return a value of type
       
   275   \texttt{Option[Path]}.\\\mbox{}\hfill[2 Marks]
       
   276 \end{itemize}
       
   277 
       
   278 \noindent
       
   279 \textbf{Testing:} The \texttt{first\_tour} function will be called with board
       
   280 sizes of up to $8 \times 8$.
       
   281 \bigskip
       
   282 
       
   283 \noindent
       
   284 \textbf{Hints:} a useful list function: \texttt{.filter(..)} filters a
       
   285 list according to a boolean function; a useful option function:
       
   286 \texttt{.isDefined} returns true, if an option is \texttt{Some(..)};
       
   287 anonymous functions can be constructed using \texttt{(x:Int) => ...},
       
   288 this functions takes an \texttt{Int} as an argument.
       
   289 
   138 
   290 
   139 
   291 %%\newpage
   140 \newpage
   292 \subsection*{Part 2 (3 Marks)}
   141 You are creating Danube.org, which you hope will be the next big thing
       
   142 in online movie provider. You know that you can save money by
       
   143 anticipating what movies people will rent; you will pass these savings
       
   144 on to your users by offering a discount if they rent movies that Danube.org
       
   145 recommends.  This assignment is meant to calculate 
   293 
   146 
   294 As you should have seen in Part 1, a naive search for tours beyond
       
   295 $8 \times 8$ boards and also searching for closed tours even on small
       
   296 boards takes too much time. There is a heuristic, called \emph{Warnsdorf's
       
   297 Rule} that can speed up finding a tour. This heuristic states that a
       
   298 knight is moved so that it always proceeds to the field from which the
       
   299 knight will have the \underline{fewest} onward moves.  For example for
       
   300 a knight on field $(1, 3)$, the field $(0, 1)$ has the fewest possible
       
   301 onward moves, namely 2.
       
   302 
   147 
   303 \chessboard[maxfield=g7,
   148 To do this, you offer an incentive for people to upload their lists of
   304             pgfstyle= {[base,at={\pgfpoint{0pt}{-0.5ex}}]text},
   149 recommended books. From their lists, you can establish suggested
   305             text = \small 3, markfield=Z5,
   150 pairs. A pair of books is a suggested pair if both books appear on one
   306             text = \small 7, markfield=b5,
   151 person’s recommendation list. Of course, some suggested pairs are more
   307             text = \small 7, markfield=c4,
   152 popular than others. Also, any given book is paired with some books
   308             text = \small 7, markfield=c2,
   153 much more frequently than with others.
   309             text = \small 5, markfield=b1,
       
   310             text = \small 2, markfield=Z1,
       
   311             setpieces={Na3}]
       
   312 
   154 
   313 \noindent
       
   314 Warnsdorf's Rule states that the moves on the board above should be
       
   315 tried in the order
       
   316 
       
   317 \[
       
   318 (0, 1), (0, 5), (2, 1), (2, 5), (3, 4), (3, 2)
       
   319 \]
       
   320 
       
   321 \noindent
       
   322 Whenever there are ties, the corresponding onward moves can be in any
       
   323 order.  When calculating the number of onward moves for each field, we
       
   324 do not count moves that revisit any field already visited.
       
   325 
       
   326 \subsubsection*{Tasks (file knight3.scala)}
       
   327 
       
   328 \begin{itemize}
       
   329 \item[(3a)] Write a function \texttt{ordered\_moves} that calculates a list of
       
   330   onward moves like in (1b) but orders them according to the
       
   331   Warnsdorf’s Rule. That means moves with the fewest legal onward moves
       
   332   should come first (in order to be tried out first). \hfill[1 Mark]
       
   333   
       
   334 \item[(3b)] Implement a \texttt{first\_closed-tour\_heuristic}
       
   335   function that searches for a
       
   336   \textbf{closed} tour on a $6\times 6$ board. It should use the
       
   337   \texttt{first}-function from (2a) and tries out onward moves according to
       
   338   the \texttt{ordered\_moves} function from (3a). It is more likely to find
       
   339   a solution when started in the middle of the board (that is
       
   340   position $(dimension / 2, dimension / 2)$). \hfill[1 Mark]
       
   341 
       
   342 \item[(3c)] Implement a \texttt{first\_tour\_heuristic} function
       
   343   for boards up to
       
   344   $40\times 40$.  It is the same function as in (3b) but searches for
       
   345   tours (not just closed tours). You have to be careful to write a
       
   346   tail-recursive function of the \texttt{first\_tour\_heuristic} function
       
   347   otherwise you will get problems with stack-overflows.\\
       
   348   \mbox{}\hfill[1 Mark]
       
   349 \end{itemize}  
       
   350 \bigskip
       
   351 
       
   352 \noindent
       
   353 \textbf{Hints:} a useful list function: \texttt{.sortBy} sorts a list
       
   354 according to a component given by the function; a function can be
       
   355 tested to be tail recursive by annotation \texttt{@tailrec}, which is
       
   356 made available by importing \texttt{scala.annotation.tailrec}.
       
   357 
   155 
   358 
   156 
   359 
   157 
   360 \end{document}
   158 \end{document}
   361 
   159