1 \documentclass{article} |
1 \documentclass{article} |
2 \usepackage{chessboard} |
|
3 \usepackage[LSBC4,T1]{fontenc} |
|
4 \let\clipbox\relax |
|
5 \usepackage{../style} |
2 \usepackage{../style} |
6 \usepackage{disclaimer} |
3 \usepackage{disclaimer} |
|
4 \usepackage{../langs} |
7 |
5 |
8 \begin{document} |
6 \begin{document} |
9 |
7 |
10 \setchessboard{smallboard, |
|
11 zero, |
|
12 showmover=false, |
|
13 boardfontencoding=LSBC4, |
|
14 hlabelformat=\arabic{ranklabel}, |
|
15 vlabelformat=\arabic{filelabel}} |
|
16 |
8 |
17 \mbox{}\\[-18mm]\mbox{} |
9 \section*{Coursework 7 (DocDiff and Danube.org)} |
18 |
10 |
19 \section*{Coursework 7 (Scala, Knight's Tour)} |
11 This coursework is worth 10\%. The first part and second part are due |
20 |
12 on 22 November at 11pm; the third, more advanced part, is due on 21 |
21 This coursework is worth 10\%. It is about searching and |
13 December at 11pm. You are asked to implement Scala programs for |
22 backtracking. The first part is due on 23 November at 11pm; the |
14 measuring similarity in texts and for recommending movies |
23 second, more advanced part, is due on 21 December at 11pm. You are |
15 according to a ratings list. Note the second part might include |
24 asked to implement Scala programs that solve various versions of the |
16 material you have not yet seen in the first two lectures. \bigskip |
25 \textit{Knight's Tour Problem} on a chessboard. Note the second part |
|
26 might include material you have not yet seen in the first two |
|
27 lectures. \bigskip |
|
28 |
17 |
29 \IMPORTANT{} |
18 \IMPORTANT{} |
|
19 |
|
20 \noindent |
30 Also note that the running time of each part will be restricted to a |
21 Also note that the running time of each part will be restricted to a |
31 maximum of 360 seconds on my laptop: If you calculate a result once, |
22 maximum of 30 seconds on my laptop. |
32 try to avoid to calculate the result again. Feel free to copy any code |
|
33 you need from files \texttt{knight1.scala}, \texttt{knight2.scala} and |
|
34 \texttt{knight3.scala}. |
|
35 |
23 |
36 \DISCLAIMER{} |
24 \DISCLAIMER{} |
37 |
25 |
38 \subsection*{Background} |
|
39 |
26 |
40 The \textit{Knight's Tour Problem} is about finding a tour such that |
27 \subsection*{Reference Implementation} |
41 the knight visits every field on an $n\times n$ chessboard once. For |
|
42 example on a $5\times 5$ chessboard, a knight's tour is: |
|
43 |
28 |
44 \chessboard[maxfield=d4, |
29 Like the C++ assignments, the Scala assignments will work like this: you |
45 pgfstyle= {[base,at={\pgfpoint{0pt}{-0.5ex}}]text}, |
30 push your files to GitHub and receive (after sometimes a long delay) some |
46 text = \small 24, markfield=Z4, |
31 automated feedback. In the end we take a snapshot of the submitted files and |
47 text = \small 11, markfield=a4, |
32 apply an automated marking script to them. |
48 text = \small 6, markfield=b4, |
|
49 text = \small 17, markfield=c4, |
|
50 text = \small 0, markfield=d4, |
|
51 text = \small 19, markfield=Z3, |
|
52 text = \small 16, markfield=a3, |
|
53 text = \small 23, markfield=b3, |
|
54 text = \small 12, markfield=c3, |
|
55 text = \small 7, markfield=d3, |
|
56 text = \small 10, markfield=Z2, |
|
57 text = \small 5, markfield=a2, |
|
58 text = \small 18, markfield=b2, |
|
59 text = \small 1, markfield=c2, |
|
60 text = \small 22, markfield=d2, |
|
61 text = \small 15, markfield=Z1, |
|
62 text = \small 20, markfield=a1, |
|
63 text = \small 3, markfield=b1, |
|
64 text = \small 8, markfield=c1, |
|
65 text = \small 13, markfield=d1, |
|
66 text = \small 4, markfield=Z0, |
|
67 text = \small 9, markfield=a0, |
|
68 text = \small 14, markfield=b0, |
|
69 text = \small 21, markfield=c0, |
|
70 text = \small 2, markfield=d0 |
|
71 ] |
|
72 |
|
73 \noindent |
|
74 The tour starts in the right-upper corner, then moves to field |
|
75 $(3,2)$, then $(4,0)$ and so on. There are no knight's tours on |
|
76 $2\times 2$, $3\times 3$ and $4\times 4$ chessboards, but for every |
|
77 bigger board there is. |
|
78 |
33 |
79 A knight's tour is called \emph{closed}, if the last step in the tour |
34 In addition, the Scala assignments come with a reference |
80 is within a knight's move to the beginning of the tour. So the above |
35 implementation in form of a \texttt{jar}-file. This allows you to run |
81 knight's tour is \underline{not} closed because the last |
36 any test cases on your own computer. For example you can call Scala on |
82 step on field $(0, 4)$ is not within the reach of the first step on |
37 the command line with the option \texttt{-cp docdiff.jar} and then |
83 $(4, 4)$. It turns out there is no closed knight's tour on a $5\times |
38 query any function from the template file. Say you want to find out |
84 5$ board. But there are on a $6\times 6$ board and on bigger ones, for |
39 what the function \texttt{occurences} produces: for this you just need |
85 example |
40 to prefix it with the object name \texttt{CW7a} (and \texttt{CW7b} |
|
41 respectively for \texttt{danube.jar}). If you want to find out what |
|
42 these functions produce for the list \texttt{List("a", "b", "b")}, |
|
43 you would type something like: |
86 |
44 |
87 \chessboard[maxfield=e5, |
45 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small] |
88 pgfstyle={[base,at={\pgfpoint{0pt}{-0.5ex}}]text}, |
46 $ scala -cp docdiff.jar |
89 text = \small 10, markfield=Z5, |
47 |
90 text = \small 5, markfield=a5, |
48 scala> CW7a.occurences(List("a", "b", "b")) |
91 text = \small 18, markfield=b5, |
49 ... |
92 text = \small 25, markfield=c5, |
50 \end{lstlisting}%$ |
93 text = \small 16, markfield=d5, |
51 |
94 text = \small 7, markfield=e5, |
52 \subsection*{Hints} |
95 text = \small 31, markfield=Z4, |
|
96 text = \small 26, markfield=a4, |
|
97 text = \small 9, markfield=b4, |
|
98 text = \small 6, markfield=c4, |
|
99 text = \small 19, markfield=d4, |
|
100 text = \small 24, markfield=e4, |
|
101 % 4 11 30 17 8 15 |
|
102 text = \small 4, markfield=Z3, |
|
103 text = \small 11, markfield=a3, |
|
104 text = \small 30, markfield=b3, |
|
105 text = \small 17, markfield=c3, |
|
106 text = \small 8, markfield=d3, |
|
107 text = \small 15, markfield=e3, |
|
108 %29 32 27 0 23 20 |
|
109 text = \small 29, markfield=Z2, |
|
110 text = \small 32, markfield=a2, |
|
111 text = \small 27, markfield=b2, |
|
112 text = \small 0, markfield=c2, |
|
113 text = \small 23, markfield=d2, |
|
114 text = \small 20, markfield=e2, |
|
115 %12 3 34 21 14 1 |
|
116 text = \small 12, markfield=Z1, |
|
117 text = \small 3, markfield=a1, |
|
118 text = \small 34, markfield=b1, |
|
119 text = \small 21, markfield=c1, |
|
120 text = \small 14, markfield=d1, |
|
121 text = \small 1, markfield=e1, |
|
122 %33 28 13 2 35 22 |
|
123 text = \small 33, markfield=Z0, |
|
124 text = \small 28, markfield=a0, |
|
125 text = \small 13, markfield=b0, |
|
126 text = \small 2, markfield=c0, |
|
127 text = \small 35, markfield=d0, |
|
128 text = \small 22, markfield=e0, |
|
129 vlabel=false, |
|
130 hlabel=false |
|
131 ] |
|
132 |
53 |
133 |
54 |
134 \noindent |
|
135 where the 35th move can join up again with the 0th move. |
|
136 |
|
137 If you cannot remember how a knight moves in chess, or never played |
|
138 chess, below are all potential moves indicated for two knights, one on |
|
139 field $(2, 2)$ (blue moves) and another on $(7, 7)$ (red moves): |
|
140 |
55 |
141 |
56 |
142 \chessboard[maxfield=g7, |
57 \newpage |
143 color=blue!50, |
58 \subsection*{Part 1 (4 Marks, file docdiff.scala)} |
144 linewidth=0.2em, |
|
145 shortenstart=0.5ex, |
|
146 shortenend=0.5ex, |
|
147 markstyle=cross, |
|
148 markfields={a4, c4, Z3, d3, Z1, d1, a0, c0}, |
|
149 color=red!50, |
|
150 markfields={f5, e6}, |
|
151 setpieces={Ng7, Nb2}] |
|
152 |
59 |
153 \subsection*{Part 1 (7 Marks)} |
60 It seems source code plagiarism---stealing someone else's code---is a |
154 |
61 serious problem at other universities.\footnote{Surely, King's |
155 You are asked to implement the knight's tour problem such that the |
62 students, after all their instructions and warnings, would never |
156 dimension of the board can be changed. Therefore most functions will |
63 commit such an offence.} Dedecting such plagiarism is time-consuming |
157 take the dimension of the board as an argument. The fun with this |
64 and disheartening. To aid the poor lecturers at other universities, |
158 problem is that even for small chessboard dimensions it has already an |
65 let's implement a program that determines the similarity between two |
159 incredibly large search space---finding a tour is like finding a |
66 documents (be they code or English texts). A document will be |
160 needle in a haystack. In the first task we want to see how far we get |
67 represented as a list of strings. |
161 with exhaustively exploring the complete search space for small |
|
162 chessboards.\medskip |
|
163 |
|
164 \noindent |
|
165 Let us first fix the basic datastructures for the implementation. The |
|
166 board dimension is an integer (we will never go beyond board sizes of |
|
167 $40 \times 40$). A \emph{position} (or field) on the chessboard is |
|
168 a pair of integers, like $(0, 0)$. A \emph{path} is a list of |
|
169 positions. The first (or 0th move) in a path is the last element in |
|
170 this list; and the last move in the path is the first element. For |
|
171 example the path for the $5\times 5$ chessboard above is represented |
|
172 by |
|
173 |
|
174 \[ |
|
175 \texttt{List($\underbrace{\texttt{(0, 4)}}_{24}$, |
|
176 $\underbrace{\texttt{(2, 3)}}_{23}$, ..., |
|
177 $\underbrace{\texttt{(3, 2)}}_1$, $\underbrace{\texttt{(4, 4)}}_0$)} |
|
178 \] |
|
179 |
|
180 \noindent |
|
181 Suppose the dimension of a chessboard is $n$, then a path is a |
|
182 \emph{tour} if the length of the path is $n \times n$, each element |
|
183 occurs only once in the path, and each move follows the rules of how a |
|
184 knight moves (see above for the rules). |
|
185 |
68 |
186 |
69 |
187 \subsubsection*{Tasks (file knight1.scala)} |
70 \subsection*{Tasks} |
188 |
71 |
189 \begin{itemize} |
72 \begin{itemize} |
190 \item[(1a)] Implement an \texttt{is\_legal\_move} function that takes a |
73 \item[(1)] Implement a function that cleans a string by finding all |
191 dimension, a path and a position as arguments and tests whether the |
74 words in this string. For this use the regular expression |
192 position is inside the board and not yet element in the |
75 \texttt{"$\backslash$w+"} and the library function |
193 path. \hfill[1 Mark] |
76 \texttt{findAllIn}. The function should return a list of |
|
77 strings.\\ |
|
78 \mbox{}\hfill [1 Mark] |
194 |
79 |
195 \item[(1b)] Implement a \texttt{legal\_moves} function that calculates for a |
80 \item[(2)] In order to compute the similarity between two documents, we |
196 position all legal onward moves. If the onward moves are |
81 associate each document with a \texttt{Map}. This Map represents the |
197 placed on a circle, you should produce them starting from |
82 strings in a document and how many times these strings occur in a |
198 ``12-o'clock'' following in clockwise order. For example on an |
83 document. A simple (though slightly inefficient) method for counting |
199 $8\times 8$ board for a knight at position $(2, 2)$ and otherwise |
84 the number of string-occurences in a document is as follows: remove |
200 empty board, the legal-moves function should produce the onward |
85 all duplicates from the document; for each of these (unique) |
201 positions in this order: |
86 strings, count how many times they occur in the original document. |
|
87 Return a Map from strings to occurences. For example |
202 |
88 |
203 \begin{center} |
89 \begin{center} |
204 \texttt{List((3,4), (4,3), (4,1), (3,0), (1,0), (0,1), (0,3), (1,4))} |
90 \pcode{occurences(List("a", "b", "b", "c", "d"))} |
205 \end{center} |
91 \end{center} |
206 |
92 |
207 If the board is not empty, then maybe some of the moves need to be |
93 produces \pcode{Map(a -> 1, b -> 2, c -> 1, d -> 1)} and |
208 filtered out from this list. For a knight on field $(7, 7)$ and an |
|
209 empty board, the legal moves are |
|
210 |
94 |
211 \begin{center} |
95 \begin{center} |
212 \texttt{List((6,5), (5,6))} |
96 \pcode{occurences(List("d", "b", "d", "b", "d"))} |
213 \end{center} |
97 \end{center} |
214 \mbox{}\hfill[1 Mark] |
|
215 |
98 |
216 \item[(1c)] Implement two recursive functions (\texttt{count\_tours} and |
99 produces \pcode{Map(d -> 3, b -> 2)}.\hfill[1 Mark] |
217 \texttt{enum\_tours}). They each take a dimension and a path as |
100 |
218 arguments. They exhaustively search for tours starting |
101 \item[(3)] You can think of the Maps calculated under (2) as efficient |
219 from the given path. The first function counts all possible |
102 representations of sparse ``vectors''. In this subtask you need to |
220 tours (there can be none for certain board sizes) and the second |
103 implement the \emph{product} of two vectors, sometimes also called |
221 collects all tours in a list of paths.\hfill[2 Marks] |
104 \emph{dot product}.\footnote{\url{https://en.wikipedia.org/wiki/Dot_product}} |
|
105 |
|
106 For this implement a function that takes two documents |
|
107 (\texttt{List[String]}) as arguments. The function first calculates |
|
108 the (unique) strings in both. For each string, it multiplies the |
|
109 occurences in each document. If a string does not occur in one of the |
|
110 documents, then the product is zero. At the end you |
|
111 sum all products. For the two documents in (2) the dot product is 7: |
|
112 |
|
113 \[ |
|
114 \underbrace{1 * 0}_{"a"} \;\;+\;\; |
|
115 \underbrace{2 * 2}_{"b"} \;\;+\;\; |
|
116 \underbrace{1 * 0}_{"c"} \;\;+\;\; |
|
117 \underbrace{1 * 3}_{"d"} |
|
118 \] |
|
119 |
|
120 \hfill\mbox{[1 Mark]} |
|
121 |
|
122 \item[(4)] Implement first a function that calculates the overlap |
|
123 between two documents, say $d_1$ and $d_2$, according to the formula |
|
124 |
|
125 \[ |
|
126 \texttt{overlap}(d_1, d_2) = \frac{d_1 \cdot d_2}{max(d_1^2, d_2^2)} |
|
127 \] |
|
128 |
|
129 This function should return a \texttt{Double} between 0 and 1. The |
|
130 overlap between the lists in (2) is $0.5384615384615384$. |
|
131 |
|
132 Second implement a function that calculates the similarity of |
|
133 two strings, by first extracting the strings using the function from (1) |
|
134 and then calculating the overlap. |
|
135 \hfill\mbox{[1 Mark]} |
222 \end{itemize} |
136 \end{itemize} |
223 |
137 |
224 \noindent \textbf{Test data:} For the marking, the functions in (1c) |
|
225 will be called with board sizes up to $5 \times 5$. If you search |
|
226 for tours on a $5 \times 5$ board starting only from field $(0, 0)$, |
|
227 there are 304 of tours. If you try out every field of a $5 \times |
|
228 5$-board as a starting field and add up all tours, you obtain |
|
229 1728. A $6\times 6$ board is already too large to be searched |
|
230 exhaustively.\footnote{For your interest, the number of tours on |
|
231 $6\times 6$, $7\times 7$ and $8\times 8$ are 6637920, 165575218320, |
|
232 19591828170979904, respectively.}\bigskip |
|
233 |
|
234 \noindent |
|
235 \textbf{Hints:} useful list functions: \texttt{.contains(..)} checks |
|
236 whether an element is in a list, \texttt{.flatten} turns a list of |
|
237 lists into just a list, \texttt{\_::\_} puts an element on the head of |
|
238 the list, \texttt{.head} gives you the first element of a list (make |
|
239 sure the list is not \texttt{Nil}). |
|
240 |
|
241 \subsubsection*{Tasks (file knight2.scala)} |
|
242 |
|
243 \begin{itemize} |
|
244 \item[(2a)] Implement a \texttt{first}-function. This function takes a list of |
|
245 positions and a function $f$ as arguments; $f$ is the name we give to |
|
246 this argument). The function $f$ takes a position as argument and |
|
247 produces an optional path. So $f$'s type is \texttt{Pos => |
|
248 Option[Path]}. The idea behind the \texttt{first}-function is as follows: |
|
249 |
|
250 \[ |
|
251 \begin{array}{lcl} |
|
252 \textit{first}(\texttt{Nil}, f) & \dn & \texttt{None}\\ |
|
253 \textit{first}(x\!::\!xs, f) & \dn & \begin{cases} |
|
254 f(x) & \textit{if}\;f(x) \not=\texttt{None}\\ |
|
255 \textit{first}(xs, f) & \textit{otherwise}\\ |
|
256 \end{cases} |
|
257 \end{array} |
|
258 \] |
|
259 |
|
260 \noindent That is, we want to find the first position where the |
|
261 result of $f$ is not \texttt{None}, if there is one. Note that |
|
262 `inside' \texttt{first}, you do not (need to) know anything about |
|
263 the argument $f$ except its type, namely \texttt{Pos => |
|
264 Option[Path]}. There is one additional point however you should |
|
265 take into account when implementing \texttt{first}: you will need to |
|
266 calculate what the result of $f(x)$ is; your code should do this |
|
267 only \textbf{once} and for as \textbf{few} elements in the list as |
|
268 possible! Do not calculate $f(x)$ for all elements and then see which |
|
269 is the first \texttt{Some}.\\\mbox{}\hfill[1 Mark] |
|
270 |
|
271 \item[(2b)] Implement a \texttt{first\_tour} function that uses the |
|
272 \texttt{first}-function from (2a), and searches recursively for a tour. |
|
273 As there might not be such a tour at all, the \texttt{first\_tour} function |
|
274 needs to return a value of type |
|
275 \texttt{Option[Path]}.\\\mbox{}\hfill[2 Marks] |
|
276 \end{itemize} |
|
277 |
|
278 \noindent |
|
279 \textbf{Testing:} The \texttt{first\_tour} function will be called with board |
|
280 sizes of up to $8 \times 8$. |
|
281 \bigskip |
|
282 |
|
283 \noindent |
|
284 \textbf{Hints:} a useful list function: \texttt{.filter(..)} filters a |
|
285 list according to a boolean function; a useful option function: |
|
286 \texttt{.isDefined} returns true, if an option is \texttt{Some(..)}; |
|
287 anonymous functions can be constructed using \texttt{(x:Int) => ...}, |
|
288 this functions takes an \texttt{Int} as an argument. |
|
289 |
138 |
290 |
139 |
291 %%\newpage |
140 \newpage |
292 \subsection*{Part 2 (3 Marks)} |
141 You are creating Danube.org, which you hope will be the next big thing |
|
142 in online movie provider. You know that you can save money by |
|
143 anticipating what movies people will rent; you will pass these savings |
|
144 on to your users by offering a discount if they rent movies that Danube.org |
|
145 recommends. This assignment is meant to calculate |
293 |
146 |
294 As you should have seen in Part 1, a naive search for tours beyond |
|
295 $8 \times 8$ boards and also searching for closed tours even on small |
|
296 boards takes too much time. There is a heuristic, called \emph{Warnsdorf's |
|
297 Rule} that can speed up finding a tour. This heuristic states that a |
|
298 knight is moved so that it always proceeds to the field from which the |
|
299 knight will have the \underline{fewest} onward moves. For example for |
|
300 a knight on field $(1, 3)$, the field $(0, 1)$ has the fewest possible |
|
301 onward moves, namely 2. |
|
302 |
147 |
303 \chessboard[maxfield=g7, |
148 To do this, you offer an incentive for people to upload their lists of |
304 pgfstyle= {[base,at={\pgfpoint{0pt}{-0.5ex}}]text}, |
149 recommended books. From their lists, you can establish suggested |
305 text = \small 3, markfield=Z5, |
150 pairs. A pair of books is a suggested pair if both books appear on one |
306 text = \small 7, markfield=b5, |
151 person’s recommendation list. Of course, some suggested pairs are more |
307 text = \small 7, markfield=c4, |
152 popular than others. Also, any given book is paired with some books |
308 text = \small 7, markfield=c2, |
153 much more frequently than with others. |
309 text = \small 5, markfield=b1, |
|
310 text = \small 2, markfield=Z1, |
|
311 setpieces={Na3}] |
|
312 |
154 |
313 \noindent |
|
314 Warnsdorf's Rule states that the moves on the board above should be |
|
315 tried in the order |
|
316 |
|
317 \[ |
|
318 (0, 1), (0, 5), (2, 1), (2, 5), (3, 4), (3, 2) |
|
319 \] |
|
320 |
|
321 \noindent |
|
322 Whenever there are ties, the corresponding onward moves can be in any |
|
323 order. When calculating the number of onward moves for each field, we |
|
324 do not count moves that revisit any field already visited. |
|
325 |
|
326 \subsubsection*{Tasks (file knight3.scala)} |
|
327 |
|
328 \begin{itemize} |
|
329 \item[(3a)] Write a function \texttt{ordered\_moves} that calculates a list of |
|
330 onward moves like in (1b) but orders them according to the |
|
331 Warnsdorf’s Rule. That means moves with the fewest legal onward moves |
|
332 should come first (in order to be tried out first). \hfill[1 Mark] |
|
333 |
|
334 \item[(3b)] Implement a \texttt{first\_closed-tour\_heuristic} |
|
335 function that searches for a |
|
336 \textbf{closed} tour on a $6\times 6$ board. It should use the |
|
337 \texttt{first}-function from (2a) and tries out onward moves according to |
|
338 the \texttt{ordered\_moves} function from (3a). It is more likely to find |
|
339 a solution when started in the middle of the board (that is |
|
340 position $(dimension / 2, dimension / 2)$). \hfill[1 Mark] |
|
341 |
|
342 \item[(3c)] Implement a \texttt{first\_tour\_heuristic} function |
|
343 for boards up to |
|
344 $40\times 40$. It is the same function as in (3b) but searches for |
|
345 tours (not just closed tours). You have to be careful to write a |
|
346 tail-recursive function of the \texttt{first\_tour\_heuristic} function |
|
347 otherwise you will get problems with stack-overflows.\\ |
|
348 \mbox{}\hfill[1 Mark] |
|
349 \end{itemize} |
|
350 \bigskip |
|
351 |
|
352 \noindent |
|
353 \textbf{Hints:} a useful list function: \texttt{.sortBy} sorts a list |
|
354 according to a component given by the function; a function can be |
|
355 tested to be tail recursive by annotation \texttt{@tailrec}, which is |
|
356 made available by importing \texttt{scala.annotation.tailrec}. |
|
357 |
155 |
358 |
156 |
359 |
157 |
360 \end{document} |
158 \end{document} |
361 |
159 |