| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Tue, 24 Nov 2020 09:04:06 +0000 | |
| changeset 366 | d2f895c1dba6 | 
| parent 356 | 35c75b594459 | 
| permissions | -rw-r--r-- | 
| 276 | 1 | % !TEX program = xelatex | 
| 6 | 2 | \documentclass{article}
 | 
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 3 | \usepackage{../style}
 | 
| 195 | 4 | \usepackage{disclaimer}
 | 
| 199 | 5 | \usepackage{../langs}
 | 
| 6 | 6 | |
| 335 | 7 | |
| 8 | ||
| 6 | 9 | \begin{document}
 | 
| 10 | ||
| 356 | 11 | \section*{Preliminary Part 1 (Scala, 3 Marks)}
 | 
| 199 | 12 | |
| 13 | \mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\
 | |
| 14 | \mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\
 | |
| 335 | 15 | \mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip
 | 
| 199 | 16 | |
| 279 | 17 | |
| 355 | 18 | \IMPORTANT{This part is about Scala. It is due on \cwSIX{} at 5pm and worth 3\%.
 | 
| 19 | Any 1\% you achieve in the preliminary part counts as your ``weekly engagement''.} | |
| 195 | 20 | |
| 127 | 21 | \noindent | 
| 195 | 22 | Also note that the running time of each part will be restricted to a | 
| 199 | 23 | maximum of 30 seconds on my laptop. | 
| 192 | 24 | |
| 196 | 25 | \DISCLAIMER{}
 | 
| 6 | 26 | |
| 201 | 27 | \subsection*{Reference Implementation}
 | 
| 6 | 28 | |
| 199 | 29 | Like the C++ assignments, the Scala assignments will work like this: you | 
| 30 | push your files to GitHub and receive (after sometimes a long delay) some | |
| 31 | automated feedback. In the end we take a snapshot of the submitted files and | |
| 266 | 32 | apply an automated marking script to them.\medskip | 
| 199 | 33 | |
| 266 | 34 | \noindent | 
| 306 | 35 | In addition, the Scala coursework comes with a reference implementation | 
| 335 | 36 | in form of \texttt{jar}-files. This allows you to run any test cases on
 | 
| 37 | your own computer. For example you can call Scala on the command line | |
| 38 | with the option \texttt{-cp collatz.jar} and then query any function
 | |
| 39 | from the template file. Say you want to find out what the functions | |
| 40 | \texttt{collatz} and \texttt{collatz\_max} produce: for this you just
 | |
| 41 | need to prefix them with the object name \texttt{CW6a}. If you want to
 | |
| 42 | find out what these functions produce for the argument \texttt{6}, you
 | |
| 43 | would type something like: | |
| 199 | 44 | |
| 45 | \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
 | |
| 46 | $ scala -cp collatz.jar | |
| 47 | ||
| 48 | scala> CW6a.collatz(6) | |
| 49 | ... | |
| 50 | scala> CW6a.collatz_max(6) | |
| 51 | ... | |
| 52 | \end{lstlisting}%$
 | |
| 53 | ||
| 201 | 54 | \subsection*{Hints}
 | 
| 55 | ||
| 56 | \noindent | |
| 345 | 57 | \textbf{For the Preliminary Part:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful
 | 
| 201 | 58 | functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
 | 
| 335 | 59 | \texttt{.toList} for conversions, you can use \texttt{List(...).max} for the
 | 
| 201 | 60 | maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
 | 
| 61 | a value in a list.\bigskip | |
| 62 | ||
| 63 | ||
| 64 | ||
| 65 | \newpage | |
| 279 | 66 | \subsection*{Preliminary Part (3 Marks, file collatz.scala)}
 | 
| 6 | 67 | |
| 336 | 68 | This part is about function definitions and recursion. You are asked | 
| 69 | to implement a Scala program that tests examples of the | |
| 70 | \emph{$3n + 1$-conjecture}, also called \emph{Collatz
 | |
| 71 |   conjecture}.\video{https://www.youtube.com./watch?v=LqKpkdRRLZw}
 | |
| 72 | This conjecture can be described as follows: Start with any positive | |
| 73 | number $n$ greater than $0$: | |
| 18 | 74 | |
| 75 | \begin{itemize}
 | |
| 76 | \item If $n$ is even, divide it by $2$ to obtain $n / 2$. | |
| 77 | \item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n + | |
| 78 | 1$. | |
| 79 | \item Repeat this process and you will always end up with $1$. | |
| 80 | \end{itemize}
 | |
| 81 | ||
| 82 | \noindent | |
| 266 | 83 | For example if you start with, say, $6$ and $9$, you obtain the | 
| 335 | 84 | two \emph{Collatz series}
 | 
| 85 | % | |
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 86 | \[ | 
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 87 | \begin{array}{@{}l@{\hspace{5mm}}l@{}}
 | 
| 199 | 88 | 6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\
 | 
| 89 | 9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1  & \text{(= 19 steps)}\\
 | |
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 90 | \end{array}
 | 
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 91 | \] | 
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 92 | |
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 93 | \noindent | 
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 94 | As you can see, the numbers go up and down like a roller-coaster, but | 
| 335 | 95 | curiously they seem to always terminate in $1$. Nobody knows why. The | 
| 96 | conjecture is that this will \emph{always} happen for every number
 | |
| 97 | greater than 0.\footnote{While it is relatively easy to test this
 | |
| 98 | conjecture with particular numbers, it is an interesting open problem to | |
| 99 | \emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$).
 | |
| 100 | Paul Erd\"o{}s, a famous mathematician you might have heard about, said
 | |
| 101 | about this conjecture: ``Mathematics may not [yet] be ready for such | |
| 102 | problems.'' and also offered a \$500 cash prize for its solution. | |
| 103 | Jeffrey Lagarias, another mathematician, claimed that based only on | |
| 104 | known information about this problem, ``this is an extraordinarily | |
| 105 | difficult problem, completely out of reach of present day mathematics.'' | |
| 106 | There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this
 | |
| 107 | conjecture\here{https://xkcd.com/710/}). If you are able to solve this
 | |
| 108 | conjecture, you will definitely get famous.}\bigskip | |
| 18 | 109 | |
| 110 | \noindent | |
| 199 | 111 | \textbf{Tasks}
 | 
| 18 | 112 | |
| 113 | \begin{itemize}
 | |
| 114 | \item[(1)] You are asked to implement a recursive function that | |
| 115 | calculates the number of steps needed until a series ends | |
| 199 | 116 | with $1$. In case of starting with $6$, it takes $8$ steps and in | 
| 335 | 117 | case of starting with $9$, it takes $19$ (see above). We assume it | 
| 118 | takes $0$ steps, if we start with $1$. In order to | |
| 18 | 119 | try out this function with large numbers, you should use | 
| 120 |   \texttt{Long} as argument type, instead of \texttt{Int}.  You can
 | |
| 121 | assume this function will be called with numbers between $1$ and | |
| 335 | 122 | $1$ Million. \hfill[1 Mark] | 
| 18 | 123 | |
| 124 | \item[(2)] Write a second function that takes an upper bound as | |
| 282 | 125 | an argument and calculates the steps for all numbers in the range from | 
| 210 | 126 | 1 up to this bound (the bound including). It returns the maximum number of | 
| 127 | steps and the corresponding number that needs that many steps. More | |
| 128 | precisely it returns a pair where the first component is the number | |
| 129 |   of steps and the second is the corresponding number. \hfill\mbox{[1
 | |
| 130 | Mark]} | |
| 335 | 131 | |
| 132 | \item[(3)] Write a function that calculates \emph{hard
 | |
| 133 |     numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377}
 | |
| 134 | in the Collatz series---these are the last odd numbers just before a | |
| 135 | power of two is reached. For this, implement an | |
| 136 |   \textit{is-power-of-two} function which tests whether a number is a
 | |
| 137 | power of two. The easiest way to implement this is by using the | |
| 336 | 138 | bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it | 
| 335 | 139 | holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why | 
| 344 | 140 | this is the case. | 
| 141 | ||
| 142 |   The function \textit{is-hard} calculates whether
 | |
| 335 | 143 |   $3n + 1$ is a power of two.  Finally the \textit{last-odd} function
 | 
| 144 | calculates the last odd number before a power of 2 in the Collatz | |
| 344 | 145 | series. This means for example when starting with 9, | 
| 335 | 146 | we receive 5 as the last odd number. Surprisingly a lot of numbers | 
| 147 | have 5 as last-odd number. But for example for 113 we obtain 85, | |
| 148 | because of the series | |
| 149 | % | |
| 150 |   \[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\]
 | |
| 151 | ||
| 152 |   The \textit{last-odd} function will only be called with numbers that are not
 | |
| 153 | powers of 2 themselves. | |
| 18 | 154 | \end{itemize}
 | 
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 155 | |
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 156 | \noindent | 
| 336 | 157 | \textbf{Test Data:} Some test ranges and cases are:
 | 
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 158 | |
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 159 | \begin{itemize}
 | 
| 199 | 160 | \item 1 to 10 where $9$ takes 19 steps | 
| 161 | \item 1 to 100 where $97$ takes 118 steps, | |
| 162 | \item 1 to 1,000 where $871$ takes 178 steps, | |
| 163 | \item 1 to 10,000 where $6,171$ takes 261 steps, | |
| 164 | \item 1 to 100,000 where $77,031$ takes 350 steps, | |
| 165 | \item 1 to 1 Million where $837,799$ takes 524 steps | |
| 166 | %% runs out of stack space | |
| 167 | %% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps | |
| 336 | 168 | \item 21 is the last odd number for 84 | 
| 169 | \item 341 is the last odd number for 201, 604, 605 and 8600 | |
| 170 | ||
| 196 | 171 | \end{itemize}
 | 
| 18 | 172 | |
| 201 | 173 | |
| 127 | 174 | |
| 175 | ||
| 196 | 176 | |
| 199 | 177 | |
| 178 | \end{document}
 | |
| 179 | ||
| 276 | 180 | |
| 181 | %%%%%%% Historical Stuff | |
| 199 | 182 | \newpage | 
| 192 | 183 | |
| 196 | 184 | This part is about web-scraping and list-processing in Scala. It uses | 
| 185 | online data about the per-capita alcohol consumption for each country | |
| 186 | (per year?), and a file containing the data about the population size of | |
| 187 | each country. From this data you are supposed to estimate how many | |
| 188 | litres of pure alcohol are consumed worldwide.\bigskip | |
| 192 | 189 | |
| 190 | \noindent | |
| 196 | 191 | \textbf{Tasks (file alcohol.scala):}
 | 
| 192 | 192 | |
| 193 | \begin{itemize}
 | |
| 196 | 194 | \item[(1)] Write a function that given an URL requests a | 
| 195 | comma-separated value (CSV) list. We are interested in the list | |
| 196 | from the following URL | |
| 192 | 197 | |
| 198 | \begin{center}
 | |
| 196 | 199 |   \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
 | 
| 192 | 200 | \end{center}
 | 
| 127 | 201 | |
| 196 | 202 | \noindent Your function should take a string (the URL) as input, and | 
| 203 | produce a list of strings as output, where each string is one line in | |
| 204 | the corresponding CSV-list. This list from the URL above should | |
| 205 | contain 194 lines.\medskip | |
| 192 | 206 | |
| 207 | \noindent | |
| 196 | 208 | Write another function that can read the file \texttt{population.csv}
 | 
| 201 | 209 | from disk (the file is distributed with the assignment). This | 
| 196 | 210 | function should take a string as argument, the file name, and again | 
| 211 | return a list of strings corresponding to each entry in the | |
| 212 | CSV-list. For \texttt{population.csv}, this list should contain 216
 | |
| 213 | lines.\hfill[1 Mark] | |
| 214 | ||
| 215 | ||
| 216 | \item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we | |
| 217 | need to extract the data that interests us. From the header of the | |
| 218 | alcohol list, you can see there are 5 columns | |
| 219 | ||
| 220 |   \begin{center}
 | |
| 221 |     \begin{tabular}{l}
 | |
| 222 |       \texttt{country (name),}\\
 | |
| 223 |       \texttt{beer\_servings,}\\
 | |
| 224 |       \texttt{spirit\_servings,}\\
 | |
| 225 |       \texttt{wine\_servings,}\\
 | |
| 226 |       \texttt{total\_litres\_of\_pure\_alcohol}
 | |
| 227 |     \end{tabular}  
 | |
| 228 |   \end{center}
 | |
| 229 | ||
| 230 | \noindent | |
| 231 | Write a function that extracts the data from the first column, | |
| 232 | the country name, and the data from the fifth column (converted into | |
| 233 |   a \texttt{Double}). For this go through each line of the CSV-list
 | |
| 234 |   (except the first line), use the \texttt{split(",")} function to
 | |
| 235 | divide each line into an array of 5 elements. Keep the data from the | |
| 236 | first and fifth element in these arrays.\medskip | |
| 192 | 237 | |
| 196 | 238 | \noindent | 
| 239 | Write another function that processes the population size list. This | |
| 240 |   is already of the form country name and population size.\footnote{Your
 | |
| 241 | friendly lecturer already did the messy processing for you from the | |
| 242 |   Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
 | |
| 243 | strings according to the commas. However, this time generate a | |
| 244 |   \texttt{Map} from country names to population sizes.\hfill[1 Mark]
 | |
| 245 | ||
| 246 | \item[(3)] In (2) you generated the data about the alcohol consumption | |
| 343 | 247 | per-capita for each country, and also the population size for each | 
| 196 | 248 | country. From this generate next a sorted(!) list of the overall | 
| 249 | alcohol consumption for each country. The list should be sorted from | |
| 250 | highest alcohol consumption to lowest. The difficulty is that the | |
| 251 | data is scraped off from ``random'' sources on the Internet and | |
| 252 | annoyingly the spelling of some country names does not always agree in both | |
| 253 | lists. For example the alcohol list contains | |
| 254 |   \texttt{Bosnia-Herzegovina}, while the population writes this country as
 | |
| 255 |   \texttt{Bosnia and Herzegovina}. In your sorted
 | |
| 256 | overall list include only countries from the alcohol list, whose | |
| 257 | exact country name is also in the population size list. This means | |
| 258 | you can ignore countries like Bosnia-Herzegovina from the overall | |
| 259 | alcohol consumption. There are 177 countries where the names | |
| 260 | agree. The UK is ranked 10th on this list by | |
| 261 | consuming 671,976,864 Litres of pure alcohol each year.\medskip | |
| 262 | ||
| 263 | \noindent | |
| 264 | Finally, write another function that takes an integer, say | |
| 265 |   \texttt{n}, as argument. You can assume this integer is between 0
 | |
| 266 | and 177 (the number of countries in the sorted list above). The | |
| 267 | function should return a triple, where the first component is the | |
| 268 | sum of the alcohol consumption in all countries (on the list); the | |
| 269 |   second component is the sum of the \texttt{n}-highest alcohol
 | |
| 270 | consumers on the list; and the third component is the percentage the | |
| 271 |   \texttt{n}-highest alcohol consumers drink with respect to the
 | |
| 272 | the world consumption. You will see that according to our data, 164 | |
| 273 | countries (out of 177) gobble up 100\% of the World alcohol | |
| 274 |   consumption.\hfill\mbox{[1 Mark]}
 | |
| 18 | 275 | \end{itemize}
 | 
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 276 | |
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 277 | \noindent | 
| 196 | 278 | \textbf{Hints:} useful list functions: \texttt{.drop(n)},
 | 
| 279 | \texttt{.take(n)} for dropping or taking some elements in a list,
 | |
| 280 | \texttt{.getLines} for separating lines in a string;
 | |
| 281 | \texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
 | |
| 282 | elements in the pairs---the sorting is done from smallest to highest; | |
| 283 | useful \texttt{Map} functions: \texttt{.toMap} converts a list of
 | |
| 284 | pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
 | |
| 285 | map is defined at that key, that is would produce a result when | |
| 286 | called with this key; useful data functions: \texttt{Source.fromURL},
 | |
| 287 | \texttt{Source.fromFile} for obtaining a webpage and reading a file.
 | |
| 127 | 288 | |
| 196 | 289 | \newpage | 
| 290 | ||
| 291 | ||
| 18 | 292 | |
| 129 | 293 | |
| 135 | 294 | |
| 6 | 295 | |
| 296 | %%% Local Variables: | |
| 297 | %%% mode: latex | |
| 298 | %%% TeX-master: t | |
| 299 | %%% End: |