| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Wed, 01 Dec 2021 07:51:21 +0000 | |
| changeset 413 | f4db8336d3d5 | 
| parent 408 | 9e416cdf1755 | 
| child 423 | 554278cd4b70 | 
| permissions | -rw-r--r-- | 
| 276 | 1 | % !TEX program = xelatex | 
| 6 | 2 | \documentclass{article}
 | 
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 3 | \usepackage{../style}
 | 
| 195 | 4 | \usepackage{disclaimer}
 | 
| 199 | 5 | \usepackage{../langs}
 | 
| 6 | 6 | |
| 335 | 7 | |
| 8 | ||
| 6 | 9 | \begin{document}
 | 
| 10 | ||
| 396 | 11 | \section*{Core Part 1 (Scala, 3 Marks)}
 | 
| 199 | 12 | |
| 13 | \mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\
 | |
| 14 | \mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\
 | |
| 335 | 15 | \mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip
 | 
| 199 | 16 | |
| 279 | 17 | |
| 408 | 18 | \IMPORTANTNONE{}
 | 
| 195 | 19 | |
| 127 | 20 | \noindent | 
| 195 | 21 | Also note that the running time of each part will be restricted to a | 
| 199 | 22 | maximum of 30 seconds on my laptop. | 
| 192 | 23 | |
| 196 | 24 | \DISCLAIMER{}
 | 
| 6 | 25 | |
| 201 | 26 | \subsection*{Reference Implementation}
 | 
| 6 | 27 | |
| 199 | 28 | Like the C++ assignments, the Scala assignments will work like this: you | 
| 29 | push your files to GitHub and receive (after sometimes a long delay) some | |
| 30 | automated feedback. In the end we take a snapshot of the submitted files and | |
| 266 | 31 | apply an automated marking script to them.\medskip | 
| 199 | 32 | |
| 266 | 33 | \noindent | 
| 306 | 34 | In addition, the Scala coursework comes with a reference implementation | 
| 335 | 35 | in form of \texttt{jar}-files. This allows you to run any test cases on
 | 
| 36 | your own computer. For example you can call Scala on the command line | |
| 37 | with the option \texttt{-cp collatz.jar} and then query any function
 | |
| 38 | from the template file. Say you want to find out what the functions | |
| 39 | \texttt{collatz} and \texttt{collatz\_max} produce: for this you just
 | |
| 396 | 40 | need to prefix them with the object name \texttt{C1}. If you want to
 | 
| 335 | 41 | find out what these functions produce for the argument \texttt{6}, you
 | 
| 42 | would type something like: | |
| 199 | 43 | |
| 44 | \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
 | |
| 45 | $ scala -cp collatz.jar | |
| 46 | ||
| 396 | 47 | scala> C1.collatz(6) | 
| 199 | 48 | ... | 
| 396 | 49 | scala> C1.collatz_max(6) | 
| 199 | 50 | ... | 
| 51 | \end{lstlisting}%$
 | |
| 52 | ||
| 201 | 53 | \subsection*{Hints}
 | 
| 54 | ||
| 55 | \noindent | |
| 396 | 56 | \textbf{For the Core Part 1:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful
 | 
| 201 | 57 | functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
 | 
| 335 | 58 | \texttt{.toList} for conversions, you can use \texttt{List(...).max} for the
 | 
| 201 | 59 | maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
 | 
| 60 | a value in a list.\bigskip | |
| 61 | ||
| 62 | ||
| 63 | ||
| 64 | \newpage | |
| 396 | 65 | \subsection*{Core Part 1 (3 Marks, file collatz.scala)}
 | 
| 6 | 66 | |
| 336 | 67 | This part is about function definitions and recursion. You are asked | 
| 68 | to implement a Scala program that tests examples of the | |
| 69 | \emph{$3n + 1$-conjecture}, also called \emph{Collatz
 | |
| 70 |   conjecture}.\video{https://www.youtube.com./watch?v=LqKpkdRRLZw}
 | |
| 71 | This conjecture can be described as follows: Start with any positive | |
| 72 | number $n$ greater than $0$: | |
| 18 | 73 | |
| 74 | \begin{itemize}
 | |
| 75 | \item If $n$ is even, divide it by $2$ to obtain $n / 2$. | |
| 76 | \item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n + | |
| 77 | 1$. | |
| 78 | \item Repeat this process and you will always end up with $1$. | |
| 79 | \end{itemize}
 | |
| 80 | ||
| 81 | \noindent | |
| 266 | 82 | For example if you start with, say, $6$ and $9$, you obtain the | 
| 335 | 83 | two \emph{Collatz series}
 | 
| 84 | % | |
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 85 | \[ | 
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 86 | \begin{array}{@{}l@{\hspace{5mm}}l@{}}
 | 
| 199 | 87 | 6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\
 | 
| 88 | 9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1  & \text{(= 19 steps)}\\
 | |
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 89 | \end{array}
 | 
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 90 | \] | 
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 91 | |
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 92 | \noindent | 
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 93 | As you can see, the numbers go up and down like a roller-coaster, but | 
| 335 | 94 | curiously they seem to always terminate in $1$. Nobody knows why. The | 
| 95 | conjecture is that this will \emph{always} happen for every number
 | |
| 96 | greater than 0.\footnote{While it is relatively easy to test this
 | |
| 97 | conjecture with particular numbers, it is an interesting open problem to | |
| 98 | \emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$).
 | |
| 99 | Paul Erd\"o{}s, a famous mathematician you might have heard about, said
 | |
| 100 | about this conjecture: ``Mathematics may not [yet] be ready for such | |
| 101 | problems.'' and also offered a \$500 cash prize for its solution. | |
| 102 | Jeffrey Lagarias, another mathematician, claimed that based only on | |
| 103 | known information about this problem, ``this is an extraordinarily | |
| 104 | difficult problem, completely out of reach of present day mathematics.'' | |
| 105 | There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this
 | |
| 106 | conjecture\here{https://xkcd.com/710/}). If you are able to solve this
 | |
| 107 | conjecture, you will definitely get famous.}\bigskip | |
| 18 | 108 | |
| 109 | \noindent | |
| 199 | 110 | \textbf{Tasks}
 | 
| 18 | 111 | |
| 112 | \begin{itemize}
 | |
| 113 | \item[(1)] You are asked to implement a recursive function that | |
| 114 | calculates the number of steps needed until a series ends | |
| 199 | 115 | with $1$. In case of starting with $6$, it takes $8$ steps and in | 
| 335 | 116 | case of starting with $9$, it takes $19$ (see above). We assume it | 
| 117 | takes $0$ steps, if we start with $1$. In order to | |
| 18 | 118 | try out this function with large numbers, you should use | 
| 119 |   \texttt{Long} as argument type, instead of \texttt{Int}.  You can
 | |
| 120 | assume this function will be called with numbers between $1$ and | |
| 335 | 121 | $1$ Million. \hfill[1 Mark] | 
| 18 | 122 | |
| 123 | \item[(2)] Write a second function that takes an upper bound as | |
| 282 | 124 | an argument and calculates the steps for all numbers in the range from | 
| 210 | 125 | 1 up to this bound (the bound including). It returns the maximum number of | 
| 126 | steps and the corresponding number that needs that many steps. More | |
| 127 | precisely it returns a pair where the first component is the number | |
| 128 |   of steps and the second is the corresponding number. \hfill\mbox{[1
 | |
| 129 | Mark]} | |
| 335 | 130 | |
| 131 | \item[(3)] Write a function that calculates \emph{hard
 | |
| 132 |     numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377}
 | |
| 133 | in the Collatz series---these are the last odd numbers just before a | |
| 134 | power of two is reached. For this, implement an | |
| 135 |   \textit{is-power-of-two} function which tests whether a number is a
 | |
| 136 | power of two. The easiest way to implement this is by using the | |
| 336 | 137 | bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it | 
| 335 | 138 | holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why | 
| 344 | 139 | this is the case. | 
| 140 | ||
| 141 |   The function \textit{is-hard} calculates whether
 | |
| 335 | 142 |   $3n + 1$ is a power of two.  Finally the \textit{last-odd} function
 | 
| 143 | calculates the last odd number before a power of 2 in the Collatz | |
| 344 | 144 | series. This means for example when starting with 9, | 
| 335 | 145 | we receive 5 as the last odd number. Surprisingly a lot of numbers | 
| 146 | have 5 as last-odd number. But for example for 113 we obtain 85, | |
| 147 | because of the series | |
| 148 | % | |
| 149 |   \[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\]
 | |
| 150 | ||
| 151 |   The \textit{last-odd} function will only be called with numbers that are not
 | |
| 396 | 152 |   powers of 2 themselves.\hfill\mbox{[1 Mark]}
 | 
| 18 | 153 | \end{itemize}
 | 
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 154 | |
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 155 | \noindent | 
| 336 | 156 | \textbf{Test Data:} Some test ranges and cases are:
 | 
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 157 | |
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 158 | \begin{itemize}
 | 
| 199 | 159 | \item 1 to 10 where $9$ takes 19 steps | 
| 160 | \item 1 to 100 where $97$ takes 118 steps, | |
| 161 | \item 1 to 1,000 where $871$ takes 178 steps, | |
| 162 | \item 1 to 10,000 where $6,171$ takes 261 steps, | |
| 163 | \item 1 to 100,000 where $77,031$ takes 350 steps, | |
| 164 | \item 1 to 1 Million where $837,799$ takes 524 steps | |
| 165 | %% runs out of stack space | |
| 166 | %% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps | |
| 336 | 167 | \item 21 is the last odd number for 84 | 
| 168 | \item 341 is the last odd number for 201, 604, 605 and 8600 | |
| 169 | ||
| 196 | 170 | \end{itemize}
 | 
| 18 | 171 | |
| 201 | 172 | |
| 127 | 173 | |
| 174 | ||
| 196 | 175 | |
| 199 | 176 | |
| 177 | \end{document}
 | |
| 178 | ||
| 276 | 179 | |
| 180 | %%%%%%% Historical Stuff | |
| 199 | 181 | \newpage | 
| 192 | 182 | |
| 196 | 183 | This part is about web-scraping and list-processing in Scala. It uses | 
| 184 | online data about the per-capita alcohol consumption for each country | |
| 185 | (per year?), and a file containing the data about the population size of | |
| 186 | each country. From this data you are supposed to estimate how many | |
| 187 | litres of pure alcohol are consumed worldwide.\bigskip | |
| 192 | 188 | |
| 189 | \noindent | |
| 196 | 190 | \textbf{Tasks (file alcohol.scala):}
 | 
| 192 | 191 | |
| 192 | \begin{itemize}
 | |
| 196 | 193 | \item[(1)] Write a function that given an URL requests a | 
| 194 | comma-separated value (CSV) list. We are interested in the list | |
| 195 | from the following URL | |
| 192 | 196 | |
| 197 | \begin{center}
 | |
| 196 | 198 |   \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
 | 
| 192 | 199 | \end{center}
 | 
| 127 | 200 | |
| 196 | 201 | \noindent Your function should take a string (the URL) as input, and | 
| 202 | produce a list of strings as output, where each string is one line in | |
| 203 | the corresponding CSV-list. This list from the URL above should | |
| 204 | contain 194 lines.\medskip | |
| 192 | 205 | |
| 206 | \noindent | |
| 196 | 207 | Write another function that can read the file \texttt{population.csv}
 | 
| 201 | 208 | from disk (the file is distributed with the assignment). This | 
| 196 | 209 | function should take a string as argument, the file name, and again | 
| 210 | return a list of strings corresponding to each entry in the | |
| 211 | CSV-list. For \texttt{population.csv}, this list should contain 216
 | |
| 212 | lines.\hfill[1 Mark] | |
| 213 | ||
| 214 | ||
| 215 | \item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we | |
| 216 | need to extract the data that interests us. From the header of the | |
| 217 | alcohol list, you can see there are 5 columns | |
| 218 | ||
| 219 |   \begin{center}
 | |
| 220 |     \begin{tabular}{l}
 | |
| 221 |       \texttt{country (name),}\\
 | |
| 222 |       \texttt{beer\_servings,}\\
 | |
| 223 |       \texttt{spirit\_servings,}\\
 | |
| 224 |       \texttt{wine\_servings,}\\
 | |
| 225 |       \texttt{total\_litres\_of\_pure\_alcohol}
 | |
| 226 |     \end{tabular}  
 | |
| 227 |   \end{center}
 | |
| 228 | ||
| 229 | \noindent | |
| 230 | Write a function that extracts the data from the first column, | |
| 231 | the country name, and the data from the fifth column (converted into | |
| 232 |   a \texttt{Double}). For this go through each line of the CSV-list
 | |
| 233 |   (except the first line), use the \texttt{split(",")} function to
 | |
| 234 | divide each line into an array of 5 elements. Keep the data from the | |
| 235 | first and fifth element in these arrays.\medskip | |
| 192 | 236 | |
| 196 | 237 | \noindent | 
| 238 | Write another function that processes the population size list. This | |
| 239 |   is already of the form country name and population size.\footnote{Your
 | |
| 240 | friendly lecturer already did the messy processing for you from the | |
| 241 |   Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
 | |
| 242 | strings according to the commas. However, this time generate a | |
| 243 |   \texttt{Map} from country names to population sizes.\hfill[1 Mark]
 | |
| 244 | ||
| 245 | \item[(3)] In (2) you generated the data about the alcohol consumption | |
| 343 | 246 | per-capita for each country, and also the population size for each | 
| 196 | 247 | country. From this generate next a sorted(!) list of the overall | 
| 248 | alcohol consumption for each country. The list should be sorted from | |
| 249 | highest alcohol consumption to lowest. The difficulty is that the | |
| 250 | data is scraped off from ``random'' sources on the Internet and | |
| 251 | annoyingly the spelling of some country names does not always agree in both | |
| 252 | lists. For example the alcohol list contains | |
| 253 |   \texttt{Bosnia-Herzegovina}, while the population writes this country as
 | |
| 254 |   \texttt{Bosnia and Herzegovina}. In your sorted
 | |
| 255 | overall list include only countries from the alcohol list, whose | |
| 256 | exact country name is also in the population size list. This means | |
| 257 | you can ignore countries like Bosnia-Herzegovina from the overall | |
| 258 | alcohol consumption. There are 177 countries where the names | |
| 259 | agree. The UK is ranked 10th on this list by | |
| 260 | consuming 671,976,864 Litres of pure alcohol each year.\medskip | |
| 261 | ||
| 262 | \noindent | |
| 263 | Finally, write another function that takes an integer, say | |
| 264 |   \texttt{n}, as argument. You can assume this integer is between 0
 | |
| 265 | and 177 (the number of countries in the sorted list above). The | |
| 266 | function should return a triple, where the first component is the | |
| 267 | sum of the alcohol consumption in all countries (on the list); the | |
| 268 |   second component is the sum of the \texttt{n}-highest alcohol
 | |
| 269 | consumers on the list; and the third component is the percentage the | |
| 270 |   \texttt{n}-highest alcohol consumers drink with respect to the
 | |
| 271 | the world consumption. You will see that according to our data, 164 | |
| 272 | countries (out of 177) gobble up 100\% of the World alcohol | |
| 273 |   consumption.\hfill\mbox{[1 Mark]}
 | |
| 18 | 274 | \end{itemize}
 | 
| 11 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 275 | |
| 
417869f65585
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
9diff
changeset | 276 | \noindent | 
| 196 | 277 | \textbf{Hints:} useful list functions: \texttt{.drop(n)},
 | 
| 278 | \texttt{.take(n)} for dropping or taking some elements in a list,
 | |
| 279 | \texttt{.getLines} for separating lines in a string;
 | |
| 280 | \texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
 | |
| 281 | elements in the pairs---the sorting is done from smallest to highest; | |
| 282 | useful \texttt{Map} functions: \texttt{.toMap} converts a list of
 | |
| 283 | pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
 | |
| 284 | map is defined at that key, that is would produce a result when | |
| 285 | called with this key; useful data functions: \texttt{Source.fromURL},
 | |
| 286 | \texttt{Source.fromFile} for obtaining a webpage and reading a file.
 | |
| 127 | 287 | |
| 196 | 288 | \newpage | 
| 289 | ||
| 290 | ||
| 18 | 291 | |
| 129 | 292 | |
| 135 | 293 | |
| 6 | 294 | |
| 295 | %%% Local Variables: | |
| 296 | %%% mode: latex | |
| 297 | %%% TeX-master: t | |
| 298 | %%% End: |