| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Thu, 02 Nov 2023 13:53:37 +0000 | |
| changeset 471 | 31b81f20fd9a | 
| parent 468 | c71ae4477e55 | 
| permissions | -rw-r--r-- | 
| 276 | 1  | 
% !TEX program = xelatex  | 
| 6 | 2  | 
\documentclass{article}
 | 
| 423 | 3  | 
\usepackage{../styles/style}
 | 
| 195 | 4  | 
\usepackage{disclaimer}
 | 
| 423 | 5  | 
\usepackage{../styles/langs}
 | 
| 6 | 6  | 
|
| 335 | 7  | 
|
8  | 
||
| 6 | 9  | 
\begin{document}
 | 
10  | 
||
| 396 | 11  | 
\section*{Core Part 1 (Scala, 3 Marks)}
 | 
| 199 | 12  | 
|
13  | 
\mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\
 | 
|
14  | 
\mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\
 | 
|
| 335 | 15  | 
\mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip
 | 
| 199 | 16  | 
|
| 279 | 17  | 
|
| 408 | 18  | 
\IMPORTANTNONE{}
 | 
| 195 | 19  | 
|
| 127 | 20  | 
\noindent  | 
| 195 | 21  | 
Also note that the running time of each part will be restricted to a  | 
| 199 | 22  | 
maximum of 30 seconds on my laptop.  | 
| 192 | 23  | 
|
| 196 | 24  | 
\DISCLAIMER{}
 | 
| 6 | 25  | 
|
| 201 | 26  | 
\subsection*{Reference Implementation}
 | 
| 6 | 27  | 
|
| 199 | 28  | 
Like the C++ assignments, the Scala assignments will work like this: you  | 
29  | 
push your files to GitHub and receive (after sometimes a long delay) some  | 
|
30  | 
automated feedback. In the end we take a snapshot of the submitted files and  | 
|
| 266 | 31  | 
apply an automated marking script to them.\medskip  | 
| 199 | 32  | 
|
| 266 | 33  | 
\noindent  | 
| 306 | 34  | 
In addition, the Scala coursework comes with a reference implementation  | 
| 471 | 35  | 
in form of a \texttt{jar}-file. This allows you to run any test cases on
 | 
36  | 
your own computer. For example you can call \texttt{scala-cli} on the command line
 | 
|
| 468 | 37  | 
with the option \texttt{--extra-jars collatz.jar} and then query any function
 | 
| 335 | 38  | 
from the template file. Say you want to find out what the functions  | 
39  | 
\texttt{collatz} and \texttt{collatz\_max} produce: for this you just
 | 
|
| 396 | 40  | 
need to prefix them with the object name \texttt{C1}. If you want to
 | 
| 335 | 41  | 
find out what these functions produce for the argument \texttt{6}, you
 | 
42  | 
would type something like:  | 
|
| 199 | 43  | 
|
44  | 
\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
 | 
|
| 468 | 45  | 
$ scala-cli --extra-jars collatz.jar  | 
| 199 | 46  | 
|
| 396 | 47  | 
scala> C1.collatz(6)  | 
| 199 | 48  | 
...  | 
| 396 | 49  | 
scala> C1.collatz_max(6)  | 
| 199 | 50  | 
...  | 
51  | 
\end{lstlisting}%$
 | 
|
52  | 
||
| 201 | 53  | 
\subsection*{Hints}
 | 
54  | 
||
55  | 
\noindent  | 
|
| 396 | 56  | 
\textbf{For the Core Part 1:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful
 | 
| 201 | 57  | 
functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
 | 
| 335 | 58  | 
\texttt{.toList} for conversions, you can use \texttt{List(...).max} for the
 | 
| 201 | 59  | 
maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
 | 
60  | 
a value in a list.\bigskip  | 
|
61  | 
||
62  | 
||
63  | 
||
64  | 
\newpage  | 
|
| 396 | 65  | 
\subsection*{Core Part 1 (3 Marks, file collatz.scala)}
 | 
| 6 | 66  | 
|
| 336 | 67  | 
This part is about function definitions and recursion. You are asked  | 
68  | 
to implement a Scala program that tests examples of the  | 
|
69  | 
\emph{$3n + 1$-conjecture}, also called \emph{Collatz
 | 
|
| 471 | 70  | 
  conjecture}.\video{https://www.youtube.com/watch?v=LqKpkdRRLZw}
 | 
| 336 | 71  | 
This conjecture can be described as follows: Start with any positive  | 
72  | 
number $n$ greater than $0$:  | 
|
| 18 | 73  | 
|
74  | 
\begin{itemize}
 | 
|
75  | 
\item If $n$ is even, divide it by $2$ to obtain $n / 2$.  | 
|
76  | 
\item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n +  | 
|
77  | 
1$.  | 
|
78  | 
\item Repeat this process and you will always end up with $1$.  | 
|
79  | 
\end{itemize}
 | 
|
80  | 
||
81  | 
\noindent  | 
|
| 266 | 82  | 
For example if you start with, say, $6$ and $9$, you obtain the  | 
| 335 | 83  | 
two \emph{Collatz series}
 | 
84  | 
%  | 
|
| 
11
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
85  | 
\[  | 
| 
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
86  | 
\begin{array}{@{}l@{\hspace{5mm}}l@{}}
 | 
| 199 | 87  | 
6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\
 | 
88  | 
9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1  & \text{(= 19 steps)}\\
 | 
|
| 
11
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
89  | 
\end{array}
 | 
| 
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
90  | 
\]  | 
| 
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
91  | 
|
| 
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
92  | 
\noindent  | 
| 
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
93  | 
As you can see, the numbers go up and down like a roller-coaster, but  | 
| 335 | 94  | 
curiously they seem to always terminate in $1$. Nobody knows why. The  | 
95  | 
conjecture is that this will \emph{always} happen for every number
 | 
|
96  | 
greater than 0.\footnote{While it is relatively easy to test this
 | 
|
97  | 
conjecture with particular numbers, it is an interesting open problem to  | 
|
98  | 
\emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$).
 | 
|
99  | 
Paul Erd\"o{}s, a famous mathematician you might have heard about, said
 | 
|
100  | 
about this conjecture: ``Mathematics may not [yet] be ready for such  | 
|
101  | 
problems.'' and also offered a \$500 cash prize for its solution.  | 
|
102  | 
Jeffrey Lagarias, another mathematician, claimed that based only on  | 
|
103  | 
known information about this problem, ``this is an extraordinarily  | 
|
| 426 | 104  | 
difficult problem, completely out of reach of present-day mathematics.''  | 
| 335 | 105  | 
There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this
 | 
106  | 
conjecture\here{https://xkcd.com/710/}). If you are able to solve this
 | 
|
107  | 
conjecture, you will definitely get famous.}\bigskip  | 
|
| 18 | 108  | 
|
109  | 
\noindent  | 
|
| 199 | 110  | 
\textbf{Tasks}
 | 
| 18 | 111  | 
|
112  | 
\begin{itemize}
 | 
|
113  | 
\item[(1)] You are asked to implement a recursive function that  | 
|
114  | 
calculates the number of steps needed until a series ends  | 
|
| 199 | 115  | 
with $1$. In case of starting with $6$, it takes $8$ steps and in  | 
| 335 | 116  | 
case of starting with $9$, it takes $19$ (see above). We assume it  | 
117  | 
takes $0$ steps, if we start with $1$. In order to  | 
|
| 18 | 118  | 
try out this function with large numbers, you should use  | 
119  | 
  \texttt{Long} as argument type, instead of \texttt{Int}.  You can
 | 
|
120  | 
assume this function will be called with numbers between $1$ and  | 
|
| 335 | 121  | 
$1$ Million. \hfill[1 Mark]  | 
| 18 | 122  | 
|
123  | 
\item[(2)] Write a second function that takes an upper bound as  | 
|
| 282 | 124  | 
an argument and calculates the steps for all numbers in the range from  | 
| 210 | 125  | 
1 up to this bound (the bound including). It returns the maximum number of  | 
126  | 
steps and the corresponding number that needs that many steps. More  | 
|
127  | 
precisely it returns a pair where the first component is the number  | 
|
128  | 
  of steps and the second is the corresponding number. \hfill\mbox{[1
 | 
|
129  | 
Mark]}  | 
|
| 335 | 130  | 
|
131  | 
\item[(3)] Write a function that calculates \emph{hard
 | 
|
132  | 
    numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377}
 | 
|
133  | 
in the Collatz series---these are the last odd numbers just before a  | 
|
134  | 
power of two is reached. For this, implement an  | 
|
135  | 
  \textit{is-power-of-two} function which tests whether a number is a
 | 
|
136  | 
power of two. The easiest way to implement this is by using the  | 
|
| 336 | 137  | 
bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it  | 
| 335 | 138  | 
holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why  | 
| 344 | 139  | 
this is the case.  | 
140  | 
||
141  | 
  The function \textit{is-hard} calculates whether
 | 
|
| 335 | 142  | 
  $3n + 1$ is a power of two.  Finally the \textit{last-odd} function
 | 
143  | 
calculates the last odd number before a power of 2 in the Collatz  | 
|
| 344 | 144  | 
series. This means for example when starting with 9,  | 
| 335 | 145  | 
we receive 5 as the last odd number. Surprisingly a lot of numbers  | 
146  | 
have 5 as last-odd number. But for example for 113 we obtain 85,  | 
|
147  | 
because of the series  | 
|
148  | 
%  | 
|
149  | 
  \[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\]
 | 
|
150  | 
||
151  | 
  The \textit{last-odd} function will only be called with numbers that are not
 | 
|
| 396 | 152  | 
  powers of 2 themselves.\hfill\mbox{[1 Mark]}
 | 
| 18 | 153  | 
\end{itemize}
 | 
| 
11
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
154  | 
|
| 
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
155  | 
\noindent  | 
| 426 | 156  | 
\textbf{Test Data:} Some test cases are:
 | 
| 
11
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
157  | 
|
| 
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
158  | 
\begin{itemize}
 | 
| 199 | 159  | 
\item 1 to 10 where $9$ takes 19 steps  | 
160  | 
\item 1 to 100 where $97$ takes 118 steps,  | 
|
161  | 
\item 1 to 1,000 where $871$ takes 178 steps,  | 
|
162  | 
\item 1 to 10,000 where $6,171$ takes 261 steps,  | 
|
163  | 
\item 1 to 100,000 where $77,031$ takes 350 steps,  | 
|
164  | 
\item 1 to 1 Million where $837,799$ takes 524 steps  | 
|
165  | 
%% runs out of stack space  | 
|
166  | 
%% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps  | 
|
| 336 | 167  | 
\item 21 is the last odd number for 84  | 
168  | 
\item 341 is the last odd number for 201, 604, 605 and 8600  | 
|
169  | 
||
| 196 | 170  | 
\end{itemize}
 | 
| 18 | 171  | 
|
| 201 | 172  | 
|
| 127 | 173  | 
|
174  | 
||
| 196 | 175  | 
|
| 199 | 176  | 
|
177  | 
\end{document}
 | 
|
178  | 
||
| 276 | 179  | 
|
180  | 
%%%%%%% Historical Stuff  | 
|
| 199 | 181  | 
\newpage  | 
| 192 | 182  | 
|
| 196 | 183  | 
This part is about web-scraping and list-processing in Scala. It uses  | 
184  | 
online data about the per-capita alcohol consumption for each country  | 
|
185  | 
(per year?), and a file containing the data about the population size of  | 
|
186  | 
each country. From this data you are supposed to estimate how many  | 
|
187  | 
litres of pure alcohol are consumed worldwide.\bigskip  | 
|
| 192 | 188  | 
|
189  | 
\noindent  | 
|
| 196 | 190  | 
\textbf{Tasks (file alcohol.scala):}
 | 
| 192 | 191  | 
|
192  | 
\begin{itemize}
 | 
|
| 196 | 193  | 
\item[(1)] Write a function that given an URL requests a  | 
194  | 
comma-separated value (CSV) list. We are interested in the list  | 
|
195  | 
from the following URL  | 
|
| 192 | 196  | 
|
197  | 
\begin{center}
 | 
|
| 196 | 198  | 
  \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
 | 
| 192 | 199  | 
\end{center}
 | 
| 127 | 200  | 
|
| 196 | 201  | 
\noindent Your function should take a string (the URL) as input, and  | 
202  | 
produce a list of strings as output, where each string is one line in  | 
|
203  | 
the corresponding CSV-list. This list from the URL above should  | 
|
204  | 
contain 194 lines.\medskip  | 
|
| 192 | 205  | 
|
206  | 
\noindent  | 
|
| 196 | 207  | 
Write another function that can read the file \texttt{population.csv}
 | 
| 201 | 208  | 
from disk (the file is distributed with the assignment). This  | 
| 196 | 209  | 
function should take a string as argument, the file name, and again  | 
210  | 
return a list of strings corresponding to each entry in the  | 
|
211  | 
CSV-list. For \texttt{population.csv}, this list should contain 216
 | 
|
212  | 
lines.\hfill[1 Mark]  | 
|
213  | 
||
214  | 
||
215  | 
\item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we  | 
|
216  | 
need to extract the data that interests us. From the header of the  | 
|
217  | 
alcohol list, you can see there are 5 columns  | 
|
218  | 
||
219  | 
  \begin{center}
 | 
|
220  | 
    \begin{tabular}{l}
 | 
|
221  | 
      \texttt{country (name),}\\
 | 
|
222  | 
      \texttt{beer\_servings,}\\
 | 
|
223  | 
      \texttt{spirit\_servings,}\\
 | 
|
224  | 
      \texttt{wine\_servings,}\\
 | 
|
225  | 
      \texttt{total\_litres\_of\_pure\_alcohol}
 | 
|
226  | 
    \end{tabular}  
 | 
|
227  | 
  \end{center}
 | 
|
228  | 
||
229  | 
\noindent  | 
|
230  | 
Write a function that extracts the data from the first column,  | 
|
231  | 
the country name, and the data from the fifth column (converted into  | 
|
232  | 
  a \texttt{Double}). For this go through each line of the CSV-list
 | 
|
233  | 
  (except the first line), use the \texttt{split(",")} function to
 | 
|
234  | 
divide each line into an array of 5 elements. Keep the data from the  | 
|
235  | 
first and fifth element in these arrays.\medskip  | 
|
| 192 | 236  | 
|
| 196 | 237  | 
\noindent  | 
238  | 
Write another function that processes the population size list. This  | 
|
239  | 
  is already of the form country name and population size.\footnote{Your
 | 
|
240  | 
friendly lecturer already did the messy processing for you from the  | 
|
241  | 
  Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
 | 
|
242  | 
strings according to the commas. However, this time generate a  | 
|
243  | 
  \texttt{Map} from country names to population sizes.\hfill[1 Mark]
 | 
|
244  | 
||
245  | 
\item[(3)] In (2) you generated the data about the alcohol consumption  | 
|
| 343 | 246  | 
per-capita for each country, and also the population size for each  | 
| 196 | 247  | 
country. From this generate next a sorted(!) list of the overall  | 
248  | 
alcohol consumption for each country. The list should be sorted from  | 
|
249  | 
highest alcohol consumption to lowest. The difficulty is that the  | 
|
250  | 
data is scraped off from ``random'' sources on the Internet and  | 
|
251  | 
annoyingly the spelling of some country names does not always agree in both  | 
|
252  | 
lists. For example the alcohol list contains  | 
|
253  | 
  \texttt{Bosnia-Herzegovina}, while the population writes this country as
 | 
|
254  | 
  \texttt{Bosnia and Herzegovina}. In your sorted
 | 
|
255  | 
overall list include only countries from the alcohol list, whose  | 
|
256  | 
exact country name is also in the population size list. This means  | 
|
257  | 
you can ignore countries like Bosnia-Herzegovina from the overall  | 
|
258  | 
alcohol consumption. There are 177 countries where the names  | 
|
259  | 
agree. The UK is ranked 10th on this list by  | 
|
260  | 
consuming 671,976,864 Litres of pure alcohol each year.\medskip  | 
|
261  | 
||
262  | 
\noindent  | 
|
263  | 
Finally, write another function that takes an integer, say  | 
|
264  | 
  \texttt{n}, as argument. You can assume this integer is between 0
 | 
|
265  | 
and 177 (the number of countries in the sorted list above). The  | 
|
266  | 
function should return a triple, where the first component is the  | 
|
267  | 
sum of the alcohol consumption in all countries (on the list); the  | 
|
268  | 
  second component is the sum of the \texttt{n}-highest alcohol
 | 
|
269  | 
consumers on the list; and the third component is the percentage the  | 
|
270  | 
  \texttt{n}-highest alcohol consumers drink with respect to the
 | 
|
271  | 
the world consumption. You will see that according to our data, 164  | 
|
272  | 
countries (out of 177) gobble up 100\% of the World alcohol  | 
|
273  | 
  consumption.\hfill\mbox{[1 Mark]}
 | 
|
| 18 | 274  | 
\end{itemize}
 | 
| 
11
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
275  | 
|
| 
 
417869f65585
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
9 
diff
changeset
 | 
276  | 
\noindent  | 
| 196 | 277  | 
\textbf{Hints:} useful list functions: \texttt{.drop(n)},
 | 
278  | 
\texttt{.take(n)} for dropping or taking some elements in a list,
 | 
|
279  | 
\texttt{.getLines} for separating lines in a string;
 | 
|
280  | 
\texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
 | 
|
281  | 
elements in the pairs---the sorting is done from smallest to highest;  | 
|
282  | 
useful \texttt{Map} functions: \texttt{.toMap} converts a list of
 | 
|
283  | 
pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
 | 
|
284  | 
map is defined at that key, that is would produce a result when  | 
|
285  | 
called with this key; useful data functions: \texttt{Source.fromURL},
 | 
|
286  | 
\texttt{Source.fromFile} for obtaining a webpage and reading a file.
 | 
|
| 127 | 287  | 
|
| 196 | 288  | 
\newpage  | 
289  | 
||
290  | 
||
| 18 | 291  | 
|
| 129 | 292  | 
|
| 135 | 293  | 
|
| 6 | 294  | 
|
295  | 
%%% Local Variables:  | 
|
296  | 
%%% mode: latex  | 
|
297  | 
%%% TeX-master: t  | 
|
298  | 
%%% End:  |