author | Christian Urban <christian.urban@kcl.ac.uk> |
Mon, 21 Nov 2022 15:57:45 +0000 | |
changeset 447 | f51e593903ac |
parent 429 | 126d0e47ac85 |
child 471 | 135bf034ac30 |
permissions | -rw-r--r-- |
276 | 1 |
% !TEX program = xelatex |
6 | 2 |
\documentclass{article} |
426 | 3 |
\usepackage{../styles/style} |
195 | 4 |
\usepackage{disclaimer} |
426 | 5 |
\usepackage{../styles/langs} |
6 | 6 |
|
335 | 7 |
|
8 |
||
6 | 9 |
\begin{document} |
10 |
||
396 | 11 |
\section*{Core Part 1 (Scala, 3 Marks)} |
199 | 12 |
|
13 |
\mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\ |
|
14 |
\mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\ |
|
335 | 15 |
\mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip |
199 | 16 |
|
279 | 17 |
|
411 | 18 |
\IMPORTANTNONE{} |
195 | 19 |
|
127 | 20 |
\noindent |
195 | 21 |
Also note that the running time of each part will be restricted to a |
199 | 22 |
maximum of 30 seconds on my laptop. |
192 | 23 |
|
196 | 24 |
\DISCLAIMER{} |
6 | 25 |
|
201 | 26 |
\subsection*{Reference Implementation} |
6 | 27 |
|
199 | 28 |
Like the C++ assignments, the Scala assignments will work like this: you |
29 |
push your files to GitHub and receive (after sometimes a long delay) some |
|
30 |
automated feedback. In the end we take a snapshot of the submitted files and |
|
266 | 31 |
apply an automated marking script to them.\medskip |
199 | 32 |
|
266 | 33 |
\noindent |
306 | 34 |
In addition, the Scala coursework comes with a reference implementation |
335 | 35 |
in form of \texttt{jar}-files. This allows you to run any test cases on |
36 |
your own computer. For example you can call Scala on the command line |
|
37 |
with the option \texttt{-cp collatz.jar} and then query any function |
|
38 |
from the template file. Say you want to find out what the functions |
|
39 |
\texttt{collatz} and \texttt{collatz\_max} produce: for this you just |
|
396 | 40 |
need to prefix them with the object name \texttt{C1}. If you want to |
335 | 41 |
find out what these functions produce for the argument \texttt{6}, you |
42 |
would type something like: |
|
199 | 43 |
|
44 |
\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small] |
|
45 |
$ scala -cp collatz.jar |
|
46 |
||
396 | 47 |
scala> C1.collatz(6) |
199 | 48 |
... |
396 | 49 |
scala> C1.collatz_max(6) |
199 | 50 |
... |
51 |
\end{lstlisting}%$ |
|
52 |
||
201 | 53 |
\subsection*{Hints} |
54 |
||
55 |
\noindent |
|
396 | 56 |
\textbf{For the Core Part 1:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful |
201 | 57 |
functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt}, |
335 | 58 |
\texttt{.toList} for conversions, you can use \texttt{List(...).max} for the |
201 | 59 |
maximum of a list, \texttt{List(...).indexOf(...)} for the first index of |
60 |
a value in a list.\bigskip |
|
61 |
||
62 |
||
63 |
||
64 |
\newpage |
|
396 | 65 |
\subsection*{Core Part 1 (3 Marks, file collatz.scala)} |
6 | 66 |
|
336 | 67 |
This part is about function definitions and recursion. You are asked |
68 |
to implement a Scala program that tests examples of the |
|
69 |
\emph{$3n + 1$-conjecture}, also called \emph{Collatz |
|
70 |
conjecture}.\video{https://www.youtube.com./watch?v=LqKpkdRRLZw} |
|
71 |
This conjecture can be described as follows: Start with any positive |
|
72 |
number $n$ greater than $0$: |
|
18 | 73 |
|
74 |
\begin{itemize} |
|
75 |
\item If $n$ is even, divide it by $2$ to obtain $n / 2$. |
|
76 |
\item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n + |
|
77 |
1$. |
|
78 |
\item Repeat this process and you will always end up with $1$. |
|
79 |
\end{itemize} |
|
80 |
||
81 |
\noindent |
|
266 | 82 |
For example if you start with, say, $6$ and $9$, you obtain the |
335 | 83 |
two \emph{Collatz series} |
84 |
% |
|
11
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
85 |
\[ |
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
86 |
\begin{array}{@{}l@{\hspace{5mm}}l@{}} |
199 | 87 |
6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\ |
88 |
9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1 & \text{(= 19 steps)}\\ |
|
11
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
89 |
\end{array} |
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
90 |
\] |
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
91 |
|
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
92 |
\noindent |
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
93 |
As you can see, the numbers go up and down like a roller-coaster, but |
335 | 94 |
curiously they seem to always terminate in $1$. Nobody knows why. The |
95 |
conjecture is that this will \emph{always} happen for every number |
|
96 |
greater than 0.\footnote{While it is relatively easy to test this |
|
97 |
conjecture with particular numbers, it is an interesting open problem to |
|
98 |
\emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$). |
|
99 |
Paul Erd\"o{}s, a famous mathematician you might have heard about, said |
|
100 |
about this conjecture: ``Mathematics may not [yet] be ready for such |
|
101 |
problems.'' and also offered a \$500 cash prize for its solution. |
|
102 |
Jeffrey Lagarias, another mathematician, claimed that based only on |
|
103 |
known information about this problem, ``this is an extraordinarily |
|
429 | 104 |
difficult problem, completely out of reach of present-day mathematics.'' |
335 | 105 |
There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this |
106 |
conjecture\here{https://xkcd.com/710/}). If you are able to solve this |
|
107 |
conjecture, you will definitely get famous.}\bigskip |
|
18 | 108 |
|
109 |
\noindent |
|
199 | 110 |
\textbf{Tasks} |
18 | 111 |
|
112 |
\begin{itemize} |
|
113 |
\item[(1)] You are asked to implement a recursive function that |
|
114 |
calculates the number of steps needed until a series ends |
|
199 | 115 |
with $1$. In case of starting with $6$, it takes $8$ steps and in |
335 | 116 |
case of starting with $9$, it takes $19$ (see above). We assume it |
117 |
takes $0$ steps, if we start with $1$. In order to |
|
18 | 118 |
try out this function with large numbers, you should use |
119 |
\texttt{Long} as argument type, instead of \texttt{Int}. You can |
|
120 |
assume this function will be called with numbers between $1$ and |
|
335 | 121 |
$1$ Million. \hfill[1 Mark] |
18 | 122 |
|
123 |
\item[(2)] Write a second function that takes an upper bound as |
|
282 | 124 |
an argument and calculates the steps for all numbers in the range from |
210 | 125 |
1 up to this bound (the bound including). It returns the maximum number of |
126 |
steps and the corresponding number that needs that many steps. More |
|
127 |
precisely it returns a pair where the first component is the number |
|
128 |
of steps and the second is the corresponding number. \hfill\mbox{[1 |
|
129 |
Mark]} |
|
335 | 130 |
|
131 |
\item[(3)] Write a function that calculates \emph{hard |
|
132 |
numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377} |
|
133 |
in the Collatz series---these are the last odd numbers just before a |
|
134 |
power of two is reached. For this, implement an |
|
135 |
\textit{is-power-of-two} function which tests whether a number is a |
|
136 |
power of two. The easiest way to implement this is by using the |
|
336 | 137 |
bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it |
335 | 138 |
holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why |
344 | 139 |
this is the case. |
140 |
||
141 |
The function \textit{is-hard} calculates whether |
|
335 | 142 |
$3n + 1$ is a power of two. Finally the \textit{last-odd} function |
143 |
calculates the last odd number before a power of 2 in the Collatz |
|
344 | 144 |
series. This means for example when starting with 9, |
335 | 145 |
we receive 5 as the last odd number. Surprisingly a lot of numbers |
146 |
have 5 as last-odd number. But for example for 113 we obtain 85, |
|
147 |
because of the series |
|
148 |
% |
|
149 |
\[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\] |
|
150 |
||
151 |
The \textit{last-odd} function will only be called with numbers that are not |
|
396 | 152 |
powers of 2 themselves.\hfill\mbox{[1 Mark]} |
18 | 153 |
\end{itemize} |
11
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
154 |
|
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
155 |
\noindent |
429 | 156 |
\textbf{Test Data:} Some test cases are: |
11
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
157 |
|
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
158 |
\begin{itemize} |
199 | 159 |
\item 1 to 10 where $9$ takes 19 steps |
160 |
\item 1 to 100 where $97$ takes 118 steps, |
|
161 |
\item 1 to 1,000 where $871$ takes 178 steps, |
|
162 |
\item 1 to 10,000 where $6,171$ takes 261 steps, |
|
163 |
\item 1 to 100,000 where $77,031$ takes 350 steps, |
|
164 |
\item 1 to 1 Million where $837,799$ takes 524 steps |
|
165 |
%% runs out of stack space |
|
166 |
%% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps |
|
336 | 167 |
\item 21 is the last odd number for 84 |
168 |
\item 341 is the last odd number for 201, 604, 605 and 8600 |
|
169 |
||
196 | 170 |
\end{itemize} |
18 | 171 |
|
201 | 172 |
|
127 | 173 |
|
174 |
||
196 | 175 |
|
199 | 176 |
|
177 |
\end{document} |
|
178 |
||
276 | 179 |
|
180 |
%%%%%%% Historical Stuff |
|
199 | 181 |
\newpage |
192 | 182 |
|
196 | 183 |
This part is about web-scraping and list-processing in Scala. It uses |
184 |
online data about the per-capita alcohol consumption for each country |
|
185 |
(per year?), and a file containing the data about the population size of |
|
186 |
each country. From this data you are supposed to estimate how many |
|
187 |
litres of pure alcohol are consumed worldwide.\bigskip |
|
192 | 188 |
|
189 |
\noindent |
|
196 | 190 |
\textbf{Tasks (file alcohol.scala):} |
192 | 191 |
|
192 |
\begin{itemize} |
|
196 | 193 |
\item[(1)] Write a function that given an URL requests a |
194 |
comma-separated value (CSV) list. We are interested in the list |
|
195 |
from the following URL |
|
192 | 196 |
|
197 |
\begin{center} |
|
196 | 198 |
\url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv} |
192 | 199 |
\end{center} |
127 | 200 |
|
196 | 201 |
\noindent Your function should take a string (the URL) as input, and |
202 |
produce a list of strings as output, where each string is one line in |
|
203 |
the corresponding CSV-list. This list from the URL above should |
|
204 |
contain 194 lines.\medskip |
|
192 | 205 |
|
206 |
\noindent |
|
196 | 207 |
Write another function that can read the file \texttt{population.csv} |
201 | 208 |
from disk (the file is distributed with the assignment). This |
196 | 209 |
function should take a string as argument, the file name, and again |
210 |
return a list of strings corresponding to each entry in the |
|
211 |
CSV-list. For \texttt{population.csv}, this list should contain 216 |
|
212 |
lines.\hfill[1 Mark] |
|
213 |
||
214 |
||
215 |
\item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we |
|
216 |
need to extract the data that interests us. From the header of the |
|
217 |
alcohol list, you can see there are 5 columns |
|
218 |
||
219 |
\begin{center} |
|
220 |
\begin{tabular}{l} |
|
221 |
\texttt{country (name),}\\ |
|
222 |
\texttt{beer\_servings,}\\ |
|
223 |
\texttt{spirit\_servings,}\\ |
|
224 |
\texttt{wine\_servings,}\\ |
|
225 |
\texttt{total\_litres\_of\_pure\_alcohol} |
|
226 |
\end{tabular} |
|
227 |
\end{center} |
|
228 |
||
229 |
\noindent |
|
230 |
Write a function that extracts the data from the first column, |
|
231 |
the country name, and the data from the fifth column (converted into |
|
232 |
a \texttt{Double}). For this go through each line of the CSV-list |
|
233 |
(except the first line), use the \texttt{split(",")} function to |
|
234 |
divide each line into an array of 5 elements. Keep the data from the |
|
235 |
first and fifth element in these arrays.\medskip |
|
192 | 236 |
|
196 | 237 |
\noindent |
238 |
Write another function that processes the population size list. This |
|
239 |
is already of the form country name and population size.\footnote{Your |
|
240 |
friendly lecturer already did the messy processing for you from the |
|
241 |
Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the |
|
242 |
strings according to the commas. However, this time generate a |
|
243 |
\texttt{Map} from country names to population sizes.\hfill[1 Mark] |
|
244 |
||
245 |
\item[(3)] In (2) you generated the data about the alcohol consumption |
|
343 | 246 |
per-capita for each country, and also the population size for each |
196 | 247 |
country. From this generate next a sorted(!) list of the overall |
248 |
alcohol consumption for each country. The list should be sorted from |
|
249 |
highest alcohol consumption to lowest. The difficulty is that the |
|
250 |
data is scraped off from ``random'' sources on the Internet and |
|
251 |
annoyingly the spelling of some country names does not always agree in both |
|
252 |
lists. For example the alcohol list contains |
|
253 |
\texttt{Bosnia-Herzegovina}, while the population writes this country as |
|
254 |
\texttt{Bosnia and Herzegovina}. In your sorted |
|
255 |
overall list include only countries from the alcohol list, whose |
|
256 |
exact country name is also in the population size list. This means |
|
257 |
you can ignore countries like Bosnia-Herzegovina from the overall |
|
258 |
alcohol consumption. There are 177 countries where the names |
|
259 |
agree. The UK is ranked 10th on this list by |
|
260 |
consuming 671,976,864 Litres of pure alcohol each year.\medskip |
|
261 |
||
262 |
\noindent |
|
263 |
Finally, write another function that takes an integer, say |
|
264 |
\texttt{n}, as argument. You can assume this integer is between 0 |
|
265 |
and 177 (the number of countries in the sorted list above). The |
|
266 |
function should return a triple, where the first component is the |
|
267 |
sum of the alcohol consumption in all countries (on the list); the |
|
268 |
second component is the sum of the \texttt{n}-highest alcohol |
|
269 |
consumers on the list; and the third component is the percentage the |
|
270 |
\texttt{n}-highest alcohol consumers drink with respect to the |
|
271 |
the world consumption. You will see that according to our data, 164 |
|
272 |
countries (out of 177) gobble up 100\% of the World alcohol |
|
273 |
consumption.\hfill\mbox{[1 Mark]} |
|
18 | 274 |
\end{itemize} |
11
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
275 |
|
417869f65585
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
9
diff
changeset
|
276 |
\noindent |
196 | 277 |
\textbf{Hints:} useful list functions: \texttt{.drop(n)}, |
278 |
\texttt{.take(n)} for dropping or taking some elements in a list, |
|
279 |
\texttt{.getLines} for separating lines in a string; |
|
280 |
\texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second |
|
281 |
elements in the pairs---the sorting is done from smallest to highest; |
|
282 |
useful \texttt{Map} functions: \texttt{.toMap} converts a list of |
|
283 |
pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the |
|
284 |
map is defined at that key, that is would produce a result when |
|
285 |
called with this key; useful data functions: \texttt{Source.fromURL}, |
|
286 |
\texttt{Source.fromFile} for obtaining a webpage and reading a file. |
|
127 | 287 |
|
196 | 288 |
\newpage |
289 |
||
290 |
||
18 | 291 |
|
129 | 292 |
|
135 | 293 |
|
6 | 294 |
|
295 |
%%% Local Variables: |
|
296 |
%%% mode: latex |
|
297 |
%%% TeX-master: t |
|
298 |
%%% End: |