|
1 % !TEX program = xelatex |
|
2 \documentclass{article} |
|
3 \usepackage{../style} |
|
4 \usepackage{disclaimer} |
|
5 \usepackage{../langs} |
|
6 |
|
7 |
|
8 |
|
9 \begin{document} |
|
10 |
|
11 \section*{Preliminary Part 6 (Scala, 3 Marks)} |
|
12 |
|
13 \mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\ |
|
14 \mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\ |
|
15 \mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip |
|
16 |
|
17 |
|
18 \IMPORTANT{This part is about Scala. It is due on \cwSIX{} at 5pm and worth 3\%.} |
|
19 |
|
20 \noindent |
|
21 Also note that the running time of each part will be restricted to a |
|
22 maximum of 30 seconds on my laptop. |
|
23 |
|
24 \DISCLAIMER{} |
|
25 |
|
26 \subsection*{Reference Implementation} |
|
27 |
|
28 Like the C++ assignments, the Scala assignments will work like this: you |
|
29 push your files to GitHub and receive (after sometimes a long delay) some |
|
30 automated feedback. In the end we take a snapshot of the submitted files and |
|
31 apply an automated marking script to them.\medskip |
|
32 |
|
33 \noindent |
|
34 In addition, the Scala coursework comes with a reference implementation |
|
35 in form of \texttt{jar}-files. This allows you to run any test cases on |
|
36 your own computer. For example you can call Scala on the command line |
|
37 with the option \texttt{-cp collatz.jar} and then query any function |
|
38 from the template file. Say you want to find out what the functions |
|
39 \texttt{collatz} and \texttt{collatz\_max} produce: for this you just |
|
40 need to prefix them with the object name \texttt{CW6a}. If you want to |
|
41 find out what these functions produce for the argument \texttt{6}, you |
|
42 would type something like: |
|
43 |
|
44 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small] |
|
45 $ scala -cp collatz.jar |
|
46 |
|
47 scala> CW6a.collatz(6) |
|
48 ... |
|
49 scala> CW6a.collatz_max(6) |
|
50 ... |
|
51 \end{lstlisting}%$ |
|
52 |
|
53 \subsection*{Hints} |
|
54 |
|
55 \noindent |
|
56 \textbf{For Preliminary Part:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful |
|
57 functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt}, |
|
58 \texttt{.toList} for conversions, you can use \texttt{List(...).max} for the |
|
59 maximum of a list, \texttt{List(...).indexOf(...)} for the first index of |
|
60 a value in a list.\bigskip |
|
61 |
|
62 |
|
63 |
|
64 \newpage |
|
65 \subsection*{Preliminary Part (3 Marks, file collatz.scala)} |
|
66 |
|
67 This part is about function definitions and recursion. You are asked |
|
68 to implement a Scala program that tests examples of the |
|
69 \emph{$3n + 1$-conjecture}, also called \emph{Collatz |
|
70 conjecture}.\video{https://www.youtube.com./watch?v=LqKpkdRRLZw} |
|
71 This conjecture can be described as follows: Start with any positive |
|
72 number $n$ greater than $0$: |
|
73 |
|
74 \begin{itemize} |
|
75 \item If $n$ is even, divide it by $2$ to obtain $n / 2$. |
|
76 \item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n + |
|
77 1$. |
|
78 \item Repeat this process and you will always end up with $1$. |
|
79 \end{itemize} |
|
80 |
|
81 \noindent |
|
82 For example if you start with, say, $6$ and $9$, you obtain the |
|
83 two \emph{Collatz series} |
|
84 % |
|
85 \[ |
|
86 \begin{array}{@{}l@{\hspace{5mm}}l@{}} |
|
87 6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\ |
|
88 9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1 & \text{(= 19 steps)}\\ |
|
89 \end{array} |
|
90 \] |
|
91 |
|
92 \noindent |
|
93 As you can see, the numbers go up and down like a roller-coaster, but |
|
94 curiously they seem to always terminate in $1$. Nobody knows why. The |
|
95 conjecture is that this will \emph{always} happen for every number |
|
96 greater than 0.\footnote{While it is relatively easy to test this |
|
97 conjecture with particular numbers, it is an interesting open problem to |
|
98 \emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$). |
|
99 Paul Erd\"o{}s, a famous mathematician you might have heard about, said |
|
100 about this conjecture: ``Mathematics may not [yet] be ready for such |
|
101 problems.'' and also offered a \$500 cash prize for its solution. |
|
102 Jeffrey Lagarias, another mathematician, claimed that based only on |
|
103 known information about this problem, ``this is an extraordinarily |
|
104 difficult problem, completely out of reach of present day mathematics.'' |
|
105 There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this |
|
106 conjecture\here{https://xkcd.com/710/}). If you are able to solve this |
|
107 conjecture, you will definitely get famous.}\bigskip |
|
108 |
|
109 \noindent |
|
110 \textbf{Tasks} |
|
111 |
|
112 \begin{itemize} |
|
113 \item[(1)] You are asked to implement a recursive function that |
|
114 calculates the number of steps needed until a series ends |
|
115 with $1$. In case of starting with $6$, it takes $8$ steps and in |
|
116 case of starting with $9$, it takes $19$ (see above). We assume it |
|
117 takes $0$ steps, if we start with $1$. In order to |
|
118 try out this function with large numbers, you should use |
|
119 \texttt{Long} as argument type, instead of \texttt{Int}. You can |
|
120 assume this function will be called with numbers between $1$ and |
|
121 $1$ Million. \hfill[1 Mark] |
|
122 |
|
123 \item[(2)] Write a second function that takes an upper bound as |
|
124 an argument and calculates the steps for all numbers in the range from |
|
125 1 up to this bound (the bound including). It returns the maximum number of |
|
126 steps and the corresponding number that needs that many steps. More |
|
127 precisely it returns a pair where the first component is the number |
|
128 of steps and the second is the corresponding number. \hfill\mbox{[1 |
|
129 Mark]} |
|
130 |
|
131 \item[(3)] Write a function that calculates \emph{hard |
|
132 numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377} |
|
133 in the Collatz series---these are the last odd numbers just before a |
|
134 power of two is reached. For this, implement an |
|
135 \textit{is-power-of-two} function which tests whether a number is a |
|
136 power of two. The easiest way to implement this is by using the |
|
137 bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it |
|
138 holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why |
|
139 this is the case. |
|
140 |
|
141 The function \textit{is-hard} calculates whether |
|
142 $3n + 1$ is a power of two. Finally the \textit{last-odd} function |
|
143 calculates the last odd number before a power of 2 in the Collatz |
|
144 series. This means for example when starting with 9, |
|
145 we receive 5 as the last odd number. Surprisingly a lot of numbers |
|
146 have 5 as last-odd number. But for example for 113 we obtain 85, |
|
147 because of the series |
|
148 % |
|
149 \[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\] |
|
150 |
|
151 The \textit{last-odd} function will only be called with numbers that are not |
|
152 powers of 2 themselves. |
|
153 \end{itemize} |
|
154 |
|
155 \noindent |
|
156 \textbf{Test Data:} Some test ranges and cases are: |
|
157 |
|
158 \begin{itemize} |
|
159 \item 1 to 10 where $9$ takes 19 steps |
|
160 \item 1 to 100 where $97$ takes 118 steps, |
|
161 \item 1 to 1,000 where $871$ takes 178 steps, |
|
162 \item 1 to 10,000 where $6,171$ takes 261 steps, |
|
163 \item 1 to 100,000 where $77,031$ takes 350 steps, |
|
164 \item 1 to 1 Million where $837,799$ takes 524 steps |
|
165 %% runs out of stack space |
|
166 %% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps |
|
167 \item 21 is the last odd number for 84 |
|
168 \item 341 is the last odd number for 201, 604, 605 and 8600 |
|
169 |
|
170 \end{itemize} |
|
171 |
|
172 |
|
173 |
|
174 |
|
175 |
|
176 |
|
177 \end{document} |
|
178 |
|
179 |
|
180 %%%%%%% Historical Stuff |
|
181 \newpage |
|
182 |
|
183 This part is about web-scraping and list-processing in Scala. It uses |
|
184 online data about the per-capita alcohol consumption for each country |
|
185 (per year?), and a file containing the data about the population size of |
|
186 each country. From this data you are supposed to estimate how many |
|
187 litres of pure alcohol are consumed worldwide.\bigskip |
|
188 |
|
189 \noindent |
|
190 \textbf{Tasks (file alcohol.scala):} |
|
191 |
|
192 \begin{itemize} |
|
193 \item[(1)] Write a function that given an URL requests a |
|
194 comma-separated value (CSV) list. We are interested in the list |
|
195 from the following URL |
|
196 |
|
197 \begin{center} |
|
198 \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv} |
|
199 \end{center} |
|
200 |
|
201 \noindent Your function should take a string (the URL) as input, and |
|
202 produce a list of strings as output, where each string is one line in |
|
203 the corresponding CSV-list. This list from the URL above should |
|
204 contain 194 lines.\medskip |
|
205 |
|
206 \noindent |
|
207 Write another function that can read the file \texttt{population.csv} |
|
208 from disk (the file is distributed with the assignment). This |
|
209 function should take a string as argument, the file name, and again |
|
210 return a list of strings corresponding to each entry in the |
|
211 CSV-list. For \texttt{population.csv}, this list should contain 216 |
|
212 lines.\hfill[1 Mark] |
|
213 |
|
214 |
|
215 \item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we |
|
216 need to extract the data that interests us. From the header of the |
|
217 alcohol list, you can see there are 5 columns |
|
218 |
|
219 \begin{center} |
|
220 \begin{tabular}{l} |
|
221 \texttt{country (name),}\\ |
|
222 \texttt{beer\_servings,}\\ |
|
223 \texttt{spirit\_servings,}\\ |
|
224 \texttt{wine\_servings,}\\ |
|
225 \texttt{total\_litres\_of\_pure\_alcohol} |
|
226 \end{tabular} |
|
227 \end{center} |
|
228 |
|
229 \noindent |
|
230 Write a function that extracts the data from the first column, |
|
231 the country name, and the data from the fifth column (converted into |
|
232 a \texttt{Double}). For this go through each line of the CSV-list |
|
233 (except the first line), use the \texttt{split(",")} function to |
|
234 divide each line into an array of 5 elements. Keep the data from the |
|
235 first and fifth element in these arrays.\medskip |
|
236 |
|
237 \noindent |
|
238 Write another function that processes the population size list. This |
|
239 is already of the form country name and population size.\footnote{Your |
|
240 friendly lecturer already did the messy processing for you from the |
|
241 Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the |
|
242 strings according to the commas. However, this time generate a |
|
243 \texttt{Map} from country names to population sizes.\hfill[1 Mark] |
|
244 |
|
245 \item[(3)] In (2) you generated the data about the alcohol consumption |
|
246 per-capita for each country, and also the population size for each |
|
247 country. From this generate next a sorted(!) list of the overall |
|
248 alcohol consumption for each country. The list should be sorted from |
|
249 highest alcohol consumption to lowest. The difficulty is that the |
|
250 data is scraped off from ``random'' sources on the Internet and |
|
251 annoyingly the spelling of some country names does not always agree in both |
|
252 lists. For example the alcohol list contains |
|
253 \texttt{Bosnia-Herzegovina}, while the population writes this country as |
|
254 \texttt{Bosnia and Herzegovina}. In your sorted |
|
255 overall list include only countries from the alcohol list, whose |
|
256 exact country name is also in the population size list. This means |
|
257 you can ignore countries like Bosnia-Herzegovina from the overall |
|
258 alcohol consumption. There are 177 countries where the names |
|
259 agree. The UK is ranked 10th on this list by |
|
260 consuming 671,976,864 Litres of pure alcohol each year.\medskip |
|
261 |
|
262 \noindent |
|
263 Finally, write another function that takes an integer, say |
|
264 \texttt{n}, as argument. You can assume this integer is between 0 |
|
265 and 177 (the number of countries in the sorted list above). The |
|
266 function should return a triple, where the first component is the |
|
267 sum of the alcohol consumption in all countries (on the list); the |
|
268 second component is the sum of the \texttt{n}-highest alcohol |
|
269 consumers on the list; and the third component is the percentage the |
|
270 \texttt{n}-highest alcohol consumers drink with respect to the |
|
271 the world consumption. You will see that according to our data, 164 |
|
272 countries (out of 177) gobble up 100\% of the World alcohol |
|
273 consumption.\hfill\mbox{[1 Mark]} |
|
274 \end{itemize} |
|
275 |
|
276 \noindent |
|
277 \textbf{Hints:} useful list functions: \texttt{.drop(n)}, |
|
278 \texttt{.take(n)} for dropping or taking some elements in a list, |
|
279 \texttt{.getLines} for separating lines in a string; |
|
280 \texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second |
|
281 elements in the pairs---the sorting is done from smallest to highest; |
|
282 useful \texttt{Map} functions: \texttt{.toMap} converts a list of |
|
283 pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the |
|
284 map is defined at that key, that is would produce a result when |
|
285 called with this key; useful data functions: \texttt{Source.fromURL}, |
|
286 \texttt{Source.fromFile} for obtaining a webpage and reading a file. |
|
287 |
|
288 \newpage |
|
289 |
|
290 |
|
291 |
|
292 |
|
293 |
|
294 |
|
295 %%% Local Variables: |
|
296 %%% mode: latex |
|
297 %%% TeX-master: t |
|
298 %%% End: |