|      1 % !TEX program = xelatex |         | 
|      2 \documentclass{article} |         | 
|      3 \usepackage{../style} |         | 
|      4 \usepackage{disclaimer} |         | 
|      5 \usepackage{../langs} |         | 
|      6  |         | 
|      7  |         | 
|      8  |         | 
|      9 \begin{document} |         | 
|     10  |         | 
|     11 \section*{Preliminary Part 1 (Scala, 3 Marks)} |         | 
|     12  |         | 
|     13 \mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\ |         | 
|     14 \mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\ |         | 
|     15 \mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip |         | 
|     16  |         | 
|     17   |         | 
|     18 \IMPORTANT{This part is about Scala. It is due on \cwSIX{} at 5pm and worth 3\%. |         | 
|     19 Any 1\% you achieve in the preliminary part counts as your ``weekly engagement''.} |         | 
|     20  |         | 
|     21 \noindent |         | 
|     22 Also note that the running time of each part will be restricted to a |         | 
|     23 maximum of 30 seconds on my laptop. |         | 
|     24  |         | 
|     25 \DISCLAIMER{} |         | 
|     26  |         | 
|     27 \subsection*{Reference Implementation} |         | 
|     28  |         | 
|     29 Like the C++ assignments, the Scala assignments will work like this: you |         | 
|     30 push your files to GitHub and receive (after sometimes a long delay) some |         | 
|     31 automated feedback. In the end we take a snapshot of the submitted files and |         | 
|     32 apply an automated marking script to them.\medskip |         | 
|     33  |         | 
|     34 \noindent |         | 
|     35 In addition, the Scala coursework comes with a reference implementation |         | 
|     36 in form of \texttt{jar}-files. This allows you to run any test cases on |         | 
|     37 your own computer. For example you can call Scala on the command line |         | 
|     38 with the option \texttt{-cp collatz.jar} and then query any function |         | 
|     39 from the template file. Say you want to find out what the functions |         | 
|     40 \texttt{collatz} and \texttt{collatz\_max} produce: for this you just |         | 
|     41 need to prefix them with the object name \texttt{CW6a}. If you want to |         | 
|     42 find out what these functions produce for the argument \texttt{6}, you |         | 
|     43 would type something like: |         | 
|     44  |         | 
|     45 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small] |         | 
|     46 $ scala -cp collatz.jar |         | 
|     47    |         | 
|     48 scala> CW6a.collatz(6) |         | 
|     49 ... |         | 
|     50 scala> CW6a.collatz_max(6) |         | 
|     51 ... |         | 
|     52 \end{lstlisting}%$ |         | 
|     53  |         | 
|     54 \subsection*{Hints} |         | 
|     55  |         | 
|     56 \noindent |         | 
|     57 \textbf{For the Preliminary Part:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful |         | 
|     58 functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt}, |         | 
|     59 \texttt{.toList} for conversions, you can use \texttt{List(...).max} for the |         | 
|     60 maximum of a list, \texttt{List(...).indexOf(...)} for the first index of |         | 
|     61 a value in a list.\bigskip |         | 
|     62  |         | 
|     63  |         | 
|     64  |         | 
|     65 \newpage |         | 
|     66 \subsection*{Preliminary Part (3 Marks, file collatz.scala)} |         | 
|     67  |         | 
|     68 This part is about function definitions and recursion. You are asked |         | 
|     69 to implement a Scala program that tests examples of the |         | 
|     70 \emph{$3n + 1$-conjecture}, also called \emph{Collatz |         | 
|     71   conjecture}.\video{https://www.youtube.com./watch?v=LqKpkdRRLZw} |         | 
|     72 This conjecture can be described as follows: Start with any positive |         | 
|     73 number $n$ greater than $0$: |         | 
|     74  |         | 
|     75 \begin{itemize} |         | 
|     76 \item If $n$ is even, divide it by $2$ to obtain $n / 2$. |         | 
|     77 \item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n + |         | 
|     78   1$. |         | 
|     79 \item Repeat this process and you will always end up with $1$. |         | 
|     80 \end{itemize} |         | 
|     81  |         | 
|     82 \noindent |         | 
|     83 For example if you start with, say, $6$ and $9$, you obtain the |         | 
|     84 two \emph{Collatz series} |         | 
|     85 % |         | 
|     86 \[ |         | 
|     87 \begin{array}{@{}l@{\hspace{5mm}}l@{}} |         | 
|     88 6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\ |         | 
|     89 9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1  & \text{(= 19 steps)}\\ |         | 
|     90 \end{array} |         | 
|     91 \] |         | 
|     92  |         | 
|     93 \noindent |         | 
|     94 As you can see, the numbers go up and down like a roller-coaster, but |         | 
|     95 curiously they seem to always terminate in $1$. Nobody knows why. The |         | 
|     96 conjecture is that this will \emph{always} happen for every number |         | 
|     97 greater than 0.\footnote{While it is relatively easy to test this |         | 
|     98 conjecture with particular numbers, it is an interesting open problem to |         | 
|     99 \emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$). |         | 
|    100 Paul Erd\"o{}s, a famous mathematician you might have heard about, said |         | 
|    101 about this conjecture: ``Mathematics may not [yet] be ready for such |         | 
|    102 problems.'' and also offered a \$500 cash prize for its solution. |         | 
|    103 Jeffrey Lagarias, another mathematician, claimed that based only on |         | 
|    104 known information about this problem, ``this is an extraordinarily |         | 
|    105 difficult problem, completely out of reach of present day mathematics.'' |         | 
|    106 There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this |         | 
|    107 conjecture\here{https://xkcd.com/710/}). If you are able to solve this |         | 
|    108 conjecture, you will definitely get famous.}\bigskip |         | 
|    109  |         | 
|    110 \noindent |         | 
|    111 \textbf{Tasks} |         | 
|    112  |         | 
|    113 \begin{itemize} |         | 
|    114 \item[(1)] You are asked to implement a recursive function that |         | 
|    115   calculates the number of steps needed until a series ends |         | 
|    116   with $1$. In case of starting with $6$, it takes $8$ steps and in |         | 
|    117   case of starting with $9$, it takes $19$ (see above). We assume it  |         | 
|    118   takes $0$ steps, if we start with $1$. In order to |         | 
|    119   try out this function with large numbers, you should use |         | 
|    120   \texttt{Long} as argument type, instead of \texttt{Int}.  You can |         | 
|    121   assume this function will be called with numbers between $1$ and |         | 
|    122   $1$ Million. \hfill[1 Mark] |         | 
|    123  |         | 
|    124 \item[(2)] Write a second function that takes an upper bound as |         | 
|    125   an argument and calculates the steps for all numbers in the range from |         | 
|    126   1 up to this bound (the bound including). It returns the maximum number of |         | 
|    127   steps and the corresponding number that needs that many steps.  More |         | 
|    128   precisely it returns a pair where the first component is the number |         | 
|    129   of steps and the second is the corresponding number. \hfill\mbox{[1 |         | 
|    130     Mark]} |         | 
|    131  |         | 
|    132 \item[(3)] Write a function that calculates \emph{hard |         | 
|    133     numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377} |         | 
|    134   in the Collatz series---these are the last odd numbers just before a |         | 
|    135   power of two is reached.  For this, implement an |         | 
|    136   \textit{is-power-of-two} function which tests whether a number is a |         | 
|    137   power of two. The easiest way to implement this is by using the |         | 
|    138   bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it |         | 
|    139   holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why |         | 
|    140   this is the case. |         | 
|    141  |         | 
|    142   The function \textit{is-hard} calculates whether |         | 
|    143   $3n + 1$ is a power of two.  Finally the \textit{last-odd} function |         | 
|    144   calculates the last odd number before a power of 2 in the Collatz |         | 
|    145   series. This means for example when starting with 9, |         | 
|    146   we receive 5 as the last odd number.  Surprisingly a lot of numbers |         | 
|    147   have 5 as last-odd number. But for example for 113 we obtain 85, |         | 
|    148   because of the series |         | 
|    149   % |         | 
|    150   \[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\] |         | 
|    151  |         | 
|    152   The \textit{last-odd} function will only be called with numbers that are not |         | 
|    153   powers of 2 themselves. |         | 
|    154 \end{itemize} |         | 
|    155  |         | 
|    156 \noindent |         | 
|    157 \textbf{Test Data:} Some test ranges and cases are: |         | 
|    158  |         | 
|    159 \begin{itemize} |         | 
|    160 \item 1 to 10 where $9$ takes 19 steps  |         | 
|    161 \item 1 to 100 where $97$ takes 118 steps, |         | 
|    162 \item 1 to 1,000 where $871$ takes 178 steps, |         | 
|    163 \item 1 to 10,000 where $6,171$ takes 261 steps, |         | 
|    164 \item 1 to 100,000 where $77,031$ takes 350 steps,  |         | 
|    165 \item 1 to 1 Million where $837,799$ takes 524 steps |         | 
|    166   %% runs out of stack space |         | 
|    167   %% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps |         | 
|    168 \item 21 is the last odd number for 84 |         | 
|    169 \item 341 is the last odd number for 201, 604, 605 and 8600 |         | 
|    170    |         | 
|    171 \end{itemize} |         | 
|    172    |         | 
|    173  |         | 
|    174  |         | 
|    175  |         | 
|    176  |         | 
|    177  |         | 
|    178 \end{document} |         | 
|    179  |         | 
|    180  |         | 
|    181 %%%%%%% Historical Stuff |         | 
|    182 \newpage |         | 
|    183  |         | 
|    184 This part is about web-scraping and list-processing in Scala. It uses |         | 
|    185 online data about the per-capita alcohol consumption for each country |         | 
|    186 (per year?), and a file containing the data about the population size of |         | 
|    187 each country.  From this data you are supposed to estimate how many |         | 
|    188 litres of pure alcohol are consumed worldwide.\bigskip |         | 
|    189  |         | 
|    190 \noindent |         | 
|    191 \textbf{Tasks (file alcohol.scala):} |         | 
|    192  |         | 
|    193 \begin{itemize} |         | 
|    194 \item[(1)] Write a function that given an URL requests a |         | 
|    195   comma-separated value (CSV) list.  We are interested in the list |         | 
|    196   from the following URL |         | 
|    197  |         | 
|    198 \begin{center} |         | 
|    199   \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv} |         | 
|    200 \end{center} |         | 
|    201  |         | 
|    202 \noindent Your function should take a string (the URL) as input, and |         | 
|    203 produce a list of strings as output, where each string is one line in |         | 
|    204 the corresponding CSV-list.  This list from the URL above should |         | 
|    205 contain 194 lines.\medskip |         | 
|    206  |         | 
|    207 \noindent |         | 
|    208 Write another function that can read the file \texttt{population.csv} |         | 
|    209 from disk (the file is distributed with the assignment). This |         | 
|    210 function should take a string as argument, the file name, and again |         | 
|    211 return a list of strings corresponding to each entry in the |         | 
|    212 CSV-list. For \texttt{population.csv}, this list should contain 216 |         | 
|    213 lines.\hfill[1 Mark] |         | 
|    214  |         | 
|    215  |         | 
|    216 \item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we |         | 
|    217   need to extract the data that interests us.  From the header of the |         | 
|    218   alcohol list, you can see there are 5 columns |         | 
|    219    |         | 
|    220   \begin{center} |         | 
|    221     \begin{tabular}{l} |         | 
|    222       \texttt{country (name),}\\ |         | 
|    223       \texttt{beer\_servings,}\\ |         | 
|    224       \texttt{spirit\_servings,}\\ |         | 
|    225       \texttt{wine\_servings,}\\ |         | 
|    226       \texttt{total\_litres\_of\_pure\_alcohol} |         | 
|    227     \end{tabular}   |         | 
|    228   \end{center} |         | 
|    229  |         | 
|    230   \noindent |         | 
|    231   Write a function that extracts the data from the first column, |         | 
|    232   the country name, and the data from the fifth column (converted into |         | 
|    233   a \texttt{Double}). For this go through each line of the CSV-list |         | 
|    234   (except the first line), use the \texttt{split(",")} function to |         | 
|    235   divide each line into an array of 5 elements. Keep the data from the |         | 
|    236   first and fifth element in these arrays.\medskip |         | 
|    237  |         | 
|    238   \noindent |         | 
|    239   Write another function that processes the population size list. This |         | 
|    240   is already of the form country name and population size.\footnote{Your |         | 
|    241     friendly lecturer already did the messy processing for you from the |         | 
|    242   Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the |         | 
|    243   strings according to the commas. However, this time generate a |         | 
|    244   \texttt{Map} from country names to population sizes.\hfill[1 Mark] |         | 
|    245  |         | 
|    246 \item[(3)] In (2) you generated the data about the alcohol consumption |         | 
|    247   per-capita for each country, and also the population size for each |         | 
|    248   country. From this generate next a sorted(!) list of the overall |         | 
|    249   alcohol consumption for each country. The list should be sorted from |         | 
|    250   highest alcohol consumption to lowest. The difficulty is that the |         | 
|    251   data is scraped off from ``random'' sources on the Internet and |         | 
|    252   annoyingly the spelling of some country names does not always agree in both |         | 
|    253   lists. For example the alcohol list contains |         | 
|    254   \texttt{Bosnia-Herzegovina}, while the population writes this country as |         | 
|    255   \texttt{Bosnia and Herzegovina}. In your sorted |         | 
|    256   overall list include only countries from the alcohol list, whose |         | 
|    257   exact country name is also in the population size list. This means |         | 
|    258   you can ignore countries like Bosnia-Herzegovina from the overall |         | 
|    259   alcohol consumption. There are 177 countries where the names |         | 
|    260   agree. The UK is ranked 10th on this list by |         | 
|    261   consuming 671,976,864 Litres of pure alcohol each year.\medskip |         | 
|    262    |         | 
|    263   \noindent |         | 
|    264   Finally, write another function that takes an integer, say |         | 
|    265   \texttt{n}, as argument. You can assume this integer is between 0 |         | 
|    266   and 177 (the number of countries in the sorted list above).  The |         | 
|    267   function should return a triple, where the first component is the |         | 
|    268   sum of the alcohol consumption in all countries (on the list); the |         | 
|    269   second component is the sum of the \texttt{n}-highest alcohol |         | 
|    270   consumers on the list; and the third component is the percentage the |         | 
|    271   \texttt{n}-highest alcohol consumers drink with respect to the |         | 
|    272   the world consumption. You will see that according to our data, 164 |         | 
|    273   countries (out of 177) gobble up 100\% of the World alcohol |         | 
|    274   consumption.\hfill\mbox{[1 Mark]} |         | 
|    275 \end{itemize} |         | 
|    276  |         | 
|    277 \noindent |         | 
|    278 \textbf{Hints:} useful list functions: \texttt{.drop(n)}, |         | 
|    279 \texttt{.take(n)} for dropping or taking some elements in a list, |         | 
|    280 \texttt{.getLines} for separating lines in a string; |         | 
|    281 \texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second |         | 
|    282 elements in the pairs---the sorting is done from smallest to highest; |         | 
|    283 useful \texttt{Map} functions: \texttt{.toMap} converts a list of |         | 
|    284 pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the |         | 
|    285 map is defined at that key, that is would produce a result when |         | 
|    286 called with this key; useful data functions: \texttt{Source.fromURL}, |         | 
|    287 \texttt{Source.fromFile} for obtaining a webpage and reading a file. |         | 
|    288  |         | 
|    289 \newpage |         | 
|    290  |         | 
|    291  |         | 
|    292  |         | 
|    293  |         | 
|    294  |         | 
|    295  |         | 
|    296 %%% Local Variables:  |         | 
|    297 %%% mode: latex |         | 
|    298 %%% TeX-master: t |         | 
|    299 %%% End:  |         |