cws/cw01.tex
changeset 344 a2ac7e3fa330
parent 343 c8fcc0e0a57f
child 345 40657f9a4e4a
equal deleted inserted replaced
343:c8fcc0e0a57f 344:a2ac7e3fa330
     1 % !TEX program = xelatex
       
     2 \documentclass{article}
       
     3 \usepackage{../style}
       
     4 \usepackage{disclaimer}
       
     5 \usepackage{../langs}
       
     6 
       
     7 
       
     8 
       
     9 \begin{document}
       
    10 
       
    11 \section*{Preliminary Part 6 (Scala, 3 Marks)}
       
    12 
       
    13 \mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\
       
    14 \mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\
       
    15 \mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip
       
    16 
       
    17  
       
    18 \IMPORTANT{This part is about Scala. It is due on \cwSIX{} at 4pm and worth 3\%.}
       
    19 
       
    20 \noindent
       
    21 Also note that the running time of each part will be restricted to a
       
    22 maximum of 30 seconds on my laptop.
       
    23 
       
    24 \DISCLAIMER{}
       
    25 
       
    26 \subsection*{Reference Implementation}
       
    27 
       
    28 Like the C++ assignments, the Scala assignments will work like this: you
       
    29 push your files to GitHub and receive (after sometimes a long delay) some
       
    30 automated feedback. In the end we take a snapshot of the submitted files and
       
    31 apply an automated marking script to them.\medskip
       
    32 
       
    33 \noindent
       
    34 In addition, the Scala coursework comes with a reference implementation
       
    35 in form of \texttt{jar}-files. This allows you to run any test cases on
       
    36 your own computer. For example you can call Scala on the command line
       
    37 with the option \texttt{-cp collatz.jar} and then query any function
       
    38 from the template file. Say you want to find out what the functions
       
    39 \texttt{collatz} and \texttt{collatz\_max} produce: for this you just
       
    40 need to prefix them with the object name \texttt{CW6a}. If you want to
       
    41 find out what these functions produce for the argument \texttt{6}, you
       
    42 would type something like:
       
    43 
       
    44 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
       
    45 $ scala -cp collatz.jar
       
    46   
       
    47 scala> CW6a.collatz(6)
       
    48 ...
       
    49 scala> CW6a.collatz_max(6)
       
    50 ...
       
    51 \end{lstlisting}%$
       
    52 
       
    53 \subsection*{Hints}
       
    54 
       
    55 \noindent
       
    56 \textbf{For Preliminary Part:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful
       
    57 functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
       
    58 \texttt{.toList} for conversions, you can use \texttt{List(...).max} for the
       
    59 maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
       
    60 a value in a list.\bigskip
       
    61 
       
    62 
       
    63 
       
    64 \newpage
       
    65 \subsection*{Preliminary Part (3 Marks, file collatz.scala)}
       
    66 
       
    67 This part is about function definitions and recursion. You are asked
       
    68 to implement a Scala program that tests examples of the
       
    69 \emph{$3n + 1$-conjecture}, also called \emph{Collatz
       
    70   conjecture}.\video{https://www.youtube.com./watch?v=LqKpkdRRLZw}
       
    71 This conjecture can be described as follows: Start with any positive
       
    72 number $n$ greater than $0$:
       
    73 
       
    74 \begin{itemize}
       
    75 \item If $n$ is even, divide it by $2$ to obtain $n / 2$.
       
    76 \item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n +
       
    77   1$.
       
    78 \item Repeat this process and you will always end up with $1$.
       
    79 \end{itemize}
       
    80 
       
    81 \noindent
       
    82 For example if you start with, say, $6$ and $9$, you obtain the
       
    83 two \emph{Collatz series}
       
    84 %
       
    85 \[
       
    86 \begin{array}{@{}l@{\hspace{5mm}}l@{}}
       
    87 6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\
       
    88 9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1  & \text{(= 19 steps)}\\
       
    89 \end{array}
       
    90 \]
       
    91 
       
    92 \noindent
       
    93 As you can see, the numbers go up and down like a roller-coaster, but
       
    94 curiously they seem to always terminate in $1$. Nobody knows why. The
       
    95 conjecture is that this will \emph{always} happen for every number
       
    96 greater than 0.\footnote{While it is relatively easy to test this
       
    97 conjecture with particular numbers, it is an interesting open problem to
       
    98 \emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$).
       
    99 Paul Erd\"o{}s, a famous mathematician you might have heard about, said
       
   100 about this conjecture: ``Mathematics may not [yet] be ready for such
       
   101 problems.'' and also offered a \$500 cash prize for its solution.
       
   102 Jeffrey Lagarias, another mathematician, claimed that based only on
       
   103 known information about this problem, ``this is an extraordinarily
       
   104 difficult problem, completely out of reach of present day mathematics.''
       
   105 There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this
       
   106 conjecture\here{https://xkcd.com/710/}). If you are able to solve this
       
   107 conjecture, you will definitely get famous.}\bigskip
       
   108 
       
   109 \noindent
       
   110 \textbf{Tasks}
       
   111 
       
   112 \begin{itemize}
       
   113 \item[(1)] You are asked to implement a recursive function that
       
   114   calculates the number of steps needed until a series ends
       
   115   with $1$. In case of starting with $6$, it takes $8$ steps and in
       
   116   case of starting with $9$, it takes $19$ (see above). We assume it 
       
   117   takes $0$ steps, if we start with $1$. In order to
       
   118   try out this function with large numbers, you should use
       
   119   \texttt{Long} as argument type, instead of \texttt{Int}.  You can
       
   120   assume this function will be called with numbers between $1$ and
       
   121   $1$ Million. \hfill[1 Mark]
       
   122 
       
   123 \item[(2)] Write a second function that takes an upper bound as
       
   124   an argument and calculates the steps for all numbers in the range from
       
   125   1 up to this bound (the bound including). It returns the maximum number of
       
   126   steps and the corresponding number that needs that many steps.  More
       
   127   precisely it returns a pair where the first component is the number
       
   128   of steps and the second is the corresponding number. \hfill\mbox{[1
       
   129     Mark]}
       
   130 
       
   131 \item[(3)] Write a function that calculates \emph{hard
       
   132     numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377}
       
   133   in the Collatz series---these are the last odd numbers just before a
       
   134   power of two is reached.  For this, implement an
       
   135   \textit{is-power-of-two} function which tests whether a number is a
       
   136   power of two. The easiest way to implement this is by using the
       
   137   bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it
       
   138   holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why
       
   139   this is the case. The function \textit{is-hard} calculates whether
       
   140   $3n + 1$ is a power of two.  Finally the \textit{last-odd} function
       
   141   calculates the last odd number before a power of 2 in the Collatz
       
   142   series. This means for example when starting with 6 and also with 9,
       
   143   we receive 5 as the last odd number.  Surprisingly a lot of numbers
       
   144   have 5 as last-odd number. But for example for 113 we obtain 85,
       
   145   because of the series
       
   146   %
       
   147   \[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\]
       
   148 
       
   149   The \textit{last-odd} function will only be called with numbers that are not
       
   150   powers of 2 themselves.
       
   151 \end{itemize}
       
   152 
       
   153 \noindent
       
   154 \textbf{Test Data:} Some test ranges and cases are:
       
   155 
       
   156 \begin{itemize}
       
   157 \item 1 to 10 where $9$ takes 19 steps 
       
   158 \item 1 to 100 where $97$ takes 118 steps,
       
   159 \item 1 to 1,000 where $871$ takes 178 steps,
       
   160 \item 1 to 10,000 where $6,171$ takes 261 steps,
       
   161 \item 1 to 100,000 where $77,031$ takes 350 steps, 
       
   162 \item 1 to 1 Million where $837,799$ takes 524 steps
       
   163   %% runs out of stack space
       
   164   %% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps
       
   165 \item 21 is the last odd number for 84
       
   166 \item 341 is the last odd number for 201, 604, 605 and 8600
       
   167   
       
   168 \end{itemize}
       
   169   
       
   170 
       
   171 
       
   172 
       
   173 
       
   174 
       
   175 \end{document}
       
   176 
       
   177 
       
   178 %%%%%%% Historical Stuff
       
   179 \newpage
       
   180 
       
   181 This part is about web-scraping and list-processing in Scala. It uses
       
   182 online data about the per-capita alcohol consumption for each country
       
   183 (per year?), and a file containing the data about the population size of
       
   184 each country.  From this data you are supposed to estimate how many
       
   185 litres of pure alcohol are consumed worldwide.\bigskip
       
   186 
       
   187 \noindent
       
   188 \textbf{Tasks (file alcohol.scala):}
       
   189 
       
   190 \begin{itemize}
       
   191 \item[(1)] Write a function that given an URL requests a
       
   192   comma-separated value (CSV) list.  We are interested in the list
       
   193   from the following URL
       
   194 
       
   195 \begin{center}
       
   196   \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
       
   197 \end{center}
       
   198 
       
   199 \noindent Your function should take a string (the URL) as input, and
       
   200 produce a list of strings as output, where each string is one line in
       
   201 the corresponding CSV-list.  This list from the URL above should
       
   202 contain 194 lines.\medskip
       
   203 
       
   204 \noindent
       
   205 Write another function that can read the file \texttt{population.csv}
       
   206 from disk (the file is distributed with the assignment). This
       
   207 function should take a string as argument, the file name, and again
       
   208 return a list of strings corresponding to each entry in the
       
   209 CSV-list. For \texttt{population.csv}, this list should contain 216
       
   210 lines.\hfill[1 Mark]
       
   211 
       
   212 
       
   213 \item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we
       
   214   need to extract the data that interests us.  From the header of the
       
   215   alcohol list, you can see there are 5 columns
       
   216   
       
   217   \begin{center}
       
   218     \begin{tabular}{l}
       
   219       \texttt{country (name),}\\
       
   220       \texttt{beer\_servings,}\\
       
   221       \texttt{spirit\_servings,}\\
       
   222       \texttt{wine\_servings,}\\
       
   223       \texttt{total\_litres\_of\_pure\_alcohol}
       
   224     \end{tabular}  
       
   225   \end{center}
       
   226 
       
   227   \noindent
       
   228   Write a function that extracts the data from the first column,
       
   229   the country name, and the data from the fifth column (converted into
       
   230   a \texttt{Double}). For this go through each line of the CSV-list
       
   231   (except the first line), use the \texttt{split(",")} function to
       
   232   divide each line into an array of 5 elements. Keep the data from the
       
   233   first and fifth element in these arrays.\medskip
       
   234 
       
   235   \noindent
       
   236   Write another function that processes the population size list. This
       
   237   is already of the form country name and population size.\footnote{Your
       
   238     friendly lecturer already did the messy processing for you from the
       
   239   Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
       
   240   strings according to the commas. However, this time generate a
       
   241   \texttt{Map} from country names to population sizes.\hfill[1 Mark]
       
   242 
       
   243 \item[(3)] In (2) you generated the data about the alcohol consumption
       
   244   per-capita for each country, and also the population size for each
       
   245   country. From this generate next a sorted(!) list of the overall
       
   246   alcohol consumption for each country. The list should be sorted from
       
   247   highest alcohol consumption to lowest. The difficulty is that the
       
   248   data is scraped off from ``random'' sources on the Internet and
       
   249   annoyingly the spelling of some country names does not always agree in both
       
   250   lists. For example the alcohol list contains
       
   251   \texttt{Bosnia-Herzegovina}, while the population writes this country as
       
   252   \texttt{Bosnia and Herzegovina}. In your sorted
       
   253   overall list include only countries from the alcohol list, whose
       
   254   exact country name is also in the population size list. This means
       
   255   you can ignore countries like Bosnia-Herzegovina from the overall
       
   256   alcohol consumption. There are 177 countries where the names
       
   257   agree. The UK is ranked 10th on this list by
       
   258   consuming 671,976,864 Litres of pure alcohol each year.\medskip
       
   259   
       
   260   \noindent
       
   261   Finally, write another function that takes an integer, say
       
   262   \texttt{n}, as argument. You can assume this integer is between 0
       
   263   and 177 (the number of countries in the sorted list above).  The
       
   264   function should return a triple, where the first component is the
       
   265   sum of the alcohol consumption in all countries (on the list); the
       
   266   second component is the sum of the \texttt{n}-highest alcohol
       
   267   consumers on the list; and the third component is the percentage the
       
   268   \texttt{n}-highest alcohol consumers drink with respect to the
       
   269   the world consumption. You will see that according to our data, 164
       
   270   countries (out of 177) gobble up 100\% of the World alcohol
       
   271   consumption.\hfill\mbox{[1 Mark]}
       
   272 \end{itemize}
       
   273 
       
   274 \noindent
       
   275 \textbf{Hints:} useful list functions: \texttt{.drop(n)},
       
   276 \texttt{.take(n)} for dropping or taking some elements in a list,
       
   277 \texttt{.getLines} for separating lines in a string;
       
   278 \texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
       
   279 elements in the pairs---the sorting is done from smallest to highest;
       
   280 useful \texttt{Map} functions: \texttt{.toMap} converts a list of
       
   281 pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
       
   282 map is defined at that key, that is would produce a result when
       
   283 called with this key; useful data functions: \texttt{Source.fromURL},
       
   284 \texttt{Source.fromFile} for obtaining a webpage and reading a file.
       
   285 
       
   286 \newpage
       
   287 
       
   288 
       
   289 
       
   290 
       
   291 
       
   292 
       
   293 %%% Local Variables: 
       
   294 %%% mode: latex
       
   295 %%% TeX-master: t
       
   296 %%% End: