cws/pre_cw01.tex
changeset 344 a2ac7e3fa330
parent 343 c8fcc0e0a57f
child 345 40657f9a4e4a
equal deleted inserted replaced
343:c8fcc0e0a57f 344:a2ac7e3fa330
       
     1 % !TEX program = xelatex
       
     2 \documentclass{article}
       
     3 \usepackage{../style}
       
     4 \usepackage{disclaimer}
       
     5 \usepackage{../langs}
       
     6 
       
     7 
       
     8 
       
     9 \begin{document}
       
    10 
       
    11 \section*{Preliminary Part 6 (Scala, 3 Marks)}
       
    12 
       
    13 \mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\
       
    14 \mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\
       
    15 \mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip
       
    16 
       
    17  
       
    18 \IMPORTANT{This part is about Scala. It is due on \cwSIX{} at 5pm and worth 3\%.}
       
    19 
       
    20 \noindent
       
    21 Also note that the running time of each part will be restricted to a
       
    22 maximum of 30 seconds on my laptop.
       
    23 
       
    24 \DISCLAIMER{}
       
    25 
       
    26 \subsection*{Reference Implementation}
       
    27 
       
    28 Like the C++ assignments, the Scala assignments will work like this: you
       
    29 push your files to GitHub and receive (after sometimes a long delay) some
       
    30 automated feedback. In the end we take a snapshot of the submitted files and
       
    31 apply an automated marking script to them.\medskip
       
    32 
       
    33 \noindent
       
    34 In addition, the Scala coursework comes with a reference implementation
       
    35 in form of \texttt{jar}-files. This allows you to run any test cases on
       
    36 your own computer. For example you can call Scala on the command line
       
    37 with the option \texttt{-cp collatz.jar} and then query any function
       
    38 from the template file. Say you want to find out what the functions
       
    39 \texttt{collatz} and \texttt{collatz\_max} produce: for this you just
       
    40 need to prefix them with the object name \texttt{CW6a}. If you want to
       
    41 find out what these functions produce for the argument \texttt{6}, you
       
    42 would type something like:
       
    43 
       
    44 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
       
    45 $ scala -cp collatz.jar
       
    46   
       
    47 scala> CW6a.collatz(6)
       
    48 ...
       
    49 scala> CW6a.collatz_max(6)
       
    50 ...
       
    51 \end{lstlisting}%$
       
    52 
       
    53 \subsection*{Hints}
       
    54 
       
    55 \noindent
       
    56 \textbf{For Preliminary Part:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful
       
    57 functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
       
    58 \texttt{.toList} for conversions, you can use \texttt{List(...).max} for the
       
    59 maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
       
    60 a value in a list.\bigskip
       
    61 
       
    62 
       
    63 
       
    64 \newpage
       
    65 \subsection*{Preliminary Part (3 Marks, file collatz.scala)}
       
    66 
       
    67 This part is about function definitions and recursion. You are asked
       
    68 to implement a Scala program that tests examples of the
       
    69 \emph{$3n + 1$-conjecture}, also called \emph{Collatz
       
    70   conjecture}.\video{https://www.youtube.com./watch?v=LqKpkdRRLZw}
       
    71 This conjecture can be described as follows: Start with any positive
       
    72 number $n$ greater than $0$:
       
    73 
       
    74 \begin{itemize}
       
    75 \item If $n$ is even, divide it by $2$ to obtain $n / 2$.
       
    76 \item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n +
       
    77   1$.
       
    78 \item Repeat this process and you will always end up with $1$.
       
    79 \end{itemize}
       
    80 
       
    81 \noindent
       
    82 For example if you start with, say, $6$ and $9$, you obtain the
       
    83 two \emph{Collatz series}
       
    84 %
       
    85 \[
       
    86 \begin{array}{@{}l@{\hspace{5mm}}l@{}}
       
    87 6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\
       
    88 9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1  & \text{(= 19 steps)}\\
       
    89 \end{array}
       
    90 \]
       
    91 
       
    92 \noindent
       
    93 As you can see, the numbers go up and down like a roller-coaster, but
       
    94 curiously they seem to always terminate in $1$. Nobody knows why. The
       
    95 conjecture is that this will \emph{always} happen for every number
       
    96 greater than 0.\footnote{While it is relatively easy to test this
       
    97 conjecture with particular numbers, it is an interesting open problem to
       
    98 \emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$).
       
    99 Paul Erd\"o{}s, a famous mathematician you might have heard about, said
       
   100 about this conjecture: ``Mathematics may not [yet] be ready for such
       
   101 problems.'' and also offered a \$500 cash prize for its solution.
       
   102 Jeffrey Lagarias, another mathematician, claimed that based only on
       
   103 known information about this problem, ``this is an extraordinarily
       
   104 difficult problem, completely out of reach of present day mathematics.''
       
   105 There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this
       
   106 conjecture\here{https://xkcd.com/710/}). If you are able to solve this
       
   107 conjecture, you will definitely get famous.}\bigskip
       
   108 
       
   109 \noindent
       
   110 \textbf{Tasks}
       
   111 
       
   112 \begin{itemize}
       
   113 \item[(1)] You are asked to implement a recursive function that
       
   114   calculates the number of steps needed until a series ends
       
   115   with $1$. In case of starting with $6$, it takes $8$ steps and in
       
   116   case of starting with $9$, it takes $19$ (see above). We assume it 
       
   117   takes $0$ steps, if we start with $1$. In order to
       
   118   try out this function with large numbers, you should use
       
   119   \texttt{Long} as argument type, instead of \texttt{Int}.  You can
       
   120   assume this function will be called with numbers between $1$ and
       
   121   $1$ Million. \hfill[1 Mark]
       
   122 
       
   123 \item[(2)] Write a second function that takes an upper bound as
       
   124   an argument and calculates the steps for all numbers in the range from
       
   125   1 up to this bound (the bound including). It returns the maximum number of
       
   126   steps and the corresponding number that needs that many steps.  More
       
   127   precisely it returns a pair where the first component is the number
       
   128   of steps and the second is the corresponding number. \hfill\mbox{[1
       
   129     Mark]}
       
   130 
       
   131 \item[(3)] Write a function that calculates \emph{hard
       
   132     numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377}
       
   133   in the Collatz series---these are the last odd numbers just before a
       
   134   power of two is reached.  For this, implement an
       
   135   \textit{is-power-of-two} function which tests whether a number is a
       
   136   power of two. The easiest way to implement this is by using the
       
   137   bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it
       
   138   holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why
       
   139   this is the case.
       
   140 
       
   141   The function \textit{is-hard} calculates whether
       
   142   $3n + 1$ is a power of two.  Finally the \textit{last-odd} function
       
   143   calculates the last odd number before a power of 2 in the Collatz
       
   144   series. This means for example when starting with 9,
       
   145   we receive 5 as the last odd number.  Surprisingly a lot of numbers
       
   146   have 5 as last-odd number. But for example for 113 we obtain 85,
       
   147   because of the series
       
   148   %
       
   149   \[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\]
       
   150 
       
   151   The \textit{last-odd} function will only be called with numbers that are not
       
   152   powers of 2 themselves.
       
   153 \end{itemize}
       
   154 
       
   155 \noindent
       
   156 \textbf{Test Data:} Some test ranges and cases are:
       
   157 
       
   158 \begin{itemize}
       
   159 \item 1 to 10 where $9$ takes 19 steps 
       
   160 \item 1 to 100 where $97$ takes 118 steps,
       
   161 \item 1 to 1,000 where $871$ takes 178 steps,
       
   162 \item 1 to 10,000 where $6,171$ takes 261 steps,
       
   163 \item 1 to 100,000 where $77,031$ takes 350 steps, 
       
   164 \item 1 to 1 Million where $837,799$ takes 524 steps
       
   165   %% runs out of stack space
       
   166   %% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps
       
   167 \item 21 is the last odd number for 84
       
   168 \item 341 is the last odd number for 201, 604, 605 and 8600
       
   169   
       
   170 \end{itemize}
       
   171   
       
   172 
       
   173 
       
   174 
       
   175 
       
   176 
       
   177 \end{document}
       
   178 
       
   179 
       
   180 %%%%%%% Historical Stuff
       
   181 \newpage
       
   182 
       
   183 This part is about web-scraping and list-processing in Scala. It uses
       
   184 online data about the per-capita alcohol consumption for each country
       
   185 (per year?), and a file containing the data about the population size of
       
   186 each country.  From this data you are supposed to estimate how many
       
   187 litres of pure alcohol are consumed worldwide.\bigskip
       
   188 
       
   189 \noindent
       
   190 \textbf{Tasks (file alcohol.scala):}
       
   191 
       
   192 \begin{itemize}
       
   193 \item[(1)] Write a function that given an URL requests a
       
   194   comma-separated value (CSV) list.  We are interested in the list
       
   195   from the following URL
       
   196 
       
   197 \begin{center}
       
   198   \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
       
   199 \end{center}
       
   200 
       
   201 \noindent Your function should take a string (the URL) as input, and
       
   202 produce a list of strings as output, where each string is one line in
       
   203 the corresponding CSV-list.  This list from the URL above should
       
   204 contain 194 lines.\medskip
       
   205 
       
   206 \noindent
       
   207 Write another function that can read the file \texttt{population.csv}
       
   208 from disk (the file is distributed with the assignment). This
       
   209 function should take a string as argument, the file name, and again
       
   210 return a list of strings corresponding to each entry in the
       
   211 CSV-list. For \texttt{population.csv}, this list should contain 216
       
   212 lines.\hfill[1 Mark]
       
   213 
       
   214 
       
   215 \item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we
       
   216   need to extract the data that interests us.  From the header of the
       
   217   alcohol list, you can see there are 5 columns
       
   218   
       
   219   \begin{center}
       
   220     \begin{tabular}{l}
       
   221       \texttt{country (name),}\\
       
   222       \texttt{beer\_servings,}\\
       
   223       \texttt{spirit\_servings,}\\
       
   224       \texttt{wine\_servings,}\\
       
   225       \texttt{total\_litres\_of\_pure\_alcohol}
       
   226     \end{tabular}  
       
   227   \end{center}
       
   228 
       
   229   \noindent
       
   230   Write a function that extracts the data from the first column,
       
   231   the country name, and the data from the fifth column (converted into
       
   232   a \texttt{Double}). For this go through each line of the CSV-list
       
   233   (except the first line), use the \texttt{split(",")} function to
       
   234   divide each line into an array of 5 elements. Keep the data from the
       
   235   first and fifth element in these arrays.\medskip
       
   236 
       
   237   \noindent
       
   238   Write another function that processes the population size list. This
       
   239   is already of the form country name and population size.\footnote{Your
       
   240     friendly lecturer already did the messy processing for you from the
       
   241   Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
       
   242   strings according to the commas. However, this time generate a
       
   243   \texttt{Map} from country names to population sizes.\hfill[1 Mark]
       
   244 
       
   245 \item[(3)] In (2) you generated the data about the alcohol consumption
       
   246   per-capita for each country, and also the population size for each
       
   247   country. From this generate next a sorted(!) list of the overall
       
   248   alcohol consumption for each country. The list should be sorted from
       
   249   highest alcohol consumption to lowest. The difficulty is that the
       
   250   data is scraped off from ``random'' sources on the Internet and
       
   251   annoyingly the spelling of some country names does not always agree in both
       
   252   lists. For example the alcohol list contains
       
   253   \texttt{Bosnia-Herzegovina}, while the population writes this country as
       
   254   \texttt{Bosnia and Herzegovina}. In your sorted
       
   255   overall list include only countries from the alcohol list, whose
       
   256   exact country name is also in the population size list. This means
       
   257   you can ignore countries like Bosnia-Herzegovina from the overall
       
   258   alcohol consumption. There are 177 countries where the names
       
   259   agree. The UK is ranked 10th on this list by
       
   260   consuming 671,976,864 Litres of pure alcohol each year.\medskip
       
   261   
       
   262   \noindent
       
   263   Finally, write another function that takes an integer, say
       
   264   \texttt{n}, as argument. You can assume this integer is between 0
       
   265   and 177 (the number of countries in the sorted list above).  The
       
   266   function should return a triple, where the first component is the
       
   267   sum of the alcohol consumption in all countries (on the list); the
       
   268   second component is the sum of the \texttt{n}-highest alcohol
       
   269   consumers on the list; and the third component is the percentage the
       
   270   \texttt{n}-highest alcohol consumers drink with respect to the
       
   271   the world consumption. You will see that according to our data, 164
       
   272   countries (out of 177) gobble up 100\% of the World alcohol
       
   273   consumption.\hfill\mbox{[1 Mark]}
       
   274 \end{itemize}
       
   275 
       
   276 \noindent
       
   277 \textbf{Hints:} useful list functions: \texttt{.drop(n)},
       
   278 \texttt{.take(n)} for dropping or taking some elements in a list,
       
   279 \texttt{.getLines} for separating lines in a string;
       
   280 \texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
       
   281 elements in the pairs---the sorting is done from smallest to highest;
       
   282 useful \texttt{Map} functions: \texttt{.toMap} converts a list of
       
   283 pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
       
   284 map is defined at that key, that is would produce a result when
       
   285 called with this key; useful data functions: \texttt{Source.fromURL},
       
   286 \texttt{Source.fromFile} for obtaining a webpage and reading a file.
       
   287 
       
   288 \newpage
       
   289 
       
   290 
       
   291 
       
   292 
       
   293 
       
   294 
       
   295 %%% Local Variables: 
       
   296 %%% mode: latex
       
   297 %%% TeX-master: t
       
   298 %%% End: