cws/pre_cw01.tex
changeset 396 3ffe978a5664
parent 395 017f621f5835
child 397 085fefce672e
equal deleted inserted replaced
395:017f621f5835 396:3ffe978a5664
     1 % !TEX program = xelatex
       
     2 \documentclass{article}
       
     3 \usepackage{../style}
       
     4 \usepackage{disclaimer}
       
     5 \usepackage{../langs}
       
     6 
       
     7 
       
     8 
       
     9 \begin{document}
       
    10 
       
    11 \section*{Preliminary Part 1 (Scala, 3 Marks)}
       
    12 
       
    13 \mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\
       
    14 \mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\
       
    15 \mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip
       
    16 
       
    17  
       
    18 \IMPORTANT{This part is about Scala. It is due on \cwSIX{} at 5pm and worth 3\%.
       
    19 Any 1\% you achieve in the preliminary part counts as your ``weekly engagement''.}
       
    20 
       
    21 \noindent
       
    22 Also note that the running time of each part will be restricted to a
       
    23 maximum of 30 seconds on my laptop.
       
    24 
       
    25 \DISCLAIMER{}
       
    26 
       
    27 \subsection*{Reference Implementation}
       
    28 
       
    29 Like the C++ assignments, the Scala assignments will work like this: you
       
    30 push your files to GitHub and receive (after sometimes a long delay) some
       
    31 automated feedback. In the end we take a snapshot of the submitted files and
       
    32 apply an automated marking script to them.\medskip
       
    33 
       
    34 \noindent
       
    35 In addition, the Scala coursework comes with a reference implementation
       
    36 in form of \texttt{jar}-files. This allows you to run any test cases on
       
    37 your own computer. For example you can call Scala on the command line
       
    38 with the option \texttt{-cp collatz.jar} and then query any function
       
    39 from the template file. Say you want to find out what the functions
       
    40 \texttt{collatz} and \texttt{collatz\_max} produce: for this you just
       
    41 need to prefix them with the object name \texttt{CW6a}. If you want to
       
    42 find out what these functions produce for the argument \texttt{6}, you
       
    43 would type something like:
       
    44 
       
    45 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
       
    46 $ scala -cp collatz.jar
       
    47   
       
    48 scala> CW6a.collatz(6)
       
    49 ...
       
    50 scala> CW6a.collatz_max(6)
       
    51 ...
       
    52 \end{lstlisting}%$
       
    53 
       
    54 \subsection*{Hints}
       
    55 
       
    56 \noindent
       
    57 \textbf{For the Preliminary Part:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful
       
    58 functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
       
    59 \texttt{.toList} for conversions, you can use \texttt{List(...).max} for the
       
    60 maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
       
    61 a value in a list.\bigskip
       
    62 
       
    63 
       
    64 
       
    65 \newpage
       
    66 \subsection*{Preliminary Part (3 Marks, file collatz.scala)}
       
    67 
       
    68 This part is about function definitions and recursion. You are asked
       
    69 to implement a Scala program that tests examples of the
       
    70 \emph{$3n + 1$-conjecture}, also called \emph{Collatz
       
    71   conjecture}.\video{https://www.youtube.com./watch?v=LqKpkdRRLZw}
       
    72 This conjecture can be described as follows: Start with any positive
       
    73 number $n$ greater than $0$:
       
    74 
       
    75 \begin{itemize}
       
    76 \item If $n$ is even, divide it by $2$ to obtain $n / 2$.
       
    77 \item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n +
       
    78   1$.
       
    79 \item Repeat this process and you will always end up with $1$.
       
    80 \end{itemize}
       
    81 
       
    82 \noindent
       
    83 For example if you start with, say, $6$ and $9$, you obtain the
       
    84 two \emph{Collatz series}
       
    85 %
       
    86 \[
       
    87 \begin{array}{@{}l@{\hspace{5mm}}l@{}}
       
    88 6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\
       
    89 9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1  & \text{(= 19 steps)}\\
       
    90 \end{array}
       
    91 \]
       
    92 
       
    93 \noindent
       
    94 As you can see, the numbers go up and down like a roller-coaster, but
       
    95 curiously they seem to always terminate in $1$. Nobody knows why. The
       
    96 conjecture is that this will \emph{always} happen for every number
       
    97 greater than 0.\footnote{While it is relatively easy to test this
       
    98 conjecture with particular numbers, it is an interesting open problem to
       
    99 \emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$).
       
   100 Paul Erd\"o{}s, a famous mathematician you might have heard about, said
       
   101 about this conjecture: ``Mathematics may not [yet] be ready for such
       
   102 problems.'' and also offered a \$500 cash prize for its solution.
       
   103 Jeffrey Lagarias, another mathematician, claimed that based only on
       
   104 known information about this problem, ``this is an extraordinarily
       
   105 difficult problem, completely out of reach of present day mathematics.''
       
   106 There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this
       
   107 conjecture\here{https://xkcd.com/710/}). If you are able to solve this
       
   108 conjecture, you will definitely get famous.}\bigskip
       
   109 
       
   110 \noindent
       
   111 \textbf{Tasks}
       
   112 
       
   113 \begin{itemize}
       
   114 \item[(1)] You are asked to implement a recursive function that
       
   115   calculates the number of steps needed until a series ends
       
   116   with $1$. In case of starting with $6$, it takes $8$ steps and in
       
   117   case of starting with $9$, it takes $19$ (see above). We assume it 
       
   118   takes $0$ steps, if we start with $1$. In order to
       
   119   try out this function with large numbers, you should use
       
   120   \texttt{Long} as argument type, instead of \texttt{Int}.  You can
       
   121   assume this function will be called with numbers between $1$ and
       
   122   $1$ Million. \hfill[1 Mark]
       
   123 
       
   124 \item[(2)] Write a second function that takes an upper bound as
       
   125   an argument and calculates the steps for all numbers in the range from
       
   126   1 up to this bound (the bound including). It returns the maximum number of
       
   127   steps and the corresponding number that needs that many steps.  More
       
   128   precisely it returns a pair where the first component is the number
       
   129   of steps and the second is the corresponding number. \hfill\mbox{[1
       
   130     Mark]}
       
   131 
       
   132 \item[(3)] Write a function that calculates \emph{hard
       
   133     numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377}
       
   134   in the Collatz series---these are the last odd numbers just before a
       
   135   power of two is reached.  For this, implement an
       
   136   \textit{is-power-of-two} function which tests whether a number is a
       
   137   power of two. The easiest way to implement this is by using the
       
   138   bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it
       
   139   holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why
       
   140   this is the case.
       
   141 
       
   142   The function \textit{is-hard} calculates whether
       
   143   $3n + 1$ is a power of two.  Finally the \textit{last-odd} function
       
   144   calculates the last odd number before a power of 2 in the Collatz
       
   145   series. This means for example when starting with 9,
       
   146   we receive 5 as the last odd number.  Surprisingly a lot of numbers
       
   147   have 5 as last-odd number. But for example for 113 we obtain 85,
       
   148   because of the series
       
   149   %
       
   150   \[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\]
       
   151 
       
   152   The \textit{last-odd} function will only be called with numbers that are not
       
   153   powers of 2 themselves.
       
   154 \end{itemize}
       
   155 
       
   156 \noindent
       
   157 \textbf{Test Data:} Some test ranges and cases are:
       
   158 
       
   159 \begin{itemize}
       
   160 \item 1 to 10 where $9$ takes 19 steps 
       
   161 \item 1 to 100 where $97$ takes 118 steps,
       
   162 \item 1 to 1,000 where $871$ takes 178 steps,
       
   163 \item 1 to 10,000 where $6,171$ takes 261 steps,
       
   164 \item 1 to 100,000 where $77,031$ takes 350 steps, 
       
   165 \item 1 to 1 Million where $837,799$ takes 524 steps
       
   166   %% runs out of stack space
       
   167   %% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps
       
   168 \item 21 is the last odd number for 84
       
   169 \item 341 is the last odd number for 201, 604, 605 and 8600
       
   170   
       
   171 \end{itemize}
       
   172   
       
   173 
       
   174 
       
   175 
       
   176 
       
   177 
       
   178 \end{document}
       
   179 
       
   180 
       
   181 %%%%%%% Historical Stuff
       
   182 \newpage
       
   183 
       
   184 This part is about web-scraping and list-processing in Scala. It uses
       
   185 online data about the per-capita alcohol consumption for each country
       
   186 (per year?), and a file containing the data about the population size of
       
   187 each country.  From this data you are supposed to estimate how many
       
   188 litres of pure alcohol are consumed worldwide.\bigskip
       
   189 
       
   190 \noindent
       
   191 \textbf{Tasks (file alcohol.scala):}
       
   192 
       
   193 \begin{itemize}
       
   194 \item[(1)] Write a function that given an URL requests a
       
   195   comma-separated value (CSV) list.  We are interested in the list
       
   196   from the following URL
       
   197 
       
   198 \begin{center}
       
   199   \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
       
   200 \end{center}
       
   201 
       
   202 \noindent Your function should take a string (the URL) as input, and
       
   203 produce a list of strings as output, where each string is one line in
       
   204 the corresponding CSV-list.  This list from the URL above should
       
   205 contain 194 lines.\medskip
       
   206 
       
   207 \noindent
       
   208 Write another function that can read the file \texttt{population.csv}
       
   209 from disk (the file is distributed with the assignment). This
       
   210 function should take a string as argument, the file name, and again
       
   211 return a list of strings corresponding to each entry in the
       
   212 CSV-list. For \texttt{population.csv}, this list should contain 216
       
   213 lines.\hfill[1 Mark]
       
   214 
       
   215 
       
   216 \item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we
       
   217   need to extract the data that interests us.  From the header of the
       
   218   alcohol list, you can see there are 5 columns
       
   219   
       
   220   \begin{center}
       
   221     \begin{tabular}{l}
       
   222       \texttt{country (name),}\\
       
   223       \texttt{beer\_servings,}\\
       
   224       \texttt{spirit\_servings,}\\
       
   225       \texttt{wine\_servings,}\\
       
   226       \texttt{total\_litres\_of\_pure\_alcohol}
       
   227     \end{tabular}  
       
   228   \end{center}
       
   229 
       
   230   \noindent
       
   231   Write a function that extracts the data from the first column,
       
   232   the country name, and the data from the fifth column (converted into
       
   233   a \texttt{Double}). For this go through each line of the CSV-list
       
   234   (except the first line), use the \texttt{split(",")} function to
       
   235   divide each line into an array of 5 elements. Keep the data from the
       
   236   first and fifth element in these arrays.\medskip
       
   237 
       
   238   \noindent
       
   239   Write another function that processes the population size list. This
       
   240   is already of the form country name and population size.\footnote{Your
       
   241     friendly lecturer already did the messy processing for you from the
       
   242   Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
       
   243   strings according to the commas. However, this time generate a
       
   244   \texttt{Map} from country names to population sizes.\hfill[1 Mark]
       
   245 
       
   246 \item[(3)] In (2) you generated the data about the alcohol consumption
       
   247   per-capita for each country, and also the population size for each
       
   248   country. From this generate next a sorted(!) list of the overall
       
   249   alcohol consumption for each country. The list should be sorted from
       
   250   highest alcohol consumption to lowest. The difficulty is that the
       
   251   data is scraped off from ``random'' sources on the Internet and
       
   252   annoyingly the spelling of some country names does not always agree in both
       
   253   lists. For example the alcohol list contains
       
   254   \texttt{Bosnia-Herzegovina}, while the population writes this country as
       
   255   \texttt{Bosnia and Herzegovina}. In your sorted
       
   256   overall list include only countries from the alcohol list, whose
       
   257   exact country name is also in the population size list. This means
       
   258   you can ignore countries like Bosnia-Herzegovina from the overall
       
   259   alcohol consumption. There are 177 countries where the names
       
   260   agree. The UK is ranked 10th on this list by
       
   261   consuming 671,976,864 Litres of pure alcohol each year.\medskip
       
   262   
       
   263   \noindent
       
   264   Finally, write another function that takes an integer, say
       
   265   \texttt{n}, as argument. You can assume this integer is between 0
       
   266   and 177 (the number of countries in the sorted list above).  The
       
   267   function should return a triple, where the first component is the
       
   268   sum of the alcohol consumption in all countries (on the list); the
       
   269   second component is the sum of the \texttt{n}-highest alcohol
       
   270   consumers on the list; and the third component is the percentage the
       
   271   \texttt{n}-highest alcohol consumers drink with respect to the
       
   272   the world consumption. You will see that according to our data, 164
       
   273   countries (out of 177) gobble up 100\% of the World alcohol
       
   274   consumption.\hfill\mbox{[1 Mark]}
       
   275 \end{itemize}
       
   276 
       
   277 \noindent
       
   278 \textbf{Hints:} useful list functions: \texttt{.drop(n)},
       
   279 \texttt{.take(n)} for dropping or taking some elements in a list,
       
   280 \texttt{.getLines} for separating lines in a string;
       
   281 \texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
       
   282 elements in the pairs---the sorting is done from smallest to highest;
       
   283 useful \texttt{Map} functions: \texttt{.toMap} converts a list of
       
   284 pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
       
   285 map is defined at that key, that is would produce a result when
       
   286 called with this key; useful data functions: \texttt{Source.fromURL},
       
   287 \texttt{Source.fromFile} for obtaining a webpage and reading a file.
       
   288 
       
   289 \newpage
       
   290 
       
   291 
       
   292 
       
   293 
       
   294 
       
   295 
       
   296 %%% Local Variables: 
       
   297 %%% mode: latex
       
   298 %%% TeX-master: t
       
   299 %%% End: