pep-material: cws/cw01.tex@c8fcc0e0a57f


% !TEX program = xelatex
\documentclass{article}
\usepackage{../style}
\usepackage{disclaimer}
\usepackage{../langs}



\begin{document}

\section*{Preliminary Part 6 (Scala, 3 Marks)}

\mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\
\mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\
\mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip

 
\IMPORTANT{This part is about Scala. It is due on \cwSIX{} at 4pm and worth 3\%.}

\noindent
Also note that the running time of each part will be restricted to a
maximum of 30 seconds on my laptop.

\DISCLAIMER{}

\subsection*{Reference Implementation}

Like the C++ assignments, the Scala assignments will work like this: you
push your files to GitHub and receive (after sometimes a long delay) some
automated feedback. In the end we take a snapshot of the submitted files and
apply an automated marking script to them.\medskip

\noindent
In addition, the Scala coursework comes with a reference implementation
in form of \texttt{jar}-files. This allows you to run any test cases on
your own computer. For example you can call Scala on the command line
with the option \texttt{-cp collatz.jar} and then query any function
from the template file. Say you want to find out what the functions
\texttt{collatz} and \texttt{collatz\_max} produce: for this you just
need to prefix them with the object name \texttt{CW6a}. If you want to
find out what these functions produce for the argument \texttt{6}, you
would type something like:

\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
$ scala -cp collatz.jar
  
scala> CW6a.collatz(6)
...
scala> CW6a.collatz_max(6)
...
\end{lstlisting}%$

\subsection*{Hints}

\noindent
\textbf{For Preliminary Part:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful
functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
\texttt{.toList} for conversions, you can use \texttt{List(...).max} for the
maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
a value in a list.\bigskip



\newpage
\subsection*{Preliminary Part (3 Marks, file collatz.scala)}

This part is about function definitions and recursion. You are asked
to implement a Scala program that tests examples of the
\emph{$3n + 1$-conjecture}, also called \emph{Collatz
  conjecture}.\video{https://www.youtube.com./watch?v=LqKpkdRRLZw}
This conjecture can be described as follows: Start with any positive
number $n$ greater than $0$:

\begin{itemize}
\item If $n$ is even, divide it by $2$ to obtain $n / 2$.
\item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n +
  1$.
\item Repeat this process and you will always end up with $1$.
\end{itemize}

\noindent
For example if you start with, say, $6$ and $9$, you obtain the
two \emph{Collatz series}
%
\[
\begin{array}{@{}l@{\hspace{5mm}}l@{}}
6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\
9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1  & \text{(= 19 steps)}\\
\end{array}
\]

\noindent
As you can see, the numbers go up and down like a roller-coaster, but
curiously they seem to always terminate in $1$. Nobody knows why. The
conjecture is that this will \emph{always} happen for every number
greater than 0.\footnote{While it is relatively easy to test this
conjecture with particular numbers, it is an interesting open problem to
\emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$).
Paul Erd\"o{}s, a famous mathematician you might have heard about, said
about this conjecture: ``Mathematics may not [yet] be ready for such
problems.'' and also offered a \$500 cash prize for its solution.
Jeffrey Lagarias, another mathematician, claimed that based only on
known information about this problem, ``this is an extraordinarily
difficult problem, completely out of reach of present day mathematics.''
There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this
conjecture\here{https://xkcd.com/710/}). If you are able to solve this
conjecture, you will definitely get famous.}\bigskip

\noindent
\textbf{Tasks}

\begin{itemize}
\item[(1)] You are asked to implement a recursive function that
  calculates the number of steps needed until a series ends
  with $1$. In case of starting with $6$, it takes $8$ steps and in
  case of starting with $9$, it takes $19$ (see above). We assume it 
  takes $0$ steps, if we start with $1$. In order to
  try out this function with large numbers, you should use
  \texttt{Long} as argument type, instead of \texttt{Int}.  You can
  assume this function will be called with numbers between $1$ and
  $1$ Million. \hfill[1 Mark]

\item[(2)] Write a second function that takes an upper bound as
  an argument and calculates the steps for all numbers in the range from
  1 up to this bound (the bound including). It returns the maximum number of
  steps and the corresponding number that needs that many steps.  More
  precisely it returns a pair where the first component is the number
  of steps and the second is the corresponding number. \hfill\mbox{[1
    Mark]}

\item[(3)] Write a function that calculates \emph{hard
    numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377}
  in the Collatz series---these are the last odd numbers just before a
  power of two is reached.  For this, implement an
  \textit{is-power-of-two} function which tests whether a number is a
  power of two. The easiest way to implement this is by using the
  bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it
  holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why
  this is the case. The function \textit{is-hard} calculates whether
  $3n + 1$ is a power of two.  Finally the \textit{last-odd} function
  calculates the last odd number before a power of 2 in the Collatz
  series. This means for example when starting with 6 and also with 9,
  we receive 5 as the last odd number.  Surprisingly a lot of numbers
  have 5 as last-odd number. But for example for 113 we obtain 85,
  because of the series
  %
  \[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\]

  The \textit{last-odd} function will only be called with numbers that are not
  powers of 2 themselves.
\end{itemize}

\noindent
\textbf{Test Data:} Some test ranges and cases are:

\begin{itemize}
\item 1 to 10 where $9$ takes 19 steps 
\item 1 to 100 where $97$ takes 118 steps,
\item 1 to 1,000 where $871$ takes 178 steps,
\item 1 to 10,000 where $6,171$ takes 261 steps,
\item 1 to 100,000 where $77,031$ takes 350 steps, 
\item 1 to 1 Million where $837,799$ takes 524 steps
  %% runs out of stack space
  %% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps
\item 21 is the last odd number for 84
\item 341 is the last odd number for 201, 604, 605 and 8600
  
\end{itemize}
  





\end{document}


%%%%%%% Historical Stuff
\newpage

This part is about web-scraping and list-processing in Scala. It uses
online data about the per-capita alcohol consumption for each country
(per year?), and a file containing the data about the population size of
each country.  From this data you are supposed to estimate how many
litres of pure alcohol are consumed worldwide.\bigskip

\noindent
\textbf{Tasks (file alcohol.scala):}

\begin{itemize}
\item[(1)] Write a function that given an URL requests a
  comma-separated value (CSV) list.  We are interested in the list
  from the following URL

\begin{center}
  \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
\end{center}

\noindent Your function should take a string (the URL) as input, and
produce a list of strings as output, where each string is one line in
the corresponding CSV-list.  This list from the URL above should
contain 194 lines.\medskip

\noindent
Write another function that can read the file \texttt{population.csv}
from disk (the file is distributed with the assignment). This
function should take a string as argument, the file name, and again
return a list of strings corresponding to each entry in the
CSV-list. For \texttt{population.csv}, this list should contain 216
lines.\hfill[1 Mark]


\item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we
  need to extract the data that interests us.  From the header of the
  alcohol list, you can see there are 5 columns
  
  \begin{center}
    \begin{tabular}{l}
      \texttt{country (name),}\\
      \texttt{beer\_servings,}\\
      \texttt{spirit\_servings,}\\
      \texttt{wine\_servings,}\\
      \texttt{total\_litres\_of\_pure\_alcohol}
    \end{tabular}  
  \end{center}

  \noindent
  Write a function that extracts the data from the first column,
  the country name, and the data from the fifth column (converted into
  a \texttt{Double}). For this go through each line of the CSV-list
  (except the first line), use the \texttt{split(",")} function to
  divide each line into an array of 5 elements. Keep the data from the
  first and fifth element in these arrays.\medskip

  \noindent
  Write another function that processes the population size list. This
  is already of the form country name and population size.\footnote{Your
    friendly lecturer already did the messy processing for you from the
  Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
  strings according to the commas. However, this time generate a
  \texttt{Map} from country names to population sizes.\hfill[1 Mark]

\item[(3)] In (2) you generated the data about the alcohol consumption
  per-capita for each country, and also the population size for each
  country. From this generate next a sorted(!) list of the overall
  alcohol consumption for each country. The list should be sorted from
  highest alcohol consumption to lowest. The difficulty is that the
  data is scraped off from ``random'' sources on the Internet and
  annoyingly the spelling of some country names does not always agree in both
  lists. For example the alcohol list contains
  \texttt{Bosnia-Herzegovina}, while the population writes this country as
  \texttt{Bosnia and Herzegovina}. In your sorted
  overall list include only countries from the alcohol list, whose
  exact country name is also in the population size list. This means
  you can ignore countries like Bosnia-Herzegovina from the overall
  alcohol consumption. There are 177 countries where the names
  agree. The UK is ranked 10th on this list by
  consuming 671,976,864 Litres of pure alcohol each year.\medskip
  
  \noindent
  Finally, write another function that takes an integer, say
  \texttt{n}, as argument. You can assume this integer is between 0
  and 177 (the number of countries in the sorted list above).  The
  function should return a triple, where the first component is the
  sum of the alcohol consumption in all countries (on the list); the
  second component is the sum of the \texttt{n}-highest alcohol
  consumers on the list; and the third component is the percentage the
  \texttt{n}-highest alcohol consumers drink with respect to the
  the world consumption. You will see that according to our data, 164
  countries (out of 177) gobble up 100\% of the World alcohol
  consumption.\hfill\mbox{[1 Mark]}
\end{itemize}

\noindent
\textbf{Hints:} useful list functions: \texttt{.drop(n)},
\texttt{.take(n)} for dropping or taking some elements in a list,
\texttt{.getLines} for separating lines in a string;
\texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
elements in the pairs---the sorting is done from smallest to highest;
useful \texttt{Map} functions: \texttt{.toMap} converts a list of
pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
map is defined at that key, that is would produce a result when
called with this key; useful data functions: \texttt{Source.fromURL},
\texttt{Source.fromFile} for obtaining a webpage and reading a file.

\newpage






%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End:
author	Christian Urban <christian.urban@kcl.ac.uk>
	Tue, 13 Oct 2020 10:21:21 +0100
changeset 343	c8fcc0e0a57f
parent 336	25d9c3b2bc99
permissions	-rw-r--r--