diff -r 017f621f5835 -r 3ffe978a5664 cws/core_cw01.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/cws/core_cw01.tex Fri Nov 05 16:47:55 2021 +0000 @@ -0,0 +1,299 @@ +% !TEX program = xelatex +\documentclass{article} +\usepackage{../style} +\usepackage{disclaimer} +\usepackage{../langs} + + + +\begin{document} + +\section*{Core Part 1 (Scala, 3 Marks)} + +\mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\ +\mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\ +\mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip + + +\IMPORTANT{This part is about Scala. It is due on \cwSIX{} at 5pm and worth 3\%. +Any 1\% you achieve in the preliminary part counts as your ``weekly engagement''.} + +\noindent +Also note that the running time of each part will be restricted to a +maximum of 30 seconds on my laptop. + +\DISCLAIMER{} + +\subsection*{Reference Implementation} + +Like the C++ assignments, the Scala assignments will work like this: you +push your files to GitHub and receive (after sometimes a long delay) some +automated feedback. In the end we take a snapshot of the submitted files and +apply an automated marking script to them.\medskip + +\noindent +In addition, the Scala coursework comes with a reference implementation +in form of \texttt{jar}-files. This allows you to run any test cases on +your own computer. For example you can call Scala on the command line +with the option \texttt{-cp collatz.jar} and then query any function +from the template file. Say you want to find out what the functions +\texttt{collatz} and \texttt{collatz\_max} produce: for this you just +need to prefix them with the object name \texttt{C1}. If you want to +find out what these functions produce for the argument \texttt{6}, you +would type something like: + +\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small] +$ scala -cp collatz.jar + +scala> C1.collatz(6) +... +scala> C1.collatz_max(6) +... +\end{lstlisting}%$ + +\subsection*{Hints} + +\noindent +\textbf{For the Core Part 1:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful +functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt}, +\texttt{.toList} for conversions, you can use \texttt{List(...).max} for the +maximum of a list, \texttt{List(...).indexOf(...)} for the first index of +a value in a list.\bigskip + + + +\newpage +\subsection*{Core Part 1 (3 Marks, file collatz.scala)} + +This part is about function definitions and recursion. You are asked +to implement a Scala program that tests examples of the +\emph{$3n + 1$-conjecture}, also called \emph{Collatz + conjecture}.\video{https://www.youtube.com./watch?v=LqKpkdRRLZw} +This conjecture can be described as follows: Start with any positive +number $n$ greater than $0$: + +\begin{itemize} +\item If $n$ is even, divide it by $2$ to obtain $n / 2$. +\item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n + + 1$. +\item Repeat this process and you will always end up with $1$. +\end{itemize} + +\noindent +For example if you start with, say, $6$ and $9$, you obtain the +two \emph{Collatz series} +% +\[ +\begin{array}{@{}l@{\hspace{5mm}}l@{}} +6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\ +9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1 & \text{(= 19 steps)}\\ +\end{array} +\] + +\noindent +As you can see, the numbers go up and down like a roller-coaster, but +curiously they seem to always terminate in $1$. Nobody knows why. The +conjecture is that this will \emph{always} happen for every number +greater than 0.\footnote{While it is relatively easy to test this +conjecture with particular numbers, it is an interesting open problem to +\emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$). +Paul Erd\"o{}s, a famous mathematician you might have heard about, said +about this conjecture: ``Mathematics may not [yet] be ready for such +problems.'' and also offered a \$500 cash prize for its solution. +Jeffrey Lagarias, another mathematician, claimed that based only on +known information about this problem, ``this is an extraordinarily +difficult problem, completely out of reach of present day mathematics.'' +There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this +conjecture\here{https://xkcd.com/710/}). If you are able to solve this +conjecture, you will definitely get famous.}\bigskip + +\noindent +\textbf{Tasks} + +\begin{itemize} +\item[(1)] You are asked to implement a recursive function that + calculates the number of steps needed until a series ends + with $1$. In case of starting with $6$, it takes $8$ steps and in + case of starting with $9$, it takes $19$ (see above). We assume it + takes $0$ steps, if we start with $1$. In order to + try out this function with large numbers, you should use + \texttt{Long} as argument type, instead of \texttt{Int}. You can + assume this function will be called with numbers between $1$ and + $1$ Million. \hfill[1 Mark] + +\item[(2)] Write a second function that takes an upper bound as + an argument and calculates the steps for all numbers in the range from + 1 up to this bound (the bound including). It returns the maximum number of + steps and the corresponding number that needs that many steps. More + precisely it returns a pair where the first component is the number + of steps and the second is the corresponding number. \hfill\mbox{[1 + Mark]} + +\item[(3)] Write a function that calculates \emph{hard + numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377} + in the Collatz series---these are the last odd numbers just before a + power of two is reached. For this, implement an + \textit{is-power-of-two} function which tests whether a number is a + power of two. The easiest way to implement this is by using the + bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it + holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why + this is the case. + + The function \textit{is-hard} calculates whether + $3n + 1$ is a power of two. Finally the \textit{last-odd} function + calculates the last odd number before a power of 2 in the Collatz + series. This means for example when starting with 9, + we receive 5 as the last odd number. Surprisingly a lot of numbers + have 5 as last-odd number. But for example for 113 we obtain 85, + because of the series + % + \[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\] + + The \textit{last-odd} function will only be called with numbers that are not + powers of 2 themselves.\hfill\mbox{[1 Mark]} +\end{itemize} + +\noindent +\textbf{Test Data:} Some test ranges and cases are: + +\begin{itemize} +\item 1 to 10 where $9$ takes 19 steps +\item 1 to 100 where $97$ takes 118 steps, +\item 1 to 1,000 where $871$ takes 178 steps, +\item 1 to 10,000 where $6,171$ takes 261 steps, +\item 1 to 100,000 where $77,031$ takes 350 steps, +\item 1 to 1 Million where $837,799$ takes 524 steps + %% runs out of stack space + %% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps +\item 21 is the last odd number for 84 +\item 341 is the last odd number for 201, 604, 605 and 8600 + +\end{itemize} + + + + + + +\end{document} + + +%%%%%%% Historical Stuff +\newpage + +This part is about web-scraping and list-processing in Scala. It uses +online data about the per-capita alcohol consumption for each country +(per year?), and a file containing the data about the population size of +each country. From this data you are supposed to estimate how many +litres of pure alcohol are consumed worldwide.\bigskip + +\noindent +\textbf{Tasks (file alcohol.scala):} + +\begin{itemize} +\item[(1)] Write a function that given an URL requests a + comma-separated value (CSV) list. We are interested in the list + from the following URL + +\begin{center} + \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv} +\end{center} + +\noindent Your function should take a string (the URL) as input, and +produce a list of strings as output, where each string is one line in +the corresponding CSV-list. This list from the URL above should +contain 194 lines.\medskip + +\noindent +Write another function that can read the file \texttt{population.csv} +from disk (the file is distributed with the assignment). This +function should take a string as argument, the file name, and again +return a list of strings corresponding to each entry in the +CSV-list. For \texttt{population.csv}, this list should contain 216 +lines.\hfill[1 Mark] + + +\item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we + need to extract the data that interests us. From the header of the + alcohol list, you can see there are 5 columns + + \begin{center} + \begin{tabular}{l} + \texttt{country (name),}\\ + \texttt{beer\_servings,}\\ + \texttt{spirit\_servings,}\\ + \texttt{wine\_servings,}\\ + \texttt{total\_litres\_of\_pure\_alcohol} + \end{tabular} + \end{center} + + \noindent + Write a function that extracts the data from the first column, + the country name, and the data from the fifth column (converted into + a \texttt{Double}). For this go through each line of the CSV-list + (except the first line), use the \texttt{split(",")} function to + divide each line into an array of 5 elements. Keep the data from the + first and fifth element in these arrays.\medskip + + \noindent + Write another function that processes the population size list. This + is already of the form country name and population size.\footnote{Your + friendly lecturer already did the messy processing for you from the + Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the + strings according to the commas. However, this time generate a + \texttt{Map} from country names to population sizes.\hfill[1 Mark] + +\item[(3)] In (2) you generated the data about the alcohol consumption + per-capita for each country, and also the population size for each + country. From this generate next a sorted(!) list of the overall + alcohol consumption for each country. The list should be sorted from + highest alcohol consumption to lowest. The difficulty is that the + data is scraped off from ``random'' sources on the Internet and + annoyingly the spelling of some country names does not always agree in both + lists. For example the alcohol list contains + \texttt{Bosnia-Herzegovina}, while the population writes this country as + \texttt{Bosnia and Herzegovina}. In your sorted + overall list include only countries from the alcohol list, whose + exact country name is also in the population size list. This means + you can ignore countries like Bosnia-Herzegovina from the overall + alcohol consumption. There are 177 countries where the names + agree. The UK is ranked 10th on this list by + consuming 671,976,864 Litres of pure alcohol each year.\medskip + + \noindent + Finally, write another function that takes an integer, say + \texttt{n}, as argument. You can assume this integer is between 0 + and 177 (the number of countries in the sorted list above). The + function should return a triple, where the first component is the + sum of the alcohol consumption in all countries (on the list); the + second component is the sum of the \texttt{n}-highest alcohol + consumers on the list; and the third component is the percentage the + \texttt{n}-highest alcohol consumers drink with respect to the + the world consumption. You will see that according to our data, 164 + countries (out of 177) gobble up 100\% of the World alcohol + consumption.\hfill\mbox{[1 Mark]} +\end{itemize} + +\noindent +\textbf{Hints:} useful list functions: \texttt{.drop(n)}, +\texttt{.take(n)} for dropping or taking some elements in a list, +\texttt{.getLines} for separating lines in a string; +\texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second +elements in the pairs---the sorting is done from smallest to highest; +useful \texttt{Map} functions: \texttt{.toMap} converts a list of +pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the +map is defined at that key, that is would produce a result when +called with this key; useful data functions: \texttt{Source.fromURL}, +\texttt{Source.fromFile} for obtaining a webpage and reading a file. + +\newpage + + + + + + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% End: