pep-material: comparison cws/pre

equal deleted inserted replaced

-:c8fcc0e0a57f
+:a2ac7e3fa330
+% !TEX program = xelatex
+\documentclass{article}
+\usepackage{../style}
+\usepackage{disclaimer}
+\usepackage{../langs}
+\begin{document}
+\section*{Preliminary Part 6 (Scala, 3 Marks)}
+\mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\
+\mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\
+\mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip
+\IMPORTANT{This part is about Scala. It is due on \cwSIX{} at 5pm and worth 3\%.}
+\noindent
+Also note that the running time of each part will be restricted to a
+maximum of 30 seconds on my laptop.
+\DISCLAIMER{}
+\subsection*{Reference Implementation}
+Like the C++ assignments, the Scala assignments will work like this: you
+push your files to GitHub and receive (after sometimes a long delay) some
+automated feedback. In the end we take a snapshot of the submitted files and
+apply an automated marking script to them.\medskip
+\noindent
+In addition, the Scala coursework comes with a reference implementation
+in form of \texttt{jar}-files. This allows you to run any test cases on
+your own computer. For example you can call Scala on the command line
+with the option \texttt{-cp collatz.jar} and then query any function
+from the template file. Say you want to find out what the functions
+\texttt{collatz} and \texttt{collatz\_max} produce: for this you just
+need to prefix them with the object name \texttt{CW6a}. If you want to
+find out what these functions produce for the argument \texttt{6}, you
+would type something like:
+\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
+$ scala -cp collatz.jar
+scala> CW6a.collatz(6)
+...
+scala> CW6a.collatz_max(6)
+...
+\end{lstlisting}%$
+\subsection*{Hints}
+\noindent
+\textbf{For Preliminary Part:} useful math operators: \texttt{\%} for modulo, \texttt{\&} for bit-wise and; useful
+functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
+\texttt{.toList} for conversions, you can use \texttt{List(...).max} for the
+maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
+a value in a list.\bigskip
+\newpage
+\subsection*{Preliminary Part (3 Marks, file collatz.scala)}
+This part is about function definitions and recursion. You are asked
+to implement a Scala program that tests examples of the
+\emph{$3n + 1$-conjecture}, also called \emph{Collatz
+conjecture}.\video{https://www.youtube.com./watch?v=LqKpkdRRLZw}
+This conjecture can be described as follows: Start with any positive
+number $n$ greater than $0$:
+\begin{itemize}
+\item If $n$ is even, divide it by $2$ to obtain $n / 2$.
+\item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n +
+1$.
+\item Repeat this process and you will always end up with $1$.
+\end{itemize}
+\noindent
+For example if you start with, say, $6$ and $9$, you obtain the
+two \emph{Collatz series}
+%
+\[
+\begin{array}{@{}l@{\hspace{5mm}}l@{}}
+6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\
+9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1  & \text{(= 19 steps)}\\
+\end{array}
+\]
+\noindent
+As you can see, the numbers go up and down like a roller-coaster, but
+curiously they seem to always terminate in $1$. Nobody knows why. The
+conjecture is that this will \emph{always} happen for every number
+greater than 0.\footnote{While it is relatively easy to test this
+conjecture with particular numbers, it is an interesting open problem to
+\emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$).
+Paul Erd\"o{}s, a famous mathematician you might have heard about, said
+about this conjecture: ``Mathematics may not [yet] be ready for such
+problems.'' and also offered a \$500 cash prize for its solution.
+Jeffrey Lagarias, another mathematician, claimed that based only on
+known information about this problem, ``this is an extraordinarily
+difficult problem, completely out of reach of present day mathematics.''
+There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this
+conjecture\here{https://xkcd.com/710/}). If you are able to solve this
+conjecture, you will definitely get famous.}\bigskip
+\noindent
+\textbf{Tasks}
+\begin{itemize}
+\item[(1)] You are asked to implement a recursive function that
+calculates the number of steps needed until a series ends
+with $1$. In case of starting with $6$, it takes $8$ steps and in
+case of starting with $9$, it takes $19$ (see above). We assume it
+takes $0$ steps, if we start with $1$. In order to
+try out this function with large numbers, you should use
+\texttt{Long} as argument type, instead of \texttt{Int}.  You can
+assume this function will be called with numbers between $1$ and
+$1$ Million. \hfill[1 Mark]
+\item[(2)] Write a second function that takes an upper bound as
+an argument and calculates the steps for all numbers in the range from
+1 up to this bound (the bound including). It returns the maximum number of
+steps and the corresponding number that needs that many steps.  More
+precisely it returns a pair where the first component is the number
+of steps and the second is the corresponding number. \hfill\mbox{[1
+Mark]}
+\item[(3)] Write a function that calculates \emph{hard
+numbers} \here{https://medium.com/cantors-paradise/the-collatz-conjecture-some-shocking-results-from-180-000-iterations-7fea130d0377}
+in the Collatz series---these are the last odd numbers just before a
+power of two is reached.  For this, implement an
+\textit{is-power-of-two} function which tests whether a number is a
+power of two. The easiest way to implement this is by using the
+bit-operator $\&$ of Scala. For a power of two, say $n$ with $n > 0$, it
+holds that $n \;\&\; (n - 1)$ is equal to zero. I let you think why
+this is the case.
+The function \textit{is-hard} calculates whether
+$3n + 1$ is a power of two.  Finally the \textit{last-odd} function
+calculates the last odd number before a power of 2 in the Collatz
+series. This means for example when starting with 9,
+we receive 5 as the last odd number.  Surprisingly a lot of numbers
+have 5 as last-odd number. But for example for 113 we obtain 85,
+because of the series
+%
+\[113, 340, 170, \,\fbox{85}\,, 256, 128, 64, 32, 16, 8, 4, 2, 1\]
+The \textit{last-odd} function will only be called with numbers that are not
+powers of 2 themselves.
+\end{itemize}
+\noindent
+\textbf{Test Data:} Some test ranges and cases are:
+\begin{itemize}
+\item 1 to 10 where $9$ takes 19 steps
+\item 1 to 100 where $97$ takes 118 steps,
+\item 1 to 1,000 where $871$ takes 178 steps,
+\item 1 to 10,000 where $6,171$ takes 261 steps,
+\item 1 to 100,000 where $77,031$ takes 350 steps,
+\item 1 to 1 Million where $837,799$ takes 524 steps
+%% runs out of stack space
+%% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps
+\item 21 is the last odd number for 84
+\item 341 is the last odd number for 201, 604, 605 and 8600
+\end{itemize}
+\end{document}
+%%%%%%% Historical Stuff
+\newpage
+This part is about web-scraping and list-processing in Scala. It uses
+online data about the per-capita alcohol consumption for each country
+(per year?), and a file containing the data about the population size of
+each country.  From this data you are supposed to estimate how many
+litres of pure alcohol are consumed worldwide.\bigskip
+\noindent
+\textbf{Tasks (file alcohol.scala):}
+\begin{itemize}
+\item[(1)] Write a function that given an URL requests a
+comma-separated value (CSV) list.  We are interested in the list
+from the following URL
+\begin{center}
+\url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
+\end{center}
+\noindent Your function should take a string (the URL) as input, and
+produce a list of strings as output, where each string is one line in
+the corresponding CSV-list.  This list from the URL above should
+contain 194 lines.\medskip
+\noindent
+Write another function that can read the file \texttt{population.csv}
+from disk (the file is distributed with the assignment). This
+function should take a string as argument, the file name, and again
+return a list of strings corresponding to each entry in the
+CSV-list. For \texttt{population.csv}, this list should contain 216
+lines.\hfill[1 Mark]
+\item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we
+need to extract the data that interests us.  From the header of the
+alcohol list, you can see there are 5 columns
+\begin{center}
+\begin{tabular}{l}
+\texttt{country (name),}\\
+\texttt{beer\_servings,}\\
+\texttt{spirit\_servings,}\\
+\texttt{wine\_servings,}\\
+\texttt{total\_litres\_of\_pure\_alcohol}
+\end{tabular}
+\end{center}
+\noindent
+Write a function that extracts the data from the first column,
+the country name, and the data from the fifth column (converted into
+a \texttt{Double}). For this go through each line of the CSV-list
+(except the first line), use the \texttt{split(",")} function to
+divide each line into an array of 5 elements. Keep the data from the
+first and fifth element in these arrays.\medskip
+\noindent
+Write another function that processes the population size list. This
+is already of the form country name and population size.\footnote{Your
+friendly lecturer already did the messy processing for you from the
+Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
+strings according to the commas. However, this time generate a
+\texttt{Map} from country names to population sizes.\hfill[1 Mark]
+\item[(3)] In (2) you generated the data about the alcohol consumption
+per-capita for each country, and also the population size for each
+country. From this generate next a sorted(!) list of the overall
+alcohol consumption for each country. The list should be sorted from
+highest alcohol consumption to lowest. The difficulty is that the
+data is scraped off from ``random'' sources on the Internet and
+annoyingly the spelling of some country names does not always agree in both
+lists. For example the alcohol list contains
+\texttt{Bosnia-Herzegovina}, while the population writes this country as
+\texttt{Bosnia and Herzegovina}. In your sorted
+overall list include only countries from the alcohol list, whose
+exact country name is also in the population size list. This means
+you can ignore countries like Bosnia-Herzegovina from the overall
+alcohol consumption. There are 177 countries where the names
+agree. The UK is ranked 10th on this list by
+consuming 671,976,864 Litres of pure alcohol each year.\medskip
+\noindent
+Finally, write another function that takes an integer, say
+\texttt{n}, as argument. You can assume this integer is between 0
+and 177 (the number of countries in the sorted list above).  The
+function should return a triple, where the first component is the
+sum of the alcohol consumption in all countries (on the list); the
+second component is the sum of the \texttt{n}-highest alcohol
+consumers on the list; and the third component is the percentage the
+\texttt{n}-highest alcohol consumers drink with respect to the
+the world consumption. You will see that according to our data, 164
+countries (out of 177) gobble up 100\% of the World alcohol
+consumption.\hfill\mbox{[1 Mark]}
+\end{itemize}
+\noindent
+\textbf{Hints:} useful list functions: \texttt{.drop(n)},
+\texttt{.take(n)} for dropping or taking some elements in a list,
+\texttt{.getLines} for separating lines in a string;
+\texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
+elements in the pairs---the sorting is done from smallest to highest;
+useful \texttt{Map} functions: \texttt{.toMap} converts a list of
+pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
+map is defined at that key, that is would produce a result when
+called with this key; useful data functions: \texttt{Source.fromURL},
+\texttt{Source.fromFile} for obtaining a webpage and reading a file.
+\newpage
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: t
+%%% End:

changeset 344	a2ac7e3fa330
parent 343	c8fcc0e0a57f
child 345	40657f9a4e4a