\documentclass{article}+ −
\usepackage{../style}+ −
\usepackage{disclaimer}+ −
\usepackage{../langs}+ −
+ −
\begin{document}+ −
+ −
\section*{Assignment 6 (Scala)}+ −
+ −
\mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\+ −
\mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\+ −
\mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\bigskip+ −
+ −
+ −
\noindent+ −
This assignment is about Scala and worth 10\%. The first and second+ −
part are due on \cwSIX{} at 11pm, and the third part on \cwSIXa{}+ −
at 11pm. You are asked to implement two programs about list+ −
processing and recursion. The third part is more advanced and might+ −
include material you have not yet seen in the first lecture.+ −
\bigskip+ −
+ −
\IMPORTANT{}+ −
+ −
\noindent+ −
Also note that the running time of each part will be restricted to a+ −
maximum of 30 seconds on my laptop.+ −
+ −
\DISCLAIMER{}+ −
+ −
\subsection*{Reference Implementation}+ −
+ −
Like the C++ assignments, the Scala assignments will work like this: you+ −
push your files to GitHub and receive (after sometimes a long delay) some+ −
automated feedback. In the end we take a snapshot of the submitted files and+ −
apply an automated marking script to them.+ −
+ −
In addition, the Scala assignments come with a reference implementation+ −
in form of a \texttt{jar}-file. This allows you to run any test cases+ −
on your own computer. For example you can call Scala on the command+ −
line with the option \texttt{-cp collatz.jar} and then query any+ −
function from the template file. Say you want to find out what+ −
the functions \texttt{collatz} and \texttt{collatz\_max}+ −
produce: for this you just need to prefix them with the object name+ −
\texttt{CW6a} (and \texttt{CW6b} respectively for \texttt{drumb.jar}).+ −
If you want to find out what these functions produce for the argument+ −
\texttt{6}, you would type something like:+ −
+ −
\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]+ −
$ scala -cp collatz.jar+ −
+ −
scala> CW6a.collatz(6)+ −
...+ −
scala> CW6a.collatz_max(6)+ −
...+ −
\end{lstlisting}%$+ −
+ −
\subsection*{Hints}+ −
+ −
\noindent+ −
\textbf{For Part 1:} useful math operators: \texttt{\%} for modulo; useful+ −
functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},+ −
\texttt{.toList} for conversions, \texttt{List(...).max} for the+ −
maximum of a list, \texttt{List(...).indexOf(...)} for the first index of+ −
a value in a list.\bigskip+ −
+ −
\noindent+ −
\textbf{For Part 2 + 3:} useful string functions: \texttt{.startsWith(...)} for+ −
checking whether a string has a given prefix, \texttt{\_ ++ \_} for+ −
concatenating two strings; useful option functions: \texttt{.flatten}+ −
flattens a list of options such that it filters way all+ −
\texttt{None}'s, \texttt{Try(...).getOrElse ...} runs some code that+ −
might raise an exception---if yes, then a default value can be given;+ −
useful list functions: \texttt{.head} for obtaining the first element+ −
in a non-empty list, \texttt{.length} for the length of a+ −
list; \texttt{.filter(...)} for filtering out elements in a list; \texttt{.getLines.toList} for obtaining a list of lines from+ −
a file; \texttt{.split(",").toList} for splitting strings according to+ −
a comma.\bigskip+ −
+ −
\noindent+ −
Fortunately Scala supports operator overloading. But make sure you understand the difference between \texttt{100 / 3} and+ −
\texttt{100.0 / 3}!+ −
+ −
\newpage+ −
\subsection*{Part 1 (3 Marks, file collatz.scala)}+ −
+ −
This part is about recursion. You are asked to implement a Scala+ −
program that tests examples of the \emph{$3n + 1$-conjecture}, also+ −
called \emph{Collatz conjecture}. This conjecture can be described as+ −
follows: Start with any positive number $n$ greater than $0$:+ −
+ −
\begin{itemize}+ −
\item If $n$ is even, divide it by $2$ to obtain $n / 2$.+ −
\item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n ++ −
1$.+ −
\item Repeat this process and you will always end up with $1$.+ −
\end{itemize}+ −
+ −
\noindent+ −
For example if you start with $6$, or $9$, you obtain the+ −
series+ −
+ −
\[+ −
\begin{array}{@{}l@{\hspace{5mm}}l@{}}+ −
6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\+ −
9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1 & \text{(= 19 steps)}\\+ −
\end{array}+ −
\]+ −
+ −
\noindent+ −
As you can see, the numbers go up and down like a roller-coaster, but+ −
curiously they seem to always terminate in $1$. The conjecture is that+ −
this will \emph{always} happen for every number greater than+ −
0.\footnote{While it is relatively easy to test this conjecture with+ −
particular numbers, it is an interesting open problem to+ −
\emph{prove} that the conjecture is true for \emph{all} numbers ($>+ −
0$). Paul Erd\"o{}s, a famous mathematician you might have hard+ −
about, said about this conjecture: ``Mathematics may not be ready+ −
for such problems.'' and also offered a \$500 cash prize for its+ −
solution. Jeffrey Lagarias, another mathematician, claimed that+ −
based only on known information about this problem, ``this is an+ −
extraordinarily difficult problem, completely out of reach of+ −
present day mathematics.'' There is also a+ −
\href{https://xkcd.com/710/}{xkcd} cartoon about this conjecture+ −
(click \href{https://xkcd.com/710/}{here}). If you are able to solve+ −
this conjecture, you will definitely get famous.}\bigskip+ −
+ −
\noindent+ −
\textbf{Tasks}+ −
+ −
\begin{itemize}+ −
\item[(1)] You are asked to implement a recursive function that+ −
calculates the number of steps needed until a series ends+ −
with $1$. In case of starting with $6$, it takes $8$ steps and in+ −
case of starting with $9$, it takes $19$ (see above). In order to+ −
try out this function with large numbers, you should use+ −
\texttt{Long} as argument type, instead of \texttt{Int}. You can+ −
assume this function will be called with numbers between $1$ and+ −
$1$ Million. \hfill[2 Marks]+ −
+ −
\item[(2)] Write a second function that takes an upper bound as+ −
argument and calculates the steps for all numbers in the range from+ −
1 up to this bound (the bound including). It returns the maximum number of+ −
steps and the corresponding number that needs that many steps. More+ −
precisely it returns a pair where the first component is the number+ −
of steps and the second is the corresponding number. \hfill\mbox{[1+ −
Mark]}+ −
\end{itemize}+ −
+ −
\noindent+ −
\textbf{Test Data:} Some test ranges are:+ −
+ −
\begin{itemize}+ −
\item 1 to 10 where $9$ takes 19 steps + −
\item 1 to 100 where $97$ takes 118 steps,+ −
\item 1 to 1,000 where $871$ takes 178 steps,+ −
\item 1 to 10,000 where $6,171$ takes 261 steps,+ −
\item 1 to 100,000 where $77,031$ takes 350 steps, + −
\item 1 to 1 Million where $837,799$ takes 524 steps+ −
%% runs out of stack space+ −
%% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps+ −
\end{itemize}+ −
+ −
+ −
+ −
+ −
+ −
\subsection*{Part 2 (3 Marks, file drumb.scala)}+ −
+ −
A purely fictional character named Mr T.~Drumb inherited in 1978+ −
approximately 200 Million Dollar from his father. Mr Drumb prides+ −
himself to be a brilliant business man because nowadays it is+ −
estimated he is 3 Billion Dollar worth (one is not sure, of course,+ −
because Mr Drumb refuses to make his tax records public).+ −
+ −
Since the question about Mr Drumb's business acumen remains open,+ −
let's do a quick back-of-the-envelope calculation in Scala whether his+ −
claim has any merit. Let's suppose we are given \$100 in 1978 and we+ −
follow a really dumb investment strategy, namely:+ −
+ −
\begin{itemize}+ −
\item We blindly choose a portfolio of stocks, say some Blue-Chip stocks+ −
or some Real Estate stocks.+ −
\item If some of the stocks in our portfolio are traded in January of+ −
a year, we invest our money in equal amounts in each of these+ −
stocks. For example if we have \$100 and there are four stocks that+ −
are traded in our portfolio, we buy \$25 worth of stocks+ −
from each. Be careful to also test cases where you trade with 3 stocks, for example. + −
\item Next year in January, we look at how our stocks did, liquidate+ −
everything, and re-invest our (hopefully) increased money in again+ −
the stocks from our portfolio (there might be more stocks available,+ −
if companies from our portfolio got listed in that year, or less if+ −
some companies went bust or were de-listed).+ −
\item We do this for 40 years until January 2018 and check what would+ −
have become out of our \$100.+ −
\end{itemize}+ −
+ −
\noindent+ −
Until Yahoo was bought by Altaba last year, historical stock market+ −
data for such back-of-the-envelope calculations was freely available+ −
online. Unfortunately nowadays this kind of data is more difficult to+ −
obtain, unless you are prepared to pay extortionate prices or be+ −
severely rate-limited. Therefore this assignment comes with a number+ −
of files containing CSV-lists with the historical stock prices for the+ −
companies in our portfolios. Use these files for the following+ −
tasks.\bigskip+ −
+ −
\newpage+ −
\noindent+ −
\textbf{Tasks}+ −
+ −
\begin{itemize}+ −
\item[(1)] Write a function \texttt{get\_january\_data} that takes a+ −
stock symbol and a year as arguments. The function reads the+ −
corresponding CSV-file and returns the list of strings that start+ −
with the given year (each line in the CSV-list is of the form+ −
\texttt{someyear-01-someday,someprice}).\hfill[1 Mark]+ −
+ −
\item[(2)] Write a function \texttt{get\_first\_price} that takes+ −
again a stock symbol and a year as arguments. It should return the+ −
first January price for the stock symbol in the given year. For this+ −
it uses the list of strings generated by+ −
\texttt{get\_january\_data}. A problem is that normally a stock+ −
exchange is not open on 1st of January, but depending on the day of+ −
the week on a later day (maybe 3rd or 4th). The easiest way to solve+ −
this problem is to obtain the whole January data for a stock symbol+ −
and then select the earliest, or first, entry in this list. The+ −
stock price of this entry should be converted into a double. Such a+ −
price might not exist, in case the company does not exist in the given+ −
year. For example, if you query for Google in January of 1980, then+ −
clearly Google did not exist yet. Therefore you are asked to+ −
return a trade price with type \texttt{Option[Double]}\ldots\texttt{None}+ −
will be the value for when no price exists; \texttt{Some} if there is a+ −
price.\hfill[1 Mark]+ −
+ −
\item[(3)] Write a function \texttt{get\_prices} that takes a+ −
portfolio (a list of stock symbols), a years range and gets all the+ −
first trading prices for each year in the range. You should organise+ −
this as a list of lists of \texttt{Option[Double]}'s. The inner+ −
lists are for all stock symbols from the portfolio and the outer+ −
list for the years. For example for Google and Apple in years 2010+ −
(first line), 2011 (second line) and 2012 (third line) you obtain:+ −
+ −
\begin{verbatim}+ −
List(List(Some(311.349976), Some(20.544939)), + −
List(Some(300.222351), Some(31.638695)),+ −
List(Some(330.555054), Some(39.478039)))+ −
\end{verbatim}\hfill[1 Mark]+ −
\end{itemize}+ −
+ −
\subsection*{Advanced Part 3 (4 Marks, continue in file drumb.scala)}+ −
+ −
\noindent+ −
\textbf{Tasks}+ −
+ −
\begin{itemize} + −
\item[(4)] Write a function that calculates the \emph{change factor} (delta)+ −
for how a stock price has changed from one year to the next. This is+ −
only well-defined, if the corresponding company has been traded in both+ −
years. In this case you can calculate+ −
+ −
\[+ −
\frac{price_{new} - price_{old}}{price_{old}}+ −
\]+ −
+ −
If the change factor is defined, you should return it+ −
as \texttt{Some(change\_factor)}; if not, you should return+ −
\texttt{None}.\mbox{}\hfill\mbox{[1 Mark]}+ −
+ −
\item[(5)] Write a function that calculates all change factors+ −
(deltas) for the prices we obtained under Part 2. For the running+ −
example of Google and Apple for the years 2010 to 2012 you should+ −
obtain 4 change factors:+ −
+ −
\begin{verbatim} + −
List(List(Some(-0.03573992567129673), Some(0.539975124774038))+ −
List(Some(0.10103412653643493), Some(0.24777709700099845)))+ −
\end{verbatim}+ −
+ −
That means Google did a bit badly in 2010, while Apple did very well.+ −
Both did OK in 2011. Make sure you handle the cases where a company is+ −
not listed in a year. In such cases the change factor should be \texttt{None}+ −
(see~(4)).+ −
\mbox{}\hfill\mbox{[1 Mark]}+ −
+ −
\item[(6)] Write a function that calculates the ``yield'', or+ −
balance, for one year for our portfolio. This function takes the+ −
change factors, the starting balance and the year as arguments. If+ −
no company from our portfolio existed in that year, the balance is+ −
unchanged. Otherwise we invest in each existing company an equal+ −
amount of our balance. Using the change factors computed under Task+ −
2, calculate the new balance. Say we had \$100 in 2010, we would have+ −
received in our running example involving Google and Apple:+ −
+ −
\begin{verbatim}+ −
$50 * -0.03573992567129673 + $50 * 0.539975124774038+ −
= $25.21175995513706+ −
\end{verbatim}+ −
+ −
as profit for that year, and our new balance for 2011 is \$125 when+ −
converted to a \texttt{Long}.\mbox{}\hfill\mbox{[1 Mark]}+ −
+ −
\item[(7)] Write a function that calculates the overall balance+ −
for a range of years where each year the yearly profit is compounded to+ −
the new balances and then re-invested into our portfolio.+ −
For this use the function and results generated under (6).\\+ −
\mbox{}\hfill\mbox{[1 Mark]}+ −
\end{itemize}\medskip + −
+ −
+ −
+ −
\noindent+ −
\textbf{Test Data:} File \texttt{drumb.scala} contains two portfolios+ −
collected from the S\&P 500, one for blue-chip companies, including+ −
Facebook, Amazon and Baidu; and another for listed real-estate+ −
companies, whose names I have never heard of. Following the dumb+ −
investment strategy from 1978 until 2018 would have turned a starting+ −
balance of \$100 into roughly \$101,589 for real estate and a whopping+ −
\$1,587,528 for blue chips. Note when comparing these results with your+ −
own calculations: there might be some small rounding errors, which+ −
when compounded lead to moderately different values.\bigskip+ −
+ −
+ −
\noindent+ −
\textbf{Moral:} Reflecting on our assumptions, we are over-estimating+ −
our yield in many ways: first, who can know in 1978 about what will+ −
turn out to be a blue chip company. Also, since the portfolios are+ −
chosen from the current S\&P 500, they do not include the myriad+ −
of companies that went bust or were de-listed over the years.+ −
So where does this leave our fictional character Mr T.~Drumb? Well, given+ −
his inheritance, a really dumb investment strategy would have done+ −
equally well, if not much better.\medskip+ −
+ −
\end{document}+ −
+ −
\newpage+ −
+ −
This part is about web-scraping and list-processing in Scala. It uses+ −
online data about the per-capita alcohol consumption for each country+ −
(per year?), and a file containing the data about the population size of+ −
each country. From this data you are supposed to estimate how many+ −
litres of pure alcohol are consumed worldwide.\bigskip+ −
+ −
\noindent+ −
\textbf{Tasks (file alcohol.scala):}+ −
+ −
\begin{itemize}+ −
\item[(1)] Write a function that given an URL requests a+ −
comma-separated value (CSV) list. We are interested in the list+ −
from the following URL+ −
+ −
\begin{center}+ −
\url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}+ −
\end{center}+ −
+ −
\noindent Your function should take a string (the URL) as input, and+ −
produce a list of strings as output, where each string is one line in+ −
the corresponding CSV-list. This list from the URL above should+ −
contain 194 lines.\medskip+ −
+ −
\noindent+ −
Write another function that can read the file \texttt{population.csv}+ −
from disk (the file is distributed with the assignment). This+ −
function should take a string as argument, the file name, and again+ −
return a list of strings corresponding to each entry in the+ −
CSV-list. For \texttt{population.csv}, this list should contain 216+ −
lines.\hfill[1 Mark]+ −
+ −
+ −
\item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we+ −
need to extract the data that interests us. From the header of the+ −
alcohol list, you can see there are 5 columns+ −
+ −
\begin{center}+ −
\begin{tabular}{l}+ −
\texttt{country (name),}\\+ −
\texttt{beer\_servings,}\\+ −
\texttt{spirit\_servings,}\\+ −
\texttt{wine\_servings,}\\+ −
\texttt{total\_litres\_of\_pure\_alcohol}+ −
\end{tabular} + −
\end{center}+ −
+ −
\noindent+ −
Write a function that extracts the data from the first column,+ −
the country name, and the data from the fifth column (converted into+ −
a \texttt{Double}). For this go through each line of the CSV-list+ −
(except the first line), use the \texttt{split(",")} function to+ −
divide each line into an array of 5 elements. Keep the data from the+ −
first and fifth element in these arrays.\medskip+ −
+ −
\noindent+ −
Write another function that processes the population size list. This+ −
is already of the form country name and population size.\footnote{Your+ −
friendly lecturer already did the messy processing for you from the+ −
Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the+ −
strings according to the commas. However, this time generate a+ −
\texttt{Map} from country names to population sizes.\hfill[1 Mark]+ −
+ −
\item[(3)] In (2) you generated the data about the alcohol consumption+ −
per capita for each country, and also the population size for each+ −
country. From this generate next a sorted(!) list of the overall+ −
alcohol consumption for each country. The list should be sorted from+ −
highest alcohol consumption to lowest. The difficulty is that the+ −
data is scraped off from ``random'' sources on the Internet and+ −
annoyingly the spelling of some country names does not always agree in both+ −
lists. For example the alcohol list contains+ −
\texttt{Bosnia-Herzegovina}, while the population writes this country as+ −
\texttt{Bosnia and Herzegovina}. In your sorted+ −
overall list include only countries from the alcohol list, whose+ −
exact country name is also in the population size list. This means+ −
you can ignore countries like Bosnia-Herzegovina from the overall+ −
alcohol consumption. There are 177 countries where the names+ −
agree. The UK is ranked 10th on this list by+ −
consuming 671,976,864 Litres of pure alcohol each year.\medskip+ −
+ −
\noindent+ −
Finally, write another function that takes an integer, say+ −
\texttt{n}, as argument. You can assume this integer is between 0+ −
and 177 (the number of countries in the sorted list above). The+ −
function should return a triple, where the first component is the+ −
sum of the alcohol consumption in all countries (on the list); the+ −
second component is the sum of the \texttt{n}-highest alcohol+ −
consumers on the list; and the third component is the percentage the+ −
\texttt{n}-highest alcohol consumers drink with respect to the+ −
the world consumption. You will see that according to our data, 164+ −
countries (out of 177) gobble up 100\% of the World alcohol+ −
consumption.\hfill\mbox{[1 Mark]}+ −
\end{itemize}+ −
+ −
\noindent+ −
\textbf{Hints:} useful list functions: \texttt{.drop(n)},+ −
\texttt{.take(n)} for dropping or taking some elements in a list,+ −
\texttt{.getLines} for separating lines in a string;+ −
\texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second+ −
elements in the pairs---the sorting is done from smallest to highest;+ −
useful \texttt{Map} functions: \texttt{.toMap} converts a list of+ −
pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the+ −
map is defined at that key, that is would produce a result when+ −
called with this key; useful data functions: \texttt{Source.fromURL},+ −
\texttt{Source.fromFile} for obtaining a webpage and reading a file.+ −
+ −
\newpage+ −
+ −
+ −
+ −
+ −
+ −
+ −
%%% Local Variables: + −
%%% mode: latex+ −
%%% TeX-master: t+ −
%%% End: + −