\documentclass{article}\usepackage{../style}\usepackage{disclaimer}%%\usepackage{../langs}\begin{document}\section*{Coursework 6 (Scala)}This coursework is about Scala and is worth 10\%. The first and secondpart are due on 16 November at 11pm, and the third part on 21 Decemberat 11pm. You are asked to implement three programs about listprocessing and recursion. The third part is more advanced and mightinclude material you have not yet seen in the first lecture.\bigskip\IMPORTANT{}\noindentAlso note that the running time of each part will be restricted to amaximum of 360 seconds on my laptop.\DISCLAIMER{}\subsection*{Part 1 (3 Marks)}This part is about recursion. You are asked to implement a Scalaprogram that tests examples of the \emph{$3n + 1$-conjecture}, alsocalled \emph{Collatz conjecture}. This conjecture can be described asfollows: Start with any positive number $n$ greater than $0$:\begin{itemize}\item If $n$ is even, divide it by $2$ to obtain $n / 2$.\item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n + 1$.\item Repeat this process and you will always end up with $1$.\end{itemize}\noindentFor example if you start with $6$, respectively $9$, you obtain theseries\[\begin{array}{@{}l@{\hspace{5mm}}l@{}}6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 9 steps)}\\9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1 & \text{(= 20 steps)}\\\end{array}\]\noindentAs you can see, the numbers go up and down like a roller-coaster, butcuriously they seem to always terminate in $1$. The conjecture is thatthis will \emph{always} happen for every number greater than0.\footnote{While it is relatively easy to test this conjecture with particular numbers, it is an interesting open problem to \emph{prove} that the conjecture is true for \emph{all} numbers ($> 0$). Paul Erd\"o{}s, a famous mathematician you might have hard about, said about this conjecture: ``Mathematics may not be ready for such problems.'' and also offered a \$500 cash prize for its solution. Jeffrey Lagarias, another mathematician, claimed that based only on known information about this problem, ``this is an extraordinarily difficult problem, completely out of reach of present day mathematics.'' There is also a \href{https://xkcd.com/710/}{xkcd} cartoon about this conjecture (click \href{https://xkcd.com/710/}{here}). If you are able to solve this conjecture, you will definitely get famous.}\bigskip\noindent\textbf{Tasks (file collatz.scala):}\begin{itemize}\item[(1)] You are asked to implement a recursive function that calculates the number of steps needed until a series ends with $1$. In case of starting with $6$, it takes $9$ steps and in case of starting with $9$, it takes $20$ (see above). In order to try out this function with large numbers, you should use \texttt{Long} as argument type, instead of \texttt{Int}. You can assume this function will be called with numbers between $1$ and $1$ Million. \hfill[2 Marks]\item[(2)] Write a second function that takes an upper bound as argument and calculates the steps for all numbers in the range from 1 up to this bound. It returns the maximum number of steps and the corresponding number that needs that many steps. More precisely it returns a pair where the first component is the number of steps and the second is the corresponding number. \hfill\mbox{[1 Mark]}\end{itemize}\noindent\textbf{Test Data:} Some test ranges are:\begin{itemize}\item 1 to 10 where $9$ takes 20 steps \item 1 to 100 where $97$ takes 119 steps,\item 1 to 1,000 where $871$ takes 179 steps,\item 1 to 10,000 where $6,171$ takes 262 steps,\item 1 to 100,000 where $77,031$ takes 351 steps, \item 1 to 1 Million where $837,799$ takes 525 steps %%\item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 686 steps\end{itemize}\noindent\textbf{Hints:} useful math operators: \texttt{\%} for modulo; usefulfunctions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},\texttt{.toList} for conversions, \texttt{List(...).max} for themaximum of a list, \texttt{List(...).indexOf(...)} for the first index ofa value in a list.\subsection*{Part 2 (3 Marks)}This part is about web-scraping and list-processing in Scala. It usesonline data about the per-capita alcohol consumption for each country(per year?), and a file containing the data about the population size ofeach country. From this data you are supposed to estimate how manylitres of pure alcohol are consumed worldwide.\bigskip\noindent\textbf{Tasks (file alcohol.scala):}\begin{itemize}\item[(1)] Write a function that given an URL requests a comma-separated value (CSV) list. We are interested in the list from the following URL\begin{center} \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}\end{center}\noindent Your function should take a string (the URL) as input, andproduce a list of strings as output, where each string is one line inthe corresponding CSV-list. This list from the URL above shouldcontain 194 lines.\medskip\noindentWrite another function that can read the file \texttt{population.csv}from disk (the file is distributed with the coursework). Thisfunction should take a string as argument, the file name, and againreturn a list of strings corresponding to each entry in theCSV-list. For \texttt{population.csv}, this list should contain 216lines.\hfill[1 Mark]\item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we need to extract the data that interests us. From the header of the alcohol list, you can see there are 5 columns \begin{center} \begin{tabular}{l} \texttt{country (name),}\\ \texttt{beer\_servings,}\\ \texttt{spirit\_servings,}\\ \texttt{wine\_servings,}\\ \texttt{total\_litres\_of\_pure\_alcohol} \end{tabular} \end{center} \noindent Write a function that extracts the data from the first column, the country name, and the data from the fifth column (converted into a \texttt{Double}). For this go through each line of the CSV-list (except the first line), use the \texttt{split(",")} function to divide each line into an array of 5 elements. Keep the data from the first and fifth element in these arrays.\medskip \noindent Write another function that processes the population size list. This is already of the form country name and population size.\footnote{Your friendly lecturer already did the messy processing for you from the Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the strings according to the commas. However, this time generate a \texttt{Map} from country names to population sizes.\hfill[1 Mark]\item[(3)] In (2) you generated the data about the alcohol consumption per capita for each country, and also the population size for each country. From this generate next a sorted(!) list of the overall alcohol consumption for each country. The list should be sorted from highest alcohol consumption to lowest. The difficulty is that the data is scraped off from ``random'' sources on the Internet and annoyingly the spelling of some country names does not always agree in both lists. For example the alcohol list contains \texttt{Bosnia-Herzegovina}, while the population writes this country as \texttt{Bosnia and Herzegovina}. In your sorted overall list include only countries from the alcohol list, whose exact country name is also in the population size list. This means you can ignore countries like Bosnia-Herzegovina from the overall alcohol consumption. There are 177 countries where the names agree. The UK is ranked 10th on this list by consuming 671,976,864 Litres of pure alcohol each year.\medskip \noindent Finally, write another function that takes an integer, say \texttt{n}, as argument. You can assume this integer is between 0 and 177 (the number of countries in the sorted list above). The function should return a triple, where the first component is the sum of the alcohol consumption in all countries (on the list); the second component is the sum of the \texttt{n}-highest alcohol consumers on the list; and the third component is the percentage the \texttt{n}-highest alcohol consumers drink with respect to the the world consumption. You will see that according to our data, 164 countries (out of 177) gobble up 100\% of the World alcohol consumption.\hfill\mbox{[1 Mark]}\end{itemize}\noindent\textbf{Hints:} useful list functions: \texttt{.drop(n)},\texttt{.take(n)} for dropping or taking some elements in a list,\texttt{.getLines} for separating lines in a string;\texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the secondelements in the pairs---the sorting is done from smallest to highest;useful \texttt{Map} functions: \texttt{.toMap} converts a list ofpairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether themap is defined at that key, that is would produce a result whencalled with this key; useful data functions: \texttt{Source.fromURL},\texttt{Source.fromFile} for obtaining a webpage and reading a file.\newpage\subsection*{Advanced Part 3 (4 Marks)}A purely fictional character named Mr T.~Drumb inherited in 1978approximately 200 Million Dollar from his father. Mr Drumb prideshimself to be a brilliant business man because nowadays it isestimated he is 3 Billion Dollar worth (one is not sure, of course,because Mr Drumb refuses to make his tax records public).Since the question about Mr Drumb's business acumen remains open,let's do a quick back-of-the-envelope calculation in Scala whether hisclaim has any merit. Let's suppose we are given \$100 in 1978 and wefollow a really dumb investment strategy, namely:\begin{itemize}\item We blindly choose a portfolio of stocks, say some Blue-Chip stocks or some Real Estate stocks.\item If some of the stocks in our portfolio are traded in January of a year, we invest our money in equal amounts in each of these stocks. For example if we have \$100 and there are four stocks that are traded in our portfolio, we buy \$25 worth of stocks from each.\item Next year in January, we look how our stocks did, liquidate everything, and re-invest our (hopefully) increased money in again the stocks from our portfolio (there might be more stocks available, if companies from our portfolio got listed in that year, or less if some companies went bust or were de-listed).\item We do this for 39 years until January 2017 and check what would have become out of our \$100.\end{itemize}\noindentUntil Yahoo was bought by Altaba this summer, historical stock marketdata for such back-of-the-envelope calculations was freely availableonline. Unfortuantely nowadays this kind of data is difficult toobtain, unless you are prepared to pay extortionate prices or beseverely rate-limited. Therefore this coursework comes with a numberof files containing CSV-lists with the historical stock prices for thecompanies in our portfolios. Use these files for the followingtasks.\bigskip\noindent\textbf{Tasks (file drumb.scala):}\begin{itemize}\item[(1.a)] Write a function \texttt{get\_january\_data} that takes a stock symbol and a year as arguments. The function reads the corresponding CSV-file and returns the list of strings that start with the given year (each line in the CSV-list is of the form \texttt{year-01-someday,someprice}).\item[(1.b)] Write a function \texttt{get\_first\_price} that takes again a stock symbol and a year as arguments. It should return the first January price for the stock symbol in given the year. For this it uses the list of strings generated by \texttt{get\_january\_data}. A problem is that normally a stock exchange is not open on 1st of January, but depending on the day of the week on a later day (maybe 3rd or 4th). The easiest way to solve this problem is to obtain the whole January data for a stock symbol and then select the earliest, or first, entry in this list. The stock price of this entry should be converted into a double. Such a price might not exist, in case the company does not exist in the given year. For example, if you query for Google in January of 1980, then clearly Google did not exist yet. Therefore you are asked to return a trade price as \texttt{Option[Double]}\ldots\texttt{None} will be the value for when no price exists.\item[(1.c)] Write a function \texttt{get\_prices} that takes a portfolio (a list of stock symbols), a years range and gets all the first trading prices for each year in the range. You should organise this as a list of lists of \texttt{Option[Double]}'s. The inner lists are for all stock symbols from the portfolio and the outer list for the years. For example for Google and Apple in years 2010 (first line), 2011 (second line) and 2012 (third line) you obtain:\begin{verbatim} List(List(Some(311.349976), Some(27.505054)), List(Some(300.222351), Some(42.357094)), List(Some(330.555054), Some(52.852215)))\end{verbatim}\hfill[2 Marks]\item[(2.a)] Write a function that calculates the \emph{change factor} (delta) for how a stock price has changed from one year to the next. This is only well-defined, if the corresponding company has been traded in both years. In this case you can calculate \[ \frac{price_{new} - price_{old}}{price_{old}} \] If the change factor is defined, you should return it as \texttt{Some(change factor)}; if not, you should return \texttt{None}.\item[(2.b)] Write a function that calculates all change factors (deltas) for the prices we obtained under Task 1. For the running example of Google and Apple for the years 2010 to 2012 you should obtain 4 change factors:\begin{verbatim} List(List(Some(-0.03573992567129673), Some(0.5399749442411563)) List(Some(0.10103412653643493), Some(0.2477771728154912)))\end{verbatim} That means Google did a bit badly in 2010, while Apple did very well. Both did OK in 2011. Make sure you handle the cases where a company is not listed in a year. In such cases the change factor should be \texttt{None} (see 2.a).\\ \mbox{}\hfill\mbox{[1 Mark]}\item[(3.a)] Write a function that calculates the ``yield'', or balance, for one year for our portfolio. This function takes the change factors, the starting balance and the year as arguments. If no company from our portfolio existed in that year, the balance is unchanged. Otherwise we invest in each existing company an equal amount of our balance. Using the change factors computed under Task 2, calculate the new balance. Say we had \$100 in 2010, we would have received in our running example involving Google and Apple: \begin{verbatim} $50 * -0.03573992567129673 + $50 * 0.5399749442411563 = $25.21175092849298 \end{verbatim} as profit for that year, and our new balance for 2011 is \$125 when converted to a \texttt{Long}.\item[(3.b)] Write a function that calculates the overall balance for a range of years where each year the yearly profit is compounded to the new balances and then re-invested into our portfolio.\\ \mbox{}\hfill\mbox{[1 Mark]}\end{itemize}\medskip \noindent\textbf{Test Data:} File \texttt{drumb.scala} contains two portfolioscollected from the S\&P 500, one for blue-chip companies, includingFacebook, Amazon and Baidu; and another for listed real-estatecompanies, whose names I have never heard of. Following the dumbinvestment strategy from 1978 until 2017 would have turned a startingbalance of \$100 into roughly \$30,895 for real estate and a whopping\$349,597 for blue chips. Note when comparing these results with yourown calculations: there might be some small rounding errors, whichwhen compounded lead to moderately different values.\bigskip\noindent\textbf{Hints:} useful string functions: \texttt{.startsWith(...)} forchecking whether a string has a given prefix, \texttt{\_ ++ \_} forconcatenating two strings; useful option functions: \texttt{.flatten}flattens a list of options such that it filters way all\texttt{None}'s, \texttt{Try(...) getOrElse ...} runs some code thatmight raise an exception---if yes, then a default value can be given;useful list functions: \texttt{.head} for obtaining the first elementin a non-empty list, \texttt{.length} for the length of alist.\bigskip\noindent\textbf{Moral:} Reflecting on our assumptions, we are over-estimatingour yield in many ways: first, who can know in 1978 about what willturn out to be a blue chip company. Also, since the portfolios arechosen from the current S\&P 500, they do not include the myriadof companies that went bust or were de-listed over the years.So where does this leave our fictional character Mr T.~Drumb? Well, givenhis inheritance, a really dumb investment strategy would have doneequally well, if not much better.\medskip\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: