pep-material: cws/cw01.tex@1877cc717291 (annotated)

276 52faee6d0be2 updated Christian Urban <urbanc@in.tum.de> parents: 266 diff changeset	1	% !TEX program = xelatex
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	2	\documentclass{article}
11 417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	3	\usepackage{../style}
195 fc3ac7b70a06 updated Christian Urban <urbanc@in.tum.de> parents: 192 diff changeset	4	\usepackage{disclaimer}
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	5	\usepackage{../langs}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	6
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	7	\begin{document}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	8
306 1877cc717291 update Christian Urban <urbanc@in.tum.de> parents: 284 diff changeset	9	\section*{Part 6 (Scala)}
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	10
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	11	\mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	12	\mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\
276 52faee6d0be2 updated Christian Urban <urbanc@in.tum.de> parents: 266 diff changeset	13	\mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\medskip\bigskip
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	14
279 14bf4e478534 updated Christian Urban <urbanc@in.tum.de> parents: 276 diff changeset	15
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	16	\noindent
306 1877cc717291 update Christian Urban <urbanc@in.tum.de> parents: 284 diff changeset	17	This part is about Scala. You are asked to implement two programs
284 9a04eb6a2291 updated Christian Urban <urbanc@in.tum.de> parents: 282 diff changeset	18	about list processing and recursion. The preliminary part (3\%) is due
9a04eb6a2291 updated Christian Urban <urbanc@in.tum.de> parents: 282 diff changeset	19	on \cwSIX{} at 4pm, and the core part on \cwSIXa{} at 4pm. The core
9a04eb6a2291 updated Christian Urban <urbanc@in.tum.de> parents: 282 diff changeset	20	part is more advanced and might include material you have not yet seen
9a04eb6a2291 updated Christian Urban <urbanc@in.tum.de> parents: 282 diff changeset	21	in the first lecture.\bigskip
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	22
195 fc3ac7b70a06 updated Christian Urban <urbanc@in.tum.de> parents: 192 diff changeset	23	\IMPORTANT{}
fc3ac7b70a06 updated Christian Urban <urbanc@in.tum.de> parents: 192 diff changeset	24
127 b4def82f3f9f updated Christian Urban <urbanc@in.tum.de> parents: 125 diff changeset	25	\noindent
195 fc3ac7b70a06 updated Christian Urban <urbanc@in.tum.de> parents: 192 diff changeset	26	Also note that the running time of each part will be restricted to a
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	27	maximum of 30 seconds on my laptop.
192 a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	28
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	29	\DISCLAIMER{}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	30
201 018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	31	\subsection*{Reference Implementation}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	32
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	33	Like the C++ assignments, the Scala assignments will work like this: you
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	34	push your files to GitHub and receive (after sometimes a long delay) some
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	35	automated feedback. In the end we take a snapshot of the submitted files and
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	36	apply an automated marking script to them.\medskip
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	37
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	38	\noindent
306 1877cc717291 update Christian Urban <urbanc@in.tum.de> parents: 284 diff changeset	39	In addition, the Scala coursework comes with a reference implementation
1877cc717291 update Christian Urban <urbanc@in.tum.de> parents: 284 diff changeset	40	in form of \texttt{jar}-files. This allows you to run any test cases
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	41	on your own computer. For example you can call Scala on the command
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	42	line with the option \texttt{-cp collatz.jar} and then query any
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	43	function from the template file. Say you want to find out what
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	44	the functions \texttt{collatz} and \texttt{collatz\_max}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	45	produce: for this you just need to prefix them with the object name
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	46	\texttt{CW6a} (and \texttt{CW6b} respectively for \texttt{drumb.jar}).
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	47	If you want to find out what these functions produce for the argument
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	48	\texttt{6}, you would type something like:
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	49
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	50	\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	51	$ scala -cp collatz.jar
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	52
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	53	scala> CW6a.collatz(6)
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	54	...
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	55	scala> CW6a.collatz_max(6)
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	56	...
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	57	\end{lstlisting}%$
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	58
201 018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	59	\subsection*{Hints}
018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	60
018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	61	\noindent
306 1877cc717291 update Christian Urban <urbanc@in.tum.de> parents: 284 diff changeset	62	\textbf{For Preliminary Part:} useful math operators: \texttt{\%} for modulo; useful
201 018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	63	functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	64	\texttt{.toList} for conversions, \texttt{List(...).max} for the
018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	65	maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	66	a value in a list.\bigskip
018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	67
018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	68	\noindent
306 1877cc717291 update Christian Urban <urbanc@in.tum.de> parents: 284 diff changeset	69	\textbf{For Core Part:} useful string functions:
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	70	\texttt{.startsWith(...)} for checking whether a string has a given
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	71	prefix, \texttt{\_ ++ \_} for concatenating two strings; useful option
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	72	functions: \texttt{.flatten} flattens a list of options such that it
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	73	filters way all \texttt{None}'s, \texttt{Try(...).getOrElse ...} runs
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	74	some code that might raise an exception---if yes, then a default value
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	75	can be given; useful list functions: \texttt{.head} for obtaining the
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	76	first element in a non-empty list, \texttt{.length} for the length of
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	77	a list; \texttt{.filter(...)} for filtering out elements in a list;
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	78	\texttt{.getLines.toList} for obtaining a list of lines from a file;
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	79	\texttt{.split(",").toList} for splitting strings according to a
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	80	comma.\bigskip
201 018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	81
018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	82	\noindent
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	83	\textbf{Note!} Fortunately Scala supports operator overloading. But
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	84	make sure you understand the difference between \texttt{100 / 3} and
201 018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	85	\texttt{100.0 / 3}!
018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	86
018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	87	\newpage
279 14bf4e478534 updated Christian Urban <urbanc@in.tum.de> parents: 276 diff changeset	88	\subsection*{Preliminary Part (3 Marks, file collatz.scala)}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	89
11 417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	90	This part is about recursion. You are asked to implement a Scala
18 87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	91	program that tests examples of the \emph{$3n + 1$-conjecture}, also
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	92	called \emph{Collatz conjecture}. This conjecture can be described as
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	93	follows: Start with any positive number $n$ greater than $0$:
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	94
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	95	\begin{itemize}
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	96	\item If $n$ is even, divide it by $2$ to obtain $n / 2$.
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	97	\item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n +
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	98	1$.
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	99	\item Repeat this process and you will always end up with $1$.
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	100	\end{itemize}
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	101
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	102	\noindent
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	103	For example if you start with, say, $6$ and $9$, you obtain the
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	104	two series
11 417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	105
417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	106	\[
417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	107	\begin{array}{@{}l@{\hspace{5mm}}l@{}}
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	108	6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	109	9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1 & \text{(= 19 steps)}\\
11 417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	110	\end{array}
417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	111	\]
417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	112
417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	113	\noindent
417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	114	As you can see, the numbers go up and down like a roller-coaster, but
18 87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	115	curiously they seem to always terminate in $1$. The conjecture is that
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	116	this will \emph{always} happen for every number greater than
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	117	0.\footnote{While it is relatively easy to test this conjecture with
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	118	particular numbers, it is an interesting open problem to
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	119	\emph{prove} that the conjecture is true for \emph{all} numbers ($>
306 1877cc717291 update Christian Urban <urbanc@in.tum.de> parents: 284 diff changeset	120	0$). Paul Erd\"o{}s, a famous mathematician you might have heard
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	121	about, said about this conjecture: ``Mathematics may not [yet] be ready
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	122	for such problems.'' and also offered a \$500 cash prize for its
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	123	solution. Jeffrey Lagarias, another mathematician, claimed that
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	124	based only on known information about this problem, ``this is an
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	125	extraordinarily difficult problem, completely out of reach of
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	126	present day mathematics.'' There is also a
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	127	\href{https://xkcd.com/710/}{xkcd} cartoon about this conjecture
18 87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	128	(click \href{https://xkcd.com/710/}{here}). If you are able to solve
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	129	this conjecture, you will definitely get famous.}\bigskip
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	130
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	131	\noindent
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	132	\textbf{Tasks}
18 87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	133
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	134	\begin{itemize}
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	135	\item[(1)] You are asked to implement a recursive function that
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	136	calculates the number of steps needed until a series ends
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	137	with $1$. In case of starting with $6$, it takes $8$ steps and in
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	138	case of starting with $9$, it takes $19$ (see above). In order to
18 87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	139	try out this function with large numbers, you should use
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	140	\texttt{Long} as argument type, instead of \texttt{Int}. You can
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	141	assume this function will be called with numbers between $1$ and
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	142	$1$ Million. \hfill[2 Marks]
18 87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	143
87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	144	\item[(2)] Write a second function that takes an upper bound as
282 ec9773fe1dc0 updated Christian Urban <urbanc@in.tum.de> parents: 279 diff changeset	145	an argument and calculates the steps for all numbers in the range from
210 63a1376cbebd updated Christian Urban <urbanc@in.tum.de> parents: 202 diff changeset	146	1 up to this bound (the bound including). It returns the maximum number of
63a1376cbebd updated Christian Urban <urbanc@in.tum.de> parents: 202 diff changeset	147	steps and the corresponding number that needs that many steps. More
63a1376cbebd updated Christian Urban <urbanc@in.tum.de> parents: 202 diff changeset	148	precisely it returns a pair where the first component is the number
63a1376cbebd updated Christian Urban <urbanc@in.tum.de> parents: 202 diff changeset	149	of steps and the second is the corresponding number. \hfill\mbox{[1
63a1376cbebd updated Christian Urban <urbanc@in.tum.de> parents: 202 diff changeset	150	Mark]}
18 87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	151	\end{itemize}
11 417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	152
417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	153	\noindent
18 87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	154	\textbf{Test Data:} Some test ranges are:
11 417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	155
417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	156	\begin{itemize}
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	157	\item 1 to 10 where $9$ takes 19 steps
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	158	\item 1 to 100 where $97$ takes 118 steps,
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	159	\item 1 to 1,000 where $871$ takes 178 steps,
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	160	\item 1 to 10,000 where $6,171$ takes 261 steps,
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	161	\item 1 to 100,000 where $77,031$ takes 350 steps,
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	162	\item 1 to 1 Million where $837,799$ takes 524 steps
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	163	%% runs out of stack space
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	164	%% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	165	\end{itemize}
18 87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	166
201 018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	167
127 b4def82f3f9f updated Christian Urban <urbanc@in.tum.de> parents: 125 diff changeset	168
b4def82f3f9f updated Christian Urban <urbanc@in.tum.de> parents: 125 diff changeset	169
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	170
279 14bf4e478534 updated Christian Urban <urbanc@in.tum.de> parents: 276 diff changeset	171	\subsection*{Core Part (7 Marks, file drumb.scala)}
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	172
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	173	A purely fictional character named Mr T.~Drumb inherited in 1978
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	174	approximately 200 Million Dollar from his father. Mr Drumb prides
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	175	himself to be a brilliant business man because nowadays it is
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	176	estimated he is 3 Billion Dollar worth (one is not sure, of course,
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	177	because Mr Drumb refuses to make his tax records public).
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	178
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	179	Since the question about Mr Drumb's business acumen remains open,
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	180	let's do a quick back-of-the-envelope calculation in Scala whether his
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	181	claim has any merit. Let's suppose we are given \$100 in 1978 and we
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	182	follow a really dumb investment strategy, namely:
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	183
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	184	\begin{itemize}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	185	\item We blindly choose a portfolio of stocks, say some Blue-Chip stocks
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	186	or some Real Estate stocks.
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	187	\item If some of the stocks in our portfolio are traded in January of
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	188	a year, we invest our money in equal amounts in each of these
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	189	stocks. For example if we have \$100 and there are four stocks that
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	190	are traded in our portfolio, we buy \$25 worth of stocks
306 1877cc717291 update Christian Urban <urbanc@in.tum.de> parents: 284 diff changeset	191	from each. (Be careful to also test cases where you trade with 3 stocks.)
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	192	\item Next year in January, we look at how our stocks did, liquidate
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	193	everything, and re-invest our (hopefully) increased money in again
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	194	the stocks from our portfolio (there might be more stocks available,
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	195	if companies from our portfolio got listed in that year, or less if
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	196	some companies went bust or were de-listed).
276 52faee6d0be2 updated Christian Urban <urbanc@in.tum.de> parents: 266 diff changeset	197	\item We do this for 41 years until January 2019 and check what would
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	198	have become out of our \$100.
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	199	\end{itemize}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	200
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	201	\noindent
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	202	Until Yahoo was bought by Altaba a few years ago, historical stock market
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	203	data for such back-of-the-envelope calculations was freely available
201 018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	204	online. Unfortunately nowadays this kind of data is more difficult to
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	205	obtain, unless you are prepared to pay extortionate prices or be
306 1877cc717291 update Christian Urban <urbanc@in.tum.de> parents: 284 diff changeset	206	severely rate-limited. Therefore this part comes with a number
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	207	of files containing CSV-lists with the historical stock prices for the
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	208	companies in our portfolios. Use these files for the following
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	209	tasks.\bigskip
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	210
201 018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	211	\newpage
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	212	\noindent
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	213	\textbf{Tasks}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	214
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	215	\begin{itemize}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	216	\item[(1)] Write a function \texttt{get\_january\_data} that takes a
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	217	stock symbol and a year as arguments. The function reads the
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	218	corresponding CSV-file and returns the list of strings that start
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	219	with the given year (each line in the CSV-list is of the form
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	220	\texttt{someyear-01-someday,someprice}).\hfill[1 Mark]
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	221
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	222	\item[(2)] Write a function \texttt{get\_first\_price} that takes
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	223	again a stock symbol and a year as arguments. It should return the
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	224	first January price for the stock symbol in the given year. For this
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	225	it uses the list of strings generated by
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	226	\texttt{get\_january\_data}. A problem is that normally a stock
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	227	exchange is not open on 1st of January, but depending on the day of
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	228	the week on a later day (maybe 3rd or 4th). The easiest way to solve
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	229	this problem is to obtain the whole January data for a stock symbol
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	230	and then select the earliest, or first, entry in this list. The
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	231	stock price of this entry should be converted into a double. Such a
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	232	price might not exist, in case the company does not exist in the given
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	233	year. For example, if you query for Google in January of 1980, then
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	234	clearly Google did not exist yet. Therefore you are asked to
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	235	return a trade price with type \texttt{Option[Double]}\ldots\texttt{None}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	236	will be the value for when no price exists; \texttt{Some} if there is a
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	237	price.\hfill[1 Mark]
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	238
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	239	\item[(3)] Write a function \texttt{get\_prices} that takes a
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	240	portfolio (a list of stock symbols), a years range and gets all the
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	241	first trading prices for each year in the range. You should organise
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	242	this as a list of lists of \texttt{Option[Double]}'s. The inner
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	243	lists are for all stock symbols from the portfolio and the outer
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	244	list for the years. For example for Google and Apple in years 2010
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	245	(first line), 2011 (second line) and 2012 (third line) you obtain:
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	246
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	247	\begin{verbatim}
282 ec9773fe1dc0 updated Christian Urban <urbanc@in.tum.de> parents: 279 diff changeset	248	List(List(Some(311.349976), Some(20.544939)),
ec9773fe1dc0 updated Christian Urban <urbanc@in.tum.de> parents: 279 diff changeset	249	List(Some(300.222351), Some(31.638695)),
ec9773fe1dc0 updated Christian Urban <urbanc@in.tum.de> parents: 279 diff changeset	250	List(Some(330.555054), Some(39.478039))))
201 018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	251	\end{verbatim}\hfill[1 Mark]
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	252
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	253
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	254	%\end{itemize}
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	255
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	256	%\subsection*{Advanced Part 3 (4 Marks, continue in file drumb.scala)}
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	257	%
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	258	%\noindent
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	259	%\textbf{Tasks}
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	260
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	261	%\begin{itemize}
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	262
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	263	\item[(4)] Write a function that calculates the \emph{change factor} (delta)
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	264	for how a stock price has changed from one year to the next. This is
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	265	only well-defined, if the corresponding company has been traded in both
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	266	years. In this case you can calculate
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	267
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	268	\[
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	269	\frac{price_{new} - price_{old}}{price_{old}}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	270	\]
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	271
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	272	If the change factor is defined, you should return it
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	273	as \texttt{Some(change\_factor)}; if not, you should return
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	274	\texttt{None}.\mbox{}\hfill\mbox{[1 Mark]}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	275
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	276	\item[(5)] Write a function that calculates all change factors
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	277	(deltas) for the prices we obtained in Task (2). For the running
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	278	example of Google and Apple for the years 2010 to 2012 you should
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	279	obtain 4 change factors:
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	280
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	281	\begin{verbatim}
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	282	List(List(Some(-0.03573991804411003), Some(0.539974575389325)),
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	283	List(Some(0.10103414222249969), Some(0.24777764141006836)))
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	284	\end{verbatim}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	285
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	286	That means Google did a bit badly in 2010, while Apple did very well.
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	287	Both did OK in 2011. Make sure you handle the cases where a company is
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	288	not listed in a year. In such cases the change factor should be \texttt{None}
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	289	(recall Task~(4)).
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	290	\mbox{}\hfill\mbox{[1 Mark]}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	291
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	292	\item[(6)] Write a function that calculates the ``yield'', or
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	293	balance, for one year for our portfolio. This function takes the
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	294	change factors, the starting balance and the year as arguments. If
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	295	no company from our portfolio existed in that year, the balance is
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	296	unchanged. Otherwise we invest in each existing company an equal
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	297	amount of our balance. Using the change factors computed under Task
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	298	(2), calculate the new balance. Say we had \$100 in 2010, we would have
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	299	received in our running example involving Google and Apple:
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	300
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	301	\begin{verbatim}
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	302	$50 * -0.03573991804411003 + $50 * 0.539974575389325
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	303	= $25.21173286726075
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	304	\end{verbatim}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	305
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	306	as profit for that year, and our new balance for 2011 is \$125 when
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	307	converted to a \texttt{Long}.\mbox{}\hfill\mbox{[1 Mark]}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	308
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	309	\item[(7)] Write a function that calculates the overall balance
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	310	for a range of years where each year the yearly profit is compounded to
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	311	the new balances and then re-invested into our portfolio.
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	312	For this use the function and results generated under (6).\\
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	313	\mbox{}\hfill\mbox{[1 Mark]}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	314	\end{itemize}\medskip
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	315
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	316
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	317
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	318	\noindent
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	319	\textbf{Test Data:} File \texttt{drumb.scala} contains two portfolios
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	320	collected from the S\&P 500, one for blue-chip companies, including
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	321	Facebook, Amazon and Baidu; and another for listed real-estate
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	322	companies, whose names I have never heard of. Following the dumb
266 ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	323	investment strategy from 1978 until 2019 would have turned a starting
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	324	balance of \$100 into roughly \$39,162 for real estate and a whopping
ca48ac1d3c3e updated to 2.13 Christian Urban <urbanc@in.tum.de> parents: 252 diff changeset	325	\$462,199 for blue chips. Note when comparing these results with your
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	326	own calculations: there might be some small rounding errors, which
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	327	when compounded lead to moderately different values.\bigskip
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	328
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	329
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	330	\noindent
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	331	\textbf{Moral:} Reflecting on our assumptions, we are over-estimating
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	332	our yield in many ways: first, who can know in 1978 about what will
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	333	turn out to be a blue chip company. Also, since the portfolios are
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	334	chosen from the current S\&P 500, they do not include the myriad
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	335	of companies that went bust or were de-listed over the years.
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	336	So where does this leave our fictional character Mr T.~Drumb? Well, given
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	337	his inheritance, a really dumb investment strategy would have done
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	338	equally well, if not much better.\medskip
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	339
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	340	\end{document}
54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	341
276 52faee6d0be2 updated Christian Urban <urbanc@in.tum.de> parents: 266 diff changeset	342
52faee6d0be2 updated Christian Urban <urbanc@in.tum.de> parents: 266 diff changeset	343	%%%%%%% Historical Stuff
199 54befaf23648 updated Christian Urban <urbanc@in.tum.de> parents: 197 diff changeset	344	\newpage
192 a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	345
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	346	This part is about web-scraping and list-processing in Scala. It uses
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	347	online data about the per-capita alcohol consumption for each country
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	348	(per year?), and a file containing the data about the population size of
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	349	each country. From this data you are supposed to estimate how many
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	350	litres of pure alcohol are consumed worldwide.\bigskip
192 a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	351
a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	352	\noindent
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	353	\textbf{Tasks (file alcohol.scala):}
192 a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	354
a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	355	\begin{itemize}
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	356	\item[(1)] Write a function that given an URL requests a
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	357	comma-separated value (CSV) list. We are interested in the list
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	358	from the following URL
192 a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	359
a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	360	\begin{center}
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	361	\url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
192 a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	362	\end{center}
127 b4def82f3f9f updated Christian Urban <urbanc@in.tum.de> parents: 125 diff changeset	363
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	364	\noindent Your function should take a string (the URL) as input, and
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	365	produce a list of strings as output, where each string is one line in
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	366	the corresponding CSV-list. This list from the URL above should
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	367	contain 194 lines.\medskip
192 a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	368
a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	369	\noindent
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	370	Write another function that can read the file \texttt{population.csv}
201 018b9c12ee1f updated Christian Urban <urbanc@in.tum.de> parents: 199 diff changeset	371	from disk (the file is distributed with the assignment). This
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	372	function should take a string as argument, the file name, and again
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	373	return a list of strings corresponding to each entry in the
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	374	CSV-list. For \texttt{population.csv}, this list should contain 216
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	375	lines.\hfill[1 Mark]
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	376
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	377
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	378	\item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	379	need to extract the data that interests us. From the header of the
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	380	alcohol list, you can see there are 5 columns
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	381
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	382	\begin{center}
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	383	\begin{tabular}{l}
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	384	\texttt{country (name),}\\
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	385	\texttt{beer\_servings,}\\
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	386	\texttt{spirit\_servings,}\\
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	387	\texttt{wine\_servings,}\\
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	388	\texttt{total\_litres\_of\_pure\_alcohol}
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	389	\end{tabular}
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	390	\end{center}
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	391
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	392	\noindent
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	393	Write a function that extracts the data from the first column,
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	394	the country name, and the data from the fifth column (converted into
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	395	a \texttt{Double}). For this go through each line of the CSV-list
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	396	(except the first line), use the \texttt{split(",")} function to
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	397	divide each line into an array of 5 elements. Keep the data from the
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	398	first and fifth element in these arrays.\medskip
192 a112e0e2325c updated Christian Urban <urbanc@in.tum.de> parents: 167 diff changeset	399
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	400	\noindent
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	401	Write another function that processes the population size list. This
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	402	is already of the form country name and population size.\footnote{Your
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	403	friendly lecturer already did the messy processing for you from the
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	404	Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	405	strings according to the commas. However, this time generate a
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	406	\texttt{Map} from country names to population sizes.\hfill[1 Mark]
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	407
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	408	\item[(3)] In (2) you generated the data about the alcohol consumption
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	409	per capita for each country, and also the population size for each
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	410	country. From this generate next a sorted(!) list of the overall
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	411	alcohol consumption for each country. The list should be sorted from
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	412	highest alcohol consumption to lowest. The difficulty is that the
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	413	data is scraped off from ``random'' sources on the Internet and
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	414	annoyingly the spelling of some country names does not always agree in both
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	415	lists. For example the alcohol list contains
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	416	\texttt{Bosnia-Herzegovina}, while the population writes this country as
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	417	\texttt{Bosnia and Herzegovina}. In your sorted
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	418	overall list include only countries from the alcohol list, whose
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	419	exact country name is also in the population size list. This means
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	420	you can ignore countries like Bosnia-Herzegovina from the overall
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	421	alcohol consumption. There are 177 countries where the names
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	422	agree. The UK is ranked 10th on this list by
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	423	consuming 671,976,864 Litres of pure alcohol each year.\medskip
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	424
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	425	\noindent
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	426	Finally, write another function that takes an integer, say
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	427	\texttt{n}, as argument. You can assume this integer is between 0
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	428	and 177 (the number of countries in the sorted list above). The
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	429	function should return a triple, where the first component is the
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	430	sum of the alcohol consumption in all countries (on the list); the
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	431	second component is the sum of the \texttt{n}-highest alcohol
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	432	consumers on the list; and the third component is the percentage the
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	433	\texttt{n}-highest alcohol consumers drink with respect to the
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	434	the world consumption. You will see that according to our data, 164
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	435	countries (out of 177) gobble up 100\% of the World alcohol
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	436	consumption.\hfill\mbox{[1 Mark]}
18 87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	437	\end{itemize}
11 417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	438
417869f65585 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 9 diff changeset	439	\noindent
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	440	\textbf{Hints:} useful list functions: \texttt{.drop(n)},
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	441	\texttt{.take(n)} for dropping or taking some elements in a list,
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	442	\texttt{.getLines} for separating lines in a string;
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	443	\texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	444	elements in the pairs---the sorting is done from smallest to highest;
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	445	useful \texttt{Map} functions: \texttt{.toMap} converts a list of
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	446	pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	447	map is defined at that key, that is would produce a result when
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	448	called with this key; useful data functions: \texttt{Source.fromURL},
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	449	\texttt{Source.fromFile} for obtaining a webpage and reading a file.
127 b4def82f3f9f updated Christian Urban <urbanc@in.tum.de> parents: 125 diff changeset	450
196 c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	451	\newpage
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	452
c50b074b3047 updated Christian Urban <urbanc@in.tum.de> parents: 195 diff changeset	453
18 87e55eb309ed updated Christian Urban <urbanc@in.tum.de> parents: 11 diff changeset	454
129 b1a51285de7e updated Christian Urban <urbanc@in.tum.de> parents: 127 diff changeset	455
135 077e63e96287 updated Christian Urban <urbanc@in.tum.de> parents: 129 diff changeset	456
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	457
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	458	%%% Local Variables:
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	459	%%% mode: latex
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	460	%%% TeX-master: t
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	461	%%% End:

author	Christian Urban <urbanc@in.tum.de>
	Sat, 02 Nov 2019 19:07:19 +0000 (2019-11-02)
changeset 306	1877cc717291
parent 284	9a04eb6a2291
child 314	21b52310bd8b
permissions	-rw-r--r--