276
|
1 |
% !TEX program = xelatex
|
6
|
2 |
\documentclass{article}
|
11
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
3 |
\usepackage{../style}
|
195
|
4 |
\usepackage{disclaimer}
|
199
|
5 |
\usepackage{../langs}
|
6
|
6 |
|
|
7 |
\begin{document}
|
|
8 |
|
306
|
9 |
\section*{Part 6 (Scala)}
|
199
|
10 |
|
|
11 |
\mbox{}\hfill\textit{``The most effective debugging tool is still careful thought,}\\
|
|
12 |
\mbox{}\hfill\textit{coupled with judiciously placed print statements.''}\smallskip\\
|
276
|
13 |
\mbox{}\hfill\textit{ --- Brian W. Kernighan, in Unix for Beginners (1979)}\medskip\bigskip
|
199
|
14 |
|
279
|
15 |
|
199
|
16 |
\noindent
|
306
|
17 |
This part is about Scala. You are asked to implement two programs
|
284
|
18 |
about list processing and recursion. The preliminary part (3\%) is due
|
|
19 |
on \cwSIX{} at 4pm, and the core part on \cwSIXa{} at 4pm. The core
|
|
20 |
part is more advanced and might include material you have not yet seen
|
|
21 |
in the first lecture.\bigskip
|
199
|
22 |
|
195
|
23 |
\IMPORTANT{}
|
|
24 |
|
127
|
25 |
\noindent
|
195
|
26 |
Also note that the running time of each part will be restricted to a
|
199
|
27 |
maximum of 30 seconds on my laptop.
|
192
|
28 |
|
196
|
29 |
\DISCLAIMER{}
|
6
|
30 |
|
201
|
31 |
\subsection*{Reference Implementation}
|
6
|
32 |
|
199
|
33 |
Like the C++ assignments, the Scala assignments will work like this: you
|
|
34 |
push your files to GitHub and receive (after sometimes a long delay) some
|
|
35 |
automated feedback. In the end we take a snapshot of the submitted files and
|
266
|
36 |
apply an automated marking script to them.\medskip
|
199
|
37 |
|
266
|
38 |
\noindent
|
306
|
39 |
In addition, the Scala coursework comes with a reference implementation
|
|
40 |
in form of \texttt{jar}-files. This allows you to run any test cases
|
199
|
41 |
on your own computer. For example you can call Scala on the command
|
|
42 |
line with the option \texttt{-cp collatz.jar} and then query any
|
|
43 |
function from the template file. Say you want to find out what
|
|
44 |
the functions \texttt{collatz} and \texttt{collatz\_max}
|
|
45 |
produce: for this you just need to prefix them with the object name
|
|
46 |
\texttt{CW6a} (and \texttt{CW6b} respectively for \texttt{drumb.jar}).
|
|
47 |
If you want to find out what these functions produce for the argument
|
|
48 |
\texttt{6}, you would type something like:
|
|
49 |
|
|
50 |
\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
|
|
51 |
$ scala -cp collatz.jar
|
|
52 |
|
|
53 |
scala> CW6a.collatz(6)
|
|
54 |
...
|
|
55 |
scala> CW6a.collatz_max(6)
|
|
56 |
...
|
|
57 |
\end{lstlisting}%$
|
|
58 |
|
201
|
59 |
\subsection*{Hints}
|
|
60 |
|
|
61 |
\noindent
|
306
|
62 |
\textbf{For Preliminary Part:} useful math operators: \texttt{\%} for modulo; useful
|
201
|
63 |
functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
|
|
64 |
\texttt{.toList} for conversions, \texttt{List(...).max} for the
|
|
65 |
maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
|
|
66 |
a value in a list.\bigskip
|
|
67 |
|
|
68 |
\noindent
|
306
|
69 |
\textbf{For Core Part:} useful string functions:
|
266
|
70 |
\texttt{.startsWith(...)} for checking whether a string has a given
|
|
71 |
prefix, \texttt{\_ ++ \_} for concatenating two strings; useful option
|
|
72 |
functions: \texttt{.flatten} flattens a list of options such that it
|
|
73 |
filters way all \texttt{None}'s, \texttt{Try(...).getOrElse ...} runs
|
|
74 |
some code that might raise an exception---if yes, then a default value
|
|
75 |
can be given; useful list functions: \texttt{.head} for obtaining the
|
|
76 |
first element in a non-empty list, \texttt{.length} for the length of
|
|
77 |
a list; \texttt{.filter(...)} for filtering out elements in a list;
|
|
78 |
\texttt{.getLines.toList} for obtaining a list of lines from a file;
|
|
79 |
\texttt{.split(",").toList} for splitting strings according to a
|
|
80 |
comma.\bigskip
|
201
|
81 |
|
|
82 |
\noindent
|
266
|
83 |
\textbf{Note!} Fortunately Scala supports operator overloading. But
|
|
84 |
make sure you understand the difference between \texttt{100 / 3} and
|
201
|
85 |
\texttt{100.0 / 3}!
|
|
86 |
|
|
87 |
\newpage
|
279
|
88 |
\subsection*{Preliminary Part (3 Marks, file collatz.scala)}
|
6
|
89 |
|
11
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
90 |
This part is about recursion. You are asked to implement a Scala
|
18
|
91 |
program that tests examples of the \emph{$3n + 1$-conjecture}, also
|
|
92 |
called \emph{Collatz conjecture}. This conjecture can be described as
|
|
93 |
follows: Start with any positive number $n$ greater than $0$:
|
|
94 |
|
|
95 |
\begin{itemize}
|
|
96 |
\item If $n$ is even, divide it by $2$ to obtain $n / 2$.
|
|
97 |
\item If $n$ is odd, multiply it by $3$ and add $1$ to obtain $3n +
|
|
98 |
1$.
|
|
99 |
\item Repeat this process and you will always end up with $1$.
|
|
100 |
\end{itemize}
|
|
101 |
|
|
102 |
\noindent
|
266
|
103 |
For example if you start with, say, $6$ and $9$, you obtain the
|
|
104 |
two series
|
11
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
105 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
106 |
\[
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
107 |
\begin{array}{@{}l@{\hspace{5mm}}l@{}}
|
199
|
108 |
6, 3, 10, 5, 16, 8, 4, 2, 1 & \text{(= 8 steps)}\\
|
|
109 |
9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1 & \text{(= 19 steps)}\\
|
11
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
110 |
\end{array}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
111 |
\]
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
112 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
113 |
\noindent
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
114 |
As you can see, the numbers go up and down like a roller-coaster, but
|
18
|
115 |
curiously they seem to always terminate in $1$. The conjecture is that
|
|
116 |
this will \emph{always} happen for every number greater than
|
|
117 |
0.\footnote{While it is relatively easy to test this conjecture with
|
|
118 |
particular numbers, it is an interesting open problem to
|
196
|
119 |
\emph{prove} that the conjecture is true for \emph{all} numbers ($>
|
306
|
120 |
0$). Paul Erd\"o{}s, a famous mathematician you might have heard
|
266
|
121 |
about, said about this conjecture: ``Mathematics may not [yet] be ready
|
196
|
122 |
for such problems.'' and also offered a \$500 cash prize for its
|
|
123 |
solution. Jeffrey Lagarias, another mathematician, claimed that
|
|
124 |
based only on known information about this problem, ``this is an
|
|
125 |
extraordinarily difficult problem, completely out of reach of
|
|
126 |
present day mathematics.'' There is also a
|
|
127 |
\href{https://xkcd.com/710/}{xkcd} cartoon about this conjecture
|
18
|
128 |
(click \href{https://xkcd.com/710/}{here}). If you are able to solve
|
|
129 |
this conjecture, you will definitely get famous.}\bigskip
|
|
130 |
|
|
131 |
\noindent
|
199
|
132 |
\textbf{Tasks}
|
18
|
133 |
|
|
134 |
\begin{itemize}
|
|
135 |
\item[(1)] You are asked to implement a recursive function that
|
|
136 |
calculates the number of steps needed until a series ends
|
199
|
137 |
with $1$. In case of starting with $6$, it takes $8$ steps and in
|
|
138 |
case of starting with $9$, it takes $19$ (see above). In order to
|
18
|
139 |
try out this function with large numbers, you should use
|
|
140 |
\texttt{Long} as argument type, instead of \texttt{Int}. You can
|
|
141 |
assume this function will be called with numbers between $1$ and
|
196
|
142 |
$1$ Million. \hfill[2 Marks]
|
18
|
143 |
|
|
144 |
\item[(2)] Write a second function that takes an upper bound as
|
282
|
145 |
an argument and calculates the steps for all numbers in the range from
|
210
|
146 |
1 up to this bound (the bound including). It returns the maximum number of
|
|
147 |
steps and the corresponding number that needs that many steps. More
|
|
148 |
precisely it returns a pair where the first component is the number
|
|
149 |
of steps and the second is the corresponding number. \hfill\mbox{[1
|
|
150 |
Mark]}
|
18
|
151 |
\end{itemize}
|
11
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
152 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
153 |
\noindent
|
18
|
154 |
\textbf{Test Data:} Some test ranges are:
|
11
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
155 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
156 |
\begin{itemize}
|
199
|
157 |
\item 1 to 10 where $9$ takes 19 steps
|
|
158 |
\item 1 to 100 where $97$ takes 118 steps,
|
|
159 |
\item 1 to 1,000 where $871$ takes 178 steps,
|
|
160 |
\item 1 to 10,000 where $6,171$ takes 261 steps,
|
|
161 |
\item 1 to 100,000 where $77,031$ takes 350 steps,
|
|
162 |
\item 1 to 1 Million where $837,799$ takes 524 steps
|
|
163 |
%% runs out of stack space
|
|
164 |
%% \item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 685 steps
|
196
|
165 |
\end{itemize}
|
18
|
166 |
|
201
|
167 |
|
127
|
168 |
|
|
169 |
|
196
|
170 |
|
279
|
171 |
\subsection*{Core Part (7 Marks, file drumb.scala)}
|
199
|
172 |
|
|
173 |
A purely fictional character named Mr T.~Drumb inherited in 1978
|
|
174 |
approximately 200 Million Dollar from his father. Mr Drumb prides
|
|
175 |
himself to be a brilliant business man because nowadays it is
|
|
176 |
estimated he is 3 Billion Dollar worth (one is not sure, of course,
|
|
177 |
because Mr Drumb refuses to make his tax records public).
|
|
178 |
|
|
179 |
Since the question about Mr Drumb's business acumen remains open,
|
|
180 |
let's do a quick back-of-the-envelope calculation in Scala whether his
|
|
181 |
claim has any merit. Let's suppose we are given \$100 in 1978 and we
|
|
182 |
follow a really dumb investment strategy, namely:
|
|
183 |
|
|
184 |
\begin{itemize}
|
|
185 |
\item We blindly choose a portfolio of stocks, say some Blue-Chip stocks
|
|
186 |
or some Real Estate stocks.
|
|
187 |
\item If some of the stocks in our portfolio are traded in January of
|
|
188 |
a year, we invest our money in equal amounts in each of these
|
|
189 |
stocks. For example if we have \$100 and there are four stocks that
|
|
190 |
are traded in our portfolio, we buy \$25 worth of stocks
|
306
|
191 |
from each. (Be careful to also test cases where you trade with 3 stocks.)
|
199
|
192 |
\item Next year in January, we look at how our stocks did, liquidate
|
|
193 |
everything, and re-invest our (hopefully) increased money in again
|
|
194 |
the stocks from our portfolio (there might be more stocks available,
|
|
195 |
if companies from our portfolio got listed in that year, or less if
|
|
196 |
some companies went bust or were de-listed).
|
276
|
197 |
\item We do this for 41 years until January 2019 and check what would
|
199
|
198 |
have become out of our \$100.
|
|
199 |
\end{itemize}
|
|
200 |
|
|
201 |
\noindent
|
266
|
202 |
Until Yahoo was bought by Altaba a few years ago, historical stock market
|
199
|
203 |
data for such back-of-the-envelope calculations was freely available
|
201
|
204 |
online. Unfortunately nowadays this kind of data is more difficult to
|
199
|
205 |
obtain, unless you are prepared to pay extortionate prices or be
|
306
|
206 |
severely rate-limited. Therefore this part comes with a number
|
199
|
207 |
of files containing CSV-lists with the historical stock prices for the
|
|
208 |
companies in our portfolios. Use these files for the following
|
|
209 |
tasks.\bigskip
|
|
210 |
|
201
|
211 |
\newpage
|
199
|
212 |
\noindent
|
|
213 |
\textbf{Tasks}
|
|
214 |
|
|
215 |
\begin{itemize}
|
|
216 |
\item[(1)] Write a function \texttt{get\_january\_data} that takes a
|
|
217 |
stock symbol and a year as arguments. The function reads the
|
|
218 |
corresponding CSV-file and returns the list of strings that start
|
|
219 |
with the given year (each line in the CSV-list is of the form
|
|
220 |
\texttt{someyear-01-someday,someprice}).\hfill[1 Mark]
|
|
221 |
|
|
222 |
\item[(2)] Write a function \texttt{get\_first\_price} that takes
|
|
223 |
again a stock symbol and a year as arguments. It should return the
|
|
224 |
first January price for the stock symbol in the given year. For this
|
|
225 |
it uses the list of strings generated by
|
|
226 |
\texttt{get\_january\_data}. A problem is that normally a stock
|
|
227 |
exchange is not open on 1st of January, but depending on the day of
|
|
228 |
the week on a later day (maybe 3rd or 4th). The easiest way to solve
|
|
229 |
this problem is to obtain the whole January data for a stock symbol
|
|
230 |
and then select the earliest, or first, entry in this list. The
|
|
231 |
stock price of this entry should be converted into a double. Such a
|
|
232 |
price might not exist, in case the company does not exist in the given
|
|
233 |
year. For example, if you query for Google in January of 1980, then
|
|
234 |
clearly Google did not exist yet. Therefore you are asked to
|
|
235 |
return a trade price with type \texttt{Option[Double]}\ldots\texttt{None}
|
|
236 |
will be the value for when no price exists; \texttt{Some} if there is a
|
|
237 |
price.\hfill[1 Mark]
|
|
238 |
|
|
239 |
\item[(3)] Write a function \texttt{get\_prices} that takes a
|
|
240 |
portfolio (a list of stock symbols), a years range and gets all the
|
|
241 |
first trading prices for each year in the range. You should organise
|
|
242 |
this as a list of lists of \texttt{Option[Double]}'s. The inner
|
|
243 |
lists are for all stock symbols from the portfolio and the outer
|
|
244 |
list for the years. For example for Google and Apple in years 2010
|
|
245 |
(first line), 2011 (second line) and 2012 (third line) you obtain:
|
|
246 |
|
|
247 |
\begin{verbatim}
|
282
|
248 |
List(List(Some(311.349976), Some(20.544939)),
|
|
249 |
List(Some(300.222351), Some(31.638695)),
|
|
250 |
List(Some(330.555054), Some(39.478039))))
|
201
|
251 |
\end{verbatim}\hfill[1 Mark]
|
266
|
252 |
|
|
253 |
|
|
254 |
%\end{itemize}
|
199
|
255 |
|
266
|
256 |
%\subsection*{Advanced Part 3 (4 Marks, continue in file drumb.scala)}
|
|
257 |
%
|
|
258 |
%\noindent
|
|
259 |
%\textbf{Tasks}
|
199
|
260 |
|
266
|
261 |
%\begin{itemize}
|
199
|
262 |
|
|
263 |
\item[(4)] Write a function that calculates the \emph{change factor} (delta)
|
|
264 |
for how a stock price has changed from one year to the next. This is
|
|
265 |
only well-defined, if the corresponding company has been traded in both
|
|
266 |
years. In this case you can calculate
|
|
267 |
|
|
268 |
\[
|
|
269 |
\frac{price_{new} - price_{old}}{price_{old}}
|
|
270 |
\]
|
|
271 |
|
|
272 |
If the change factor is defined, you should return it
|
|
273 |
as \texttt{Some(change\_factor)}; if not, you should return
|
|
274 |
\texttt{None}.\mbox{}\hfill\mbox{[1 Mark]}
|
|
275 |
|
|
276 |
\item[(5)] Write a function that calculates all change factors
|
266
|
277 |
(deltas) for the prices we obtained in Task (2). For the running
|
199
|
278 |
example of Google and Apple for the years 2010 to 2012 you should
|
|
279 |
obtain 4 change factors:
|
|
280 |
|
266
|
281 |
\begin{verbatim}
|
|
282 |
List(List(Some(-0.03573991804411003), Some(0.539974575389325)),
|
|
283 |
List(Some(0.10103414222249969), Some(0.24777764141006836)))
|
199
|
284 |
\end{verbatim}
|
|
285 |
|
|
286 |
That means Google did a bit badly in 2010, while Apple did very well.
|
|
287 |
Both did OK in 2011. Make sure you handle the cases where a company is
|
|
288 |
not listed in a year. In such cases the change factor should be \texttt{None}
|
266
|
289 |
(recall Task~(4)).
|
199
|
290 |
\mbox{}\hfill\mbox{[1 Mark]}
|
|
291 |
|
|
292 |
\item[(6)] Write a function that calculates the ``yield'', or
|
|
293 |
balance, for one year for our portfolio. This function takes the
|
|
294 |
change factors, the starting balance and the year as arguments. If
|
|
295 |
no company from our portfolio existed in that year, the balance is
|
|
296 |
unchanged. Otherwise we invest in each existing company an equal
|
|
297 |
amount of our balance. Using the change factors computed under Task
|
266
|
298 |
(2), calculate the new balance. Say we had \$100 in 2010, we would have
|
199
|
299 |
received in our running example involving Google and Apple:
|
|
300 |
|
|
301 |
\begin{verbatim}
|
266
|
302 |
$50 * -0.03573991804411003 + $50 * 0.539974575389325
|
|
303 |
= $25.21173286726075
|
199
|
304 |
\end{verbatim}
|
|
305 |
|
|
306 |
as profit for that year, and our new balance for 2011 is \$125 when
|
|
307 |
converted to a \texttt{Long}.\mbox{}\hfill\mbox{[1 Mark]}
|
|
308 |
|
|
309 |
\item[(7)] Write a function that calculates the overall balance
|
|
310 |
for a range of years where each year the yearly profit is compounded to
|
|
311 |
the new balances and then re-invested into our portfolio.
|
|
312 |
For this use the function and results generated under (6).\\
|
|
313 |
\mbox{}\hfill\mbox{[1 Mark]}
|
|
314 |
\end{itemize}\medskip
|
|
315 |
|
|
316 |
|
|
317 |
|
|
318 |
\noindent
|
|
319 |
\textbf{Test Data:} File \texttt{drumb.scala} contains two portfolios
|
|
320 |
collected from the S\&P 500, one for blue-chip companies, including
|
|
321 |
Facebook, Amazon and Baidu; and another for listed real-estate
|
|
322 |
companies, whose names I have never heard of. Following the dumb
|
266
|
323 |
investment strategy from 1978 until 2019 would have turned a starting
|
|
324 |
balance of \$100 into roughly \$39,162 for real estate and a whopping
|
|
325 |
\$462,199 for blue chips. Note when comparing these results with your
|
199
|
326 |
own calculations: there might be some small rounding errors, which
|
|
327 |
when compounded lead to moderately different values.\bigskip
|
|
328 |
|
|
329 |
|
|
330 |
\noindent
|
|
331 |
\textbf{Moral:} Reflecting on our assumptions, we are over-estimating
|
|
332 |
our yield in many ways: first, who can know in 1978 about what will
|
|
333 |
turn out to be a blue chip company. Also, since the portfolios are
|
|
334 |
chosen from the current S\&P 500, they do not include the myriad
|
|
335 |
of companies that went bust or were de-listed over the years.
|
|
336 |
So where does this leave our fictional character Mr T.~Drumb? Well, given
|
|
337 |
his inheritance, a really dumb investment strategy would have done
|
|
338 |
equally well, if not much better.\medskip
|
|
339 |
|
|
340 |
\end{document}
|
|
341 |
|
276
|
342 |
|
|
343 |
%%%%%%% Historical Stuff
|
199
|
344 |
\newpage
|
192
|
345 |
|
196
|
346 |
This part is about web-scraping and list-processing in Scala. It uses
|
|
347 |
online data about the per-capita alcohol consumption for each country
|
|
348 |
(per year?), and a file containing the data about the population size of
|
|
349 |
each country. From this data you are supposed to estimate how many
|
|
350 |
litres of pure alcohol are consumed worldwide.\bigskip
|
192
|
351 |
|
|
352 |
\noindent
|
196
|
353 |
\textbf{Tasks (file alcohol.scala):}
|
192
|
354 |
|
|
355 |
\begin{itemize}
|
196
|
356 |
\item[(1)] Write a function that given an URL requests a
|
|
357 |
comma-separated value (CSV) list. We are interested in the list
|
|
358 |
from the following URL
|
192
|
359 |
|
|
360 |
\begin{center}
|
196
|
361 |
\url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
|
192
|
362 |
\end{center}
|
127
|
363 |
|
196
|
364 |
\noindent Your function should take a string (the URL) as input, and
|
|
365 |
produce a list of strings as output, where each string is one line in
|
|
366 |
the corresponding CSV-list. This list from the URL above should
|
|
367 |
contain 194 lines.\medskip
|
192
|
368 |
|
|
369 |
\noindent
|
196
|
370 |
Write another function that can read the file \texttt{population.csv}
|
201
|
371 |
from disk (the file is distributed with the assignment). This
|
196
|
372 |
function should take a string as argument, the file name, and again
|
|
373 |
return a list of strings corresponding to each entry in the
|
|
374 |
CSV-list. For \texttt{population.csv}, this list should contain 216
|
|
375 |
lines.\hfill[1 Mark]
|
|
376 |
|
|
377 |
|
|
378 |
\item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we
|
|
379 |
need to extract the data that interests us. From the header of the
|
|
380 |
alcohol list, you can see there are 5 columns
|
|
381 |
|
|
382 |
\begin{center}
|
|
383 |
\begin{tabular}{l}
|
|
384 |
\texttt{country (name),}\\
|
|
385 |
\texttt{beer\_servings,}\\
|
|
386 |
\texttt{spirit\_servings,}\\
|
|
387 |
\texttt{wine\_servings,}\\
|
|
388 |
\texttt{total\_litres\_of\_pure\_alcohol}
|
|
389 |
\end{tabular}
|
|
390 |
\end{center}
|
|
391 |
|
|
392 |
\noindent
|
|
393 |
Write a function that extracts the data from the first column,
|
|
394 |
the country name, and the data from the fifth column (converted into
|
|
395 |
a \texttt{Double}). For this go through each line of the CSV-list
|
|
396 |
(except the first line), use the \texttt{split(",")} function to
|
|
397 |
divide each line into an array of 5 elements. Keep the data from the
|
|
398 |
first and fifth element in these arrays.\medskip
|
192
|
399 |
|
196
|
400 |
\noindent
|
|
401 |
Write another function that processes the population size list. This
|
|
402 |
is already of the form country name and population size.\footnote{Your
|
|
403 |
friendly lecturer already did the messy processing for you from the
|
|
404 |
Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
|
|
405 |
strings according to the commas. However, this time generate a
|
|
406 |
\texttt{Map} from country names to population sizes.\hfill[1 Mark]
|
|
407 |
|
|
408 |
\item[(3)] In (2) you generated the data about the alcohol consumption
|
|
409 |
per capita for each country, and also the population size for each
|
|
410 |
country. From this generate next a sorted(!) list of the overall
|
|
411 |
alcohol consumption for each country. The list should be sorted from
|
|
412 |
highest alcohol consumption to lowest. The difficulty is that the
|
|
413 |
data is scraped off from ``random'' sources on the Internet and
|
|
414 |
annoyingly the spelling of some country names does not always agree in both
|
|
415 |
lists. For example the alcohol list contains
|
|
416 |
\texttt{Bosnia-Herzegovina}, while the population writes this country as
|
|
417 |
\texttt{Bosnia and Herzegovina}. In your sorted
|
|
418 |
overall list include only countries from the alcohol list, whose
|
|
419 |
exact country name is also in the population size list. This means
|
|
420 |
you can ignore countries like Bosnia-Herzegovina from the overall
|
|
421 |
alcohol consumption. There are 177 countries where the names
|
|
422 |
agree. The UK is ranked 10th on this list by
|
|
423 |
consuming 671,976,864 Litres of pure alcohol each year.\medskip
|
|
424 |
|
|
425 |
\noindent
|
|
426 |
Finally, write another function that takes an integer, say
|
|
427 |
\texttt{n}, as argument. You can assume this integer is between 0
|
|
428 |
and 177 (the number of countries in the sorted list above). The
|
|
429 |
function should return a triple, where the first component is the
|
|
430 |
sum of the alcohol consumption in all countries (on the list); the
|
|
431 |
second component is the sum of the \texttt{n}-highest alcohol
|
|
432 |
consumers on the list; and the third component is the percentage the
|
|
433 |
\texttt{n}-highest alcohol consumers drink with respect to the
|
|
434 |
the world consumption. You will see that according to our data, 164
|
|
435 |
countries (out of 177) gobble up 100\% of the World alcohol
|
|
436 |
consumption.\hfill\mbox{[1 Mark]}
|
18
|
437 |
\end{itemize}
|
11
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
438 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
439 |
\noindent
|
196
|
440 |
\textbf{Hints:} useful list functions: \texttt{.drop(n)},
|
|
441 |
\texttt{.take(n)} for dropping or taking some elements in a list,
|
|
442 |
\texttt{.getLines} for separating lines in a string;
|
|
443 |
\texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
|
|
444 |
elements in the pairs---the sorting is done from smallest to highest;
|
|
445 |
useful \texttt{Map} functions: \texttt{.toMap} converts a list of
|
|
446 |
pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
|
|
447 |
map is defined at that key, that is would produce a result when
|
|
448 |
called with this key; useful data functions: \texttt{Source.fromURL},
|
|
449 |
\texttt{Source.fromFile} for obtaining a webpage and reading a file.
|
127
|
450 |
|
196
|
451 |
\newpage
|
|
452 |
|
|
453 |
|
18
|
454 |
|
129
|
455 |
|
135
|
456 |
|
6
|
457 |
|
|
458 |
%%% Local Variables:
|
|
459 |
%%% mode: latex
|
|
460 |
%%% TeX-master: t
|
|
461 |
%%% End:
|