191 |
192 |
192 \noindent |
193 \noindent |
193 Write another function that processes the population size list. This |
194 Write another function that processes the population size list. This |
194 is already of the form country name and population size.\footnote{Your |
195 is already of the form country name and population size.\footnote{Your |
195 friendly lecturer already did the messy processing for you from the |
196 friendly lecturer already did the messy processing for you from the |
196 Worldbank database, see \url{https://github.com/datasets/population/tree/master/data}.} Again, split the |
197 Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the |
197 strings according to the commas. However, this time generate a |
198 strings according to the commas. However, this time generate a |
198 \texttt{Map} from country names to population sizes.\hfill[1 Mark] |
199 \texttt{Map} from country names to population sizes.\hfill[1 Mark] |
199 |
200 |
200 \item[(3)] In (2) you generated the data about the alcohol consumption |
201 \item[(3)] In (2) you generated the data about the alcohol consumption |
201 per capita for each country, and also the population size for each |
202 per capita for each country, and also the population size for each |
202 country. From this generate next a sorted(!) list of the overall |
203 country. From this generate next a sorted(!) list of the overall |
203 alcohol consumption for each country. The list should be sorted from |
204 alcohol consumption for each country. The list should be sorted from |
204 highest alcohol consumption to lowest. The difficulty is that the |
205 highest alcohol consumption to lowest. The difficulty is that the |
205 data is scrapped off from ``random'' sources on the Internet and |
206 data is scraped off from ``random'' sources on the Internet and |
206 annoyingly the spelling of some country names does not always agree in the |
207 annoyingly the spelling of some country names does not always agree in both |
207 lists. For example the alcohol list contains |
208 lists. For example the alcohol list contains |
208 \texttt{Bosnia-Herzegovina}, while the population writes this country as |
209 \texttt{Bosnia-Herzegovina}, while the population writes this country as |
209 \texttt{Bosnia and Herzegovina}. In your sorted |
210 \texttt{Bosnia and Herzegovina}. In your sorted |
210 overall list include only countries from the alcohol list, whose |
211 overall list include only countries from the alcohol list, whose |
211 exact country name is also in the population size list. This means |
212 exact country name is also in the population size list. This means |
212 you can ignore countries like Bosnia-Herzegovina from the overall |
213 you can ignore countries like Bosnia-Herzegovina from the overall |
213 alcohol consumption. There are 177 countries where the names |
214 alcohol consumption. There are 177 countries where the names |
214 agree. The UK is ranked 10th on this list with |
215 agree. The UK is ranked 10th on this list by |
215 consuming 671,976,864 Litres of pure alcohol each year.\medskip |
216 consuming 671,976,864 Litres of pure alcohol each year.\medskip |
216 |
217 |
217 \noindent |
218 \noindent |
218 Finally, write another function that takes an integer, say |
219 Finally, write another function that takes an integer, say |
219 \texttt{n}, as argument. You can assume this integer is between 0 |
220 \texttt{n}, as argument. You can assume this integer is between 0 |
220 and 177. The function should use the sorted list from above. It returns |
221 and 177 (the number of countries in the sorted list above). The |
221 a triple, where the first component is the sum of the alcohol |
222 function should return a triple, where the first component is the |
222 consumption in all countries (on the list); the second component is |
223 sum of the alcohol consumption in all countries (on the list); the |
223 the sum of the \texttt{n}-highest alcohol consumers on the list; and |
224 second component is the sum of the \texttt{n}-highest alcohol |
224 the third component is the percentage the \texttt{n}-highest alcohol |
225 consumers on the list; and the third component is the percentage the |
225 consumers feast on with respect to the the world consumption. You will |
226 \texttt{n}-highest alcohol consumers drink with respect to the |
226 see that according to our data, 164 countries (out of 177) gobble up 100\% |
227 the world consumption. You will see that according to our data, 164 |
227 of the world alcohol consumption.\hfill\mbox{[1 Mark]} |
228 countries (out of 177) gobble up 100\% of the World alcohol |
|
229 consumption.\hfill\mbox{[1 Mark]} |
228 \end{itemize} |
230 \end{itemize} |
229 |
231 |
230 \noindent |
232 \noindent |
231 \textbf{Hints:} useful list functions: \texttt{.drop(n)}, |
233 \textbf{Hints:} useful list functions: \texttt{.drop(n)}, |
232 \texttt{.take(n)} for dropping or taking some elements in a list, |
234 \texttt{.take(n)} for dropping or taking some elements in a list, |
263 from each. |
265 from each. |
264 \item Next year in January, we look how our stocks did, liquidate |
266 \item Next year in January, we look how our stocks did, liquidate |
265 everything, and re-invest our (hopefully) increased money in again |
267 everything, and re-invest our (hopefully) increased money in again |
266 the stocks from our portfolio (there might be more stocks available, |
268 the stocks from our portfolio (there might be more stocks available, |
267 if companies from our portfolio got listed in that year, or less if |
269 if companies from our portfolio got listed in that year, or less if |
268 some companies went bust or de-listed). |
270 some companies went bust or were de-listed). |
269 \item We do this for 38 years until January 2017 and check what would |
271 \item We do this for 39 years until January 2017 and check what would |
270 have become out of our \$100. |
272 have become out of our \$100. |
271 \end{itemize} |
273 \end{itemize} |
272 |
274 |
273 |
275 \noindent |
274 \medskip |
276 Until Yahoo was bought by Altaba this summer, historical stock market |
|
277 data was available online for free, but nowadays this kind of data is |
|
278 difficult to obtain unless you are prepared to pay extortionate prices |
|
279 or be severely rate-limited. Therefore this coursework comes with a |
|
280 number of files containing CSV-lists about stock prices of |
|
281 various companies. Use these files for the following tasks.\bigskip |
275 |
282 |
276 \noindent |
283 \noindent |
277 \textbf{Tasks (file drumb.scala):} |
284 \textbf{Tasks (file drumb.scala):} |
278 |
285 |
279 \begin{itemize} |
286 \begin{itemize} |
280 \item[(1.a)] Write a function that queries the Yahoo financial data |
287 \item[(1.a)] Write a function \texttt{get\_january\_data} that takes a |
281 service and obtains the first trade (adjusted close price) of a |
288 stock symbol and a year as argument. The function reads the |
282 stock symbol and a year. A problem is that normally a stock exchange |
289 corresponding CSV-file and returns the list of strings that start |
283 is not open on 1st of January, but depending on the day of the week |
290 with the given year (each line in the CSV-list is of the form |
284 on a later day (maybe 3rd or 4th). The easiest way to solve this |
291 \texttt{year-01-someday,someprice}. |
285 problem is to obtain the whole January data for a stock symbol as |
292 |
286 CSV-list and then select the earliest entry in this list. For this |
293 \item[(1.b)] Write a function \texttt{get\_first\_price} that takes |
287 you can specify a date range with the Yahoo service. For example if |
294 again stock symbol and a year as arguments. It should return the |
288 you want to obtain all January data for Google in 2000, you can form |
295 first January price for the stock symbol in the year. For this it |
289 the query:\mbox{}\\[-8mm] |
296 obtains the list of strings generated by |
290 |
297 \texttt{get\_january\_data}. A problem is that normally a stock |
291 \begin{center}\small |
298 exchange is not open on 1st of January, but depending on the day of |
292 \mbox{\url{http://ichart.yahoo.com/table.csv?s=GOOG&a=0&b=1&c=2000&d=1&e=1&f=2000}} |
299 the week on a later day (maybe 3rd or 4th). The easiest way to solve |
293 \end{center} |
300 this problem is to obtain the whole January data for a stock symbol |
294 |
301 and then select the earliest entry in this list. This entry should |
295 For other companies and years, you need to change the stock symbol |
302 be converted into a double. Such a price might not exist, if the |
296 (\texttt{GOOG}) and the year \texttt{2000} (in the \texttt{c} and |
303 company does not exist in the given year. For example, if you query |
297 \texttt{f} argument of the query). Such a request might fail, if the |
304 for Google in January of 1980, then clearly Google did not exists |
298 company does not exist during this period. For example, if you query |
305 yet. Therefore you are asked to return a trade price as |
299 for Google in January of 1980, then clearly Google did not exists yet. |
|
300 Therefore you are asked to return a trade price as |
|
301 \texttt{Option[Double]}. |
306 \texttt{Option[Double]}. |
302 |
307 |
303 \item[(1.b)] Write a function that takes a portfolio (a list of stock symbols), |
308 \item[(1.c)] Write a function \texttt{get\_prices} that takes a |
304 a years range and gets all the first trading prices for each year. You should |
309 portfolio (a list of stock symbols), a years range and gets all the |
305 organise this as a list of lists of \texttt{Option[Double]}'s. The inner lists |
310 first trading prices for each year. You should organise this as a |
306 are for all stock symbols from the portfolio and the outer list for the years. |
311 list of lists of \texttt{Option[Double]}'s. The inner lists are for |
307 For example for Google and Apple in years 2010 (first line), 2011 |
312 all stock symbols from the portfolio and the outer list for the |
308 (second line) and 2012 (third line) you obtain: |
313 years. For example for Google and Apple in years 2010 (first line), |
|
314 2011 (second line) and 2012 (third line) you obtain: |
309 |
315 |
310 \begin{verbatim} |
316 \begin{verbatim} |
311 List(List(Some(313.062468), Some(27.847252)), |
317 List(List(Some(311.349976), Some(27.505054)), |
312 List(Some(301.873641), Some(42.884065)), |
318 List(Some(300.222351), Some(42.357094)), |
313 List(Some(332.373186), Some(53.509768))) |
319 List(Some(330.555054), Some(52.852215))) |
314 \end{verbatim}\hfill[1 Mark] |
320 \end{verbatim}\hfill[2 Marks] |
315 |
321 |
316 \item[(2.a)] Write a function that calculates the \emph{change factor} (delta) |
322 \item[(2.a)] Write a function that calculates the \emph{change factor} (delta) |
317 for how a stock price has changed from one year to the next. This is |
323 for how a stock price has changed from one year to the next. This is |
318 only well-defined, if the corresponding company has been traded in both |
324 only well-defined, if the corresponding company has been traded in both |
319 years. In this case you can calculate |
325 years. In this case you can calculate |
359 \end{itemize}\medskip |
365 \end{itemize}\medskip |
360 |
366 |
361 \noindent |
367 \noindent |
362 \textbf{Test Data:} File \texttt{drumb.scala} contains two portfolios |
368 \textbf{Test Data:} File \texttt{drumb.scala} contains two portfolios |
363 collected from the S\&P 500, one for blue-chip companies, including |
369 collected from the S\&P 500, one for blue-chip companies, including |
364 Facebook, Amazon and Baidu; and another for listed real-estate companies, whose |
370 Facebook, Amazon and Baidu; and another for listed real-estate |
365 names I have never heard of. Following the dumb investment strategy |
371 companies, whose names I have never heard of. Following the dumb |
366 from 1978 until 2016 would have turned a starting balance of \$100 |
372 investment strategy from 1978 until 2017 would have turned a starting |
367 into \$23,794 for real estate and a whopping \$524,609 for blue chips.\medskip |
373 balance of \$100 into roughly \$30,895 for real estate and a whopping |
|
374 \$188,172 for blue chips. Note when comparing these results with your |
|
375 own results: there might be some small rounding errors, which when |
|
376 compounded, lead to moderately different values.\medskip |
368 |
377 |
369 \noindent |
378 \noindent |
370 \textbf{Moral:} Reflecting on our assumptions, we are over-estimating |
379 \textbf{Moral:} Reflecting on our assumptions, we are over-estimating |
371 our yield in many ways: first, who can know in 1978 about what will |
380 our yield in many ways: first, who can know in 1978 about what will |
372 turn out to be a blue chip company. Also, since the portfolios are |
381 turn out to be a blue chip company. Also, since the portfolios are |
373 chosen from the current S\&P 500, they do not include the myriad |
382 chosen from the current S\&P 500, they do not include the myriad |
374 of companies that went bust or were de-listed over the years. |
383 of companies that went bust or were de-listed over the years. |
375 So where does this leave our fictional character Mr T.~Drumb? Well, given |
384 So where does this leave our fictional character Mr T.~Drumb? Well, given |
376 his inheritance, a really dumb investment strategy would have done |
385 his inheritance, a really dumb investment strategy would have done |
377 equally well, if not much better. |
386 equally well, if not much better.\medskip |
|
387 |
|
388 \noindent |
|
389 \textbf{Hints:} |
378 \end{document} |
390 \end{document} |
379 |
391 |
380 %%% Local Variables: |
392 %%% Local Variables: |
381 %%% mode: latex |
393 %%% mode: latex |
382 %%% TeX-master: t |
394 %%% TeX-master: t |