cws/cw01.tex
changeset 129 b1a51285de7e
parent 127 b4def82f3f9f
child 135 077e63e96287
equal deleted inserted replaced
128:166bb9b6b20a 129:b1a51285de7e
   136 
   136 
   137 \subsection*{Part 2 (3 Marks)}
   137 \subsection*{Part 2 (3 Marks)}
   138 
   138 
   139 This part is about web-scraping and list-processing in Scala. It uses
   139 This part is about web-scraping and list-processing in Scala. It uses
   140 online data about the per-capita alcohol consumption for each country
   140 online data about the per-capita alcohol consumption for each country
   141 (per year?), and a file with the data about the population size of
   141 (per year?), and a file containing the data about the population size of
   142 each country.  From this data you are supposed to estimate how many
   142 each country.  From this data you are supposed to estimate how many
   143 litres of pure alcohol are consumed worldwide.\bigskip
   143 litres of pure alcohol are consumed worldwide.\bigskip
   144 
   144 
   145 \noindent
   145 \noindent
   146 \textbf{Tasks (file alcohol.scala):}
   146 \textbf{Tasks (file alcohol.scala):}
   154   \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
   154   \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
   155 \end{center}
   155 \end{center}
   156 
   156 
   157 \noindent Your function should take a string (the URL) as input, and
   157 \noindent Your function should take a string (the URL) as input, and
   158 produce a list of strings as output, where each string is one line in
   158 produce a list of strings as output, where each string is one line in
   159 the corresponding CSV-list.  This list should contain 194 lines.\medskip
   159 the corresponding CSV-list.  This list from the URL above should
       
   160 contain 194 lines.\medskip
   160 
   161 
   161 \noindent
   162 \noindent
   162 Write another function that can read the file \texttt{population.csv}
   163 Write another function that can read the file \texttt{population.csv}
   163 from disk (the file is distributed with the coursework). This
   164 from disk (the file is distributed with the coursework). This
   164 function should take a string as argument, the file name, and again
   165 function should take a string as argument, the file name, and again
   191 
   192 
   192   \noindent
   193   \noindent
   193   Write another function that processes the population size list. This
   194   Write another function that processes the population size list. This
   194   is already of the form country name and population size.\footnote{Your
   195   is already of the form country name and population size.\footnote{Your
   195     friendly lecturer already did the messy processing for you from the
   196     friendly lecturer already did the messy processing for you from the
   196   Worldbank database, see \url{https://github.com/datasets/population/tree/master/data}.} Again, split the
   197   Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
   197   strings according to the commas. However, this time generate a
   198   strings according to the commas. However, this time generate a
   198   \texttt{Map} from country names to population sizes.\hfill[1 Mark]
   199   \texttt{Map} from country names to population sizes.\hfill[1 Mark]
   199 
   200 
   200 \item[(3)] In (2) you generated the data about the alcohol consumption
   201 \item[(3)] In (2) you generated the data about the alcohol consumption
   201   per capita for each country, and also the population size for each
   202   per capita for each country, and also the population size for each
   202   country. From this generate next a sorted(!) list of the overall
   203   country. From this generate next a sorted(!) list of the overall
   203   alcohol consumption for each country. The list should be sorted from
   204   alcohol consumption for each country. The list should be sorted from
   204   highest alcohol consumption to lowest. The difficulty is that the
   205   highest alcohol consumption to lowest. The difficulty is that the
   205   data is scrapped off from ``random'' sources on the Internet and
   206   data is scraped off from ``random'' sources on the Internet and
   206   annoyingly the spelling of some country names does not always agree in the
   207   annoyingly the spelling of some country names does not always agree in both
   207   lists. For example the alcohol list contains
   208   lists. For example the alcohol list contains
   208   \texttt{Bosnia-Herzegovina}, while the population writes this country as
   209   \texttt{Bosnia-Herzegovina}, while the population writes this country as
   209   \texttt{Bosnia and Herzegovina}. In your sorted
   210   \texttt{Bosnia and Herzegovina}. In your sorted
   210   overall list include only countries from the alcohol list, whose
   211   overall list include only countries from the alcohol list, whose
   211   exact country name is also in the population size list. This means
   212   exact country name is also in the population size list. This means
   212   you can ignore countries like Bosnia-Herzegovina from the overall
   213   you can ignore countries like Bosnia-Herzegovina from the overall
   213   alcohol consumption. There are 177 countries where the names
   214   alcohol consumption. There are 177 countries where the names
   214   agree. The UK is ranked 10th on this list with
   215   agree. The UK is ranked 10th on this list by
   215   consuming 671,976,864 Litres of pure alcohol each year.\medskip
   216   consuming 671,976,864 Litres of pure alcohol each year.\medskip
   216   
   217   
   217   \noindent
   218   \noindent
   218   Finally, write another function that takes an integer, say
   219   Finally, write another function that takes an integer, say
   219   \texttt{n}, as argument. You can assume this integer is between 0
   220   \texttt{n}, as argument. You can assume this integer is between 0
   220   and 177.  The function should use the sorted list from above.  It returns
   221   and 177 (the number of countries in the sorted list above).  The
   221   a triple, where the first component is the sum of the alcohol
   222   function should return a triple, where the first component is the
   222   consumption in all countries (on the list); the second component is
   223   sum of the alcohol consumption in all countries (on the list); the
   223   the sum of the \texttt{n}-highest alcohol consumers on the list; and
   224   second component is the sum of the \texttt{n}-highest alcohol
   224   the third component is the percentage the \texttt{n}-highest alcohol
   225   consumers on the list; and the third component is the percentage the
   225   consumers feast on with respect to the the world consumption. You will
   226   \texttt{n}-highest alcohol consumers drink with respect to the
   226   see that according to our data, 164 countries (out of 177) gobble up 100\%
   227   the world consumption. You will see that according to our data, 164
   227   of the world alcohol consumption.\hfill\mbox{[1 Mark]}
   228   countries (out of 177) gobble up 100\% of the World alcohol
       
   229   consumption.\hfill\mbox{[1 Mark]}
   228 \end{itemize}
   230 \end{itemize}
   229 
   231 
   230 \noindent
   232 \noindent
   231 \textbf{Hints:} useful list functions: \texttt{.drop(n)},
   233 \textbf{Hints:} useful list functions: \texttt{.drop(n)},
   232 \texttt{.take(n)} for dropping or taking some elements in a list,
   234 \texttt{.take(n)} for dropping or taking some elements in a list,
   238 map is defined at that key, that is would produce a result when
   240 map is defined at that key, that is would produce a result when
   239 called with this key.
   241 called with this key.
   240 
   242 
   241 \newpage
   243 \newpage
   242 
   244 
   243 \subsection*{Advanced Part 3 (3 Marks)}
   245 \subsection*{Advanced Part 3 (4 Marks)}
   244 
   246 
   245 A purely fictional character named Mr T.~Drumb inherited in 1978
   247 A purely fictional character named Mr T.~Drumb inherited in 1978
   246 approximately 200 Million Dollar from his father. Mr Drumb prides
   248 approximately 200 Million Dollar from his father. Mr Drumb prides
   247 himself to be a brilliant business man because nowadays it is
   249 himself to be a brilliant business man because nowadays it is
   248 estimated he is 3 Billion Dollar worth (one is not sure, of course,
   250 estimated he is 3 Billion Dollar worth (one is not sure, of course,
   263   from each.
   265   from each.
   264 \item Next year in January, we look how our stocks did, liquidate
   266 \item Next year in January, we look how our stocks did, liquidate
   265   everything, and re-invest our (hopefully) increased money in again
   267   everything, and re-invest our (hopefully) increased money in again
   266   the stocks from our portfolio (there might be more stocks available,
   268   the stocks from our portfolio (there might be more stocks available,
   267   if companies from our portfolio got listed in that year, or less if
   269   if companies from our portfolio got listed in that year, or less if
   268   some companies went bust or de-listed).
   270   some companies went bust or were de-listed).
   269 \item We do this for 38 years until January 2017 and check what would
   271 \item We do this for 39 years until January 2017 and check what would
   270   have become out of our \$100.
   272   have become out of our \$100.
   271 \end{itemize}
   273 \end{itemize}
   272 
   274 
   273 
   275 \noindent
   274 \medskip  
   276 Until Yahoo was bought by Altaba this summer, historical stock market
       
   277 data was available online for free, but nowadays this kind of data is
       
   278 difficult to obtain unless you are prepared to pay extortionate prices
       
   279 or be severely rate-limited.  Therefore this coursework comes with a
       
   280 number of files containing CSV-lists about stock prices of
       
   281 various companies. Use these files for the following tasks.\bigskip
   275 
   282 
   276 \noindent
   283 \noindent
   277 \textbf{Tasks (file drumb.scala):}
   284 \textbf{Tasks (file drumb.scala):}
   278 
   285 
   279 \begin{itemize}
   286 \begin{itemize}
   280 \item[(1.a)] Write a function that queries the Yahoo financial data
   287 \item[(1.a)] Write a function \texttt{get\_january\_data} that takes a
   281   service and obtains the first trade (adjusted close price) of a
   288   stock symbol and a year as argument. The function reads the
   282   stock symbol and a year. A problem is that normally a stock exchange
   289   corresponding CSV-file and returns the list of strings that start
   283   is not open on 1st of January, but depending on the day of the week
   290   with the given year (each line in the CSV-list is of the form
   284   on a later day (maybe 3rd or 4th). The easiest way to solve this
   291   \texttt{year-01-someday,someprice}.
   285   problem is to obtain the whole January data for a stock symbol as
   292 
   286   CSV-list and then select the earliest entry in this list. For this
   293 \item[(1.b)] Write a function \texttt{get\_first\_price} that takes
   287   you can specify a date range with the Yahoo service. For example if
   294   again stock symbol and a year as arguments. It should return the
   288   you want to obtain all January data for Google in 2000, you can form
   295   first January price for the stock symbol in the year. For this it
   289   the query:\mbox{}\\[-8mm]
   296   obtains the list of strings generated by
   290 
   297   \texttt{get\_january\_data}.  A problem is that normally a stock
   291   \begin{center}\small
   298   exchange is not open on 1st of January, but depending on the day of
   292     \mbox{\url{http://ichart.yahoo.com/table.csv?s=GOOG&a=0&b=1&c=2000&d=1&e=1&f=2000}}
   299   the week on a later day (maybe 3rd or 4th). The easiest way to solve
   293   \end{center}
   300   this problem is to obtain the whole January data for a stock symbol
   294 
   301   and then select the earliest entry in this list. This entry should
   295   For other companies and years, you need to change the stock symbol
   302   be converted into a double.  Such a price might not exist, if the
   296   (\texttt{GOOG}) and the year \texttt{2000} (in the \texttt{c} and
   303   company does not exist in the given year. For example, if you query
   297   \texttt{f} argument of the query). Such a request might fail, if the
   304   for Google in January of 1980, then clearly Google did not exists
   298   company does not exist during this period. For example, if you query
   305   yet.  Therefore you are asked to return a trade price as
   299   for Google in January of 1980, then clearly Google did not exists yet.
       
   300   Therefore you are asked to return a trade price as
       
   301   \texttt{Option[Double]}.
   306   \texttt{Option[Double]}.
   302 
   307 
   303 \item[(1.b)] Write a function that takes a portfolio (a list of stock symbols),
   308 \item[(1.c)] Write a function \texttt{get\_prices} that takes a
   304   a years range and gets all the first trading prices for each year. You should
   309   portfolio (a list of stock symbols), a years range and gets all the
   305   organise this as a list of lists of \texttt{Option[Double]}'s. The inner lists
   310   first trading prices for each year. You should organise this as a
   306   are for all stock symbols from the portfolio and the outer list for the years.
   311   list of lists of \texttt{Option[Double]}'s. The inner lists are for
   307   For example for Google and Apple in years 2010 (first line), 2011
   312   all stock symbols from the portfolio and the outer list for the
   308   (second line) and 2012 (third line) you obtain:
   313   years.  For example for Google and Apple in years 2010 (first line),
       
   314   2011 (second line) and 2012 (third line) you obtain:
   309 
   315 
   310 \begin{verbatim}
   316 \begin{verbatim}
   311   List(List(Some(313.062468), Some(27.847252)), 
   317   List(List(Some(311.349976), Some(27.505054)), 
   312        List(Some(301.873641), Some(42.884065)),
   318        List(Some(300.222351), Some(42.357094)),
   313        List(Some(332.373186), Some(53.509768)))
   319        List(Some(330.555054), Some(52.852215)))
   314 \end{verbatim}\hfill[1 Mark]
   320 \end{verbatim}\hfill[2 Marks]
   315  
   321  
   316 \item[(2.a)] Write a function that calculates the \emph{change factor} (delta)
   322 \item[(2.a)] Write a function that calculates the \emph{change factor} (delta)
   317   for how a stock price has changed from one year to the next. This is
   323   for how a stock price has changed from one year to the next. This is
   318   only well-defined, if the corresponding company has been traded in both
   324   only well-defined, if the corresponding company has been traded in both
   319   years. In this case you can calculate
   325   years. In this case you can calculate
   327   (deltas) for the prices we obtained under Task 1. For the running
   333   (deltas) for the prices we obtained under Task 1. For the running
   328   example of Google and Apple for the years 2010 to 2012 you should
   334   example of Google and Apple for the years 2010 to 2012 you should
   329   obtain 4 change factors:
   335   obtain 4 change factors:
   330 
   336 
   331 \begin{verbatim}  
   337 \begin{verbatim}  
   332   List(List(Some(-0.03573991820699504), Some(0.5399747522663995))
   338   List(List(Some(-0.03573992567129673), Some(0.5399749442411563))
   333           List(Some(0.10103414428290529), Some(0.24777742035415723)))
   339         List(Some(0.10103412653643493), Some(0.2477771728154912)))
   334 \end{verbatim}
   340 \end{verbatim}
   335 
   341 
   336   That means Google did a bit badly in 2010, while Apple did very well.
   342   That means Google did a bit badly in 2010, while Apple did very well.
   337   Both did OK in 2011.\hfill\mbox{[1 Mark]}
   343   Both did OK in 2011.\hfill\mbox{[1 Mark]}
   338 
   344 
   344   amount of our balance. Using the change factors computed under Task
   350   amount of our balance. Using the change factors computed under Task
   345   2, calculate the new balance. Say we had \$100 in 2010, we would have
   351   2, calculate the new balance. Say we had \$100 in 2010, we would have
   346   received in our running example
   352   received in our running example
   347 
   353 
   348   \begin{verbatim}
   354   \begin{verbatim}
   349   $50 * -0.03573991820699504 + $50 * 0.5399747522663995
   355   $50 * -0.03573992567129673 + $50 * 0.5399749442411563
   350                                          = $25.211741702970222
   356                                        = $25.21175092849298
   351   \end{verbatim}
   357   \end{verbatim}
   352 
   358 
   353   as profit for that year, and our new balance for 2011 is \$125 when
   359   as profit for that year, and our new balance for 2011 is \$125 when
   354   converted to a \texttt{Long}.
   360   converted to a \texttt{Long}.
   355   
   361   
   359 \end{itemize}\medskip  
   365 \end{itemize}\medskip  
   360 
   366 
   361 \noindent
   367 \noindent
   362 \textbf{Test Data:} File \texttt{drumb.scala} contains two portfolios
   368 \textbf{Test Data:} File \texttt{drumb.scala} contains two portfolios
   363 collected from the S\&P 500, one for blue-chip companies, including
   369 collected from the S\&P 500, one for blue-chip companies, including
   364 Facebook, Amazon and Baidu; and another for listed real-estate companies, whose
   370 Facebook, Amazon and Baidu; and another for listed real-estate
   365 names I have never heard of. Following the dumb investment strategy
   371 companies, whose names I have never heard of. Following the dumb
   366 from 1978 until 2016 would have turned a starting balance of \$100
   372 investment strategy from 1978 until 2017 would have turned a starting
   367 into \$23,794 for real estate and a whopping \$524,609 for blue chips.\medskip
   373 balance of \$100 into roughly \$30,895 for real estate and a whopping
       
   374 \$188,172 for blue chips.  Note when comparing these results with your
       
   375 own results: there might be some small rounding errors, which when
       
   376 compounded, lead to moderately different values.\medskip
   368 
   377 
   369 \noindent
   378 \noindent
   370 \textbf{Moral:} Reflecting on our assumptions, we are over-estimating
   379 \textbf{Moral:} Reflecting on our assumptions, we are over-estimating
   371 our yield in many ways: first, who can know in 1978 about what will
   380 our yield in many ways: first, who can know in 1978 about what will
   372 turn out to be a blue chip company.  Also, since the portfolios are
   381 turn out to be a blue chip company.  Also, since the portfolios are
   373 chosen from the current S\&P 500, they do not include the myriad
   382 chosen from the current S\&P 500, they do not include the myriad
   374 of companies that went bust or were de-listed over the years.
   383 of companies that went bust or were de-listed over the years.
   375 So where does this leave our fictional character Mr T.~Drumb? Well, given
   384 So where does this leave our fictional character Mr T.~Drumb? Well, given
   376 his inheritance, a really dumb investment strategy would have done
   385 his inheritance, a really dumb investment strategy would have done
   377 equally well, if not much better.
   386 equally well, if not much better.\medskip
       
   387 
       
   388 \noindent
       
   389 \textbf{Hints:}
   378 \end{document}
   390 \end{document}
   379 
   391 
   380 %%% Local Variables: 
   392 %%% Local Variables: 
   381 %%% mode: latex
   393 %%% mode: latex
   382 %%% TeX-master: t
   394 %%% TeX-master: t