pep-material: comparison cws/cw01.tex

equal deleted inserted replaced

-:166bb9b6b20a
+:b1a51285de7e
 \subsection*{Part 2 (3 Marks)}
 This part is about web-scraping and list-processing in Scala. It uses
 online data about the per-capita alcohol consumption for each country
-(per year?), and a file with the data about the population size of
+(per year?), and a file containing the data about the population size of
 each country.  From this data you are supposed to estimate how many
 litres of pure alcohol are consumed worldwide.\bigskip
 \noindent
 \textbf{Tasks (file alcohol.scala):}
 \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
 \end{center}
 \noindent Your function should take a string (the URL) as input, and
 produce a list of strings as output, where each string is one line in
-the corresponding CSV-list.  This list should contain 194 lines.\medskip
+the corresponding CSV-list.  This list from the URL above should
+contain 194 lines.\medskip
 \noindent
 Write another function that can read the file \texttt{population.csv}
 from disk (the file is distributed with the coursework). This
 function should take a string as argument, the file name, and again
 \noindent
 Write another function that processes the population size list. This
 is already of the form country name and population size.\footnote{Your
 friendly lecturer already did the messy processing for you from the
-Worldbank database, see \url{https://github.com/datasets/population/tree/master/data}.} Again, split the
+Worldbank database, see \url{https://github.com/datasets/population/tree/master/data} for the original.} Again, split the
 strings according to the commas. However, this time generate a
 \texttt{Map} from country names to population sizes.\hfill[1 Mark]
 \item[(3)] In (2) you generated the data about the alcohol consumption
 per capita for each country, and also the population size for each
 country. From this generate next a sorted(!) list of the overall
 alcohol consumption for each country. The list should be sorted from
 highest alcohol consumption to lowest. The difficulty is that the
-data is scrapped off from ``random'' sources on the Internet and
+data is scraped off from ``random'' sources on the Internet and
-annoyingly the spelling of some country names does not always agree in the
+annoyingly the spelling of some country names does not always agree in both
 lists. For example the alcohol list contains
 \texttt{Bosnia-Herzegovina}, while the population writes this country as
 \texttt{Bosnia and Herzegovina}. In your sorted
 overall list include only countries from the alcohol list, whose
 exact country name is also in the population size list. This means
 you can ignore countries like Bosnia-Herzegovina from the overall
 alcohol consumption. There are 177 countries where the names
-agree. The UK is ranked 10th on this list with
+agree. The UK is ranked 10th on this list by
 consuming 671,976,864 Litres of pure alcohol each year.\medskip
 \noindent
 Finally, write another function that takes an integer, say
 \texttt{n}, as argument. You can assume this integer is between 0
-and 177.  The function should use the sorted list from above.  It returns
+and 177 (the number of countries in the sorted list above).  The
-a triple, where the first component is the sum of the alcohol
+function should return a triple, where the first component is the
-consumption in all countries (on the list); the second component is
+sum of the alcohol consumption in all countries (on the list); the
-the sum of the \texttt{n}-highest alcohol consumers on the list; and
+second component is the sum of the \texttt{n}-highest alcohol
-the third component is the percentage the \texttt{n}-highest alcohol
+consumers on the list; and the third component is the percentage the
-consumers feast on with respect to the the world consumption. You will
+\texttt{n}-highest alcohol consumers drink with respect to the
-see that according to our data, 164 countries (out of 177) gobble up 100\%
+the world consumption. You will see that according to our data, 164
-of the world alcohol consumption.\hfill\mbox{[1 Mark]}
+countries (out of 177) gobble up 100\% of the World alcohol
+consumption.\hfill\mbox{[1 Mark]}
 \end{itemize}
 \noindent
 \textbf{Hints:} useful list functions: \texttt{.drop(n)},
 \texttt{.take(n)} for dropping or taking some elements in a list,
 map is defined at that key, that is would produce a result when
 called with this key.
 \newpage
-\subsection*{Advanced Part 3 (3 Marks)}
+\subsection*{Advanced Part 3 (4 Marks)}
 A purely fictional character named Mr T.~Drumb inherited in 1978
 approximately 200 Million Dollar from his father. Mr Drumb prides
 himself to be a brilliant business man because nowadays it is
 estimated he is 3 Billion Dollar worth (one is not sure, of course,
 from each.
 \item Next year in January, we look how our stocks did, liquidate
 everything, and re-invest our (hopefully) increased money in again
 the stocks from our portfolio (there might be more stocks available,
 if companies from our portfolio got listed in that year, or less if
-some companies went bust or de-listed).
+some companies went bust or were de-listed).
-\item We do this for 38 years until January 2017 and check what would
+\item We do this for 39 years until January 2017 and check what would
 have become out of our \$100.
 \end{itemize}
+\noindent
-\medskip
+Until Yahoo was bought by Altaba this summer, historical stock market
+data was available online for free, but nowadays this kind of data is
+difficult to obtain unless you are prepared to pay extortionate prices
+or be severely rate-limited.  Therefore this coursework comes with a
+number of files containing CSV-lists about stock prices of
+various companies. Use these files for the following tasks.\bigskip
 \noindent
 \textbf{Tasks (file drumb.scala):}
 \begin{itemize}
-\item[(1.a)] Write a function that queries the Yahoo financial data
+\item[(1.a)] Write a function \texttt{get\_january\_data} that takes a
-service and obtains the first trade (adjusted close price) of a
+stock symbol and a year as argument. The function reads the
-stock symbol and a year. A problem is that normally a stock exchange
+corresponding CSV-file and returns the list of strings that start
-is not open on 1st of January, but depending on the day of the week
+with the given year (each line in the CSV-list is of the form
-on a later day (maybe 3rd or 4th). The easiest way to solve this
+\texttt{year-01-someday,someprice}.
-problem is to obtain the whole January data for a stock symbol as
-CSV-list and then select the earliest entry in this list. For this
+\item[(1.b)] Write a function \texttt{get\_first\_price} that takes
-you can specify a date range with the Yahoo service. For example if
+again stock symbol and a year as arguments. It should return the
-you want to obtain all January data for Google in 2000, you can form
+first January price for the stock symbol in the year. For this it
-the query:\mbox{}\\[-8mm]
+obtains the list of strings generated by
+\texttt{get\_january\_data}.  A problem is that normally a stock
-\begin{center}\small
+exchange is not open on 1st of January, but depending on the day of
-\mbox{\url{http://ichart.yahoo.com/table.csv?s=GOOG&a=0&b=1&c=2000&d=1&e=1&f=2000}}
+the week on a later day (maybe 3rd or 4th). The easiest way to solve
-\end{center}
+this problem is to obtain the whole January data for a stock symbol
+and then select the earliest entry in this list. This entry should
-For other companies and years, you need to change the stock symbol
+be converted into a double.  Such a price might not exist, if the
-(\texttt{GOOG}) and the year \texttt{2000} (in the \texttt{c} and
+company does not exist in the given year. For example, if you query
-\texttt{f} argument of the query). Such a request might fail, if the
+for Google in January of 1980, then clearly Google did not exists
-company does not exist during this period. For example, if you query
+yet.  Therefore you are asked to return a trade price as
-for Google in January of 1980, then clearly Google did not exists yet.
-Therefore you are asked to return a trade price as
 \texttt{Option[Double]}.
-\item[(1.b)] Write a function that takes a portfolio (a list of stock symbols),
+\item[(1.c)] Write a function \texttt{get\_prices} that takes a
-a years range and gets all the first trading prices for each year. You should
+portfolio (a list of stock symbols), a years range and gets all the
-organise this as a list of lists of \texttt{Option[Double]}'s. The inner lists
+first trading prices for each year. You should organise this as a
-are for all stock symbols from the portfolio and the outer list for the years.
+list of lists of \texttt{Option[Double]}'s. The inner lists are for
-For example for Google and Apple in years 2010 (first line), 2011
+all stock symbols from the portfolio and the outer list for the
-(second line) and 2012 (third line) you obtain:
+years.  For example for Google and Apple in years 2010 (first line),
+2011 (second line) and 2012 (third line) you obtain:
 \begin{verbatim}
-List(List(Some(313.062468), Some(27.847252)),
+List(List(Some(311.349976), Some(27.505054)),
-List(Some(301.873641), Some(42.884065)),
+List(Some(300.222351), Some(42.357094)),
-List(Some(332.373186), Some(53.509768)))
+List(Some(330.555054), Some(52.852215)))
-\end{verbatim}\hfill[1 Mark]
+\end{verbatim}\hfill[2 Marks]
 \item[(2.a)] Write a function that calculates the \emph{change factor} (delta)
 for how a stock price has changed from one year to the next. This is
 only well-defined, if the corresponding company has been traded in both
 years. In this case you can calculate
 (deltas) for the prices we obtained under Task 1. For the running
 example of Google and Apple for the years 2010 to 2012 you should
 obtain 4 change factors:
 \begin{verbatim}
-List(List(Some(-0.03573991820699504), Some(0.5399747522663995))
+List(List(Some(-0.03573992567129673), Some(0.5399749442411563))
-List(Some(0.10103414428290529), Some(0.24777742035415723)))
+List(Some(0.10103412653643493), Some(0.2477771728154912)))
 \end{verbatim}
 That means Google did a bit badly in 2010, while Apple did very well.
 Both did OK in 2011.\hfill\mbox{[1 Mark]}
 amount of our balance. Using the change factors computed under Task
 2, calculate the new balance. Say we had \$100 in 2010, we would have
 received in our running example
 \begin{verbatim}
-$50 * -0.03573991820699504 + $50 * 0.5399747522663995
+$50 * -0.03573992567129673 + $50 * 0.5399749442411563
-= $25.211741702970222
+= $25.21175092849298
 \end{verbatim}
 as profit for that year, and our new balance for 2011 is \$125 when
 converted to a \texttt{Long}.
 \end{itemize}\medskip
 \noindent
 \textbf{Test Data:} File \texttt{drumb.scala} contains two portfolios
 collected from the S\&P 500, one for blue-chip companies, including
-Facebook, Amazon and Baidu; and another for listed real-estate companies, whose
+Facebook, Amazon and Baidu; and another for listed real-estate
-names I have never heard of. Following the dumb investment strategy
+companies, whose names I have never heard of. Following the dumb
-from 1978 until 2016 would have turned a starting balance of \$100
+investment strategy from 1978 until 2017 would have turned a starting
-into \$23,794 for real estate and a whopping \$524,609 for blue chips.\medskip
+balance of \$100 into roughly \$30,895 for real estate and a whopping
+\$188,172 for blue chips.  Note when comparing these results with your
+own results: there might be some small rounding errors, which when
+compounded, lead to moderately different values.\medskip
 \noindent
 \textbf{Moral:} Reflecting on our assumptions, we are over-estimating
 our yield in many ways: first, who can know in 1978 about what will
 turn out to be a blue chip company.  Also, since the portfolios are
 chosen from the current S\&P 500, they do not include the myriad
 of companies that went bust or were de-listed over the years.
 So where does this leave our fictional character Mr T.~Drumb? Well, given
 his inheritance, a really dumb investment strategy would have done
-equally well, if not much better.
+equally well, if not much better.\medskip
+\noindent
+\textbf{Hints:}
 \end{document}
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: t

changeset 129	b1a51285de7e
parent 127	b4def82f3f9f
child 135	077e63e96287