pep-material: comparison cws/cw01.tex

equal deleted inserted replaced

-:c40f364d87eb
+:b4def82f3f9f
 This coursework is about Scala and is worth 10\%. The first and second
 part are due on 16 November at 11pm, and the third part on 21 December
 at 11pm. You are asked to implement three programs about list
 processing and recursion. The third part is more advanced and might
 include material you have not yet seen in the first lecture.
-Make sure the files you submit can be processed by just calling
+\bigskip
-\texttt{scala <<filename.scala>>}.\bigskip
+\noindent
-\noindent
+\textbf{Important:}
-\textbf{Important:} Do not use any mutable data structures in your
+\begin{itemize}
+\item Make sure the files you submit can be processed by just calling\\
+\mbox{\texttt{scala <<filename.scala>>}} on the commandline.
+\item Do not use any mutable data structures in your
 submissions! They are not needed. This means you cannot use
-\texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
+\texttt{ListBuffer}s, for example.
-code! It has a different meaning in Scala, than in Java.
-Do not use \texttt{var}! This declares a mutable variable. ??? Make sure the
+\item Do not use \texttt{return} in your code! It has a different
-functions you submit are defined on the ``top-level'' of Scala, not
+meaning in Scala, than in Java.
-inside a class or object. Also note that the running time of
-each part will be restricted to a maximum of 360 seconds on my laptop.
+\item Do not use \texttt{var}! This declares a mutable variable. Only
+use \texttt{val}!
+\item Do not use any parallel collections! No \texttt{.par} therefore!
+Our testing and marking infrastructure is not set up for it.
+\end{itemize}
+\noindent
+Also note that the running time of each part will be restricted to a
+maximum of 360 seconds on my laptop.
 \subsection*{Disclaimer}
 It should be understood that the work you submit represents
-your own effort. You have not copied from anyone else. An
+your \textbf{own} effort. You have not copied from anyone else. An
 exception is the Scala code I showed during the lectures or
 uploaded to KEATS, which you can freely use.\bigskip
 \subsection*{Part 1 (3 Marks)}
 present day mathematics.'' There is also a
 \href{https://xkcd.com/710/}{xkcd} cartoon about this conjecture
 (click \href{https://xkcd.com/710/}{here}). If you are able to solve
 this conjecture, you will definitely get famous.}\bigskip
-\newpage
 \noindent
 \textbf{Tasks (file collatz.scala):}
 \begin{itemize}
 \item[(1)] You are asked to implement a recursive function that
 with $1$. In case of starting with $6$, it takes $9$ steps and in
 case of starting with $9$, it takes $20$ (see above). In order to
 try out this function with large numbers, you should use
 \texttt{Long} as argument type, instead of \texttt{Int}.  You can
 assume this function will be called with numbers between $1$ and
-$1$ million. \hfill[2 Marks]
+$1$ Million. \hfill[2 Marks]
 \item[(2)] Write a second function that takes an upper bound as
 argument and calculates the steps for all numbers in the range from
 1 up to this bound. It returns the maximum number of steps and the
 corresponding number that needs that many steps.  More precisely
 \item 1 to 10 where $9$ takes 20 steps
 \item 1 to 100 where $97$ takes 119 steps,
 \item 1 to 1,000 where $871$ takes 179 steps,
 \item 1 to 10,000 where $6,171$ takes 262 steps,
 \item 1 to 100,000 where $77,031$ takes 351 steps,
-\item 1 to 1 million where $837,799$ takes 525 steps
+\item 1 to 1 Million where $837,799$ takes 525 steps
 %%\item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 686 steps
-\end{itemize}\bigskip
+\end{itemize}
+\noindent
+\textbf{Hints:} useful math operators: \texttt{\%} for modulo; useful
-\subsection*{Part 2 (4 Marks)}
+functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
+\texttt{.toList} for conversions, \texttt{List(...).max} for the
-This part is about list processing---it's a variant of
+maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
-``buy-low-sell-high'' in Scala. It uses the online financial data
+a value in a list.
-service from Yahoo.\bigskip
-\noindent
-\textbf{Tasks (file trade.scala):}
+\subsection*{Part 2 (3 Marks)}
-\begin{itemize}
+This part is about web-scraping and list-processing in Scala. It uses
-\item[(1)] Given a list of prices for a commodity, for example
+online data about the per-capita alcohol consumption for each country
+(per year?), and a file with the data about the population size of
-\[
+each country.  From this data you are supposed to estimate how many
-\texttt{List(28.0, 18.0, 20.0, 26.0, 24.0)}
+litres of pure alcohol are consumed worldwide.\bigskip
-\]
+\noindent
-\noindent
+\textbf{Tasks (file alcohol.scala):}
-you need to write a function that returns a pair of indices for when
-to buy and when to sell this commodity. In the example above it should
+\begin{itemize}
-return the pair $\texttt{(1, 3)}$ because at index $1$ the price is lowest and
+\item[(1)] Write a function that given an URL requests a
-then at index $3$ the price is highest. Note the prices are given as
+comma-separated value (CSV) list.  We are interested in the list
-lists of \texttt{Double}s.\newline \mbox{} \hfill[1 Mark]
+from the following URL
-\item[(2)] Write a function that requests a comma-separated value (CSV) list
-from the Yahoo websevice that provides historical data for stock
-indices. For example if you query the URL
 \begin{center}
-\url{http://ichart.yahoo.com/table.csv?s=GOOG}
+\url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
 \end{center}
-\noindent where \texttt{GOOG} stands for Google's stock market symbol,
+\noindent Your function should take a string (the URL) as input, and
-then you will receive a CSV-list of the daily stock prices since
+produce a list of strings as output, where each string is one line in
-Google was listed. You can also try this with other stock market
+the corresponding CSV-list.  This list should contain 194 lines.\medskip
-symbols, for instance AAPL, MSFT, IBM, FB, YHOO, AMZN, BIDU and so
-on.
+\noindent
+Write another function that can read the file \texttt{population.csv}
-This function should return a List of strings, where each string
+from disk (the file is distributed with the coursework). This
-is one line in this CVS-list (representing one day's
+function should take a string as argument, the file name, and again
-data). Note that Yahoo generates its answer such that the newest data
+return a list of strings corresponding to each entry in the
-is at the front of this list, and the oldest data is at the end.
+CSV-list. For \texttt{population.csv}, this list should contain 216
-\hfill[1 Mark]
+lines.\hfill[1 Mark]
-\item[(3)] As you can see, the financial data from Yahoo is organised in 7 columns,
-for example
+\item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we
+need to extract the data that interests us.  From the header of the
-{\small\begin{verbatim}
+alcohol list, you can see there are 5 columns
-Date,Open,High,Low,Close,Volume,Adj Close
-2016-11-04,750.659973,770.359985,750.560974,762.02002,2126900,762.02002
+\begin{center}
-2016-11-03,767.25,769.950012,759.030029,762.130005,1914000,762.130005
+\begin{tabular}{l}
-2016-11-02,778.200012,781.650024,763.450012,768.700012,1872400,768.700012
+\texttt{country (name),}\\
-2016-11-01,782.890015,789.48999,775.539978,783.609985,2404500,783.609985
+\texttt{beer\_servings,}\\
-....
+\texttt{spirit\_servings,}\\
-\end{verbatim}}
+\texttt{wine\_servings,}\\
+\texttt{total\_litres\_of\_pure\_alcohol}
-\noindent
+\end{tabular}
-Write a function that ignores the first line (the header) and then
+\end{center}
-extracts from each line the date (first column) and the Adjusted Close
-price (last column). The Adjusted Close price should be converted into
+\noindent
-a \texttt{Double}. So the result of this function is a list of pairs where the
+Write a function that extracts the data from the first column,
-first components are strings (the dates) and the second are doubles
+the country name, and the data from the fifth column (converted into
-(the adjusted close prices).\newline\mbox{}\hfill\mbox{[1 Mark]}
+a \texttt{Double}). For this go through each line of the CSV-list
+(except the first line), use the \texttt{split(",")} function to
-\item[(4)] Write a function that takes a stock market symbol as
+divide each line into an array of 5 elements. Keep the data from the
-argument (you can assume it is a valid one, like GOOG or AAPL). The
+first and fifth element in these arrays.\medskip
-function calculates the \underline{dates} when you should have
-bought the corresponding shares (lowest price) and when you should
+\noindent
-have sold them (highest price).\hfill\mbox{[1 Mark]}
+Write another function that processes the population size list. This
-\end{itemize}
+is already of the form country name and population size.\footnote{Your
+friendly lecturer already did the messy processing for you from the
-\noindent
+Worldbank database, see \url{https://github.com/datasets/population/tree/master/data}.} Again, split the
-\textbf{Test Data:}
+strings according to the commas. However, this time generate a
-In case of Google, the financial data records 3077 entries starting
+\texttt{Map} from country names to population sizes.\hfill[1 Mark]
-from 2004-08-19 until 2016-11-04 (which is the last entry on the day
-when I prepared the course work...namely on 6 November; remember stock
+\item[(3)] In (2) you generated the data about the alcohol consumption
-markets are typically closed on weekends and no financial data is
+per capita for each country, and also the population size for each
-produced then; also I did not count the header line). The lowest
+country. From this generate next a sorted(!) list of the overall
-shareprice for Google was on 2004-09-03 with \$49.95513 per share and the
+alcohol consumption for each country. The list should be sorted from
-highest on 2016-10-24 with \$813.109985 per share.\bigskip
+highest alcohol consumption to lowest. The difficulty is that the
+data is scrapped off from ``random'' sources on the Internet and
+annoyingly the spelling of some country names does not always agree in the
+lists. For example the alcohol list contains
+\texttt{Bosnia-Herzegovina}, while the population writes this country as
+\texttt{Bosnia and Herzegovina}. In your sorted
+overall list include only countries from the alcohol list, whose
+exact country name is also in the population size list. This means
+you can ignore countries like Bosnia-Herzegovina from the overall
+alcohol consumption. There are 177 countries where the names
+agree. The UK is ranked 10th on this list with
+consuming 671,976,864 Litres of pure alcohol each year.\medskip
+\noindent
+Finally, write another function that takes an integer, say
+\texttt{n}, as argument. You can assume this integer is between 0
+and 177.  The function should use the sorted list from above.  It returns
+a triple, where the first component is the sum of the alcohol
+consumption in all countries (on the list); the second component is
+the sum of the \texttt{n}-highest alcohol consumers on the list; and
+the third component is the percentage the \texttt{n}-highest alcohol
+consumers feast on with respect to the the world consumption. You will
+see that according to our data, 164 countries (out of 177) gobble up 100\%
+of the world alcohol consumption.\hfill\mbox{[1 Mark]}
+\end{itemize}
+\noindent
+\textbf{Hints:} useful list functions: \texttt{.drop(n)},
+\texttt{.take(n)} for dropping or taking some elements in a list,
+\texttt{.getLines} for separating lines in a string;
+\texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
+elements in the pairs---the sorting is done from smallest to highest;
+useful \texttt{Map} functions: \texttt{.toMap} converts a list of
+pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
+map is defined at that key, that is would produce a result when
+called with this key.
+\newpage
 \subsection*{Advanced Part 3 (3 Marks)}
 A purely fictional character named Mr T.~Drumb inherited in 1978
 approximately 200 Million Dollar from his father. Mr Drumb prides
 himself to be a brilliant business man because nowadays it is
 estimated he is 3 Billion Dollar worth (one is not sure, of course,
 because Mr Drumb refuses to make his tax records public).
-Since the question about Mr Drumb's business acumen remains, let's do a
+Since the question about Mr Drumb's business acumen remains open,
-quick back-of-the-envelope calculation in Scala whether his claim has
+let's do a quick back-of-the-envelope calculation in Scala whether his
-any merit. Let's suppose we are given \$100 in 1978 and we follow a
+claim has any merit. Let's suppose we are given \$100 in 1978 and we
-really dumb investment strategy, namely:
+follow a really dumb investment strategy, namely:
 \begin{itemize}
 \item We blindly choose a portfolio of stocks, say some Blue-Chip stocks
 or some Real Estate stocks.
 \item If some of the stocks in our portfolio are traded in January of
 \item Next year in January, we look how our stocks did, liquidate
 everything, and re-invest our (hopefully) increased money in again
 the stocks from our portfolio (there might be more stocks available,
 if companies from our portfolio got listed in that year, or less if
 some companies went bust or de-listed).
-\item We do this for 38 years until January 2016 and check what would
+\item We do this for 38 years until January 2017 and check what would
 have become out of our \$100.
-\end{itemize}\medskip
+\end{itemize}
+\medskip
 \noindent
 \textbf{Tasks (file drumb.scala):}
 \begin{itemize}

changeset 127	b4def82f3f9f
parent 125	dcaab8068baa
child 129	b1a51285de7e