cws/cw01.tex
changeset 127 b4def82f3f9f
parent 125 dcaab8068baa
child 129 b1a51285de7e
equal deleted inserted replaced
126:c40f364d87eb 127:b4def82f3f9f
     9 This coursework is about Scala and is worth 10\%. The first and second
     9 This coursework is about Scala and is worth 10\%. The first and second
    10 part are due on 16 November at 11pm, and the third part on 21 December
    10 part are due on 16 November at 11pm, and the third part on 21 December
    11 at 11pm. You are asked to implement three programs about list
    11 at 11pm. You are asked to implement three programs about list
    12 processing and recursion. The third part is more advanced and might
    12 processing and recursion. The third part is more advanced and might
    13 include material you have not yet seen in the first lecture.
    13 include material you have not yet seen in the first lecture.
    14 Make sure the files you submit can be processed by just calling
    14 \bigskip
    15 \texttt{scala <<filename.scala>>}.\bigskip
    15 
    16 
    16 \noindent
    17 \noindent
    17 \textbf{Important:}
    18 \textbf{Important:} Do not use any mutable data structures in your
    18 
       
    19 \begin{itemize}
       
    20 \item Make sure the files you submit can be processed by just calling\\
       
    21 \mbox{\texttt{scala <<filename.scala>>}} on the commandline.
       
    22 
       
    23 \item Do not use any mutable data structures in your
    19 submissions! They are not needed. This means you cannot use 
    24 submissions! They are not needed. This means you cannot use 
    20 \texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
    25 \texttt{ListBuffer}s, for example. 
    21 code! It has a different meaning in Scala, than in Java.
    26 
    22 Do not use \texttt{var}! This declares a mutable variable. ??? Make sure the
    27 \item Do not use \texttt{return} in your code! It has a different
    23 functions you submit are defined on the ``top-level'' of Scala, not
    28   meaning in Scala, than in Java.
    24 inside a class or object. Also note that the running time of
    29 
    25 each part will be restricted to a maximum of 360 seconds on my laptop.
    30 \item Do not use \texttt{var}! This declares a mutable variable. Only
       
    31   use \texttt{val}!
       
    32 
       
    33 \item Do not use any parallel collections! No \texttt{.par} therefore!
       
    34   Our testing and marking infrastructure is not set up for it.
       
    35 \end{itemize}
       
    36 
       
    37 \noindent
       
    38 Also note that the running time of each part will be restricted to a
       
    39 maximum of 360 seconds on my laptop.
    26 
    40 
    27 
    41 
    28 \subsection*{Disclaimer}
    42 \subsection*{Disclaimer}
    29 
    43 
    30 It should be understood that the work you submit represents
    44 It should be understood that the work you submit represents
    31 your own effort. You have not copied from anyone else. An
    45 your \textbf{own} effort. You have not copied from anyone else. An
    32 exception is the Scala code I showed during the lectures or
    46 exception is the Scala code I showed during the lectures or
    33 uploaded to KEATS, which you can freely use.\bigskip
    47 uploaded to KEATS, which you can freely use.\bigskip
    34 
    48 
    35 
    49 
    36 \subsection*{Part 1 (3 Marks)}
    50 \subsection*{Part 1 (3 Marks)}
    74   present day mathematics.'' There is also a
    88   present day mathematics.'' There is also a
    75   \href{https://xkcd.com/710/}{xkcd} cartoon about this conjecture
    89   \href{https://xkcd.com/710/}{xkcd} cartoon about this conjecture
    76   (click \href{https://xkcd.com/710/}{here}). If you are able to solve
    90   (click \href{https://xkcd.com/710/}{here}). If you are able to solve
    77   this conjecture, you will definitely get famous.}\bigskip
    91   this conjecture, you will definitely get famous.}\bigskip
    78 
    92 
    79 \newpage
       
    80 \noindent
    93 \noindent
    81 \textbf{Tasks (file collatz.scala):}
    94 \textbf{Tasks (file collatz.scala):}
    82 
    95 
    83 \begin{itemize}
    96 \begin{itemize}
    84 \item[(1)] You are asked to implement a recursive function that
    97 \item[(1)] You are asked to implement a recursive function that
    86   with $1$. In case of starting with $6$, it takes $9$ steps and in
    99   with $1$. In case of starting with $6$, it takes $9$ steps and in
    87   case of starting with $9$, it takes $20$ (see above). In order to
   100   case of starting with $9$, it takes $20$ (see above). In order to
    88   try out this function with large numbers, you should use
   101   try out this function with large numbers, you should use
    89   \texttt{Long} as argument type, instead of \texttt{Int}.  You can
   102   \texttt{Long} as argument type, instead of \texttt{Int}.  You can
    90   assume this function will be called with numbers between $1$ and
   103   assume this function will be called with numbers between $1$ and
    91   $1$ million. \hfill[2 Marks]
   104   $1$ Million. \hfill[2 Marks]
    92 
   105 
    93 \item[(2)] Write a second function that takes an upper bound as
   106 \item[(2)] Write a second function that takes an upper bound as
    94   argument and calculates the steps for all numbers in the range from
   107   argument and calculates the steps for all numbers in the range from
    95   1 up to this bound. It returns the maximum number of steps and the
   108   1 up to this bound. It returns the maximum number of steps and the
    96   corresponding number that needs that many steps.  More precisely
   109   corresponding number that needs that many steps.  More precisely
   106 \item 1 to 10 where $9$ takes 20 steps 
   119 \item 1 to 10 where $9$ takes 20 steps 
   107 \item 1 to 100 where $97$ takes 119 steps,
   120 \item 1 to 100 where $97$ takes 119 steps,
   108 \item 1 to 1,000 where $871$ takes 179 steps,
   121 \item 1 to 1,000 where $871$ takes 179 steps,
   109 \item 1 to 10,000 where $6,171$ takes 262 steps,
   122 \item 1 to 10,000 where $6,171$ takes 262 steps,
   110 \item 1 to 100,000 where $77,031$ takes 351 steps, 
   123 \item 1 to 100,000 where $77,031$ takes 351 steps, 
   111 \item 1 to 1 million where $837,799$ takes 525 steps
   124 \item 1 to 1 Million where $837,799$ takes 525 steps
   112   %%\item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 686 steps
   125   %%\item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 686 steps
   113 \end{itemize}\bigskip
   126 \end{itemize}
   114   
   127   
   115 
   128 \noindent
   116 
   129 \textbf{Hints:} useful math operators: \texttt{\%} for modulo; useful
   117 \subsection*{Part 2 (4 Marks)}
   130 functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt},
   118 
   131 \texttt{.toList} for conversions, \texttt{List(...).max} for the
   119 This part is about list processing---it's a variant of
   132 maximum of a list, \texttt{List(...).indexOf(...)} for the first index of
   120 ``buy-low-sell-high'' in Scala. It uses the online financial data
   133 a value in a list.
   121 service from Yahoo.\bigskip 
   134 
   122 
   135 
   123 \noindent
   136 
   124 \textbf{Tasks (file trade.scala):}
   137 \subsection*{Part 2 (3 Marks)}
   125 
   138 
   126 \begin{itemize}
   139 This part is about web-scraping and list-processing in Scala. It uses
   127 \item[(1)] Given a list of prices for a commodity, for example
   140 online data about the per-capita alcohol consumption for each country
   128 
   141 (per year?), and a file with the data about the population size of
   129 \[
   142 each country.  From this data you are supposed to estimate how many
   130 \texttt{List(28.0, 18.0, 20.0, 26.0, 24.0)}
   143 litres of pure alcohol are consumed worldwide.\bigskip
   131 \]
   144 
   132 
   145 \noindent
   133 \noindent
   146 \textbf{Tasks (file alcohol.scala):}
   134 you need to write a function that returns a pair of indices for when
   147 
   135 to buy and when to sell this commodity. In the example above it should
   148 \begin{itemize}
   136 return the pair $\texttt{(1, 3)}$ because at index $1$ the price is lowest and
   149 \item[(1)] Write a function that given an URL requests a
   137 then at index $3$ the price is highest. Note the prices are given as
   150   comma-separated value (CSV) list.  We are interested in the list
   138 lists of \texttt{Double}s.\newline \mbox{} \hfill[1 Mark]
   151   from the following URL
   139 
       
   140 \item[(2)] Write a function that requests a comma-separated value (CSV) list
       
   141   from the Yahoo websevice that provides historical data for stock
       
   142   indices. For example if you query the URL
       
   143 
   152 
   144 \begin{center}
   153 \begin{center}
   145 \url{http://ichart.yahoo.com/table.csv?s=GOOG}
   154   \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv}
   146 \end{center}
   155 \end{center}
   147 
   156 
   148 \noindent where \texttt{GOOG} stands for Google's stock market symbol,
   157 \noindent Your function should take a string (the URL) as input, and
   149 then you will receive a CSV-list of the daily stock prices since
   158 produce a list of strings as output, where each string is one line in
   150 Google was listed. You can also try this with other stock market
   159 the corresponding CSV-list.  This list should contain 194 lines.\medskip
   151 symbols, for instance AAPL, MSFT, IBM, FB, YHOO, AMZN, BIDU and so
   160 
   152 on. 
   161 \noindent
   153 
   162 Write another function that can read the file \texttt{population.csv}
   154 This function should return a List of strings, where each string
   163 from disk (the file is distributed with the coursework). This
   155 is one line in this CVS-list (representing one day's
   164 function should take a string as argument, the file name, and again
   156 data). Note that Yahoo generates its answer such that the newest data
   165 return a list of strings corresponding to each entry in the
   157 is at the front of this list, and the oldest data is at the end.
   166 CSV-list. For \texttt{population.csv}, this list should contain 216
   158 \hfill[1 Mark]
   167 lines.\hfill[1 Mark]
   159 
   168 
   160 \item[(3)] As you can see, the financial data from Yahoo is organised in 7 columns,
   169 
   161 for example
   170 \item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we
   162 
   171   need to extract the data that interests us.  From the header of the
   163 {\small\begin{verbatim}
   172   alcohol list, you can see there are 5 columns
   164 Date,Open,High,Low,Close,Volume,Adj Close
   173   
   165 2016-11-04,750.659973,770.359985,750.560974,762.02002,2126900,762.02002
   174   \begin{center}
   166 2016-11-03,767.25,769.950012,759.030029,762.130005,1914000,762.130005
   175     \begin{tabular}{l}
   167 2016-11-02,778.200012,781.650024,763.450012,768.700012,1872400,768.700012
   176       \texttt{country (name),}\\
   168 2016-11-01,782.890015,789.48999,775.539978,783.609985,2404500,783.609985
   177       \texttt{beer\_servings,}\\
   169 ....
   178       \texttt{spirit\_servings,}\\
   170 \end{verbatim}}
   179       \texttt{wine\_servings,}\\
   171 
   180       \texttt{total\_litres\_of\_pure\_alcohol}
   172 \noindent
   181     \end{tabular}  
   173 Write a function that ignores the first line (the header) and then
   182   \end{center}
   174 extracts from each line the date (first column) and the Adjusted Close
   183 
   175 price (last column). The Adjusted Close price should be converted into
   184   \noindent
   176 a \texttt{Double}. So the result of this function is a list of pairs where the
   185   Write a function that extracts the data from the first column,
   177 first components are strings (the dates) and the second are doubles
   186   the country name, and the data from the fifth column (converted into
   178 (the adjusted close prices).\newline\mbox{}\hfill\mbox{[1 Mark]}
   187   a \texttt{Double}). For this go through each line of the CSV-list
   179 
   188   (except the first line), use the \texttt{split(",")} function to
   180 \item[(4)] Write a function that takes a stock market symbol as
   189   divide each line into an array of 5 elements. Keep the data from the
   181   argument (you can assume it is a valid one, like GOOG or AAPL). The
   190   first and fifth element in these arrays.\medskip
   182   function calculates the \underline{dates} when you should have
   191 
   183   bought the corresponding shares (lowest price) and when you should
   192   \noindent
   184   have sold them (highest price).\hfill\mbox{[1 Mark]}
   193   Write another function that processes the population size list. This
   185 \end{itemize}
   194   is already of the form country name and population size.\footnote{Your
   186 
   195     friendly lecturer already did the messy processing for you from the
   187 \noindent
   196   Worldbank database, see \url{https://github.com/datasets/population/tree/master/data}.} Again, split the
   188 \textbf{Test Data:}
   197   strings according to the commas. However, this time generate a
   189 In case of Google, the financial data records 3077 entries starting
   198   \texttt{Map} from country names to population sizes.\hfill[1 Mark]
   190 from 2004-08-19 until 2016-11-04 (which is the last entry on the day
   199 
   191 when I prepared the course work...namely on 6 November; remember stock
   200 \item[(3)] In (2) you generated the data about the alcohol consumption
   192 markets are typically closed on weekends and no financial data is
   201   per capita for each country, and also the population size for each
   193 produced then; also I did not count the header line). The lowest
   202   country. From this generate next a sorted(!) list of the overall
   194 shareprice for Google was on 2004-09-03 with \$49.95513 per share and the
   203   alcohol consumption for each country. The list should be sorted from
   195 highest on 2016-10-24 with \$813.109985 per share.\bigskip
   204   highest alcohol consumption to lowest. The difficulty is that the
       
   205   data is scrapped off from ``random'' sources on the Internet and
       
   206   annoyingly the spelling of some country names does not always agree in the
       
   207   lists. For example the alcohol list contains
       
   208   \texttt{Bosnia-Herzegovina}, while the population writes this country as
       
   209   \texttt{Bosnia and Herzegovina}. In your sorted
       
   210   overall list include only countries from the alcohol list, whose
       
   211   exact country name is also in the population size list. This means
       
   212   you can ignore countries like Bosnia-Herzegovina from the overall
       
   213   alcohol consumption. There are 177 countries where the names
       
   214   agree. The UK is ranked 10th on this list with
       
   215   consuming 671,976,864 Litres of pure alcohol each year.\medskip
       
   216   
       
   217   \noindent
       
   218   Finally, write another function that takes an integer, say
       
   219   \texttt{n}, as argument. You can assume this integer is between 0
       
   220   and 177.  The function should use the sorted list from above.  It returns
       
   221   a triple, where the first component is the sum of the alcohol
       
   222   consumption in all countries (on the list); the second component is
       
   223   the sum of the \texttt{n}-highest alcohol consumers on the list; and
       
   224   the third component is the percentage the \texttt{n}-highest alcohol
       
   225   consumers feast on with respect to the the world consumption. You will
       
   226   see that according to our data, 164 countries (out of 177) gobble up 100\%
       
   227   of the world alcohol consumption.\hfill\mbox{[1 Mark]}
       
   228 \end{itemize}
       
   229 
       
   230 \noindent
       
   231 \textbf{Hints:} useful list functions: \texttt{.drop(n)},
       
   232 \texttt{.take(n)} for dropping or taking some elements in a list,
       
   233 \texttt{.getLines} for separating lines in a string;
       
   234 \texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
       
   235 elements in the pairs---the sorting is done from smallest to highest;
       
   236 useful \texttt{Map} functions: \texttt{.toMap} converts a list of
       
   237 pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the
       
   238 map is defined at that key, that is would produce a result when
       
   239 called with this key.
       
   240 
       
   241 \newpage
   196 
   242 
   197 \subsection*{Advanced Part 3 (3 Marks)}
   243 \subsection*{Advanced Part 3 (3 Marks)}
   198 
   244 
   199 A purely fictional character named Mr T.~Drumb inherited in 1978
   245 A purely fictional character named Mr T.~Drumb inherited in 1978
   200 approximately 200 Million Dollar from his father. Mr Drumb prides
   246 approximately 200 Million Dollar from his father. Mr Drumb prides
   201 himself to be a brilliant business man because nowadays it is
   247 himself to be a brilliant business man because nowadays it is
   202 estimated he is 3 Billion Dollar worth (one is not sure, of course,
   248 estimated he is 3 Billion Dollar worth (one is not sure, of course,
   203 because Mr Drumb refuses to make his tax records public).
   249 because Mr Drumb refuses to make his tax records public).
   204 
   250 
   205 Since the question about Mr Drumb's business acumen remains, let's do a
   251 Since the question about Mr Drumb's business acumen remains open,
   206 quick back-of-the-envelope calculation in Scala whether his claim has
   252 let's do a quick back-of-the-envelope calculation in Scala whether his
   207 any merit. Let's suppose we are given \$100 in 1978 and we follow a
   253 claim has any merit. Let's suppose we are given \$100 in 1978 and we
   208 really dumb investment strategy, namely:
   254 follow a really dumb investment strategy, namely:
   209 
   255 
   210 \begin{itemize}
   256 \begin{itemize}
   211 \item We blindly choose a portfolio of stocks, say some Blue-Chip stocks
   257 \item We blindly choose a portfolio of stocks, say some Blue-Chip stocks
   212   or some Real Estate stocks.
   258   or some Real Estate stocks.
   213 \item If some of the stocks in our portfolio are traded in January of
   259 \item If some of the stocks in our portfolio are traded in January of
   218 \item Next year in January, we look how our stocks did, liquidate
   264 \item Next year in January, we look how our stocks did, liquidate
   219   everything, and re-invest our (hopefully) increased money in again
   265   everything, and re-invest our (hopefully) increased money in again
   220   the stocks from our portfolio (there might be more stocks available,
   266   the stocks from our portfolio (there might be more stocks available,
   221   if companies from our portfolio got listed in that year, or less if
   267   if companies from our portfolio got listed in that year, or less if
   222   some companies went bust or de-listed).
   268   some companies went bust or de-listed).
   223 \item We do this for 38 years until January 2016 and check what would
   269 \item We do this for 38 years until January 2017 and check what would
   224   have become out of our \$100.
   270   have become out of our \$100.
   225 \end{itemize}\medskip  
   271 \end{itemize}
       
   272 
       
   273 
       
   274 \medskip  
   226 
   275 
   227 \noindent
   276 \noindent
   228 \textbf{Tasks (file drumb.scala):}
   277 \textbf{Tasks (file drumb.scala):}
   229 
   278 
   230 \begin{itemize}
   279 \begin{itemize}