9 This coursework is about Scala and is worth 10\%. The first and second |
9 This coursework is about Scala and is worth 10\%. The first and second |
10 part are due on 16 November at 11pm, and the third part on 21 December |
10 part are due on 16 November at 11pm, and the third part on 21 December |
11 at 11pm. You are asked to implement three programs about list |
11 at 11pm. You are asked to implement three programs about list |
12 processing and recursion. The third part is more advanced and might |
12 processing and recursion. The third part is more advanced and might |
13 include material you have not yet seen in the first lecture. |
13 include material you have not yet seen in the first lecture. |
14 Make sure the files you submit can be processed by just calling |
14 \bigskip |
15 \texttt{scala <<filename.scala>>}.\bigskip |
15 |
16 |
16 \noindent |
17 \noindent |
17 \textbf{Important:} |
18 \textbf{Important:} Do not use any mutable data structures in your |
18 |
|
19 \begin{itemize} |
|
20 \item Make sure the files you submit can be processed by just calling\\ |
|
21 \mbox{\texttt{scala <<filename.scala>>}} on the commandline. |
|
22 |
|
23 \item Do not use any mutable data structures in your |
19 submissions! They are not needed. This means you cannot use |
24 submissions! They are not needed. This means you cannot use |
20 \texttt{ListBuffer}s, for example. Do not use \texttt{return} in your |
25 \texttt{ListBuffer}s, for example. |
21 code! It has a different meaning in Scala, than in Java. |
26 |
22 Do not use \texttt{var}! This declares a mutable variable. ??? Make sure the |
27 \item Do not use \texttt{return} in your code! It has a different |
23 functions you submit are defined on the ``top-level'' of Scala, not |
28 meaning in Scala, than in Java. |
24 inside a class or object. Also note that the running time of |
29 |
25 each part will be restricted to a maximum of 360 seconds on my laptop. |
30 \item Do not use \texttt{var}! This declares a mutable variable. Only |
|
31 use \texttt{val}! |
|
32 |
|
33 \item Do not use any parallel collections! No \texttt{.par} therefore! |
|
34 Our testing and marking infrastructure is not set up for it. |
|
35 \end{itemize} |
|
36 |
|
37 \noindent |
|
38 Also note that the running time of each part will be restricted to a |
|
39 maximum of 360 seconds on my laptop. |
26 |
40 |
27 |
41 |
28 \subsection*{Disclaimer} |
42 \subsection*{Disclaimer} |
29 |
43 |
30 It should be understood that the work you submit represents |
44 It should be understood that the work you submit represents |
31 your own effort. You have not copied from anyone else. An |
45 your \textbf{own} effort. You have not copied from anyone else. An |
32 exception is the Scala code I showed during the lectures or |
46 exception is the Scala code I showed during the lectures or |
33 uploaded to KEATS, which you can freely use.\bigskip |
47 uploaded to KEATS, which you can freely use.\bigskip |
34 |
48 |
35 |
49 |
36 \subsection*{Part 1 (3 Marks)} |
50 \subsection*{Part 1 (3 Marks)} |
106 \item 1 to 10 where $9$ takes 20 steps |
119 \item 1 to 10 where $9$ takes 20 steps |
107 \item 1 to 100 where $97$ takes 119 steps, |
120 \item 1 to 100 where $97$ takes 119 steps, |
108 \item 1 to 1,000 where $871$ takes 179 steps, |
121 \item 1 to 1,000 where $871$ takes 179 steps, |
109 \item 1 to 10,000 where $6,171$ takes 262 steps, |
122 \item 1 to 10,000 where $6,171$ takes 262 steps, |
110 \item 1 to 100,000 where $77,031$ takes 351 steps, |
123 \item 1 to 100,000 where $77,031$ takes 351 steps, |
111 \item 1 to 1 million where $837,799$ takes 525 steps |
124 \item 1 to 1 Million where $837,799$ takes 525 steps |
112 %%\item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 686 steps |
125 %%\item[$\bullet$] $1 - 10$ million where $8,400,511$ takes 686 steps |
113 \end{itemize}\bigskip |
126 \end{itemize} |
114 |
127 |
115 |
128 \noindent |
116 |
129 \textbf{Hints:} useful math operators: \texttt{\%} for modulo; useful |
117 \subsection*{Part 2 (4 Marks)} |
130 functions: \mbox{\texttt{(1\,to\,10)}} for ranges, \texttt{.toInt}, |
118 |
131 \texttt{.toList} for conversions, \texttt{List(...).max} for the |
119 This part is about list processing---it's a variant of |
132 maximum of a list, \texttt{List(...).indexOf(...)} for the first index of |
120 ``buy-low-sell-high'' in Scala. It uses the online financial data |
133 a value in a list. |
121 service from Yahoo.\bigskip |
134 |
122 |
135 |
123 \noindent |
136 |
124 \textbf{Tasks (file trade.scala):} |
137 \subsection*{Part 2 (3 Marks)} |
125 |
138 |
126 \begin{itemize} |
139 This part is about web-scraping and list-processing in Scala. It uses |
127 \item[(1)] Given a list of prices for a commodity, for example |
140 online data about the per-capita alcohol consumption for each country |
128 |
141 (per year?), and a file with the data about the population size of |
129 \[ |
142 each country. From this data you are supposed to estimate how many |
130 \texttt{List(28.0, 18.0, 20.0, 26.0, 24.0)} |
143 litres of pure alcohol are consumed worldwide.\bigskip |
131 \] |
144 |
132 |
145 \noindent |
133 \noindent |
146 \textbf{Tasks (file alcohol.scala):} |
134 you need to write a function that returns a pair of indices for when |
147 |
135 to buy and when to sell this commodity. In the example above it should |
148 \begin{itemize} |
136 return the pair $\texttt{(1, 3)}$ because at index $1$ the price is lowest and |
149 \item[(1)] Write a function that given an URL requests a |
137 then at index $3$ the price is highest. Note the prices are given as |
150 comma-separated value (CSV) list. We are interested in the list |
138 lists of \texttt{Double}s.\newline \mbox{} \hfill[1 Mark] |
151 from the following URL |
139 |
|
140 \item[(2)] Write a function that requests a comma-separated value (CSV) list |
|
141 from the Yahoo websevice that provides historical data for stock |
|
142 indices. For example if you query the URL |
|
143 |
152 |
144 \begin{center} |
153 \begin{center} |
145 \url{http://ichart.yahoo.com/table.csv?s=GOOG} |
154 \url{https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv} |
146 \end{center} |
155 \end{center} |
147 |
156 |
148 \noindent where \texttt{GOOG} stands for Google's stock market symbol, |
157 \noindent Your function should take a string (the URL) as input, and |
149 then you will receive a CSV-list of the daily stock prices since |
158 produce a list of strings as output, where each string is one line in |
150 Google was listed. You can also try this with other stock market |
159 the corresponding CSV-list. This list should contain 194 lines.\medskip |
151 symbols, for instance AAPL, MSFT, IBM, FB, YHOO, AMZN, BIDU and so |
160 |
152 on. |
161 \noindent |
153 |
162 Write another function that can read the file \texttt{population.csv} |
154 This function should return a List of strings, where each string |
163 from disk (the file is distributed with the coursework). This |
155 is one line in this CVS-list (representing one day's |
164 function should take a string as argument, the file name, and again |
156 data). Note that Yahoo generates its answer such that the newest data |
165 return a list of strings corresponding to each entry in the |
157 is at the front of this list, and the oldest data is at the end. |
166 CSV-list. For \texttt{population.csv}, this list should contain 216 |
158 \hfill[1 Mark] |
167 lines.\hfill[1 Mark] |
159 |
168 |
160 \item[(3)] As you can see, the financial data from Yahoo is organised in 7 columns, |
169 |
161 for example |
170 \item[(2)] Unfortunately, the CSV-lists contain a lot of ``junk'' and we |
162 |
171 need to extract the data that interests us. From the header of the |
163 {\small\begin{verbatim} |
172 alcohol list, you can see there are 5 columns |
164 Date,Open,High,Low,Close,Volume,Adj Close |
173 |
165 2016-11-04,750.659973,770.359985,750.560974,762.02002,2126900,762.02002 |
174 \begin{center} |
166 2016-11-03,767.25,769.950012,759.030029,762.130005,1914000,762.130005 |
175 \begin{tabular}{l} |
167 2016-11-02,778.200012,781.650024,763.450012,768.700012,1872400,768.700012 |
176 \texttt{country (name),}\\ |
168 2016-11-01,782.890015,789.48999,775.539978,783.609985,2404500,783.609985 |
177 \texttt{beer\_servings,}\\ |
169 .... |
178 \texttt{spirit\_servings,}\\ |
170 \end{verbatim}} |
179 \texttt{wine\_servings,}\\ |
171 |
180 \texttt{total\_litres\_of\_pure\_alcohol} |
172 \noindent |
181 \end{tabular} |
173 Write a function that ignores the first line (the header) and then |
182 \end{center} |
174 extracts from each line the date (first column) and the Adjusted Close |
183 |
175 price (last column). The Adjusted Close price should be converted into |
184 \noindent |
176 a \texttt{Double}. So the result of this function is a list of pairs where the |
185 Write a function that extracts the data from the first column, |
177 first components are strings (the dates) and the second are doubles |
186 the country name, and the data from the fifth column (converted into |
178 (the adjusted close prices).\newline\mbox{}\hfill\mbox{[1 Mark]} |
187 a \texttt{Double}). For this go through each line of the CSV-list |
179 |
188 (except the first line), use the \texttt{split(",")} function to |
180 \item[(4)] Write a function that takes a stock market symbol as |
189 divide each line into an array of 5 elements. Keep the data from the |
181 argument (you can assume it is a valid one, like GOOG or AAPL). The |
190 first and fifth element in these arrays.\medskip |
182 function calculates the \underline{dates} when you should have |
191 |
183 bought the corresponding shares (lowest price) and when you should |
192 \noindent |
184 have sold them (highest price).\hfill\mbox{[1 Mark]} |
193 Write another function that processes the population size list. This |
185 \end{itemize} |
194 is already of the form country name and population size.\footnote{Your |
186 |
195 friendly lecturer already did the messy processing for you from the |
187 \noindent |
196 Worldbank database, see \url{https://github.com/datasets/population/tree/master/data}.} Again, split the |
188 \textbf{Test Data:} |
197 strings according to the commas. However, this time generate a |
189 In case of Google, the financial data records 3077 entries starting |
198 \texttt{Map} from country names to population sizes.\hfill[1 Mark] |
190 from 2004-08-19 until 2016-11-04 (which is the last entry on the day |
199 |
191 when I prepared the course work...namely on 6 November; remember stock |
200 \item[(3)] In (2) you generated the data about the alcohol consumption |
192 markets are typically closed on weekends and no financial data is |
201 per capita for each country, and also the population size for each |
193 produced then; also I did not count the header line). The lowest |
202 country. From this generate next a sorted(!) list of the overall |
194 shareprice for Google was on 2004-09-03 with \$49.95513 per share and the |
203 alcohol consumption for each country. The list should be sorted from |
195 highest on 2016-10-24 with \$813.109985 per share.\bigskip |
204 highest alcohol consumption to lowest. The difficulty is that the |
|
205 data is scrapped off from ``random'' sources on the Internet and |
|
206 annoyingly the spelling of some country names does not always agree in the |
|
207 lists. For example the alcohol list contains |
|
208 \texttt{Bosnia-Herzegovina}, while the population writes this country as |
|
209 \texttt{Bosnia and Herzegovina}. In your sorted |
|
210 overall list include only countries from the alcohol list, whose |
|
211 exact country name is also in the population size list. This means |
|
212 you can ignore countries like Bosnia-Herzegovina from the overall |
|
213 alcohol consumption. There are 177 countries where the names |
|
214 agree. The UK is ranked 10th on this list with |
|
215 consuming 671,976,864 Litres of pure alcohol each year.\medskip |
|
216 |
|
217 \noindent |
|
218 Finally, write another function that takes an integer, say |
|
219 \texttt{n}, as argument. You can assume this integer is between 0 |
|
220 and 177. The function should use the sorted list from above. It returns |
|
221 a triple, where the first component is the sum of the alcohol |
|
222 consumption in all countries (on the list); the second component is |
|
223 the sum of the \texttt{n}-highest alcohol consumers on the list; and |
|
224 the third component is the percentage the \texttt{n}-highest alcohol |
|
225 consumers feast on with respect to the the world consumption. You will |
|
226 see that according to our data, 164 countries (out of 177) gobble up 100\% |
|
227 of the world alcohol consumption.\hfill\mbox{[1 Mark]} |
|
228 \end{itemize} |
|
229 |
|
230 \noindent |
|
231 \textbf{Hints:} useful list functions: \texttt{.drop(n)}, |
|
232 \texttt{.take(n)} for dropping or taking some elements in a list, |
|
233 \texttt{.getLines} for separating lines in a string; |
|
234 \texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second |
|
235 elements in the pairs---the sorting is done from smallest to highest; |
|
236 useful \texttt{Map} functions: \texttt{.toMap} converts a list of |
|
237 pairs into a \texttt{Map}, \texttt{.isDefinedAt(k)} tests whether the |
|
238 map is defined at that key, that is would produce a result when |
|
239 called with this key. |
|
240 |
|
241 \newpage |
196 |
242 |
197 \subsection*{Advanced Part 3 (3 Marks)} |
243 \subsection*{Advanced Part 3 (3 Marks)} |
198 |
244 |
199 A purely fictional character named Mr T.~Drumb inherited in 1978 |
245 A purely fictional character named Mr T.~Drumb inherited in 1978 |
200 approximately 200 Million Dollar from his father. Mr Drumb prides |
246 approximately 200 Million Dollar from his father. Mr Drumb prides |
201 himself to be a brilliant business man because nowadays it is |
247 himself to be a brilliant business man because nowadays it is |
202 estimated he is 3 Billion Dollar worth (one is not sure, of course, |
248 estimated he is 3 Billion Dollar worth (one is not sure, of course, |
203 because Mr Drumb refuses to make his tax records public). |
249 because Mr Drumb refuses to make his tax records public). |
204 |
250 |
205 Since the question about Mr Drumb's business acumen remains, let's do a |
251 Since the question about Mr Drumb's business acumen remains open, |
206 quick back-of-the-envelope calculation in Scala whether his claim has |
252 let's do a quick back-of-the-envelope calculation in Scala whether his |
207 any merit. Let's suppose we are given \$100 in 1978 and we follow a |
253 claim has any merit. Let's suppose we are given \$100 in 1978 and we |
208 really dumb investment strategy, namely: |
254 follow a really dumb investment strategy, namely: |
209 |
255 |
210 \begin{itemize} |
256 \begin{itemize} |
211 \item We blindly choose a portfolio of stocks, say some Blue-Chip stocks |
257 \item We blindly choose a portfolio of stocks, say some Blue-Chip stocks |
212 or some Real Estate stocks. |
258 or some Real Estate stocks. |
213 \item If some of the stocks in our portfolio are traded in January of |
259 \item If some of the stocks in our portfolio are traded in January of |