cws/main_cw02.text-old
changeset 428 cdfa6a293453
parent 426 b51467741af2
equal deleted inserted replaced
427:6e93040e3378 428:cdfa6a293453
       
     1 
       
     2 % !TEX program = xelatex
       
     3 \documentclass{article}
       
     4 \usepackage{../styles/style}
       
     5 \usepackage{disclaimer}
       
     6 \usepackage{../styles/langs}
       
     7 
       
     8 \begin{document}
       
     9 
       
    10 
       
    11 %% should ask to lower case the words.
       
    12 
       
    13 \section*{Main Part 2 (Scala, 6 Marks)}
       
    14 
       
    15 \mbox{}\hfill\textit{``C makes it easy to shoot yourself in the foot; C++ makes it harder,}\\
       
    16 \mbox{}\hfill\textit{ but when you do, it blows your whole leg off.''}\smallskip\\
       
    17 \mbox{}\hfill\textit{ --- Bjarne Stroustrup (creator of the C++ language)}\bigskip\bigskip
       
    18 
       
    19 
       
    20 
       
    21 
       
    22 \noindent
       
    23 You are asked to implement a Scala program for recommending movies
       
    24 according to a ratings list.\bigskip
       
    25 
       
    26 \IMPORTANTNONE{}
       
    27 
       
    28 \noindent
       
    29 Also note that the running time of each part will be restricted to a
       
    30 maximum of 30 seconds on my laptop.
       
    31 
       
    32 \DISCLAIMER{}
       
    33 
       
    34 
       
    35 \subsection*{Reference Implementation}
       
    36 
       
    37 Like the C++ part, the Scala part works like this: you push your files
       
    38 to GitHub and receive (after sometimes a long delay) some automated
       
    39 feedback. In the end we will take a snapshot of the submitted files
       
    40 and apply an automated marking script to them.\medskip
       
    41 
       
    42 \noindent
       
    43 In addition, the Scala part comes with reference
       
    44 implementations in form of \texttt{jar}-files. This allows you to run
       
    45 any test cases on your own computer. For example you can call Scala on
       
    46 the command line with the option \texttt{-cp danube.jar} and then
       
    47 query any function from the template file. Say you want to find out
       
    48 what the function \texttt{} produces: for this you just need
       
    49 to prefix it with the object name \texttt{M2}.  If you want to find out what
       
    50 these functions produce for the list \texttt{List("a", "b", "b")},
       
    51 you would type something like:
       
    52 
       
    53 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
       
    54 $ scala -cp danube.jar
       
    55 scala> val ratings_url =
       
    56      | """https://nms.kcl.ac.uk/christian.urban/ratings.csv"""
       
    57 
       
    58 scala> M2.get_csv_url(ratings_url)
       
    59 val res0: List[String] = List(1,1,4 ...)
       
    60 \end{lstlisting}%$
       
    61 
       
    62 \subsection*{Hints}
       
    63 
       
    64 \noindent
       
    65 Use \texttt{.split(",").toList} for splitting
       
    66 strings according to commas (similarly for the newline character \mbox{$\backslash$\texttt{n}}),
       
    67 \texttt{.getOrElse(..,..)} allows to query a Map, but also gives a
       
    68 default value if the Map is not defined, a Map can be `updated' by
       
    69 using \texttt{+}, \texttt{.contains} and \texttt{.filter} can test whether
       
    70 an element is included in a list, and respectively filter out elements in a list,
       
    71 \texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
       
    72 elements in the pairs---the sorting is done from smallest to highest,
       
    73 \texttt{.take(n)} for taking some elements in a list (takes fewer if the list
       
    74 contains less than \texttt{n} elements).
       
    75 
       
    76 
       
    77 \newpage
       
    78 
       
    79 
       
    80 \subsection*{Main Part 2 (6 Marks, file danube.scala)}
       
    81 
       
    82 You are creating Danube.co.uk which you hope will be the next big thing
       
    83 in online movie renting. You know that you can save money by
       
    84 anticipating what movies people will rent; you will pass these savings
       
    85 on to your users by offering a discount if they rent movies that
       
    86 Danube.co.uk recommends.  
       
    87 
       
    88 Your task is to generate \emph{two} movie recommendations for every
       
    89 movie a user rents. To do this, you calculate what other
       
    90 renters, who also watched this movie, suggest by giving positive ratings.
       
    91 Of course, some suggestions are more popular than others. You need to find
       
    92 the two most-frequently suggested movies. Return fewer recommendations,
       
    93 if there are fewer movies suggested.
       
    94 
       
    95 The calculations will be based on the small datasets which the research lab
       
    96 GroupLens provides for education and development purposes.
       
    97 
       
    98 \begin{center}
       
    99 \url{https://grouplens.org/datasets/movielens/}
       
   100 \end{center}
       
   101 
       
   102 \noindent
       
   103 The slightly adapted CSV-files should be downloaded in your Scala
       
   104 file from the URLs:
       
   105 
       
   106 
       
   107 \begin{center}
       
   108 \begin{tabular}{ll}  
       
   109   \url{https://nms.kcl.ac.uk/christian.urban/ratings.csv} & (940 KByte)\\
       
   110   \url{https://nms.kcl.ac.uk/christian.urban/movies.csv}  & (280 KByte)\\
       
   111 \end{tabular}
       
   112 \end{center}
       
   113 
       
   114 \noindent
       
   115 The ratings.csv file is organised as userID, 
       
   116 movieID, and rating (which is between 0 and 5, with \emph{positive} ratings
       
   117 being 4 and 5). The file movie.csv is organised as
       
   118 movieID and full movie name. Both files still contain the usual
       
   119 CSV-file header (first line). In this part you are asked
       
   120 to implement functions that process these files. If bandwidth
       
   121 is an issue for you, download the files locally, but in the submitted
       
   122 version use \texttt{Source.fromURL} instead of \texttt{Source.fromFile}.
       
   123 
       
   124 \subsection*{Tasks}
       
   125 
       
   126 \begin{itemize}
       
   127 \item[(1)] Implement the function \pcode{get_csv_url} which takes an
       
   128   URL-string as argument and requests the corresponding file. The two
       
   129   URLs of interest are \pcode{ratings_url} and \pcode{movies_url},
       
   130   which correspond to CSV-files mentioned above.  The function should
       
   131   return the CSV-file appropriately broken up into lines, and the
       
   132   first line should be dropped (that is omit the header of the CSV-file).
       
   133   The result is a list of strings (the lines in the file). In case
       
   134   the url does not produce a file, return the empty list.\\
       
   135   \mbox{}\hfill [1 Mark]
       
   136 
       
   137 \item[(2)] Implement two functions that process the (broken up)
       
   138   CSV-files from (1). The \pcode{process_ratings} function filters out all
       
   139   ratings below 4 and returns a list of (userID, movieID) pairs. The
       
   140   \pcode{process_movies} function returns a list of (movieID, title) pairs.
       
   141   Note the input to these functions will be the output of the function
       
   142   \pcode{get_csv_url}.\\
       
   143   \mbox{}\hfill [1 Mark]
       
   144 %\end{itemize}  
       
   145 %  
       
   146 %
       
   147 %\subsection*{Part 3 (4 Marks, file danube.scala)}
       
   148 %
       
   149 %\subsection*{Tasks}
       
   150 %
       
   151 %\begin{itemize}
       
   152 \item[(3)] Implement a kind of grouping function that calculates a Map
       
   153   containing the userIDs and all the corresponding recommendations for
       
   154   this user (list of movieIDs). This should be implemented in a
       
   155   tail-recursive fashion using a Map as accumulator. This Map is set to
       
   156   \pcode{Map()} at the beginning of the calculation. For example
       
   157 
       
   158 \begin{lstlisting}[numbers=none]
       
   159 val lst = List(("1", "a"), ("1", "b"),
       
   160                ("2", "x"), ("3", "a"),
       
   161                ("2", "y"), ("3", "c"))
       
   162 groupById(lst, Map())
       
   163 \end{lstlisting}
       
   164 
       
   165 returns the ratings map
       
   166 
       
   167 \begin{center}
       
   168   \pcode{Map(1 -> List(b, a), 2 -> List(y, x), 3 -> List(c, a))}.
       
   169 \end{center}
       
   170 
       
   171 \noindent
       
   172 In which order the elements of the list are given is unimportant.\\
       
   173 \mbox{}\hfill [1 Mark]
       
   174 
       
   175 \item[(4)] Implement a function that takes a ratings map and a movieID
       
   176   as arguments.  The function calculates all suggestions containing the
       
   177   given movie in its recommendations. It returns a list of all these
       
   178   recommendations (each of them is a list and needs to have the given
       
   179   movie deleted, otherwise it might happen we recommend the same movie
       
   180   ``back''). For example for the Map from above and the movie
       
   181   \pcode{"y"} we obtain \pcode{List(List("x"))}, and for the movie
       
   182   \pcode{"a"} we get \pcode{List(List("b"), List("c"))}.\\
       
   183   \mbox{}\hfill [1 Mark]
       
   184 
       
   185 \item[(5)] Implement a suggestions function which takes a ratings map
       
   186   and a movieID as arguments. It calculates all the recommended movies
       
   187   sorted according to the most frequently suggested movie(s) sorted
       
   188   first. This function returns \emph{all} suggested movieIDs as a list of
       
   189   strings.\\
       
   190   \mbox{}\hfill [1 Mark]
       
   191 
       
   192 \item[(6)]  
       
   193   Implement then a recommendation function which generates a maximum
       
   194   of two most-suggested movies (as calculated above). But it returns
       
   195   the actual movie name, not the movieID. If fewer movies are recommended,
       
   196   then return fewer than two movie names.\\
       
   197   \mbox{}\hfill [1 Mark]
       
   198 
       
   199 %\item[(7)] Calculate the recommendations for all movies according to
       
   200 % what the recommendations function in (6) produces (this
       
   201 % can take a few seconds). Put all recommendations into a list 
       
   202 % (of strings) and count how often the strings occur in
       
   203 % this list. This produces a list of string-int pairs,
       
   204 % where the first component is the movie name and the second
       
   205 % is the number of how many times the movie was recommended. 
       
   206 % Sort all the pairs according to the number
       
   207 % of times they were recommended (most recommended movie name 
       
   208 % first).\\
       
   209 % \mbox{}\hfill [1 Mark]
       
   210   
       
   211 \end{itemize}
       
   212 
       
   213 \end{document} 
       
   214 
       
   215 %%% Local Variables: 
       
   216 %%% mode: latex
       
   217 %%% TeX-master: t
       
   218 %%% End: 
       
   219 
       
   220 
       
   221