cws/main_cw02.tex
changeset 349 682611a0fb89
parent 333 24bc76d97db2
child 350 c5ad0e3f2a6d
equal deleted inserted replaced
348:b5b6ed38c2f2 349:682611a0fb89
       
     1 % !TEX program = xelatex
       
     2 \documentclass{article}
       
     3 \usepackage{../style}
       
     4 \usepackage{disclaimer}
       
     5 \usepackage{../langs}
       
     6 
       
     7 \begin{document}
       
     8 
       
     9 
       
    10 %% should ask to lower case the words.
       
    11 
       
    12 \section*{Part 7 (Scala, 7 Marks)}
       
    13 
       
    14 
       
    15 \noindent
       
    16 You are asked to implement a Scala program for recommending movies
       
    17 according to a ratings list. This part is due on \cwSEVENa{} at 5pm.\bigskip
       
    18 
       
    19 \IMPORTANTNONE{}
       
    20 
       
    21 \noindent
       
    22 Also note that the running time of each part will be restricted to a
       
    23 maximum of 30 seconds on my laptop.
       
    24 
       
    25 \DISCLAIMER{}
       
    26 
       
    27 
       
    28 \subsection*{Reference Implementation}
       
    29 
       
    30 Like the C++ part, the Scala part works like this: you push your files
       
    31 to GitHub and receive (after sometimes a long delay) some automated
       
    32 feedback. In the end we will take a snapshot of the submitted files
       
    33 and apply an automated marking script to them.\medskip
       
    34 
       
    35 \noindent
       
    36 In addition, the Scala part comes with reference
       
    37 implementations in form of \texttt{jar}-files. This allows you to run
       
    38 any test cases on your own computer. For example you can call Scala on
       
    39 the command line with the option \texttt{-cp danube.jar} and then
       
    40 query any function from the template file. Say you want to find out
       
    41 what the function \texttt{} produces: for this you just need
       
    42 to prefix it with the object name \texttt{CW7b}.  If you want to find out what
       
    43 these functions produce for the list \texttt{List("a", "b", "b")},
       
    44 you would type something like:
       
    45 
       
    46 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
       
    47 $ scala -cp danube.jar
       
    48 scala> val ratings_url =
       
    49      | """https://nms.kcl.ac.uk/christian.urban/ratings.csv"""
       
    50 
       
    51 scala> CW7b.get_csv_url(ratings_url)
       
    52 val res0: List[String] = List(1,1,4 ...)
       
    53 \end{lstlisting}%$
       
    54 
       
    55 \subsection*{Hints}
       
    56 
       
    57 \noindent
       
    58 Use \texttt{.split(",").toList} for splitting
       
    59 strings according to commas (similarly for the newline character \mbox{$\backslash$\texttt{n}}),
       
    60 \texttt{.getOrElse(..,..)} allows to query a Map, but also gives a
       
    61 default value if the Map is not defined, a Map can be `updated' by
       
    62 using \texttt{+}, \texttt{.contains} and \texttt{.filter} can test whether
       
    63 an element is included in a list, and respectively filter out elements in a list,
       
    64 \texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the second
       
    65 elements in the pairs---the sorting is done from smallest to highest,
       
    66 \texttt{.take(n)} for taking some elements in a list (takes fewer if the list
       
    67 contains less than \texttt{n} elements).
       
    68 
       
    69 
       
    70 \newpage
       
    71 
       
    72 
       
    73 \subsection*{Part (7 Marks, file danube.scala)}
       
    74 
       
    75 You are creating Danube.co.uk which you hope will be the next big thing
       
    76 in online movie renting. You know that you can save money by
       
    77 anticipating what movies people will rent; you will pass these savings
       
    78 on to your users by offering a discount if they rent movies that
       
    79 Danube.co.uk recommends.  
       
    80 
       
    81 Your task is to generate \emph{two} movie recommendations for every
       
    82 movie a user rents. To do this, you calculate what other
       
    83 renters, who also watched this movie, suggest by giving positive ratings.
       
    84 Of course, some suggestions are more popular than others. You need to find
       
    85 the two most-frequently suggested movies. Return fewer recommendations,
       
    86 if there are fewer movies suggested.
       
    87 
       
    88 The calculations will be based on the small datasets which the research lab
       
    89 GroupLens provides for education and development purposes.
       
    90 
       
    91 \begin{center}
       
    92 \url{https://grouplens.org/datasets/movielens/}
       
    93 \end{center}
       
    94 
       
    95 \noindent
       
    96 The slightly adapted CSV-files should be downloaded in your Scala
       
    97 file from the URLs:
       
    98 
       
    99 
       
   100 \begin{center}
       
   101 \begin{tabular}{ll}  
       
   102   \url{https://nms.kcl.ac.uk/christian.urban/ratings.csv} & (940 KByte)\\
       
   103   \url{https://nms.kcl.ac.uk/christian.urban/movies.csv}  & (280 KByte)\\
       
   104 \end{tabular}
       
   105 \end{center}
       
   106 
       
   107 \noindent
       
   108 The ratings.csv file is organised as userID, 
       
   109 movieID, and rating (which is between 0 and 5, with \emph{positive} ratings
       
   110 being 4 and 5). The file movie.csv is organised as
       
   111 movieID and full movie name. Both files still contain the usual
       
   112 CSV-file header (first line). In this part you are asked
       
   113 to implement functions that process these files. If bandwidth
       
   114 is an issue for you, download the files locally, but in the submitted
       
   115 version use \texttt{Source.fromURL} instead of \texttt{Source.fromFile}.
       
   116 
       
   117 \subsection*{Tasks}
       
   118 
       
   119 \begin{itemize}
       
   120 \item[(1)] Implement the function \pcode{get_csv_url} which takes an
       
   121   URL-string as argument and requests the corresponding file. The two
       
   122   URLs of interest are \pcode{ratings_url} and \pcode{movies_url},
       
   123   which correspond to CSV-files mentioned above.  The function should
       
   124   return the CSV-file appropriately broken up into lines, and the
       
   125   first line should be dropped (that is omit the header of the CSV-file).
       
   126   The result is a list of strings (the lines in the file). In case
       
   127   the url does not produce a file, return the empty list.\\
       
   128   \mbox{}\hfill [1 Mark]
       
   129 
       
   130 \item[(2)] Implement two functions that process the (broken up)
       
   131   CSV-files from (1). The \pcode{process_ratings} function filters out all
       
   132   ratings below 4 and returns a list of (userID, movieID) pairs. The
       
   133   \pcode{process_movies} function returns a list of (movieID, title) pairs.
       
   134   Note the input to these functions will be the output of the function
       
   135   \pcode{get_csv_url}.\\
       
   136   \mbox{}\hfill [1 Mark]
       
   137 %\end{itemize}  
       
   138 %  
       
   139 %
       
   140 %\subsection*{Part 3 (4 Marks, file danube.scala)}
       
   141 %
       
   142 %\subsection*{Tasks}
       
   143 %
       
   144 %\begin{itemize}
       
   145 \item[(3)] Implement a kind of grouping function that calculates a Map
       
   146   containing the userIDs and all the corresponding recommendations for
       
   147   this user (list of movieIDs). This should be implemented in a
       
   148   tail-recursive fashion using a Map as accumulator. This Map is set to
       
   149   \pcode{Map()} at the beginning of the calculation. For example
       
   150 
       
   151 \begin{lstlisting}[numbers=none]
       
   152 val lst = List(("1", "a"), ("1", "b"),
       
   153                ("2", "x"), ("3", "a"),
       
   154                ("2", "y"), ("3", "c"))
       
   155 groupById(lst, Map())
       
   156 \end{lstlisting}
       
   157 
       
   158 returns the ratings map
       
   159 
       
   160 \begin{center}
       
   161   \pcode{Map(1 -> List(b, a), 2 -> List(y, x), 3 -> List(c, a))}.
       
   162 \end{center}
       
   163 
       
   164 \noindent
       
   165 In which order the elements of the list are given is unimportant.\\
       
   166 \mbox{}\hfill [1 Mark]
       
   167 
       
   168 \item[(4)] Implement a function that takes a ratings map and a movieID
       
   169   as arguments.  The function calculates all suggestions containing the
       
   170   given movie in its recommendations. It returns a list of all these
       
   171   recommendations (each of them is a list and needs to have the given
       
   172   movie deleted, otherwise it might happen we recommend the same movie
       
   173   ``back''). For example for the Map from above and the movie
       
   174   \pcode{"y"} we obtain \pcode{List(List("x"))}, and for the movie
       
   175   \pcode{"a"} we get \pcode{List(List("b"), List("c"))}.\\
       
   176   \mbox{}\hfill [1 Mark]
       
   177 
       
   178 \item[(5)] Implement a suggestions function which takes a ratings map
       
   179   and a movieID as arguments. It calculates all the recommended movies
       
   180   sorted according to the most frequently suggested movie(s) sorted
       
   181   first. This function returns \emph{all} suggested movieIDs as a list of
       
   182   strings.\\
       
   183   \mbox{}\hfill [1 Mark]
       
   184 
       
   185 \item[(6)]  
       
   186   Implement then a recommendation function which generates a maximum
       
   187   of two most-suggested movies (as calculated above). But it returns
       
   188   the actual movie name, not the movieID. If fewer movies are recommended,
       
   189   then return fewer than two movie names.\\
       
   190   \mbox{}\hfill [1 Mark]
       
   191 
       
   192 \item[(7)] Calculate the recommendations for all movies according to
       
   193  what the recommendations function in (6) produces (this
       
   194  can take a few seconds). Put all recommendations into a list 
       
   195  (of strings) and count how often the strings occur in
       
   196  this list. This produces a list of string-int pairs,
       
   197  where the first component is the movie name and the second
       
   198  is the number of how many times they were recommended. 
       
   199  Sort all the pairs according to the number
       
   200  of times they were recommended (most recommended movie name 
       
   201  first).\mbox{}\hfill [1 Mark]
       
   202   
       
   203 \end{itemize}
       
   204 
       
   205 \end{document} 
       
   206 
       
   207 %%% Local Variables: 
       
   208 %%% mode: latex
       
   209 %%% TeX-master: t
       
   210 %%% End: 
       
   211 
       
   212 
       
   213