% !TEX program = xelatex\documentclass{article}\usepackage{../styles/style}\usepackage{disclaimer}\usepackage{../styles/langs}\begin{document}%% should ask to lower case the words.\section*{Main Part 2 (Scala, 6 Marks)}\mbox{}\hfill\textit{``C makes it easy to shoot yourself in the foot; C++ makes it harder,}\\\mbox{}\hfill\textit{ but when you do, it blows your whole leg off.''}\smallskip\\\mbox{}\hfill\textit{ --- Bjarne Stroustrup (creator of the C++ language)}\bigskip\bigskip\noindentYou are asked to implement a Scala program for recommending moviesaccording to a ratings list.\bigskip\IMPORTANTNONE{}\noindentAlso note that the running time of each part will be restricted to amaximum of 30 seconds on my laptop.\DISCLAIMER{}\subsection*{Reference Implementation}Like the C++ part, the Scala part works like this: you push your filesto GitHub and receive (after sometimes a long delay) some automatedfeedback. In the end we will take a snapshot of the submitted filesand apply an automated marking script to them.\medskip\noindentIn addition, the Scala part comes with referenceimplementations in form of \texttt{jar}-files. This allows you to runany test cases on your own computer. For example you can call Scala onthe command line with the option \texttt{-cp danube.jar} and thenquery any function from the template file. Say you want to find outwhat the function \texttt{} produces: for this you just needto prefix it with the object name \texttt{M2}. If you want to find out whatthese functions produce for the list \texttt{List("a", "b", "b")},you would type something like:\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]$ scala -cp danube.jarscala> val ratings_url = | """https://nms.kcl.ac.uk/christian.urban/ratings.csv"""scala> M2.get_csv_url(ratings_url)val res0: List[String] = List(1,1,4 ...)\end{lstlisting}%$\subsection*{Hints}\noindentUse \texttt{.split(",").toList} for splittingstrings according to commas (similarly for the newline character \mbox{$\backslash$\texttt{n}}),\texttt{.getOrElse(..,..)} allows to query a Map, but also gives adefault value if the Map is not defined, a Map can be `updated' byusing \texttt{+}, \texttt{.contains} and \texttt{.filter} can test whetheran element is included in a list, and respectively filter out elements in a list,\texttt{.sortBy(\_.\_2)} sorts a list of pairs according to the secondelements in the pairs---the sorting is done from smallest to highest,\texttt{.take(n)} for taking some elements in a list (takes fewer if the listcontains less than \texttt{n} elements).\newpage\subsection*{Main Part 2 (6 Marks, file danube.scala)}You are creating Danube.co.uk which you hope will be the next big thingin online movie renting. You know that you can save money byanticipating what movies people will rent; you will pass these savingson to your users by offering a discount if they rent movies thatDanube.co.uk recommends. Your task is to generate \emph{two} movie recommendations for everymovie a user rents. To do this, you calculate what otherrenters, who also watched this movie, suggest by giving positive ratings.Of course, some suggestions are more popular than others. You need to findthe two most-frequently suggested movies. Return fewer recommendations,if there are fewer movies suggested.The calculations will be based on the small datasets which the research labGroupLens provides for education and development purposes.\begin{center}\url{https://grouplens.org/datasets/movielens/}\end{center}\noindentThe slightly adapted CSV-files should be downloaded in your Scalafile from the URLs:\begin{center}\begin{tabular}{ll} \url{https://nms.kcl.ac.uk/christian.urban/ratings.csv} & (940 KByte)\\ \url{https://nms.kcl.ac.uk/christian.urban/movies.csv} & (280 KByte)\\\end{tabular}\end{center}\noindentThe ratings.csv file is organised as userID, movieID, and rating (which is between 0 and 5, with \emph{positive} ratingsbeing 4 and 5). The file movie.csv is organised asmovieID and full movie name. Both files still contain the usualCSV-file header (first line). In this part you are askedto implement functions that process these files. If bandwidthis an issue for you, download the files locally, but in the submittedversion use \texttt{Source.fromURL} instead of \texttt{Source.fromFile}.\subsection*{Tasks}\begin{itemize}\item[(1)] Implement the function \pcode{get_csv_url} which takes an URL-string as argument and requests the corresponding file. The two URLs of interest are \pcode{ratings_url} and \pcode{movies_url}, which correspond to CSV-files mentioned above. The function should return the CSV-file appropriately broken up into lines, and the first line should be dropped (that is omit the header of the CSV-file). The result is a list of strings (the lines in the file). In case the url does not produce a file, return the empty list.\\ \mbox{}\hfill [1 Mark]\item[(2)] Implement two functions that process the (broken up) CSV-files from (1). The \pcode{process_ratings} function filters out all ratings below 4 and returns a list of (userID, movieID) pairs. The \pcode{process_movies} function returns a list of (movieID, title) pairs. Note the input to these functions will be the output of the function \pcode{get_csv_url}.\\ \mbox{}\hfill [1 Mark]%\end{itemize} % %%\subsection*{Part 3 (4 Marks, file danube.scala)}%%\subsection*{Tasks}%%\begin{itemize}\item[(3)] Implement a kind of grouping function that calculates a Map containing the userIDs and all the corresponding recommendations for this user (list of movieIDs). This should be implemented in a tail-recursive fashion using a Map as accumulator. This Map is set to \pcode{Map()} at the beginning of the calculation. For example\begin{lstlisting}[numbers=none]val lst = List(("1", "a"), ("1", "b"), ("2", "x"), ("3", "a"), ("2", "y"), ("3", "c"))groupById(lst, Map())\end{lstlisting}returns the ratings map\begin{center} \pcode{Map(1 -> List(b, a), 2 -> List(y, x), 3 -> List(c, a))}.\end{center}\noindentIn which order the elements of the list are given is unimportant.\\\mbox{}\hfill [1 Mark]\item[(4)] Implement a function that takes a ratings map and a movieID as arguments. The function calculates all suggestions containing the given movie in its recommendations. It returns a list of all these recommendations (each of them is a list and needs to have the given movie deleted, otherwise it might happen we recommend the same movie ``back''). For example for the Map from above and the movie \pcode{"y"} we obtain \pcode{List(List("x"))}, and for the movie \pcode{"a"} we get \pcode{List(List("b"), List("c"))}.\\ \mbox{}\hfill [1 Mark]\item[(5)] Implement a suggestions function which takes a ratings map and a movieID as arguments. It calculates all the recommended movies sorted according to the most frequently suggested movie(s) sorted first. This function returns \emph{all} suggested movieIDs as a list of strings.\\ \mbox{}\hfill [1 Mark]\item[(6)] Implement then a recommendation function which generates a maximum of two most-suggested movies (as calculated above). But it returns the actual movie name, not the movieID. If fewer movies are recommended, then return fewer than two movie names.\\ \mbox{}\hfill [1 Mark]%\item[(7)] Calculate the recommendations for all movies according to% what the recommendations function in (6) produces (this% can take a few seconds). Put all recommendations into a list % (of strings) and count how often the strings occur in% this list. This produces a list of string-int pairs,% where the first component is the movie name and the second% is the number of how many times the movie was recommended. % Sort all the pairs according to the number% of times they were recommended (most recommended movie name % first).\\% \mbox{}\hfill [1 Mark]\end{itemize}\end{document} %%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: