336
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
     1  | 
\documentclass{article}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
     2  | 
\usepackage{../style}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
     3  | 
\usepackage{../langs}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
     4  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
     5  | 
\begin{document}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
     6  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
     7  | 
\section*{August Exam (Scala):  Chat Log Mining}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
     8  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
     9  | 
This coursework is worth 50\%. It is about mining a log of an online
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    10  | 
chat between 85 participants. The log is given as a csv-list in the file
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    11  | 
\texttt{log.csv}. The log is an unordered list containing information which
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    12  | 
message has been sent, by whom, when and in response to which other
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    13  | 
message. Each message has also a number and a unique hash code.\bigskip
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    14  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    15  | 
\noindent 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    16  | 
\textbf{Important:} Make sure the file you submit can be processed 
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    17  | 
by just calling
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    18  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    19  | 
\begin{center}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    20  | 
  \texttt{scala <<filename.scala>>}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    21  | 
\end{center}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    22  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    23  | 
\noindent
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    24  | 
Do not use any mutable data structures in your
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    25  | 
submission! They are not needed. This means you cannot use
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    26  | 
\texttt{ListBuffer}s, \texttt{Array}s, for example. Do not use
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    27  | 
\texttt{return} in your code! It has a different meaning in Scala,
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    28  | 
than in Java.  Do not use \texttt{var}! This declares a mutable
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    29  | 
variable.  
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    30  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    31  | 
\subsection*{Disclaimer}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    32  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    33  | 
It should be understood that the work you submit represents your own
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    34  | 
effort! You have not copied from anyone or anywhere else. An exception
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    35  | 
is the Scala code I showed during the lectures or uploaded to KEATS,
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    36  | 
which you can freely use.\bigskip
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    37  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    38  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    39  | 
\subsection*{Background}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    40  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    41  | 
\noindent
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    42  | 
The fields in the file \texttt{log.csv} are organised 
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    43  | 
as follows:
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    44  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    45  | 
\begin{center}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    46  | 
\texttt{counter, id, time\_date, name, country, parent\_id, msg}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    47  | 
\end{center}  
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    48  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    49  | 
\noindent
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    50  | 
Each line in this file contains the data for a single message.  The field
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    51  | 
\texttt{counter} is an integer number given to each message; \texttt{id} is a
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    52  | 
unique hash string for a message; \texttt{time\_date} is the time when the message
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    53  | 
was sent; \texttt{name} and \texttt{country} is data about the author
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    54  | 
of the message, whereby sometimes the authors left the country information
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    55  | 
empty; \texttt{parent\_id} is a hash specifying which other message the
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    56  | 
message answers (this can also be empty). \texttt{Msg} is the actual
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    57  | 
message text. \textbf{Be careful} for the tasks below that this text can contain
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    58  | 
commas and needs to be treated special when the line is split up
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    59  | 
by using \texttt{line.split(",").toList}. Tasks (2) and (3) are about
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    60  | 
processing this data and storing it into the \texttt{Rec}-data-structure, which
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    61  | 
is pre-defined in the file \texttt{resit.scala}:
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    62  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    63  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    64  | 
\begin{center}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    65  | 
\begin{verbatim}  
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    66  | 
  Rec(num: Int, 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    67  | 
      msg_id: String,
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    68  | 
      date: String,
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    69  | 
      msg: String,
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    70  | 
      author: String,
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    71  | 
      country: Option[String],
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    72  | 
      reply_id : Option[String],
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    73  | 
      parent: Option[Int] = None,
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    74  | 
      children: List[Int] = Nil)  
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    75  | 
\end{verbatim}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    76  | 
\end{center}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    77  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    78  | 
\noindent
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    79  | 
The transformation into a Rec-data-structure is a two-step process
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    80  | 
where first the fields for parents and children are given default
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    81  | 
values. This information is then filled in in a second step.
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    82  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    83  | 
The main information that will be computed in the tasks below is from
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    84  | 
which country authors are and how many authors are from each
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    85  | 
country. The last task will also rank which messages have been the most
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    86  | 
popular in terms of how many replies they received (this will computed
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    87  | 
according to be the number children, grand-children and so on of a
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    88  | 
message).
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    89  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    90  | 
\subsection*{Tasks}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    91  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    92  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    93  | 
\begin{itemize}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    94  | 
\item[(1)] The function \texttt{get\_csv} takes a file name as
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    95  | 
  argument. It should read the corresponding file and return its
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    96  | 
  content. The content should be returned as a list of strings, namely a
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    97  | 
  string for each line in the file. Since the file is a csv-file, the
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    98  | 
  first line (the header) should be dropped in the result. Lines are
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
    99  | 
  separated by \verb!"\n"!. For the file \texttt{log.csv} there should
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   100  | 
  be a list of 680 separate strings.
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   101  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   102  | 
  \mbox{}\hfill[5\% Marks]
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   103  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   104  | 
 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   105  | 
\item[(2)] The function \texttt{process\_line} takes a single line
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   106  | 
  from the csv-file (as generated by \texttt{get\_csv}) and creates a
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   107  | 
  Rec(ord) data structure. This data structure is pre-defined in the
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   108  | 
  Scala file.
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   109  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   110  | 
  For processing a line, you should use the function
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   111  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   112  | 
  \begin{center}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   113  | 
    \verb!<<some_line>>.split(",").toList!
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   114  | 
  \end{center}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   115  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   116  | 
  \noindent
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   117  | 
  in order to separate the fields. HOWEVER BE CAREFUL that the message
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   118  | 
  text in the last field of \texttt{log.cvs} can contain commas and
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   119  | 
  therefore the split will not always result in a list of only 7
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   120  | 
  elements. You need to concatenate anything beyond the 7th field into
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   121  | 
  a single string before assigning the field \texttt{msg}.
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   122  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   123  | 
  \mbox{}\hfill[10\% Marks]
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   124  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   125  | 
\item[(3)] Each record in the log contains a unique hash code
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   126  | 
  identifying each message. For example
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   127  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   128  | 
  \begin{center}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   129  | 
  \verb!"5ebeb459ac278d01301f1497"!
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   130  | 
  \end{center}  
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   131  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   132  | 
  \noindent
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   133  | 
  Some messages also contain a hash code identifying the parent
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   134  | 
  message (that is to which question they reply).  The function
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   135  | 
  \texttt{post\_process} fills in the information about potential
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   136  | 
  children and a potential parent message.
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   137  | 
  
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   138  | 
  The auxiliary function \texttt{get\_children} takes a record
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   139  | 
  \texttt{e} and a record list \texttt{rs} as arguments, and returns
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   140  | 
  the list of all direct children (children have the hash code of
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   141  | 
  \texttt{e} as \texttt{reply\_id}). The list of children is returned
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   142  | 
  as a list of \texttt{num}s. The \texttt{num}s can be used later
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   143  | 
  as indexes in a Rec-list.
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   144  | 
      
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   145  | 
  The auxiliary function \texttt{get\_parent} returns the number of
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   146  | 
  the record corresponding to the \texttt{reply\_id} (encoded as
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   147  | 
  \texttt{Some} if there exists one, otherwise it returns \texttt{None}).
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   148  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   149  | 
  In order to update a record, say \texttt{r}, with some additional
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   150  | 
  information, you can use the Scala code
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   151  | 
  \begin{verbatim}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   152  | 
      r.copy(parent = ....,
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   153  | 
             children = ....)
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   154  | 
  \end{verbatim}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   155  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   156  | 
  \mbox{}\hfill[10\% Marks]
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   157  | 
  
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   158  | 
\item[(4)] The functions \texttt{get\_countries} and
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   159  | 
  \texttt{get\_countries\_numbers} calculate the countries where
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   160  | 
  message authors are coming from and how many authors come from each
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   161  | 
  country (returned as a \texttt{Map} from countries to Integers). In
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   162  | 
  case an author did not specify a country, the empty string should
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   163  | 
  be returned.
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   164  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   165  | 
  \mbox{}\hfill[10\% Mark]
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   166  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   167  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   168  | 
\item[(5)] This task identifies the most popular questions in the log,
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   169  | 
  whereby popularity is measured in terms of how many follow-up
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   170  | 
  questions were asked. We call such questions as belonging to a
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   171  | 
  \emph{thread}. It can be assumed that in \texttt{log.csv} there are
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   172  | 
  no circular references, that is no question refers to a
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   173  | 
  follow-up question as parent.
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   174  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   175  | 
  The function \texttt{ordered\_thread\_sizes} orders the
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   176  | 
  message threads according to how many answers were given for one
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   177  | 
  message (that is how many children, grand-children and so on one
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   178  | 
  message has).
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   179  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   180  | 
  The auxiliary function \texttt{search} enumerates all children,
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   181  | 
  grand-children and so on for a given record \texttt{r} (including
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   182  | 
  the record \texttt{r} itself). \texttt{Search} returns these children
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   183  | 
  as a list of \texttt{Rec}s.
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   184  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   185  | 
  The function \texttt{thread\_size} generates for a record, say
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   186  | 
  \texttt{r}, a pair consisting of the number of \texttt{r} and the
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   187  | 
  number of all children as produced by search. The numbers are the
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   188  | 
  integers given for each message---for \texttt{log.cvs} a number
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   189  | 
  is between 0 and 679.
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   190  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   191  | 
  The function \texttt{ordered\_thread\_sizes} orders the list of
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   192  | 
  pairs according to which thread in the chat is the longest (the
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   193  | 
  longest should be first).
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   194  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   195  | 
\mbox{}\hfill[15\% Mark]
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   196  | 
\end{itemize}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   197  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   198  | 
\end{document}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   199  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   200  | 
  
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   201  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   202  | 
\end{document}
 | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   203  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   204  | 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   205  | 
%%% Local Variables: 
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   206  | 
%%% mode: latex
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   207  | 
%%% TeX-master: t
  | 
Christian Urban <christian.urban@kcl.ac.uk> 
parents:  
diff
changeset
 
 | 
   208  | 
%%% End: 
  |