solutions-resit/cw-resit.tex
author Christian Urban <christian.urban@kcl.ac.uk>
Sat, 11 Mar 2023 23:24:15 +0000
changeset 469 48de09728447
parent 336 25d9c3b2bc99
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
336
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     1
\documentclass{article}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     2
\usepackage{../style}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     3
\usepackage{../langs}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     4
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     5
\begin{document}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     6
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     7
\section*{August Exam (Scala):  Chat Log Mining}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     8
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     9
This coursework is worth 50\%. It is about mining a log of an online
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    10
chat between 85 participants. The log is given as a csv-list in the file
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    11
\texttt{log.csv}. The log is an unordered list containing information which
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    12
message has been sent, by whom, when and in response to which other
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    13
message. Each message has also a number and a unique hash code.\bigskip
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    14
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    15
\noindent 
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    16
\textbf{Important:} Make sure the file you submit can be processed 
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    17
by just calling
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    18
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    19
\begin{center}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    20
  \texttt{scala <<filename.scala>>}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    21
\end{center}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    22
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    23
\noindent
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    24
Do not use any mutable data structures in your
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    25
submission! They are not needed. This means you cannot use
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    26
\texttt{ListBuffer}s, \texttt{Array}s, for example. Do not use
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    27
\texttt{return} in your code! It has a different meaning in Scala,
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    28
than in Java.  Do not use \texttt{var}! This declares a mutable
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    29
variable.  
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    30
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    31
\subsection*{Disclaimer}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    32
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    33
It should be understood that the work you submit represents your own
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    34
effort! You have not copied from anyone or anywhere else. An exception
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    35
is the Scala code I showed during the lectures or uploaded to KEATS,
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    36
which you can freely use.\bigskip
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    37
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    38
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    39
\subsection*{Background}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    40
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    41
\noindent
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    42
The fields in the file \texttt{log.csv} are organised 
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    43
as follows:
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    44
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    45
\begin{center}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    46
\texttt{counter, id, time\_date, name, country, parent\_id, msg}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    47
\end{center}  
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    48
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    49
\noindent
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    50
Each line in this file contains the data for a single message.  The field
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    51
\texttt{counter} is an integer number given to each message; \texttt{id} is a
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    52
unique hash string for a message; \texttt{time\_date} is the time when the message
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    53
was sent; \texttt{name} and \texttt{country} is data about the author
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    54
of the message, whereby sometimes the authors left the country information
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    55
empty; \texttt{parent\_id} is a hash specifying which other message the
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    56
message answers (this can also be empty). \texttt{Msg} is the actual
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    57
message text. \textbf{Be careful} for the tasks below that this text can contain
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    58
commas and needs to be treated special when the line is split up
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    59
by using \texttt{line.split(",").toList}. Tasks (2) and (3) are about
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    60
processing this data and storing it into the \texttt{Rec}-data-structure, which
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    61
is pre-defined in the file \texttt{resit.scala}:
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    62
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    63
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    64
\begin{center}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    65
\begin{verbatim}  
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    66
  Rec(num: Int, 
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    67
      msg_id: String,
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    68
      date: String,
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    69
      msg: String,
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    70
      author: String,
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    71
      country: Option[String],
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    72
      reply_id : Option[String],
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    73
      parent: Option[Int] = None,
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    74
      children: List[Int] = Nil)  
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    75
\end{verbatim}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    76
\end{center}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    77
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    78
\noindent
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    79
The transformation into a Rec-data-structure is a two-step process
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    80
where first the fields for parents and children are given default
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    81
values. This information is then filled in in a second step.
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    82
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    83
The main information that will be computed in the tasks below is from
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    84
which country authors are and how many authors are from each
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    85
country. The last task will also rank which messages have been the most
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    86
popular in terms of how many replies they received (this will computed
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    87
according to be the number children, grand-children and so on of a
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    88
message).
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    89
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    90
\subsection*{Tasks}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    91
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    92
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    93
\begin{itemize}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    94
\item[(1)] The function \texttt{get\_csv} takes a file name as
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    95
  argument. It should read the corresponding file and return its
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    96
  content. The content should be returned as a list of strings, namely a
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    97
  string for each line in the file. Since the file is a csv-file, the
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    98
  first line (the header) should be dropped in the result. Lines are
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    99
  separated by \verb!"\n"!. For the file \texttt{log.csv} there should
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   100
  be a list of 680 separate strings.
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   101
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   102
  \mbox{}\hfill[5\% Marks]
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   103
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   104
 
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   105
\item[(2)] The function \texttt{process\_line} takes a single line
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   106
  from the csv-file (as generated by \texttt{get\_csv}) and creates a
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   107
  Rec(ord) data structure. This data structure is pre-defined in the
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   108
  Scala file.
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   109
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   110
  For processing a line, you should use the function
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   111
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   112
  \begin{center}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   113
    \verb!<<some_line>>.split(",").toList!
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   114
  \end{center}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   115
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   116
  \noindent
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   117
  in order to separate the fields. HOWEVER BE CAREFUL that the message
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   118
  text in the last field of \texttt{log.cvs} can contain commas and
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   119
  therefore the split will not always result in a list of only 7
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   120
  elements. You need to concatenate anything beyond the 7th field into
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   121
  a single string before assigning the field \texttt{msg}.
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   122
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   123
  \mbox{}\hfill[10\% Marks]
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   124
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   125
\item[(3)] Each record in the log contains a unique hash code
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   126
  identifying each message. For example
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   127
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   128
  \begin{center}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   129
  \verb!"5ebeb459ac278d01301f1497"!
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   130
  \end{center}  
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   131
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   132
  \noindent
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   133
  Some messages also contain a hash code identifying the parent
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   134
  message (that is to which question they reply).  The function
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   135
  \texttt{post\_process} fills in the information about potential
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   136
  children and a potential parent message.
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   137
  
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   138
  The auxiliary function \texttt{get\_children} takes a record
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   139
  \texttt{e} and a record list \texttt{rs} as arguments, and returns
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   140
  the list of all direct children (children have the hash code of
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   141
  \texttt{e} as \texttt{reply\_id}). The list of children is returned
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   142
  as a list of \texttt{num}s. The \texttt{num}s can be used later
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   143
  as indexes in a Rec-list.
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   144
      
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   145
  The auxiliary function \texttt{get\_parent} returns the number of
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   146
  the record corresponding to the \texttt{reply\_id} (encoded as
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   147
  \texttt{Some} if there exists one, otherwise it returns \texttt{None}).
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   148
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   149
  In order to update a record, say \texttt{r}, with some additional
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   150
  information, you can use the Scala code
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   151
  \begin{verbatim}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   152
      r.copy(parent = ....,
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   153
             children = ....)
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   154
  \end{verbatim}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   155
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   156
  \mbox{}\hfill[10\% Marks]
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   157
  
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   158
\item[(4)] The functions \texttt{get\_countries} and
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   159
  \texttt{get\_countries\_numbers} calculate the countries where
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   160
  message authors are coming from and how many authors come from each
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   161
  country (returned as a \texttt{Map} from countries to Integers). In
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   162
  case an author did not specify a country, the empty string should
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   163
  be returned.
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   164
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   165
  \mbox{}\hfill[10\% Mark]
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   166
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   167
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   168
\item[(5)] This task identifies the most popular questions in the log,
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   169
  whereby popularity is measured in terms of how many follow-up
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   170
  questions were asked. We call such questions as belonging to a
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   171
  \emph{thread}. It can be assumed that in \texttt{log.csv} there are
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   172
  no circular references, that is no question refers to a
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   173
  follow-up question as parent.
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   174
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   175
  The function \texttt{ordered\_thread\_sizes} orders the
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   176
  message threads according to how many answers were given for one
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   177
  message (that is how many children, grand-children and so on one
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   178
  message has).
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   179
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   180
  The auxiliary function \texttt{search} enumerates all children,
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   181
  grand-children and so on for a given record \texttt{r} (including
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   182
  the record \texttt{r} itself). \texttt{Search} returns these children
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   183
  as a list of \texttt{Rec}s.
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   184
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   185
  The function \texttt{thread\_size} generates for a record, say
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   186
  \texttt{r}, a pair consisting of the number of \texttt{r} and the
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   187
  number of all children as produced by search. The numbers are the
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   188
  integers given for each message---for \texttt{log.cvs} a number
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   189
  is between 0 and 679.
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   190
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   191
  The function \texttt{ordered\_thread\_sizes} orders the list of
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   192
  pairs according to which thread in the chat is the longest (the
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   193
  longest should be first).
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   194
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   195
\mbox{}\hfill[15\% Mark]
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   196
\end{itemize}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   197
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   198
\end{document}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   199
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   200
  
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   201
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   202
\end{document}
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   203
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   204
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   205
%%% Local Variables: 
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   206
%%% mode: latex
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   207
%%% TeX-master: t
25d9c3b2bc99 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   208
%%% End: