| 336 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |      1 | \documentclass{article}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |      2 | \usepackage{../style}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |      3 | \usepackage{../langs}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |      4 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |      5 | \begin{document}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |      6 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |      7 | \section*{August Exam (Scala):  Chat Log Mining}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |      8 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |      9 | This coursework is worth 50\%. It is about mining a log of an online
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     10 | chat between 85 participants. The log is given as a csv-list in the file
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     11 | \texttt{log.csv}. The log is an unordered list containing information which
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     12 | message has been sent, by whom, when and in response to which other
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     13 | message. Each message has also a number and a unique hash code.\bigskip
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     14 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     15 | \noindent 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     16 | \textbf{Important:} Make sure the file you submit can be processed 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     17 | by just calling
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     18 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     19 | \begin{center}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     20 |   \texttt{scala <<filename.scala>>}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     21 | \end{center}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     22 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     23 | \noindent
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     24 | Do not use any mutable data structures in your
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     25 | submission! They are not needed. This means you cannot use
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     26 | \texttt{ListBuffer}s, \texttt{Array}s, for example. Do not use
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     27 | \texttt{return} in your code! It has a different meaning in Scala,
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     28 | than in Java.  Do not use \texttt{var}! This declares a mutable
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     29 | variable.  
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     30 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     31 | \subsection*{Disclaimer}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     32 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     33 | It should be understood that the work you submit represents your own
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     34 | effort! You have not copied from anyone or anywhere else. An exception
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     35 | is the Scala code I showed during the lectures or uploaded to KEATS,
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     36 | which you can freely use.\bigskip
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     37 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     38 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     39 | \subsection*{Background}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     40 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     41 | \noindent
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     42 | The fields in the file \texttt{log.csv} are organised 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     43 | as follows:
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     44 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     45 | \begin{center}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     46 | \texttt{counter, id, time\_date, name, country, parent\_id, msg}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     47 | \end{center}  
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     48 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     49 | \noindent
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     50 | Each line in this file contains the data for a single message.  The field
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     51 | \texttt{counter} is an integer number given to each message; \texttt{id} is a
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     52 | unique hash string for a message; \texttt{time\_date} is the time when the message
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     53 | was sent; \texttt{name} and \texttt{country} is data about the author
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     54 | of the message, whereby sometimes the authors left the country information
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     55 | empty; \texttt{parent\_id} is a hash specifying which other message the
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     56 | message answers (this can also be empty). \texttt{Msg} is the actual
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     57 | message text. \textbf{Be careful} for the tasks below that this text can contain
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     58 | commas and needs to be treated special when the line is split up
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     59 | by using \texttt{line.split(",").toList}. Tasks (2) and (3) are about
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     60 | processing this data and storing it into the \texttt{Rec}-data-structure, which
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     61 | is pre-defined in the file \texttt{resit.scala}:
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     62 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     63 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     64 | \begin{center}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     65 | \begin{verbatim}  
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     66 |   Rec(num: Int, 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     67 |       msg_id: String,
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     68 |       date: String,
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     69 |       msg: String,
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     70 |       author: String,
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     71 |       country: Option[String],
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     72 |       reply_id : Option[String],
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     73 |       parent: Option[Int] = None,
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     74 |       children: List[Int] = Nil)  
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     75 | \end{verbatim}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     76 | \end{center}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     77 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     78 | \noindent
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     79 | The transformation into a Rec-data-structure is a two-step process
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     80 | where first the fields for parents and children are given default
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     81 | values. This information is then filled in in a second step.
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     82 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     83 | The main information that will be computed in the tasks below is from
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     84 | which country authors are and how many authors are from each
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     85 | country. The last task will also rank which messages have been the most
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     86 | popular in terms of how many replies they received (this will computed
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     87 | according to be the number children, grand-children and so on of a
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     88 | message).
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     89 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     90 | \subsection*{Tasks}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     91 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     92 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     93 | \begin{itemize}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     94 | \item[(1)] The function \texttt{get\_csv} takes a file name as
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     95 |   argument. It should read the corresponding file and return its
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     96 |   content. The content should be returned as a list of strings, namely a
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     97 |   string for each line in the file. Since the file is a csv-file, the
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     98 |   first line (the header) should be dropped in the result. Lines are
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |     99 |   separated by \verb!"\n"!. For the file \texttt{log.csv} there should
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    100 |   be a list of 680 separate strings.
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    101 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    102 |   \mbox{}\hfill[5\% Marks]
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    103 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    104 |  
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    105 | \item[(2)] The function \texttt{process\_line} takes a single line
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    106 |   from the csv-file (as generated by \texttt{get\_csv}) and creates a
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    107 |   Rec(ord) data structure. This data structure is pre-defined in the
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    108 |   Scala file.
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    109 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    110 |   For processing a line, you should use the function
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    111 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    112 |   \begin{center}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    113 |     \verb!<<some_line>>.split(",").toList!
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    114 |   \end{center}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    115 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    116 |   \noindent
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    117 |   in order to separate the fields. HOWEVER BE CAREFUL that the message
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    118 |   text in the last field of \texttt{log.cvs} can contain commas and
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    119 |   therefore the split will not always result in a list of only 7
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    120 |   elements. You need to concatenate anything beyond the 7th field into
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    121 |   a single string before assigning the field \texttt{msg}.
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    122 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    123 |   \mbox{}\hfill[10\% Marks]
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    124 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    125 | \item[(3)] Each record in the log contains a unique hash code
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    126 |   identifying each message. For example
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    127 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    128 |   \begin{center}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    129 |   \verb!"5ebeb459ac278d01301f1497"!
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    130 |   \end{center}  
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    131 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    132 |   \noindent
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    133 |   Some messages also contain a hash code identifying the parent
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    134 |   message (that is to which question they reply).  The function
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    135 |   \texttt{post\_process} fills in the information about potential
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    136 |   children and a potential parent message.
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    137 |   
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    138 |   The auxiliary function \texttt{get\_children} takes a record
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    139 |   \texttt{e} and a record list \texttt{rs} as arguments, and returns
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    140 |   the list of all direct children (children have the hash code of
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    141 |   \texttt{e} as \texttt{reply\_id}). The list of children is returned
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    142 |   as a list of \texttt{num}s. The \texttt{num}s can be used later
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    143 |   as indexes in a Rec-list.
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    144 |       
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    145 |   The auxiliary function \texttt{get\_parent} returns the number of
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    146 |   the record corresponding to the \texttt{reply\_id} (encoded as
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    147 |   \texttt{Some} if there exists one, otherwise it returns \texttt{None}).
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    148 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    149 |   In order to update a record, say \texttt{r}, with some additional
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    150 |   information, you can use the Scala code
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    151 |   \begin{verbatim}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    152 |       r.copy(parent = ....,
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    153 |              children = ....)
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    154 |   \end{verbatim}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    155 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    156 |   \mbox{}\hfill[10\% Marks]
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    157 |   
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    158 | \item[(4)] The functions \texttt{get\_countries} and
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    159 |   \texttt{get\_countries\_numbers} calculate the countries where
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    160 |   message authors are coming from and how many authors come from each
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    161 |   country (returned as a \texttt{Map} from countries to Integers). In
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    162 |   case an author did not specify a country, the empty string should
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    163 |   be returned.
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    164 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    165 |   \mbox{}\hfill[10\% Mark]
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    166 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    167 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    168 | \item[(5)] This task identifies the most popular questions in the log,
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    169 |   whereby popularity is measured in terms of how many follow-up
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    170 |   questions were asked. We call such questions as belonging to a
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    171 |   \emph{thread}. It can be assumed that in \texttt{log.csv} there are
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    172 |   no circular references, that is no question refers to a
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    173 |   follow-up question as parent.
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    174 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    175 |   The function \texttt{ordered\_thread\_sizes} orders the
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    176 |   message threads according to how many answers were given for one
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    177 |   message (that is how many children, grand-children and so on one
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    178 |   message has).
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    179 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    180 |   The auxiliary function \texttt{search} enumerates all children,
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    181 |   grand-children and so on for a given record \texttt{r} (including
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    182 |   the record \texttt{r} itself). \texttt{Search} returns these children
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    183 |   as a list of \texttt{Rec}s.
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    184 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    185 |   The function \texttt{thread\_size} generates for a record, say
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    186 |   \texttt{r}, a pair consisting of the number of \texttt{r} and the
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    187 |   number of all children as produced by search. The numbers are the
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    188 |   integers given for each message---for \texttt{log.cvs} a number
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    189 |   is between 0 and 679.
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    190 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    191 |   The function \texttt{ordered\_thread\_sizes} orders the list of
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    192 |   pairs according to which thread in the chat is the longest (the
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    193 |   longest should be first).
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    194 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    195 | \mbox{}\hfill[15\% Mark]
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    196 | \end{itemize}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    197 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    198 | \end{document}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    199 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    200 |   
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    201 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    202 | \end{document}
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    203 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    204 | 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    205 | %%% Local Variables: 
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    206 | %%% mode: latex
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    207 | %%% TeX-master: t
 | 
| 
Christian Urban <christian.urban@kcl.ac.uk> parents: diff
changeset |    208 | %%% End: 
 |