# HG changeset patch
# User Christian Urban <christian dot urban at kcl dot ac dot uk>
# Date 1382720135 -3600
# Node ID 51d6b8b828c4bd4d31e5443fbf6e4dd01c39dde7
# Parent  70ab41cb610e1696d0b1e8b3e0e6c6a456eb86c8
added

diff -r 70ab41cb610e -r 51d6b8b828c4 handouts/ho05.pdf
Binary file handouts/ho05.pdf has changed
diff -r 70ab41cb610e -r 51d6b8b828c4 handouts/ho05.tex
--- a/handouts/ho05.tex	Fri Oct 25 17:06:19 2013 +0100
+++ b/handouts/ho05.tex	Fri Oct 25 17:55:35 2013 +0100
@@ -164,13 +164,26 @@
 \end{center}
 
 \noindent
-Since \texttt{if} matches the \textit{KEYWORD} regular expression, \VS{}  is a whitespace and so on. This process 
-of separating an input string into components is often called \emph{lexing} or \emph{scanning}.
+Since \texttt{if} matches the \textit{KEYWORD} regular expression, \VS{}  is a whitespace and so on. This process of separating an input string into components is often called \emph{lexing} or \emph{scanning}.
 It is usually the first phase of a compiler. Note that the separation into words cannot, in general, 
 be done by looking at whitespaces: while \texttt{if} and \texttt{true} are separated by a whitespace,
 the components in \texttt{x+2} are not. Another reason for recognising whitespaces explicitly is
 that in some languages, for example Python, whitespace matters. However in our small language we will eventually filter out all whitespaces and also comments.
 
+Lexing will not just separate the input into its components, but also classify the components, that
+is explicitly record that \texttt{it} is a keyword,  \VS{} a whitespace, \texttt{true} an identifier and so on.
+But for the moment we will only focus on the simpler problem of separating a string into components.
+There are a few subtleties  we need to consider first. For example if the input string is
+
+\begin{center}
+\texttt{\Grid{iffoo\VS\ldots}}
+\end{center}
+
+\noindent
+then there are two possibilities: either we regard the input as the keyword \texttt{if} followed
+by the identifier \texttt{foo} (both regular expressions match) or we regard \texttt{iffoo} as a 
+single identifier. The choice that is often made in lexers to look for the longest possible match,
+that is regard the input as a single identifier  \texttt{iffoo} (since it is longer than \texttt{if}).
 
 \end{document}