added
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Fri, 25 Oct 2013 17:55:35 +0100
changeset 154 51d6b8b828c4
parent 153 70ab41cb610e
child 155 9b2d128765e1
added
handouts/ho05.pdf
handouts/ho05.tex
Binary file handouts/ho05.pdf has changed
--- a/handouts/ho05.tex	Fri Oct 25 17:06:19 2013 +0100
+++ b/handouts/ho05.tex	Fri Oct 25 17:55:35 2013 +0100
@@ -164,13 +164,26 @@
 \end{center}
 
 \noindent
-Since \texttt{if} matches the \textit{KEYWORD} regular expression, \VS{}  is a whitespace and so on. This process 
-of separating an input string into components is often called \emph{lexing} or \emph{scanning}.
+Since \texttt{if} matches the \textit{KEYWORD} regular expression, \VS{}  is a whitespace and so on. This process of separating an input string into components is often called \emph{lexing} or \emph{scanning}.
 It is usually the first phase of a compiler. Note that the separation into words cannot, in general, 
 be done by looking at whitespaces: while \texttt{if} and \texttt{true} are separated by a whitespace,
 the components in \texttt{x+2} are not. Another reason for recognising whitespaces explicitly is
 that in some languages, for example Python, whitespace matters. However in our small language we will eventually filter out all whitespaces and also comments.
 
+Lexing will not just separate the input into its components, but also classify the components, that
+is explicitly record that \texttt{it} is a keyword,  \VS{} a whitespace, \texttt{true} an identifier and so on.
+But for the moment we will only focus on the simpler problem of separating a string into components.
+There are a few subtleties  we need to consider first. For example if the input string is
+
+\begin{center}
+\texttt{\Grid{iffoo\VS\ldots}}
+\end{center}
+
+\noindent
+then there are two possibilities: either we regard the input as the keyword \texttt{if} followed
+by the identifier \texttt{foo} (both regular expressions match) or we regard \texttt{iffoo} as a 
+single identifier. The choice that is often made in lexers to look for the longest possible match,
+that is regard the input as a single identifier  \texttt{iffoo} (since it is longer than \texttt{if}).
 
 \end{document}