| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Mon, 20 Oct 2025 22:18:21 +0200 | |
| changeset 1013 | 1a23d87d1700 | 
| parent 871 | 94b84d880c2b | 
| permissions | -rw-r--r-- | 
| 151 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1 | \documentclass{article}
 | 
| 292 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 2 | \usepackage{../style}
 | 
| 217 
cd6066f1056a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 3 | \usepackage{../langs}
 | 
| 292 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 4 | \usepackage{../graphics}
 | 
| 151 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 5 | |
| 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 6 | \begin{document}
 | 
| 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 7 | |
| 292 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 8 | \section*{Handout 5 (Lexing)}
 | 
| 151 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 9 | |
| 292 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 10 | Whenever you want to design a new programming language or | 
| 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 11 | implement a compiler for an existing language, the first task | 
| 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 12 | is to fix the basic ``words'' of the language. For example | 
| 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 13 | what are the keywords, or reserved words, of the language, | 
| 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 14 | what are permitted identifiers, numbers, expressions and so | 
| 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 15 | on. One convenient way to do this is, of course, by using | 
| 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 16 | regular expressions. | 
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 17 | |
| 292 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 18 | In this course we want to take a closer look at the WHILE | 
| 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 19 | programming language. This is a simple imperative programming | 
| 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 20 | language consisting of arithmetic expressions, assignments, | 
| 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 21 | if-statements and loops. For example the Fibonacci program can | 
| 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 22 | be written in this language as follows: | 
| 151 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 23 | |
| 153 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 24 | \begin{center}
 | 
| 871 | 25 | \mbox{\lstinputlisting[language=while]{../progs/while-tests/fib.while}}
 | 
| 153 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 26 | \end{center}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 27 | |
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 28 | \noindent | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 29 | The keywords in this language will be | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 30 | |
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 31 | \begin{center}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 32 | \texttt{while}, \texttt{if}, \texttt{then}, \texttt{else}, \texttt{write}, \texttt{read}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 33 | \end{center}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 34 | |
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 35 | \noindent | 
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 36 | In addition we will have some common operators, such as \texttt{<}, \texttt{>}, \texttt{:=} and so on, as well as numbers
 | 
| 159 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 37 | and strings (which we however ignore for the moment). We also need to specify what the ``whitespace'' | 
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 38 | is in our programming language and what comments should look like. | 
| 159 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 39 | As a first try, we might specify the regular expressions for our language roughly as follows | 
| 153 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 40 | |
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 41 | \begin{center}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 42 | \begin{tabular}{lcl}
 | 
| 159 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 43 | $\textit{LETTER}$ & $:=$ & $\texttt{a} + \texttt{A} + \texttt{b} + \texttt{B} + \ldots$\\
 | 
| 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 44 | $\textit{DIGIT}$ & $:=$ & $\texttt{0} + \texttt{1} + \texttt{2} + \ldots$\\
 | 
| 153 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 45 | $\textit{KEYWORD}$ & $:=$ & $\texttt{while}  + \texttt{if} + \texttt{then} + \texttt{else} + \dots$\\
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 46 | $\textit{IDENT}$ & $:=$ & $\textit{LETTER} \cdot (\textit{LETTER} + \textit{DIGIT} + {\_})^*$\\ 
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 47 | $\textit{OP}$      &  $:=$ & $\texttt{:=} + \texttt{<} + \ldots$\\
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 48 | $\textit{NUM}$   &  $:=$ & $\textit{DIGIT}^+$\\
 | 
| 159 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 49 | $\textit{WHITESPACE}$ & $:=$ & $("\hspace{2mm}" + \backslash\texttt{n})^+$
 | 
| 153 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 50 | \end{tabular}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 51 | \end{center}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 52 | |
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 53 | Having the regular expressions in place, the problem we have to solve is: | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 54 | given a string of our programming language, which regular expression | 
| 159 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 55 | matches which part of the string. By solving this problem, we can split up a string | 
| 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 56 | of our language into components. | 
| 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 57 | For example given the input string | 
| 153 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 58 | |
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 59 | \begin{center}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 60 | \texttt{\Grid{if\VS true\VS then\VS x+2\VS else\VS x+3}}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 61 | \end{center}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 62 | |
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 63 | \noindent | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 64 | we expect it is split up as follows | 
| 153 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 65 | |
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 66 | \begin{center}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 67 | \tt | 
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 68 | \Grid{if}\,
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 69 | \Grid{\VS}\, 
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 70 | \Grid{true}\, 
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 71 | \Grid{\VS}\, 
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 72 | \Grid{then}\, 
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 73 | \Grid{\VS}\, 
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 74 | \Grid{x}\, 
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 75 | \Grid{+}\, 
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 76 | \Grid{2}\,
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 77 | \Grid{\VS}\, 
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 78 | \Grid{else}\,
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 79 | \Grid{\VS}\,
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 80 | \Grid{x}\,
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 81 | \Grid{+}\,
 | 
| 153 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 82 | \Grid{3}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 83 | \end{center}
 | 
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 84 | |
| 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 85 | \noindent | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 86 | This process of splitting up an input string into components is often called \emph{lexing} or \emph{scanning}.
 | 
| 153 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 87 | It is usually the first phase of a compiler. Note that the separation into words cannot, in general, | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 88 | be done by just looking at whitespaces: while \texttt{if} and \texttt{true} are separated by a whitespace
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 89 | in the example above, this is not always the case. As can be seen the three components in \texttt{x+2} are 
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 90 | not separated by any whitespace. Another reason for recognising whitespaces explicitly is | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 91 | that in some languages, for example Python, whitespaces matters, that is carry meaning. However in | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 92 | our small language we will eventually just filter out all whitespaces and also all comments. | 
| 153 
70ab41cb610e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
152diff
changeset | 93 | |
| 162 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 94 | Lexing not just separates a string into its components, but also classifies the components, meaning it explicitly records that \texttt{if} is a keyword,  \VS{} a whitespace, \texttt{true} an identifier and so on.
 | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 95 | For the moment, though, we will only focus on the simpler problem of just splitting up | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 96 | a string into components. | 
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 97 | |
| 159 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 98 | There are a few subtleties we need to consider first. For example, say the string is | 
| 154 
51d6b8b828c4
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
153diff
changeset | 99 | |
| 
51d6b8b828c4
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
153diff
changeset | 100 | \begin{center}
 | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 101 | \texttt{\Grid{iffoo\VS}}\;\ldots
 | 
| 154 
51d6b8b828c4
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
153diff
changeset | 102 | \end{center}
 | 
| 
51d6b8b828c4
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
153diff
changeset | 103 | |
| 
51d6b8b828c4
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
153diff
changeset | 104 | \noindent | 
| 159 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 105 | then there are two possibilities for how it can be split up: either we regard the input as the keyword \texttt{if} followed
 | 
| 154 
51d6b8b828c4
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
153diff
changeset | 106 | by the identifier \texttt{foo} (both regular expressions match) or we regard \texttt{iffoo} as a 
 | 
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 107 | single identifier. The choice that is often made in lexers is to look for the longest possible match. | 
| 162 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 108 | This leaves  \texttt{iffoo} as the only match in this case (since it is longer than \texttt{if}).
 | 
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 109 | |
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 110 | Unfortunately, the convention about the longest match does not yet make the process | 
| 159 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 111 | of lexing completely deterministic. Consider the string | 
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 112 | |
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 113 | \begin{center}
 | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 114 | \texttt{\Grid{then\VS}}\;\dots
 | 
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 115 | \end{center}
 | 
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 116 | |
| 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 117 | \noindent | 
| 159 
ae5ceef5355e
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
158diff
changeset | 118 | Clearly, this string should be identified as a keyword. The problem is that also the regular expression \textit{IDENT} for identifiers matches this string. To overcome this ambiguity we need to rank our 
 | 
| 156 
95eaee695636
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
155diff
changeset | 119 | regular expressions. In our running example we just use the ranking | 
| 
95eaee695636
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
155diff
changeset | 120 | |
| 
95eaee695636
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
155diff
changeset | 121 | \[ | 
| 
95eaee695636
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
155diff
changeset | 122 | \textit{KEYWORD} < \textit{IDENT} < \textit{OP} < \ldots
 | 
| 
95eaee695636
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
155diff
changeset | 123 | \] | 
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 124 | |
| 156 
95eaee695636
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
155diff
changeset | 125 | \noindent | 
| 160 
8134c3b981e0
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
159diff
changeset | 126 | So even if both regular expressions match in the example above, | 
| 
8134c3b981e0
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
159diff
changeset | 127 | we give preference to the regular expression for keywords. | 
| 155 
9b2d128765e1
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
154diff
changeset | 128 | |
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 129 | Let us see how our algorithm for lexing works in detail. In addition to the functions $nullable$ | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 130 | and $der$, it will  use the function \emph{zeroable} defined as follows:
 | 
| 157 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 131 | |
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 132 | \begin{center}
 | 
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 133 | \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
 | 
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 134 | $zeroable(\varnothing)$ & $\dn$ & $true$\\ | 
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 135 | $zeroable(\epsilon)$ & $\dn$ & $f\!alse$\\ | 
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 136 | $zeroable (c)$ & $\dn$ & $f\!alse$\\ | 
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 137 | $zeroable (r_1 + r_2)$ & $\dn$ & $zeroable(r_1) \wedge zeroable(r_2)$ \\ | 
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 138 | $zeroable (r_1 \cdot r_2)$ & $\dn$ & $zeroable(r_1) \vee zeroable(r_2)$ \\ | 
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 139 | $zeroable (r^*)$ & $\dn$ & $f\!alse$\\ | 
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 140 | \end{tabular}
 | 
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 141 | \end{center}
 | 
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 142 | |
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 143 | \noindent | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 144 | Recall that the function $nullable(r)$ tests whether a regular expression | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 145 | can match the empty string. The function $zeroable$, on the other hand, tests whether a regular | 
| 158 
77e8397ec2ec
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
157diff
changeset | 146 | expression cannot match anything at all. The mathematical way of stating this | 
| 160 
8134c3b981e0
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
159diff
changeset | 147 | property is | 
| 157 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 148 | |
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 149 | \begin{center}
 | 
| 160 
8134c3b981e0
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
159diff
changeset | 150 | $zeroable(r)$ if and only if $L(r) = \varnothing$ | 
| 157 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 151 | \end{center}
 | 
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 152 | |
| 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 153 | \noindent | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 154 | For what follows let us fix a set of regular expressions $rs$ as being | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 155 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 156 | \begin{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 157 | \textit{KEYWORD}, \textit{IDENT}, \textit{WHITESPACE}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 158 | \end{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 159 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 160 | \noindent | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 161 | specifying the ``words'' | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 162 | of our programming language. The algorithm takes as input the $rs$ and a string, say | 
| 158 
77e8397ec2ec
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
157diff
changeset | 163 | |
| 
77e8397ec2ec
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
157diff
changeset | 164 | \begin{center}
 | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 165 | \texttt{\grid{$c_1$}\grid{$c_2$}\grid{$c_3$}\grid{$c_4$}}\;\ldots
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 166 | \end{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 167 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 168 | \noindent | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 169 | and tries to chop off one word from the beginning of the string. If none of the | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 170 | regular expression in $rs$ matches, we will just return | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 171 | the empty string. | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 172 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 173 | The crucial idea in the algorithm is to build the derivatives of all regular expressions in $rs$ with respect | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 174 | to the first character $c_1$. Then we take the results and continue with | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 175 | building the derivatives with respect to $c_2$ until we have either exhausted our | 
| 162 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 176 | input string or all of the regular expressions are ``zeroable''. Suppose the input string is | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 177 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 178 | \begin{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 179 | \texttt{\Grid{if2\VS}}\;\dots
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 180 | \end{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 181 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 182 | \noindent | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 183 | then building the derivatives with respect to \texttt{i} gives
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 184 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 185 | \begin{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 186 | \begin{tabular}{l|c}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 187 | & $zeroable$\\\hline | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 188 |  $der\;\texttt{i}\;(\textit{KEYWORD})$      & no\\
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 189 |  $der\;\texttt{i}\;(\textit{IDENT})$              & no\\
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 190 |  $der\;\texttt{i}\;(\textit{WHITESPACE})$ & yes\\
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 191 | \end{tabular}
 | 
| 158 
77e8397ec2ec
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
157diff
changeset | 192 | \end{center}
 | 
| 
77e8397ec2ec
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
157diff
changeset | 193 | |
| 
77e8397ec2ec
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
157diff
changeset | 194 | \noindent | 
| 162 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 195 | We can eliminate \textit{WHITESPACE} as a potential candidate, because
 | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 196 | no derivative can go from $zeroable = \text{yes}$ to no. That leaves the other
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 197 | two regular expressions as potential candidate and we have to consider the | 
| 162 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 198 | next character, \texttt{f}, from the input string
 | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 199 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 200 | \begin{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 201 | \begin{tabular}{l|c}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 202 | & $zeroable$\\\hline | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 203 |  $der\;\texttt{f}\;(der\;\texttt{i}\;(\textit{KEYWORD}))$      & no\\
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 204 |  $der\;\texttt{f}\;(der\;\texttt{i}\;(\textit{IDENT}))$              & no\\
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 205 | \end{tabular}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 206 | \end{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 207 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 208 | \noindent | 
| 162 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 209 | Since both are `no', we have to continue with \texttt{2} from the input string
 | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 210 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 211 | \begin{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 212 | \begin{tabular}{l|c}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 213 | & $zeroable$\\\hline | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 214 |  $der\;\texttt{2}\;(der\;\texttt{f}\;(der\;\texttt{i}\;(\textit{KEYWORD})))$      & yes\\
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 215 |  $der\;\texttt{2}\;(der\;\texttt{f}\;(der\;\texttt{i}\;(\textit{IDENT})))$              & no\\
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 216 | \end{tabular}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 217 | \end{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 218 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 219 | \noindent | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 220 | Although we now know that the beginning is definitely an \textit{IDENT}, we do not yet
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 221 | know how much of the input string should be considered as an \textit{IDENT}. So we
 | 
| 162 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 222 | still have to continue and consider the next derivative. | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 223 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 224 | \begin{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 225 | \begin{tabular}{l|c}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 226 | & $zeroable$\\\hline | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 227 |  $der\;\texttt{\VS}\;(der\;\texttt{2}\;(der\;\texttt{f}\;(der\;\texttt{i}\;(\textit{IDENT}))))$              & yes\\
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 228 | \end{tabular}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 229 | \end{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 230 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 231 | \noindent | 
| 162 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 232 | Since the answer is now `yes' also in this case, we can stop: once all derivatives are | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 233 | zeroable, we know the regular expressions cannot match any more letters from | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 234 | the input. In this case we only have to go back to the derivative that is nullable. In this case it | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 235 | is | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 236 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 237 | \begin{center}
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 238 | $der\;\texttt{2}\;(der\;\texttt{f}\;(der\;\texttt{i}\;(\textit{IDENT})))$ 
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 239 | \end{center} 
 | 
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 240 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 241 | \noindent | 
| 162 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 242 | which means we recognised an identifier. In case where there is a choice of more than one | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 243 | regular expressions that are nullable, then we choose the one with the highest precedence. | 
| 162 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 244 | You can try out such a case with the input string | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 245 | |
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 246 | \begin{center}
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 247 | \texttt{\Grid{then\VS}}\;\dots
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 248 | \end{center}
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 249 | |
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 250 | \noindent | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 251 | which can both be recognised as a keyword, but also an identifier. | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 252 | |
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 253 | While in the example above the last nullable derivative is the one directly before | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 254 | the derivative turns zeroable, this is not always the case. Imagine, identifiers can be | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 255 | letters, as permuted by the regular expression \textit{IDENT}, but must end with an 
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 256 | undercore. | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 257 | |
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 258 | \begin{center}
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 259 | \begin{tabular}{lcl}
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 260 | $\textit{NEWIDENT}$ & $:=$ & $\textit{LETTER} \cdot (\textit{LETTER} + \textit{DIGIT} + {\_})^* \cdot \_$\\ 
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 261 | \end{tabular}
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 262 | \end{center}
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 263 | |
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 264 | \noindent | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 265 | If we use \textit{NEWIDENT} with the input string
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 266 | |
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 267 | \begin{center}
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 268 | \texttt{\Grid{iffoo\VS}}\;\ldots
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 269 | \end{center}
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 270 | |
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 271 | \noindent | 
| 163 
89d6d89d9844
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
162diff
changeset | 272 | then it will only become $zeroable$ after the $\VS$ has been analysed. In this case we have to go back to the | 
| 162 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 273 | first \texttt{f} because only 
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 274 | |
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 275 | \begin{center}
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 276 |  $der\;\texttt{f}\;(der\;\texttt{i}\;(\textit{KEYWORD}))$  
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 277 | \end{center}
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 278 | |
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 279 | \noindent | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 280 | is nullable. As a result we recognise successfully the keyword \texttt{if} and the remaining
 | 
| 
edcd84c7b491
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
161diff
changeset | 281 | string needs to be consumed by other regular expressions or lead to a lexing error. | 
| 161 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 282 | |
| 
bfcf838dda4d
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
160diff
changeset | 283 | |
| 157 
b6eee9571a63
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
156diff
changeset | 284 | |
| 151 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 285 | \end{document}
 | 
| 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 286 | |
| 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 287 | %%% Local Variables: | 
| 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 288 | %%% mode: latex | 
| 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 289 | %%% TeX-master: t | 
| 
df229ec49b22
added ho
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 290 | %%% End: |