diff -r a259eec25156 -r d9c784c71305 handouts/ho04.tex --- a/handouts/ho04.tex Sat Oct 17 11:24:41 2015 +0100 +++ b/handouts/ho04.tex Mon Oct 19 14:45:50 2015 +0100 @@ -29,10 +29,10 @@ If the string does not match the string, an error will be raised. Since the first phase of the algorithm by Sulzmann \& Lu is identical to the derivative based matcher from the first -coursework, the function $nullable$ will be used to decide -whether as string is matched by a regular expression. If -$nullable$ says yes, then values are constructed that reflect -how the regular expression matched the string. +coursework, the function \textit{nullable} will be used to +decide whether as string is matched by a regular expression. +If \textit{nullable} says yes, then values are constructed +that reflect how the regular expression matched the string. The definitions for values is given below. They are shown together with the regular expressions $r$ to which @@ -83,11 +83,11 @@ regular expressions are written entirely with upper-case letters, while values just start with a single upper-case character and the rest are lower-case letters. My definition -of regular expressions and values in Scala is shown below. I use -this in the REPL of Scala; when I use the Scala compiler I -need to rename some constructors, because Scala on Macs does -not like classes that are called \pcode{EMPTY} and -\pcode{Empty}. +of regular expressions and values in Scala is shown below. I +use this in the REPL of Scala; when I use the Scala compiler I +unfortunately need to rename some constructors, because Scala +on Macs does not like classes that are called \pcode{EMPTY} +and \pcode{Empty}. {\small\lstinputlisting[language=Scala,numbers=none] {../progs/app01.scala}} @@ -223,7 +223,15 @@ $Left(\ldots)$. The point is that from this value we can directly read off which part of $r_4$ matched the empty string: take the right-alternative first, and then the -right-alternative again. +right-alternative again. Remember $r_4$ is of the form + +\begin{center} +$r_4$:\;$(\varnothing \cdot (b \cdot c)) + + ((\varnothing \cdot c) + \underline{\epsilon})$\\ +\end{center} + +\noindent the value tells us that the underlined $\epsilon$ +is responsible for matching the empty string. Next we have to ``inject'' the last character, that is $c$ in the running example, into this value $v_4$ in order to @@ -341,19 +349,20 @@ manner in order to calculate this simplified regular expression. -The rectification we can implement by letting simp return -not just a (simplified) regular expression, but also a +The rectification we can implement by letting simp return not +just a (simplified) regular expression, but also a rectification function. Let us consider the alternative case, $r_1 + r_2$, first. By going depth-first, we first simplify the component regular expressions $r_1$ and $r_2.$ This will -return simplified versions (if they can be simplified), say -$r_{1s}$ and $r_{2s}$, but also two rectification functions -$f_{1s}$ and $f_{2s}$. We need to assemble them in order to -obtain a rectified value for $r_1 + r_2$. In case $r_{1s}$ -simplified to $\varnothing$, we continue the derivative -calculation with $r_{2s}$. The Sulzmann \& Lu algorithm would -return a corresponding value, say $v_{2s}$. But now this value -needs to be ``rectified'' to the value +return simplified versions, say $r_{1s}$ and $r_{2s}$, of the +component regular expressions (if they can be simplified) but +also two rectification functions $f_{1s}$ and $f_{2s}$. We +need to assemble them in order to obtain a rectified value for +$r_1 + r_2$. In case $r_{1s}$ simplified to $\varnothing$, we +continue the derivative calculation with $r_{2s}$. The +Sulzmann \& Lu algorithm would return a corresponding value, +say $v_{2s}$. But now this value needs to be ``rectified'' to +the value \begin{center} $Right(v_{2s})$ @@ -720,8 +729,29 @@ (top\_level:\texttt{ac.uk})]$ \end{center} -\noindent As you will see in the next lecture, this is now all -we need to tokenise an input string and classify each token. +Recall that we want to lex a little programming language, +called the \emph{While}-language. The main keywords in this +language are \pcode{while}, \pcode{if}, \pcode{then} and +\pcode{else}. As usual we have identifiers, operators, numbers +and so on. For this we would need to design the corresponding +regular expressions to recognise these syntactic categories. I +let you do this design task. Having these regular expressions +at our disposal we can form the regular expression + +\begin{center} +\begin{tabular}{rcl} +\textit{WhileRegs} & $\dn$ & (($k$, KEYWORD) +\\ + & & ($i$, ID) +\\ + & & ($o$, OP) + \\ + & & ($n$, NUM) + \\ + & & ($s$, SEMI) + \\ + & & ($p$, (LPAREN + RPAREN)) +\\ + & & ($b$, (BEGIN + END)) + \\ + & & ($w$, WHITESPACE))$^*$ +\end{tabular} +\end{center} + + \end{document} %%% Local Variables: