diff -r a4646557016d -r 0451b8b67f62 handouts/ho06.tex --- a/handouts/ho06.tex Thu Oct 25 00:50:58 2018 +0100 +++ b/handouts/ho06.tex Thu Oct 25 06:53:16 2018 +0100 @@ -32,7 +32,7 @@ \end{center} \noindent -Given the extended effort we have spent in order to implement a lexer +Given the extended effort we have spent implementing a lexer in order to generate list of tokens, it might be surprising that in what follows we shall often use strings as input. This is for making the explanation more lucid. It does not make our previous work on @@ -148,7 +148,7 @@ \end{lstlisting} \end{center} -\noindent You can see the \texttt{parse} tests whether the +\noindent You can see \texttt{parse} tests whether the first character of the input string \texttt{in} is equal to \texttt{c}. If yes, then it splits the string into the recognised part \texttt{c} and the unprocessed part \texttt{in.tail}. In case @@ -174,10 +174,10 @@ \noindent In this parser the input is of type \texttt{List[Token]}. The function parse looks at the input \texttt{ts} and checks whether the first -token is a \texttt{Num\_token}. Let us assume our lexer generated -these tokens for numbers. But this parser does not just return this +token is a \texttt{Num\_token} (let us assume our lexer generated +these tokens for numbers). But this parser does not just return this token (and the rest of the list), like the \texttt{CharParser} above, -rather extracts the string \texttt{s} from the token and converts it +rather it extracts the string \texttt{s} from the token and converts it into an integer. The hope is that the lexer did its work well and this conversion always succeeds. The consequence of this is that the output type for this parser is \texttt{Int}, not \texttt{Token}. Such a @@ -238,7 +238,7 @@ \end{center} \noindent Later on we will use Scala mechanism for introducing some -more readable shorthand notation for this, like \texttt{"a" || +more readable shorthand notation for this, like \texttt{"a" | "b"}. Let us look in detail at what this parser combinator produces with some sample strings @@ -275,7 +275,7 @@ \noindent Notice that the $p$ wil first be run on the input, producing pairs of the form $(\textit{output}_1, u_1)$ where the $u_1$ stands -for the unprocessed, or left-over, parts. We want that $q$ runs on all +for the unprocessed, or leftover, parts. We want that $q$ runs on all these unprocessed parts $u_1$. This again will produce some processed part , $p$ and $q$, we apply both parsers to the input (remember parsers are @@ -325,7 +325,7 @@ \end{center} \noindent Later on we will use again Scala mechanism for introducing some -more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some sample strings +more readable shorthand notation for this, like \texttt{"a" | "b"}. Let us look in detail at what this parser combinator produces with some sample strings \begin{center} \begin{tabular}{rcl} @@ -363,7 +363,7 @@ overall result of the sequence parser combinator is pairs of the form $((\textit{output}_1, \textit{output}_2), u_2)$. This means the unprocessed part of both parsers is the unprocessed part the second -parser $q$ produces leaves as left-over. The processed parts of both +parser $q$ leaves as leftover. The processed parts of both parsers is a pair consisting of the outputs of $p$ and $q$, namely $(\textit{output}_1, \textit{output}_2)$. This behaviour can be implemented in Scala as follows: @@ -431,7 +431,7 @@ because they do not fit with what the parser is supposed to parse. -A slightly more complicated parser is \pcode{("a" || "b") ~ "c"} which +A slightly more complicated parser is \pcode{("a" | "b") ~ "c"} which parses as first character either an \texttt{a} or \texttt{b}, followed by a \texttt{c}. This parser produces the following outputs. @@ -462,7 +462,19 @@ \noindent The second and third example fail, because something is ``missing'' in the sequence we are looking for. Also notice how the results nest with sequences: the parsed part is a nested pair of the -form \pcode{((a, b), c)}. Two more examples: first consider the parser +form \pcode{((a, b), c)}. If we nest the sequence parser differently, +for example \pcode{"a" ~ ("b" ~ "c")}, then also our output pairs nest +differently + +\begin{center} +\begin{tabular}{rcl} +input strings & & output\medskip\\ +\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}},(\texttt{\Grid{b}}, \texttt{\Grid{c}})), \texttt{\Grid{de}})\right\}$\\ +\end{tabular} +\end{center} + +\noindent +Two more examples: first consider the parser \pcode{("a" ~ "a") ~ "a"} and the input \pcode{aaaa}: \begin{center} @@ -501,7 +513,7 @@ ultimately-unsuccessful-parses. -Note carefully that constructing a parser such \pcode{"a" || ("a" ~ +Note carefully that constructing a parser such \pcode{"a" | ("a" ~ "b")} will result in a typing error. The intention is that we want to parse an \texttt{a}, or an \texttt{a} followed by a \texttt{b}. However, the first parser has as output type a single @@ -509,7 +521,8 @@ parser produces a pair of characters as output. The alternative parser is required to have both component parsers to have the same type---we need to be able to build the union of two sets, which means in Scala -they need to be of the same type. We will see later how we can build +they need to be of the same type. Therefore the typing error. +We will see later how we can build this parser without the typing error. The next parser combinator, called \emph{semantic action}, does not @@ -532,18 +545,19 @@ with output type \texttt{T} as one argument as well as a function \texttt{f} with type \texttt{T => S}. The parser \texttt{p} produces sets of type \texttt{(T, I)}. The -semantic action combinastor then applies the function +semantic action combinator then applies the function \texttt{f} to all the parser outputs. Since this function is of type \texttt{T => S}, we obtain a parser with output type \texttt{S}. Again Scala lets us introduce some shorthand notation for this parser combinator. Therefore we will write \texttt{p ==> f} for it. -What are semantic actions good for? Well, they allow is to transform -the parsed input into a datastructure we can use for further +What are semantic actions good for? Well, they allow you to transform +the parsed input into a datastructure you can use for further processing. A simple example would be to transform parsed characters into ASCII numbers. Suppose we define a function \texttt{f} (from -characters to ints) and use a \texttt{CharParser} for the character \texttt{c}. +characters to ints) and use a \texttt{CharParser} for parsing +the character \texttt{c}. \begin{center} \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] @@ -553,7 +567,7 @@ \end{center} \noindent -Then we can run the following parsers on the input \texttt{cbd}: +We then can run the following two parsers on the input \texttt{cbd}: \begin{center} \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] @@ -563,9 +577,9 @@ \end{center} \noindent -The first line we obtain the result \texttt{Set(('c', "bd"))}, whereas the second -produces \texttt{Set((99, "bd"))}---the character has been transformed into -an ASCII number. +In the first line we obtain the expected result \texttt{Set(('c', + "bd"))}, whereas the second produces \texttt{Set((99, "bd"))}---the +character has been transformed into an ASCII number. A slightly less contrived example is about parsing numbers (recall \texttt{NumParser} above). However, we want to do this here for strings. @@ -590,7 +604,7 @@ prefix and the rest according to this regex (\texttt{reg.findPrefixMatchOf} generates a match---in the successful case---and the corresponding strings can be extracted with -\texttt{matched} and \texttt{after}). We can now define a +\texttt{matched} and \texttt{after}). Using this parser we can define a \texttt{NumParser} for strings as follows: \begin{center} @@ -611,8 +625,9 @@ \noindent produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is -still a string. We need to convert it into the corresponding -\texttt{Int}. We can do this as follows +still a string (the double-quotes are not printed by Scala). We need +to convert it into the corresponding \texttt{Int}. We can do this as +follows using a semantic action \begin{center} \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] @@ -621,9 +636,9 @@ \end{center} \noindent -The semantic action in form of a function converts a string into an +The function in the semantic action converts a string into an \texttt{Int}. Let us come back to semantic actions when we are going -to implement a simple calculator. +to implement actual context-free gammars. \subsubsection*{Shorthand notation for parser combinators} @@ -673,6 +688,13 @@ \end{lstlisting} \end{center} +\noindent +The semantic action + + +\noindent +Important to note is that we must define \texttt{Pal}-parser as a +\emph{lazy} value. \subsubsection*{Implementing an Interpreter}