update
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Thu, 26 Nov 2015 00:11:10 +0000
changeset 386 31295bb945c6
parent 385 7f8516ff408d
child 387 8aa406adfde0
update
handouts/ho06.pdf
handouts/ho06.tex
Binary file handouts/ho06.pdf has changed
--- a/handouts/ho06.tex	Wed Nov 25 15:59:32 2015 +0000
+++ b/handouts/ho06.tex	Thu Nov 26 00:11:10 2015 +0000
@@ -7,17 +7,17 @@
 
 \section*{Handout 6 (Parser Combinators)}
 
-In what follows we explain \emph{parser combinators}, because
-they are very easy to implement. However, they only work when
-the grammar to be parsed is \emph{not} left-recursive and they
-are efficient only when the grammar is unambiguous. It is the
-responsibility of the grammar designer to ensure these two
-properties.
+In what follows we explain \emph{parser combinators}. Their
+distinguishing feature is that they are very easy to
+implement. However, they only work when the grammar to be
+parsed is \emph{not} left-recursive and they are efficient
+only when the grammar is unambiguous. It is the responsibility
+of the grammar designer to ensure these two properties.
 
 Parser combinators can deal with any kind of input as long as
 this input is a kind of sequence, for example a string or a
-list of tokens. If the input are lists of tokens, then parser
-combinators transform them into sets of pairs, like so
+list of tokens. The general idea behind parser combinators is
+to transform the input into sets of pairs, like so
 
 \begin{center}
 $\underbrace{\text{list of tokens}}_{\text{input}}$ 
@@ -25,8 +25,7 @@
 $\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$
 \end{center} 
 
-\noindent In Scala parser combinators will therefore be
-functions of type
+\noindent In Scala parser combinators are functions of type
 
 \begin{center}
 \texttt{I $\Rightarrow$ Set[(T, I)]}
@@ -37,24 +36,23 @@
 these pairs corresponds to what the parser combinator was able
 to process from the input and the second is the unprocessed
 part of the input. As we shall see shortly, a parser
-combinator might return more than one such pair, with the idea
-that there are potentially several ways how to interpret the
-input. To simplify matters we will first look at the case
+combinator might return more than one such pair, the idea
+being that there are potentially several ways how to interpret
+the input. To simplify matters we will first look at the case
 where the input to the parser combinators is just strings. As
-a concrete example, consider the case where the input is the
-string
+a concrete example, consider the string
 
 \begin{center}
 \tt\Grid{iffoo\VS testbar}
 \end{center}
 
 \noindent We might have a parser combinator which tries to
-interpret this string as a keyword (\texttt{if}) or an
+interpret this string as a keyword (\texttt{if}) or as an
 identifier (\texttt{iffoo}). Then the output will be the set
 
 \begin{center}
-$\left\{ \left(\texttt{\Grid{if}}\,,\, \texttt{\Grid{foo\VS testbar}}\right), 
-           \left(\texttt{\Grid{iffoo}}\,,\, \texttt{\Grid{\VS testbar}}\right) \right\}$
+$\left\{ \left(\texttt{\Grid{if}}\;,\; \texttt{\Grid{foo\VS testbar}}\right), 
+           \left(\texttt{\Grid{iffoo}}\;,\; \texttt{\Grid{\VS testbar}}\right) \right\}$
 \end{center}
 
 \noindent where the first pair means the parser could
@@ -62,7 +60,7 @@
 `unprocessed' as the second component of the pair; in the
 other case it could recognise \texttt{iffoo} and leaves
 \texttt{\VS testbar} as unprocessed. If the parser cannot
-recognise anything from the input then parser combinators just
+recognise anything from the input, then parser combinators just
 return the empty set $\{\}$. This will indicate something
 ``went wrong''\ldots or more precisely, nothing could be
 parsed.
@@ -93,19 +91,19 @@
 ``centrally'' derive the function \texttt{parse\_all}, which
 just filters out all pairs whose second component is not empty
 (that is has still some unprocessed part). The reason is that
-at the end of parsing we are only interested in the results
+at the end of the parsing we are only interested in the results
 where all the input has been consumed and no unprocessed part
-is left.
+is left over.
 
 One of the simplest parser combinators recognises just a
 character, say $c$, from the beginning of strings. Its
 behaviour can be described as follows:
 
 \begin{itemize}
-\item if the head of the input string starts with a $c$, it returns 
+\item if the head of the input string starts with a $c$, then returns 
 	the set $\{(c, \textit{tail of}\; s)\}$, where \textit{tail of} 
 	$s$ is the unprocessed part of the input string
-\item otherwise it returns the empty set $\{\}$	
+\item otherwise return the empty set $\{\}$	
 \end{itemize}
 
 \noindent
@@ -135,14 +133,15 @@
 parsers out of smaller component parsers. For example the
 alternative parser combinator is as follows: given two
 parsers, say, $p$ and $q$, then we apply both parsers to the
-input (remember parsers are functions) and combine the input
+input (remember parsers are functions) and combine the output
+(remember they are sets)
  
 \begin{center}
 $p(\text{input}) \cup q(\text{input})$
 \end{center}
 
-\noindent
-In Scala we would implement alternative parsers as follows
+\noindent In Scala we would implement alternative parser
+combinator as follows
 
 \begin{center}
 \begin{lstlisting}[language=Scala, numbers=none]
@@ -157,21 +156,22 @@
 \noindent The types of this parser combinator are polymorphic
 (we just have \texttt{I} for the input type, and \texttt{T}
 for the output type). The alternative parser builds a new
-parser out of two existing parser combinator \texttt{p} and
-\texttt{q}. Both need to be able to process input of type
-\texttt{I} and return the same output type \texttt{Set[(T,
-I)]}. (There is an interesting detail of Scala, namely the
-\texttt{=>} in front of the types of \texttt{p} and
+parser out of two existing parsers \texttt{p} and \texttt{q}.
+Both need to be able to process input of type \texttt{I} and
+return the same output type \texttt{Set[(T,
+I)]}.\footnote{There is an interesting detail of Scala, namely
+the \texttt{=>} in front of the types of \texttt{p} and
 \texttt{q}. They will prevent the evaluation of the arguments
 before they are used. This is often called \emph{lazy
-evaluation} of the arguments.) The alternative parser should
+evaluation} of the arguments. We will explain this later.} The alternative parser should
 run the input with the first parser \texttt{p} (producing a
 set of outputs) and then run the same input with \texttt{q}.
 The result should be then just the union of both sets, which
 is the operation \texttt{++} in Scala.
 
-This parser combinator already allows us to construct a parser
-that either a character \texttt{a} or \texttt{b}, as
+The alternative parser combinator already allows us to
+construct a parser that parses either a character \texttt{a}
+or \texttt{b}, as
 
 \begin{center}
 \begin{lstlisting}[language=Scala, numbers=none]
@@ -194,22 +194,22 @@
 
 \noindent We receive in the first two cases a successful
 output (that is a non-empty set). In each case, either
-\pcode{a} or \pcode{b} are the processed parts, and \pcode{c}
-is the unprocessed part. Clearly this parser cannot parse
-anything with the string \pcode{cc}.
+\pcode{a} or \pcode{b} are in the processed parts, and
+\pcode{c} in the unprocessed part. Clearly this parser cannot
+parse anything in the string \pcode{cc}, therefore the empty
+set.
 
 A bit more interesting is the \emph{sequence parser
-combinator}. Given two parsers, say, $p$ and $q$,
-apply first the input to $p$ producing a set of pairs;
-then apply $q$ to the unparsed parts; then combine the 
-results like
+combinator}. Given two parsers, say, $p$ and $q$, apply first
+the input to $p$ producing a set of pairs; then apply $q$ to
+all the unparsed parts; then combine the results like
 
 \begin{center}
 \begin{tabular}{lcl}
 $\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & 
 $(\textit{output}_1, u_1) \in p(\text{input}) 
 \;\wedge\;$\\
-&& $(\textit{output}_2, u_2) \in q(u_1)\}$
+&& $\;(\textit{output}_2, u_2) \in q(u_1)\}$
 \end{tabular}
 \end{center}
 
@@ -222,9 +222,9 @@
        (p: => Parser[I, T], 
         q: => Parser[I, S]) extends Parser[I, (T, S)] {
   def parse(sb: I) = 
-    for ((head1, tail1) <- p.parse(sb); 
-         (head2, tail2) <- q.parse(tail1)) 
-            yield ((head1, head2), tail2)
+    for ((output1, u1) <- p.parse(sb); 
+         (output2, u2) <- q.parse(u1)) 
+            yield ((output1, output2), u2)
 }
 \end{lstlisting}
 \end{center}
@@ -232,7 +232,7 @@
 \noindent This parser takes as input two parsers, \texttt{p}
 and \texttt{q}. It implements \texttt{parse} as follows: let
 first run the parser \texttt{p} on the input producing a set
-of pairs (\texttt{head1}, \texttt{tail1}). The \texttt{tail1}
+of pairs (\texttt{output1}, \texttt{u1}). The \texttt{u1}
 stands for the unprocessed parts left over by \texttt{p}. Let
 \texttt{q} run on these unprocessed parts producing again a
 set of pairs. The output of the sequence parser combinator is
@@ -249,8 +249,8 @@
 \end{center}
 
 Scala allows us to provide some shorthand notation for the
-sequence parser combinator. So we can write for example
-\texttt{'a' $\sim$ 'b'}, which is the parser combinator that
+sequence parser combinator. We can write for example
+\pcode{'a' ~ 'b'}, which is the parser combinator that
 first consumes the character \texttt{a} from a string and then
 \texttt{b}. Three examples of this parser combinator are as
 follows:
@@ -264,10 +264,10 @@
 \end{tabular}
 \end{center}
 
-\noindent A slightly more complicated parser is \texttt{('a'
-|| 'b') $\sim$ 'b'} which parses as first character either an
+\noindent A slightly more complicated parser is \pcode{('a'
+|| 'b') ~ 'b'} which parses as first character either an
 \texttt{a} or \texttt{b} followed by a \texttt{b}. This parser
-produces the following results.
+produces the following outputs.
 
 \begin{center}
 \begin{tabular}{rcl}
@@ -289,10 +289,11 @@
 \end{tabular}
 \end{center}
 
-\noindent Notice how the results nest deeper as pairs (the
-last \pcode{a} is in the unprocessed part). To consume
-everything we can use the parser \pcode{(('a' ~'a') ~ 'a') ~
-'a'}. Then the output is as follows:
+\noindent Notice how the results nest deeper and deeper as
+pairs (the last \pcode{a} is in the unprocessed part). To
+consume everything of this string we can use the parser
+\pcode{(('a' ~'a') ~ 'a') ~ 'a'}. Then the output is as
+follows:
 
 \begin{center}
 \begin{tabular}{rcl}
@@ -307,8 +308,8 @@
 empty string.
 
 
-Note carefully that constructing a parser such \texttt{'a' ||
-('a' $\sim$ 'b')} will result in a tying error. The first
+Note carefully that constructing a parser such \pcode{'a' ||
+('a' ~ 'b')} will result in a typing error. The first
 parser has as output type a single character (recall the type
 of \texttt{CharParser}), but the second parser produces a pair
 of characters as output. The alternative parser is however
@@ -317,7 +318,7 @@
 typing error.
 
 The next parser combinator does not actually combine smaller
-parsers, but applies a function to the result of the parser.
+parsers, but applies a function to the result of a parser.
 It is implemented in Scala as follows
 
 \begin{center}
@@ -332,14 +333,20 @@
 \end{center}
 
 
-\noindent
-This parser combinator takes a parser \texttt{p} with output type \texttt{T} as
-input as well as a function \texttt{f} with type \texttt{T => S}. The parser \texttt{p}
-produces sets of type \texttt{(T, I)}. The \texttt{FunParser} combinator then
-applies the function \texttt{f} to all the parer outputs. Since this function
-is of type \texttt{T => S}, we obtain a parser with output type \texttt{S}.
-Again Scala lets us introduce some shorthand notation for this parser combinator. 
-Therefore we will write \texttt{p ==> f} for it.
+\noindent This parser combinator takes a parser \texttt{p}
+with output type \texttt{T} as one argument as well as a
+function \texttt{f} with type \texttt{T => S}. The parser
+\texttt{p} produces sets of type \texttt{(T, I)}. The
+\texttt{FunParser} combinator then applies the function
+\texttt{f} to all the parser outputs. Since this function is of
+type \texttt{T => S}, we obtain a parser with output type
+\texttt{S}. Again Scala lets us introduce some shorthand
+notation for this parser combinator. Therefore we will write
+\texttt{p ==> f} for it.
+
+\subsubsection*{How to build parsers using parser combinators?}
+
+\subsubsection*{Implementing an Interpreter}
 
 %\bigskip
 %takes advantage of the full generality---have a look