afl-material: comparison handouts/ho06.tex

equal deleted inserted replaced

-:451a95e1bc25
+:5ddedcd92d84
 \begin{document}
 \section*{Handout 6 (Parser Combinators)}
 This handout explains how \emph{parser combinators} work and how they
-can be implemented in Scala. Their distinguishing feature is that they
+can be implemented in Scala. Their most distinguishing feature is that
-are very easy to implement (admittedly it is only easy in a functional
+they are very easy to implement (admittedly it is only easy in a
-programming language). However, parser combinators require that the
+functional programming language).  Another good point of parser
-grammar to be parsed is \emph{not} left-recursive and they are
+combinators is that they can deal with any kind of input as long as
-efficient only when the grammar is unambiguous. It is the
+this input is of ``sequence-kind'', for example a string or a list of
-responsibility of the grammar designer to ensure these two properties.
+tokens. The only two properties of the input we need is to be able to
+test when it is empty and ``sequentially'' take it apart. Strings and
-Another good point of parser combinators is that they can deal with any kind
+lists fit this bill. However, parser combinators also have their
-of input as long as this input of ``sequence-kind'', for example a
+drawbacks. For example they require that the grammar to be parsed is
-string or a list of tokens. The general idea behind parser combinators
+\emph{not} left-recursive and they are efficient only when the grammar
-is to transform the input into sets of pairs, like so
+is unambiguous. It is the responsibility of the grammar designer to
+ensure these two properties.
+The general idea behind parser combinators is to transform the input
+into sets of pairs, like so
 \begin{center}
 $\underbrace{\text{list of tokens}}_{\text{input}}$
 $\Rightarrow$
 $\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$
 \end{center}
-\noindent As said, the input can be anything as long as it is a
+\noindent
-``sequence''. The only property of the input we need is to be able to
+Given the extended effort we have spent in order to implement a lexer
-test when it is empty. Obviously we can do this for strings and lists.
+in order to generate list of tokens, it might be surprising that in
-For more lucidity we shall below often use strings as input in order
+what follows we shall often use strings as input. This is for making
-to illustrate matters. However, this does not make our previous work
+the explanation more lucid. It does not make our previous work on
-on lexers obsolete (remember they transform a string into a list of
+lexers obsolete (remember they transform a string into a list of
-tokens). Lexers will still be needed to build a somewhat realistic
+tokens). Lexers will still be needed for building a somewhat realistic
 compiler.
+But as said, parser combinators are relatively agnostic about what
-In my Scala code I use the following polymorphic types for parser combinators:
+kind of input they process. In my Scala code I use the following
+polymorphic types for parser combinators:
 \begin{center}
 input:\;\; \texttt{I}  \qquad output:\;\; \texttt{T}
 \end{center}
-\noindent that is they take as input something of type \texttt{I} and
+\noindent That is they take as input something of type \texttt{I} and
-return a set of pairs of \texttt{Set[(T, I)]}. Since the input needs
+return a set of pairs of type \texttt{Set[(T, I)]}. Since the input needs
-to be of ``sequence-kind'' I actually have to often write \texttt{I
+to be of ``sequence-kind'', I actually have to often write \texttt{I
-<\% Seq[\_]} for the input type in order to indicate it is a
+<\% Seq[\_]} for the input type. This ensures the input is a
 subtype of Scala sequences. The first component of the generated pairs
 corresponds to what the parser combinator was able to process from the
 input and the second is the unprocessed part of the input (therefore
 the type of this unprocessed part is the same as the input). As we
 shall see shortly, a parser combinator might return more than one such
-pair; the idea being that there are potentially several ways of how to
+pair; the idea is that there are potentially several ways of how to
 parse the input.  As a concrete example, consider the string
 \begin{center}
 \tt\Grid{iffoo\VS testbar}
 \end{center}
 \begin{center}
 $\left\{ \left(\texttt{\Grid{if}}\;,\; \texttt{\Grid{foo\VS testbar}}\right),
 \left(\texttt{\Grid{iffoo}}\;,\; \texttt{\Grid{\VS testbar}}\right) \right\}$
 \end{center}
-\noindent where the first pair means the parser could
+\noindent where the first pair means the parser could recognise
-recognise \texttt{if} from the input and leaves the rest as
+\texttt{if} from the input and leaves the rest as `unprocessed' as the
-`unprocessed' as the second component of the pair; in the
+second component of the pair; in the other case it could recognise
-other case it could recognise \texttt{iffoo} and leaves
+\texttt{iffoo} and leaves \texttt{\VS testbar} as unprocessed. If the
-\texttt{\VS testbar} as unprocessed. If the parser cannot
+parser cannot recognise anything from the input at all, then parser
-recognise anything from the input, then parser combinators just
+combinators just return the empty set $\{\}$. This will indicate
-return the empty set $\{\}$. This will indicate something
+something ``went wrong''\ldots or more precisely, nothing could be
-``went wrong''\ldots or more precisely, nothing could be
 parsed.
 Also important to note is that the type \texttt{T} for the processed
-part is different from the input type. The reason is that in general
+part is different from the input type. In the example above is just
-we are interested in transform our input into something
+happens to be the same. The reason for the difference is that in
-``different''\ldots for example into a tree, or if we implement the
+general we are interested in transforming our input into something
-grammar for arithmetic expressions we might be interested in the
+``different''\ldots for example into a tree; or if we implement the
+grammar for arithmetic expressions, we might be interested in the
 actual integer number the arithmetic expression, say \texttt{1 + 2 *
 3}, stands for. In this way we can use parser combinators to
-implement relatively easily a calculator.
+implement relatively easily a calculator, for instance.
 The main idea of parser combinators is that we can easily build parser
 combinators out of smaller components following very closely the
 structure of a grammar. In order to implement this in an
 object-oriented programming language, like Scala, we need to specify
 \item Otherwise return the empty set $\{\}$.
 \end{itemize}
 \noindent
 The input type of this simple parser combinator for characters is
-\texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}.
+\texttt{String} and the output type is \texttt{Char}. This means
+\texttt{parse} returns  \mbox{\texttt{Set[(Char, String)]}}.
 The code in Scala is as follows:
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 case class CharParser(c: Char) extends Parser[String, Char] {
-def parse(sb: String) =
+def parse(in: String) =
-if (sb.head == c) Set((c, sb.tail)) else Set()
+if (in.head == c) Set((c, in.tail)) else Set()
 }
 \end{lstlisting}
 \end{center}
-\noindent You can see the \texttt{parse} function tests whether the
+\noindent You can see the \texttt{parse} tests whether the
-first character of the input string \texttt{sb} is equal to
+first character of the input string \texttt{in} is equal to
 \texttt{c}. If yes, then it splits the string into the recognised part
-\texttt{c} and the unprocessed part \texttt{sb.tail}. In case
+\texttt{c} and the unprocessed part \texttt{in.tail}. In case
-\texttt{sb} does not start with \texttt{c} then the parser returns the
+\texttt{in} does not start with \texttt{c} then the parser returns the
 empty set (in Scala \texttt{Set()}). Since this parser recognises
 characters and just returns characters as the processed part, the
 output type of the parser is \texttt{Char}.
 If we want to parse a list of tokens and interested in recognising a
 these tokens for numbers. But this parser does not just return this
 token (and the rest of the list), like the \texttt{CharParser} above,
 rather extracts the string \texttt{s} from the token and converts it
 into an integer. The hope is that the lexer did its work well and this
 conversion always succeeds. The consequence of this is that the output
-type for this parser is \texttt{Int}. Such a conversion would be
+type for this parser is \texttt{Int}, not \texttt{Token}. Such a
-needed if we want to implement a simple calculator program.
+conversion would be needed if we want to implement a simple calculator
+program.
 These simple parsers that just look at the input and do a simple
 transformation are often called \emph{atomic} parser combinators.
 More interesting are the parser combinators that build larger parsers
-out of smaller component parsers. For example the \emph{alternative
+out of smaller component parsers. There are three such parser
+combinators that can be implemented generically. The \emph{alternative
 parser combinator} is as follows: given two parsers, say, $p$ and
 $q$, we apply both parsers to the input (remember parsers are
-functions) and combine the output (remember they are sets of pairs)
+functions) and combine the output (remember they are sets of pairs):
 \begin{center}
 $p(\text{input}) \cup q(\text{input})$
 \end{center}
-\noindent In Scala we would implement alternative parser
+\noindent In Scala we can implement alternative parser
 combinator as follows
 \begin{center}
 \begin{lstlisting}[language=Scala, numbers=none]
 class AltParser[I, T]
 (p: => Parser[I, T],
 q: => Parser[I, T]) extends Parser[I, T] {
-def parse(sb: I) = p.parse(sb) ++ q.parse(sb)
+def parse(in: I) = p.parse(in) ++ q.parse(in)
+}
+\end{lstlisting}
+\end{center}
+\noindent The types of this parser combinator are again generic (we
+have \texttt{I} for the input type, and \texttt{T} for the output
+type). The alternative parser builds a new parser out of two existing
+parsers \texttt{p} and \texttt{q} given as arguments.  Both parsers
+need to be able to process input of type \texttt{I} and return in
+\texttt{parse} the same output type \texttt{Set[(T,
+I)]}.\footnote{There is an interesting detail of Scala, namely the
+\texttt{=>} in front of the types of \texttt{p} and \texttt{q}. They
+will prevent the evaluation of the arguments before they are
+used. This is often called \emph{lazy evaluation} of the
+arguments. We will explain this later.}  The alternative parser
+should run the input with the first parser \texttt{p} (producing a set
+of pairs) and then run the same input with \texttt{q} (producing
+another set of pairs).  The result should be then just the union of
+both sets, which is the operation \texttt{++} in Scala.
+The alternative parser combinator allows us to construct a parser that
+parses either a character \texttt{a} or \texttt{b} using the
+\texttt{CharParser} shown above. For this we can write
+\begin{center}
+\begin{lstlisting}[language=Scala, numbers=none]
+new AltParser(CharParser('a'), CharParser('b'))
+\end{lstlisting}
+\end{center}
+\noindent Later on we will use Scala mechanism for introducing some
+more readable shorthand notation for this, like \texttt{"a" ||
+"b"}. Let us look in detail at what this parser combinator produces
+with some sample strings
+\begin{center}
+\begin{tabular}{rcl}
+input strings & & output\medskip\\
+\texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
+\texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{cde}})\right\}$\\
+\texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$
+\end{tabular}
+\end{center}
+\noindent We receive in the first two cases a successful
+output (that is a non-empty set). In each case, either
+\pcode{a} or \pcode{b} is in the processed part, and
+\pcode{cde} in the unprocessed part. Clearly this parser cannot
+parse anything with \pcode{ccde}, therefore the empty
+set is returned.
+A bit more interesting is the \emph{sequence parser combinator}. Given
+two parsers, say again, $p$ and $q$, we want to apply first the input
+to $p$ producing a set of pairs; then apply $q$ to all the unparsed
+parts in the pairs; and then combine the results. Mathematically we would
+write something like this for the expected set of pairs:
+\begin{center}
+\begin{tabular}{lcl}
+$\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\,|\,$ &
+$(\textit{output}_1, u_1) \in p(\text{input})
+\;\wedge\;$\\
+&& $(\textit{output}_2, u_2) \in q(u_1)\}$
+\end{tabular}
+\end{center}
+\noindent Notice that the $p$ wil first be run on the input, producing
+pairs of the form $(\textit{output}_1, u_1)$ where the $u_1$ stands
+for the unprocessed, or left-over, parts. We want that $q$ runs on all
+these unprocessed parts $u_1$. This again will produce some
+processed part , $p$ and
+$q$, we apply both parsers to the input (remember parsers are
+functions) and combine the output (remember they are sets of pairs)
+\begin{center}
+$p(\text{input}) \cup q(\text{input})$
+\end{center}
+\noindent In Scala we would implement alternative parser
+combinator as follows
+\begin{center}
+\begin{lstlisting}[language=Scala, numbers=none]
+class AltParser[I, T]
+(p: => Parser[I, T],
+q: => Parser[I, T]) extends Parser[I, T] {
+def parse(in: I) = p.parse(in) ++ q.parse(in)
 }
 \end{lstlisting}
 \end{center}
 \noindent The types of this parser combinator are again generic (we
 \;\wedge\;$\\
 && $(\textit{output}_2, u_2) \in q(u_1)\}$
 \end{tabular}
 \end{center}
-\noindent Notice that the $p$ wil first be run on the input, producing
+\noindent Notice that the $p$ will first be run on the input,
-pairs of the form $(\textit{output}_1, u_1)$ where the $u_1$ stands
+producing pairs of the form $\textit{output}_1$ and unprocessed part
-for the unprocessed, or left-over, parts. We want that $q$ runs on all
+$u_1$. This unprocessed part is fed into the second parser $q$. The
-these unprocessed parts $u_1$. This again will produce some
-processed part , $p$ and
-$q$, we apply both parsers to the input (remember parsers are
-functions) and combine the output (remember they are sets of pairs)
-\begin{center}
-$p(\text{input}) \cup q(\text{input})$
-\end{center}
-\noindent In Scala we would implement alternative parser
-combinator as follows
-\begin{center}
-\begin{lstlisting}[language=Scala, numbers=none]
-class AltParser[I, T]
-(p: => Parser[I, T],
-q: => Parser[I, T]) extends Parser[I, T] {
-def parse(sb: I) = p.parse(sb) ++ q.parse(sb)
-}
-\end{lstlisting}
-\end{center}
-\noindent The types of this parser combinator are again generic (we
-just have \texttt{I} for the input type, and \texttt{T} for the output
-type). The alternative parser builds a new parser out of two existing
-parsers \texttt{p} and \texttt{q}.  Both need to be able to process
-input of type \texttt{I} and return the same output type
-\texttt{Set[(T, I)]}.\footnote{There is an interesting detail of
-Scala, namely the \texttt{=>} in front of the types of \texttt{p}
-and \texttt{q}. They will prevent the evaluation of the arguments
-before they are used. This is often called \emph{lazy evaluation} of
-the arguments. We will explain this later.}  Therefore the output
-type of this parser is \texttt{T}. The alternative parser should run
-the input with the first parser \texttt{p} (producing a set of pairs)
-and then run the same input with \texttt{q} (producing another set of
-pairs).  The result should be then just the union of both sets, which
-is the operation \texttt{++} in Scala.
-The alternative parser combinator already allows us to
-construct a parser that parses either a character \texttt{a}
-or \texttt{b}, as
-\begin{center}
-\begin{lstlisting}[language=Scala, numbers=none]
-new AltParser(CharParser('a'), CharParser('b'))
-\end{lstlisting}
-\end{center}
-\noindent Later on we will use again Scala mechanism for introducing some
-more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some sample strings
-\begin{center}
-\begin{tabular}{rcl}
-input strings & & output\medskip\\
-\texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
-\texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{cde}})\right\}$\\
-\texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$
-\end{tabular}
-\end{center}
-\noindent We receive in the first two cases a successful
-output (that is a non-empty set). In each case, either
-\pcode{a} or \pcode{b} is in the processed part, and
-\pcode{cde} in the unprocessed part. Clearly this parser cannot
-parse anything in the string \pcode{ccde}, therefore the empty
-set.
-A bit more interesting is the \emph{sequence parser combinator}. Given
-two parsers, say again, $p$ and $q$, we want to apply first the input
-to $p$ producing a set of pairs; then apply $q$ to all the unparsed
-parts in the pairs; and then combine the results like
-\begin{center}
-\begin{tabular}{lcl}
-$\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ &
-$(\textit{output}_1, u_1) \in p(\text{input})
-\;\wedge\;$\\
-&& $(\textit{output}_2, u_2) \in q(u_1)\}$
-\end{tabular}
-\end{center}
-\noindent Notice that the $p$ wil first be run on the input, producing
-pairs of the form $\textit{output}_1$ and unprocessed part $u_1$. The
 overall result of the sequence parser combinator is pairs of the form
 $((\textit{output}_1, \textit{output}_2), u_2)$. This means the
-unprocessed parts of both parsers is the unprocessed parts the second
+unprocessed part of both parsers is the unprocessed part the second
-parser $q$ produces as left-over. The processed parts of both parsers
+parser $q$ produces leaves as left-over. The processed parts of both
-is just the pair of the outputs
+parsers is a pair consisting of the outputs of $p$ and $q$, namely
 $(\textit{output}_1, \textit{output}_2)$. This behaviour can be
 implemented in Scala as follows:
 \begin{center}
 \begin{lstlisting}[language=Scala,numbers=none]
 class SeqParser[I, T, S]
 (p: => Parser[I, T],
 q: => Parser[I, S]) extends Parser[I, (T, S)] {
-def parse(sb: I) =
+def parse(in: I) =
-for ((output1, u1) <- p.parse(sb);
+for ((output1, u1) <- p.parse(in);
 (output2, u2) <- q.parse(u1))
 yield ((output1, output2), u2)
 }
 \end{lstlisting}
 \end{center}
-\noindent This parser takes again as input two parsers, \texttt{p}
+\noindent This parser takes again as arguments two parsers, \texttt{p}
-and \texttt{q}. It implements \texttt{parse} as follows: let
+and \texttt{q}. It implements \texttt{parse} as follows: let first run
-first run the parser \texttt{p} on the input producing a set
+the parser \texttt{p} on the input producing a set of pairs
-of pairs (\texttt{output1}, \texttt{u1}). The \texttt{u1}
+(\texttt{output1}, \texttt{u1}). The \texttt{u1} stands for the
-stands for the unprocessed parts left over by \texttt{p}. Let
+unprocessed parts left over by \texttt{p}. Let then \texttt{q} run on these
-\texttt{q} run on these unprocessed parts producing again a
+unprocessed parts producing again a set of pairs. The output of the
-set of pairs. The output of the sequence parser combinator is
+sequence parser combinator is then a set containing pairs where the
-then a set containing pairs where the first components are
+first components are again pairs, namely what the first parser could
-again pairs, namely what the first parser could parse together
+parse together with what the second parser could parse; the second
-with what the second parser could parse; the second component
+component is the unprocessed part left over after running the second
-is the unprocessed part left over after running the second
+parser \texttt{q}. Therefore the input type of the sequence parser
-parser \texttt{q}. Therefore the input type of the sequence
+combinator is as usual \texttt{I}, but the output type is
-parser combinator is as usual \texttt{I}, but the output type
-is
 \begin{center}
 \texttt{Set[((T, S), I)]}
 \end{center}
 Notice that we have now two output types for the sequence parser
 combinator, because in general \textit{p} and \textit{q} might produce
 different things (for example first we recognise a number and then a
 string corresponding to an operator).
+With the shorthand notation we shall introduce later for the sequence
-We have not yet looked at this in detail, but Scala allows us to
+parser combinator, we can write for example \pcode{"a" ~ "b"}, which
-provide some shorthand notation for the sequence parser combinator. We
+is the parser combinator that first recognises the character
-can write for example \pcode{"a" ~ "b"}, which is the parser
+\texttt{a} from a string and then \texttt{b}. Let us look again at
-combinator that first recognises the character \texttt{a} from a
+three examples of how this parser combinator processes some strings:
-string and then \texttt{b}. Let us look again at three examples of
-how this parser combinator processes strings:
 \begin{center}
 \begin{tabular}{rcl}
 input strings & & output\medskip\\
 \texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{cde}})\right\}$\\
 \texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$
 \end{tabular}
 \end{center}
 \noindent In the first line we have a successful parse, because the
-string started with \texttt{ab}, which is the prefix we are looking
+string starts with \texttt{ab}, which is the prefix we are looking
 for. But since the parsing combinator is constructed as sequence of
 the two simple (atomic) parsers for \texttt{a} and \texttt{b}, the
 result is a nested pair of the form \texttt{((a, b), cde)}. It is
 \emph{not} a simple pair \texttt{(ab, cde)} as one might erroneously
-expects.  The parser returns the empty set in the other examples,
+expect.  The parser returns the empty set in the other examples,
 because they do not fit with what the parser is supposed to parse.
-A slightly more complicated parser is \pcode{("a" || "b") ~ "c"}
+A slightly more complicated parser is \pcode{("a" || "b") ~ "c"} which
-which parses as first character either an \texttt{a} or \texttt{b}
+parses as first character either an \texttt{a} or \texttt{b}, followed
-followed by a \texttt{c}. This parser produces the following outputs.
+by a \texttt{c}. This parser produces the following outputs.
 \begin{center}
 \begin{tabular}{rcl}
 input strings & & output\medskip\\
 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\
 \texttt{\Grid{aaaa}} & $\rightarrow$ &
 $\left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}$\\
 \end{tabular}
 \end{center}
-\noindent Notice how the results nests deeper and deeper as pairs (the
+\noindent Notice how again the results nest deeper and deeper as pairs (the
 last \pcode{a} is in the unprocessed part). To consume everything of
 this string we can use the parser \pcode{(("a" ~ "a") ~ "a") ~
 "a"}. Then the output is as follows:
 \begin{center}
 \end{tabular}
 \end{center}
 \noindent This is an instance where the parser consumed
 completely the input, meaning the unprocessed part is just the
-empty string. So if we called \pcode{parse_all} instead of \pcode{parse}
+empty string. So if we called \pcode{parse_all}, instead of \pcode{parse},
 we would get back the result
 \[
 \left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}
 \]
 \noindent where the unprocessed (empty) parts have been stripped away
 from the pairs; everything where the second part was not empty has
-been thrown away as ultimately-unsuccessful-parses.
+been thrown away as well, because they represent
+ultimately-unsuccessful-parses.
-Note carefully that constructing a parser such \pcode{'a' ||
-('a' ~ 'b')} will result in a typing error. The first
+Note carefully that constructing a parser such \pcode{"a" || ("a" ~
-parser has as output type a single character (recall the type
+"b")} will result in a typing error. The intention is that we want
-of \texttt{CharParser}), but the second parser produces a pair
+to parse an \texttt{a}, or an \texttt{a} followed by a
-of characters as output. The alternative parser is however
+\texttt{b}. However, the first parser has as output type a single
-required to have both component parsers to have the same type.
+character (recall the type of \texttt{CharParser}), but the second
-We will see later how we can build this parser without the
+parser produces a pair of characters as output. The alternative parser
-typing error.
+is required to have both component parsers to have the same type---we
+need to be able to build the union of two sets, which means in Scala
-The next parser combinator does not actually combine smaller
+they need to be of the same type.  We will see later how we can build
-parsers, but applies a function to the result of a parser.
+this parser without the typing error.
-It is implemented in Scala as follows
+The next parser combinator, called \emph{semantic action}, does not
+actually combine smaller parsers, but applies a function to the result
+of a parser.  It is implemented in Scala as follows
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 class FunParser[I, T, S]
 (p: => Parser[I, T],
 f: T => S) extends Parser[I, S] {
-def parse(sb: I) =
+def parse(in: I) =
-for ((head, tail) <- p.parse(sb)) yield (f(head), tail)
+for ((head, tail) <- p.parse(in)) yield (f(head), tail)
 }
 \end{lstlisting}
 \end{center}
 \noindent This parser combinator takes a parser \texttt{p}
 with output type \texttt{T} as one argument as well as a
 function \texttt{f} with type \texttt{T => S}. The parser
 \texttt{p} produces sets of type \texttt{(T, I)}. The
-\texttt{FunParser} combinator then applies the function
+semantic action combinastor then applies the function
 \texttt{f} to all the parser outputs. Since this function is of
 type \texttt{T => S}, we obtain a parser with output type
 \texttt{S}. Again Scala lets us introduce some shorthand
 notation for this parser combinator. Therefore we will write
 \texttt{p ==> f} for it.
+What are semantic actions good for? Ultimately, they allow is to
+transform the parsed input into a datastructure we can use for further
+processing. A simple example would be to transform parsed characters
+into ASCII numbers. Suppose we define a function \texttt{f} (from
+characters to ints) and a \texttt{CharParser}.
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+val f = (c: Char) => c.toInt
+val c = new CharParser('c')
+\end{lstlisting}
+\end{center}
+\noindent
+Then
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+c.parse("cbd")
+(c ==> f).parse("cbd")
+\end{lstlisting}
+\end{center}
+\noindent
+the first line produces \texttt{Set(('c', "bd"))}, whereas the second
+produces \texttt{Set((99, "bd"))}.
+\subsubsection*{Shorthand notation for parser combinators}
+Before we proceed, let us just explain the shorthand notation for
+parser combinators. Like for regular expressions, the shorthand notation
+will make our life much easier when writing actual parsers.
 \subsubsection*{How to build parsers using parser combinators?}
 \subsubsection*{Implementing an Interpreter}

changeset 587	5ddedcd92d84
parent 586	451a95e1bc25
child 588	a4646557016d