afl-material: handouts/ho06.tex@c6a1e19e9801



\documentclass{article}
\usepackage{../style}
\usepackage{../langs}
\usepackage{../grammar}

\begin{document}

\section*{Handout 6 (Parser Combinators)}

This handout explains how \emph{parser combinators} work and how they
can be implemented in Scala. Their most distinguishing feature is that
they are very easy to implement (admittedly it is only easy in a
functional programming language).  Another good point of parser
combinators is that they can deal with any kind of input as long as
this input is of ``sequence-kind'', for example a string or a list of
tokens. The only two properties of the input we need is to be able to
test when it is empty and ``sequentially'' take it apart. Strings and
lists fit this bill. However, parser combinators also have their
drawbacks. For example they require that the grammar to be parsed is
\emph{not} left-recursive and they are efficient only when the grammar
is unambiguous. It is the responsibility of the grammar designer to
ensure these two properties.

The general idea behind parser combinators is to transform the input
into sets of pairs, like so

\begin{center}
$\underbrace{\text{list of tokens}}_{\text{input}}$ 
$\Rightarrow$
$\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$
\end{center} 

\noindent
Given the extended effort we have spent implementing a lexer in order
to generate list of tokens, it might be surprising that in what
follows we shall often use strings as input, rather than list of
tokens. This is for making the explanation more lucid. It does not
make our previous work on lexers obsolete (remember they transform a
string into a list of tokens). Lexers will still be needed for
building a somewhat realistic compiler.

As mentioned above, parser combinators are relatively agnostic about what
kind of input they process. In my Scala code I use the following
polymorphic types for parser combinators:

\begin{center}
input:\;\; \texttt{I}  \qquad output:\;\; \texttt{T}
\end{center}

\noindent That is they take as input something of type \texttt{I} and
return a set of pairs of type \texttt{Set[(T, I)]}. Since the input
needs to be of ``sequence-kind'', I actually have to often write
\texttt{I <\% Seq[\_]} for the input type. This ensures the input is a
subtype of Scala sequences. The first component of the generated pairs
corresponds to what the parser combinator was able to process from the
input and the second is the unprocessed part, or leftover, of the
input (therefore the type of this unprocessed part is the same as the
input). A parser combinator might return more than one such pair; the
idea is that there are potentially several ways of how to parse the
input.  As a concrete example, consider the string

\begin{center}
\tt\Grid{iffoo\VS testbar}
\end{center}

\noindent We might have a parser combinator which tries to
interpret this string as a keyword (\texttt{if}) or as an
identifier (\texttt{iffoo}). Then the output will be the set

\begin{center}
$\left\{ \left(\texttt{\Grid{if}}\;,\; \texttt{\Grid{foo\VS testbar}}\right), 
           \left(\texttt{\Grid{iffoo}}\;,\; \texttt{\Grid{\VS testbar}}\right) \right\}$
\end{center}

\noindent where the first pair means the parser could recognise
\texttt{if} from the input and leaves the \texttt{foo\VS testbar} as
`unprocessed' part; in the other case it could recognise
\texttt{iffoo} and leaves \texttt{\VS testbar} as unprocessed. If the
parser cannot recognise anything from the input at all, then parser
combinators just return the empty set $\{\}$. This will indicate
something ``went wrong''\ldots or more precisely, nothing could be
parsed.

Also important to note is that the type \texttt{T} for the processed
part is different from the input type \texttt{I} in the parse. In the
example above is just happens to be the same. The reason for the
potential difference is that in general we are interested in
transforming our input into something ``different''\ldots for example
into a tree; or if we implement the grammar for arithmetic
expressions, we might be interested in the actual integer number the
arithmetic expression, say \texttt{1 + 2 * 3}, stands for. In this way
we can use parser combinators to implement relatively easily a
calculator, for instance.

The main idea of parser combinators is that we can easily build parser
combinators out of smaller components following very closely the
structure of a grammar. In order to implement this in an
object-oriented programming language, like Scala, we need to specify
an abstract class for parser combinators. This abstract class states
that the function \texttt{parse} takes an argument of type \texttt{I}
and returns a set of type \mbox{\texttt{Set[(T, I)]}}.

\begin{center}
\begin{lstlisting}[language=Scala]
abstract class Parser[I, T] {
  def parse(in: I) : Set[(T, I)]

  def parse_all(in: I) : Set[T] =
    for ((head, tail) <- parse(in); if (tail.isEmpty)) 
      yield head
}
\end{lstlisting}
\end{center}

\noindent It is the obligation in each instance (parser combinator) to
supply an implementation for \texttt{parse}.  From this function we
can then ``centrally'' derive the function \texttt{parse\_all}, which
just filters out all pairs whose second component is not empty (that
is has still some unprocessed part). The reason is that at the end of
the parsing we are only interested in the results where all the input
has been consumed and no unprocessed part is left over.

One of the simplest parser combinators recognises just a
single character, say $c$, from the beginning of strings. Its
behaviour can be described as follows:

\begin{itemize}
\item If the head of the input string starts with a $c$, then return 
  the set
  \[\{(c, \textit{tail of}\; s)\}\]
  where \textit{tail of} 
  $s$ is the unprocessed part of the input string.
\item Otherwise return the empty set $\{\}$.	
\end{itemize}

\noindent
The input type of this simple parser combinator is \texttt{String} and
the output type is \texttt{Char}. This means \texttt{parse} returns
\mbox{\texttt{Set[(Char, String)]}}.  The code in Scala is as follows:

\begin{center}
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
case class CharParser(c: Char) extends Parser[String, Char] {
  def parse(in: String) = 
    if (in.head == c) Set((c, in.tail)) else Set()
}
\end{lstlisting}
\end{center}

\noindent You can see \texttt{parse} tests whether the
first character of the input string \texttt{in} is equal to
\texttt{c}. If yes, then it splits the string into the recognised part
\texttt{c} and the unprocessed part \texttt{in.tail}. In case
\texttt{in} does not start with \texttt{c} then the parser returns the
empty set (in Scala \texttt{Set()}). Since this parser recognises
characters and just returns characters as the processed part, the
output type of the parser is \texttt{Char}.

If we want to parse a list of tokens and interested in recognising a
number token, for example, we could write something like this

\begin{center}
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily,numbers=none]
case object NumParser extends Parser[List[Token], Int] {
  def parse(ts: List[Token]) = ts match {
    case Num_token(s)::ts => Set((s.toInt, ts)) 
    case _ => Set ()
  }
}
\end{lstlisting}
\end{center}

\noindent
In this parser the input is of type \texttt{List[Token]}. The function
parse looks at the input \texttt{ts} and checks whether the first
token is a \texttt{Num\_token} (let us assume our lexer generated
these tokens for numbers). But this parser does not just return this
token (and the rest of the list), like the \texttt{CharParser} above,
rather it extracts also the string \texttt{s} from the token and
converts it into an integer. The hope is that the lexer did its work
well and this conversion always succeeds. The consequence of this is
that the output type for this parser is \texttt{Int}, not
\texttt{Token}. Such a conversion would be needed if we want to
implement a simple calculator program, because string-numbers need to
be transformed into \texttt{Int}-numbers in order to do the
calculations.


These simple parsers that just look at the input and do a simple
transformation are often called \emph{atomic} parser combinators.
More interesting are the parser combinators that build larger parsers
out of smaller component parsers. There are three such parser
combinators that can be implemented generically. The \emph{alternative
  parser combinator} is as follows: given two parsers, say, $p$ and
$q$, we apply both parsers to the input (remember parsers are
functions) and combine the output (remember they are sets of pairs):
 
\begin{center}
$p(\text{input}) \cup q(\text{input})$
\end{center}

\noindent In Scala we can implement alternative parser
combinator as follows

\begin{center}
\begin{lstlisting}[language=Scala, numbers=none]
class AltParser[I, T]
       (p: => Parser[I, T], 
        q: => Parser[I, T]) extends Parser[I, T] {
  def parse(in: I) = p.parse(in) ++ q.parse(in)
}
\end{lstlisting}
\end{center}

\noindent The types of this parser combinator are again generic (we
have \texttt{I} for the input type, and \texttt{T} for the output
type). The alternative parser builds a new parser out of two existing
parsers \texttt{p} and \texttt{q} which are given as arguments.  Both
parsers need to be able to process input of type \texttt{I} and return
in \texttt{parse} the same output type \texttt{Set[(T,
  I)]}.\footnote{There is an interesting detail of Scala, namely the
  \texttt{=>} in front of the types of \texttt{p} and \texttt{q}. They
  will prevent the evaluation of the arguments before they are
  used. This is often called \emph{lazy evaluation} of the
  arguments. We will explain this later.}  The alternative parser runs
the input with the first parser \texttt{p} (producing a set of pairs)
and then runs the same input with \texttt{q} (producing another set of
pairs).  The result should be then just the union of both sets, which
is the operation \texttt{++} in Scala.

The alternative parser combinator allows us to construct a parser that
parses either a character \texttt{a} or \texttt{b} using the
\texttt{CharParser} shown above. For this we can write

\begin{center}
\begin{lstlisting}[language=Scala, numbers=none]
new AltParser(CharParser('a'), CharParser('b'))
\end{lstlisting}
\end{center}

\noindent Later on we will use Scala mechanism for introducing some
more readable shorthand notation for this, like \texttt{"a" |
  "b"}. Let us look in detail at what this parser combinator produces
with some sample strings.

\begin{center}
\begin{tabular}{rcl}
input strings & & output\medskip\\
\texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
\texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{cde}})\right\}$\\
\texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$
\end{tabular}
\end{center}

\noindent We receive in the first two cases a successful
output (that is a non-empty set). In each case, either
\pcode{a} or \pcode{b} is in the processed part, and
\pcode{cde} in the unprocessed part. Clearly this parser cannot
parse anything with \pcode{ccde}, therefore the empty
set is returned.

A bit more interesting is the \emph{sequence parser combinator}. Given
two parsers, say again, $p$ and $q$, we want to apply first the input
to $p$ producing a set of pairs; then apply $q$ to all the unparsed  
parts in the pairs; and then combine the results. Mathematically we would
write something like this for the result set of pairs:

\begin{center}
\begin{tabular}{lcl}
$\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\,|\,$ & 
$(\textit{output}_1, u_1) \in p(\text{input}) 
\;\wedge\;$\\
&& $(\textit{output}_2, u_2) \in q(u_1)\}$
\end{tabular}
\end{center}

\noindent Notice that the $p$ will first be run on the input,
producing pairs of the form $(\textit{output}_1, u_1)$ where the $u_1$
stands for the unprocessed, or leftover, parts pf $p$. We want that
$q$ runs on all these unprocessed parts $u_1$. Therefore these
unprocessed parts are fed into the second parser $q$. The overall
result of the sequence parser combinator is pairs of the form
$((\textit{output}_1, \textit{output}_2), u_2)$. This means the
unprocessed part of the sequqnce p[arser combinator is the unprocessed
part the second parser $q$ leaves as leftover. The processed parts of
of the component parsers is a pair consisting of the outputs of $p$
and $q$, namely $(\textit{output}_1, \textit{output}_2)$. This
behaviour can be implemented in Scala as follows:

\begin{center}
\begin{lstlisting}[language=Scala,numbers=none]
class SeqParser[I, T, S]
       (p: => Parser[I, T], 
        q: => Parser[I, S]) extends Parser[I, (T, S)] {
  def parse(in: I) = 
    for ((output1, u1) <- p.parse(in); 
         (output2, u2) <- q.parse(u1)) 
            yield ((output1, output2), u2)
}
\end{lstlisting}
\end{center}

\noindent This parser takes again as arguments two parsers, \texttt{p}
and \texttt{q}. It implements \texttt{parse} as follows: let first run
the parser \texttt{p} on the input producing a set of pairs
(\texttt{output1}, \texttt{u1}). The \texttt{u1} stands for the
unprocessed parts left over by \texttt{p}. Let then \texttt{q} run on these
unprocessed parts producing again a set of pairs. The output of the
sequence parser combinator is then a set containing pairs where the
first components are again pairs, namely what the first parser could
parse together with what the second parser could parse; the second
component is the unprocessed part left over after running the second
parser \texttt{q}. Therefore the input type of the sequence parser
combinator is as usual \texttt{I}, but the output type is

\begin{center}
\texttt{(T, S)}
\end{center}

\noindent
This means \texttt{parse} in the sequence parser combinator returns
sets of type \texttt{Set[((T, S), I)]}.  Notice that we have
essentially two output types for the sequence parser combinator
(packaged in a pair), because in general \textit{p} and \textit{q}
might produce different things (for example first we recognise a
number and then a string corresponding to an operator).  If any of the
runs of \textit{p} and \textit{q} fail, that is produce the empty set,
then \texttt{parse} will also produce the empty set.

With the shorthand notation we shall introduce later for the sequence
parser combinator, we can write for example \pcode{"a" ~ "b"}, which
is the parser combinator that first recognises the character
\texttt{a} from a string and then \texttt{b}. Let us look again at
three examples of how this parser combinator processes some strings:

\begin{center}
\begin{tabular}{rcl}
input strings & & output\medskip\\
\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{cde}})\right\}$\\
\texttt{\Grid{bacde}} & $\rightarrow$ & $\{\}$\\
\texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$
\end{tabular}
\end{center}

\noindent In the first line we have a successful parse, because the
string starts with \texttt{ab}, which is the prefix we are looking
for. But since the parsing combinator is constructed as sequence of
the two simple (atomic) parsers for \texttt{a} and \texttt{b}, the
result is a nested pair of the form \texttt{((a, b), cde)}. It is
\emph{not} a simple pair \texttt{(ab, cde)} as one might erroneously
expect.  The parser returns the empty set in the other examples,
because they do not fit with what the parser is supposed to parse.


A slightly more complicated parser is \pcode{("a" | "b") ~ "c"} which
parses as first character either an \texttt{a} or \texttt{b}, followed
by a \texttt{c}. This parser produces the following outputs.

\begin{center}
\begin{tabular}{rcl}
input strings & & output\medskip\\
\texttt{\Grid{acde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\
\texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\
\texttt{\Grid{abde}} & $\rightarrow$ & $\{\}$
\end{tabular}
\end{center}

\noindent
Now consider the parser \pcode{("a" ~ "b") ~ "c"} which parses
\texttt{a}, \texttt{b}, \texttt{c} in sequence. This parser produces
the following outputs.

\begin{center}
\begin{tabular}{rcl}
input strings & & output\medskip\\
\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{(((\texttt{\Grid{a}},\texttt{\Grid{b}}), \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\
\texttt{\Grid{abde}} & $\rightarrow$ & $\{\}$\\
\texttt{\Grid{bcde}} & $\rightarrow$ & $\{\}$
\end{tabular}
\end{center}


\noindent The second and third example fail, because something is
``missing'' in the sequence we are looking for. The first succeeds but
notice how the results nest with sequences: the parsed part is a
nested pair of the form \pcode{((a, b), c)}. If we nest the sequence
parser differently, for example \pcode{"a" ~ ("b" ~ "c")}, then also
our output pairs nest differently

\begin{center}
\begin{tabular}{rcl}
input strings & & output\medskip\\
\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}},(\texttt{\Grid{b}}, \texttt{\Grid{c}})), \texttt{\Grid{de}})\right\}$\\
\end{tabular}
\end{center}

\noindent
Two more examples: first consider the parser
\pcode{("a" ~ "a") ~ "a"} and the input \pcode{aaaa}:

\begin{center}
\begin{tabular}{rcl}
input string & & output\medskip\\
\texttt{\Grid{aaaa}} & $\rightarrow$ & 
$\left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}$\\
\end{tabular}
\end{center}

\noindent Notice how again the results nest deeper and deeper as pairs (the
last \pcode{a} is in the unprocessed part). To consume everything of
this string we can use the parser \pcode{(("a" ~ "a") ~ "a") ~
  "a"}. Then the output is as follows:

\begin{center}
\begin{tabular}{rcl}
input string & & output\medskip\\
\texttt{\Grid{aaaa}} & $\rightarrow$ & 
$\left\{((((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{""})\right\}$\\
\end{tabular}
\end{center}

\noindent This is an instance where the parser consumed
completely the input, meaning the unprocessed part is just the
empty string. So if we called \pcode{parse_all}, instead of \pcode{parse},
we would get back the result

\[
\left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}
\]

\noindent where the unprocessed (empty) parts have been stripped away
from the pairs; everything where the second part was not empty has
been thrown away as well, because they represent
ultimately-unsuccessful-parses. The main point is that the sequence
parser combinator returns pairs that can nest according to the
nesting of the component parsers.


Consider also carefully that constructing a parser such \pcode{"a" |
  ("a" ~ "b")} will result in a typing error. The intention with this
parser is that we want to parse an \texttt{a}, or an \texttt{a}
followed by a \texttt{b}. However, the first parser has as output type
a single character (recall the type of \texttt{CharParser}), but the
second parser produces a pair of characters as output. The alternative
parser is required to have both component parsers to have the same
type---we need to be able to build the union of two sets, which means
in Scala they need to be of the same type.  Since ther are not, there
is a typing error in this example.  We will see later how we can build
this parser without the typing error.

The next parser combinator, called \emph{semantic action}, does not
actually combine smaller parsers, but applies a function to the result
of a parser.  It is implemented in Scala as follows

\begin{center}
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
class FunParser[I, T, S]
         (p: => Parser[I, T], 
          f: T => S) extends Parser[I, S] {
  def parse(in: I) = 
    for ((head, tail) <- p.parse(in)) yield (f(head), tail)
}
\end{lstlisting}
\end{center}


\noindent This parser combinator takes a parser \texttt{p} (with input
type \texttt{I} and output type \texttt{T}) as one argument but also a
function \texttt{f} (with type \texttt{T => S}). The parser \texttt{p}
produces sets of type \texttt{Set[(T, I)]}. The semantic action
combinator then applies the function \texttt{f} to all the `processed'
parser outputs. Since this function is of type \texttt{T => S}, we
obtain a parser with output type \texttt{S}. Again Scala lets us
introduce some shorthand notation for this parser
combinator. Therefore we will write \texttt{p ==> f} for it.

What are semantic actions good for? Well, they allow you to transform
the parsed input into datastructures you can use for further
processing. A simple example would be to transform parsed characters
into ASCII numbers. Suppose we define a function \texttt{f} (from
characters to ints) and use a \texttt{CharParser} for parsing
the character \texttt{c}.

\begin{center}
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
val f = (c: Char) => c.toInt
val c = new CharParser('c')
\end{lstlisting}
\end{center}

\noindent
We then can run the following two parsers on the input \texttt{cbd}:

\begin{center}
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
c.parse("cbd")
(c ==> f).parse("cbd")
\end{lstlisting}
\end{center}

\noindent
In the first line we obtain the expected result \texttt{Set(('c',
  "bd"))}, whereas the second produces \texttt{Set((99, "bd"))}---the
character has been transformed into an ASCII number.

A slightly less contrived example is about parsing numbers (recall
\texttt{NumParser} above). However, we want to do this here for strings.
For this assume we have the following \texttt{RegexParser}.

\begin{center}
  \begin{lstlisting}[language=Scala,xleftmargin=0mm,
    basicstyle=\small\ttfamily, numbers=none]
import scala.util.matching.Regex
    
case class RegexParser(reg: Regex) extends Parser[String, String] {
  def parse(in: String) = reg.findPrefixMatchOf(in) match {
    case None => Set()
    case Some(m) => Set((m.matched, m.after.toString))  
  }
}
\end{lstlisting}
\end{center}

\noindent
This parser takes a regex as argument and splits up a string into a
prefix and the rest according to this regex
(\texttt{reg.findPrefixMatchOf} generates a match---in the successful
case---and the corresponding strings can be extracted with
\texttt{matched} and \texttt{after}). Using this parser we can define a
\texttt{NumParser} for strings as follows:

\begin{center}
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
val NumParser = RegexParser("[0-9]+".r)
\end{lstlisting}
\end{center}

\noindent
This parser will recognise a number at the beginning of a string, for
example

\begin{center}
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
NumParser.parse("123abc")
\end{lstlisting}
\end{center}  

\noindent
produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is
still a string (the required double-quotes are not printed by
Scala). We want to convert this string into the corresponding
\texttt{Int}. We can do this as follows using a semantic action

\begin{center}
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
(NumParser ==> (s => s.toInt)).parse("123abc")
\end{lstlisting}
\end{center}  

\noindent
The function in the semantic action converts a string into an
\texttt{Int}. Let us come back to semantic actions when we are going
to implement actual context-free gammars.

\subsubsection*{Shorthand notation for parser combinators}

Before we proceed, let us just explain the shorthand notation for
parser combinators. Like for regular expressions, the shorthand notation
will make our life much easier when writing actual parsers. We can define
some implicits which allow us to write \pcode{p | q}, \pcode{p ~ q} and
\pcode{p ==> f} as well as to use plain strings for specifying simple
string parsers.

The idea is that this shorthand notation allows us to easily translate
context-free grammars into code. For example recall our context-free
grammar for palindromes:

\begin{plstx}[margin=3cm]
: \meta{P} ::=  a\cdot \meta{P}\cdot a | b\cdot \meta{P}\cdot b | a | b | \epsilon\\
\end{plstx}

\noindent
Each alternative in this grammar translates into an alternative parser
combinator.  The $\cdot$ can be translated to a sequence parser
combinator. The parsers for $a$, $b$ and $\epsilon$ can be simply
written as \texttt{"a"}, \texttt{"b"} and \texttt{""}.


\subsubsection*{How to build parsers using parser combinators?}

The beauty of parser combinators is the ease with which they can be
implemented and how easy it is to translate context-free grammars into
code (though the grammars need to be non-left-recursive). To
demonstrate this recall the grammar for palindromes from above.
The first idea would be to translate it into the following code

\begin{center}
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
lazy val Pal : Parser[String, String] = 
  (("a" ~ Pal ~ "a") | ("b" ~ Pal ~ "b") | "a" | "b" | "")
\end{lstlisting}
\end{center}

\noindent
Unfortunately, this does not quite work yet as it produces a typing
error. The reason is that the parsers \texttt{"a"}, \texttt{"b"} and
\texttt{""} all produce strings as output type and therefore can be
put into an alternative \texttt{...| "a" | "b" | ""}. But both
\pcode{"a" ~ Pal ~ "a"} and \pcode{"b" ~ Pal ~ "b"} produce pairs of
the form $(((\_, \_), \_), \_)$---that is how the sequence parser
combinator nests results when \pcode{\~} is used between two
components. The solution is to use a semantic action that ``flattens''
these pairs and appends the corresponding strings, like

\begin{center}
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
lazy val Pal : Parser[String, String] =  
  (("a" ~ Pal ~ "a") ==> { case ((x, y), z) => x + y + z } |
   ("b" ~ Pal ~ "b") ==> { case ((x, y), z) => x + y + z } |
    "a" | "b" | "")
\end{lstlisting}
\end{center}

\noindent
Now in all cases we have strings as output type for the parser variants.
The semantic action


\noindent
Important to note is that we must define \texttt{Pal}-parser as a
\emph{lazy} value.

\subsubsection*{Implementing an Interpreter}

%\bigskip
%takes advantage of the full generality---have a look
%what it produces if we call it with the string \texttt{abc}
%
%\begin{center}
%\begin{tabular}{rcl}
%input string & & output\medskip\\
%\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
%\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
%\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
%\end{tabular}
%\end{center}



\end{document}

%%% Local Variables: 
%%% mode: latex  
%%% TeX-master: t
%%% End:
author	Christian Urban <urbanc@in.tum.de>
	Thu, 25 Oct 2018 18:36:04 +0100
changeset 590	c6a1e19e9801
parent 589	0451b8b67f62
child 591	863e502f6a5c
permissions	-rw-r--r--