handouts/ho06.tex
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Wed, 25 Nov 2015 15:59:32 +0000
changeset 385 7f8516ff408d
parent 297 5c51839c88fd
child 386 31295bb945c6
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     1
\documentclass{article}
297
5c51839c88fd updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 292
diff changeset
     2
\usepackage{../style}
217
cd6066f1056a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 183
diff changeset
     3
\usepackage{../langs}
cd6066f1056a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 183
diff changeset
     4
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     5
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     6
\begin{document}
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     7
292
7ed2a25dd115 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 217
diff changeset
     8
\section*{Handout 6 (Parser Combinators)}
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     9
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    10
In what follows we explain \emph{parser combinators}, because
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    11
they are very easy to implement. However, they only work when
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    12
the grammar to be parsed is \emph{not} left-recursive and they
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    13
are efficient only when the grammar is unambiguous. It is the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    14
responsibility of the grammar designer to ensure these two
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    15
properties.
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    16
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    17
Parser combinators can deal with any kind of input as long as
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    18
this input is a kind of sequence, for example a string or a
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    19
list of tokens. If the input are lists of tokens, then parser
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    20
combinators transform them into sets of pairs, like so
175
5801e8c0e528 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 173
diff changeset
    21
5801e8c0e528 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 173
diff changeset
    22
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    23
$\underbrace{\text{list of tokens}}_{\text{input}}$ 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    24
$\Rightarrow$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    25
$\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    26
\end{center} 
175
5801e8c0e528 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 173
diff changeset
    27
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    28
\noindent In Scala parser combinators will therefore be
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    29
functions of type
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    30
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    31
\begin{center}
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    32
\texttt{I $\Rightarrow$ Set[(T, I)]}
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    33
\end{center}
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    34
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    35
\noindent that is they take as input something of type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    36
\texttt{I} and return a set of pairs. The first component of
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    37
these pairs corresponds to what the parser combinator was able
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    38
to process from the input and the second is the unprocessed
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    39
part of the input. As we shall see shortly, a parser
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    40
combinator might return more than one such pair, with the idea
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    41
that there are potentially several ways how to interpret the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    42
input. To simplify matters we will first look at the case
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    43
where the input to the parser combinators is just strings. As
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    44
a concrete example, consider the case where the input is the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    45
string
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    46
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    47
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    48
\tt\Grid{iffoo\VS testbar}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    49
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    50
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    51
\noindent We might have a parser combinator which tries to
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    52
interpret this string as a keyword (\texttt{if}) or an
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    53
identifier (\texttt{iffoo}). Then the output will be the set
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    54
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    55
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    56
$\left\{ \left(\texttt{\Grid{if}}\,,\, \texttt{\Grid{foo\VS testbar}}\right), 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    57
           \left(\texttt{\Grid{iffoo}}\,,\, \texttt{\Grid{\VS testbar}}\right) \right\}$
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    58
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    59
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    60
\noindent where the first pair means the parser could
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    61
recognise \texttt{if} from the input and leaves the rest as
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    62
`unprocessed' as the second component of the pair; in the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    63
other case it could recognise \texttt{iffoo} and leaves
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    64
\texttt{\VS testbar} as unprocessed. If the parser cannot
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    65
recognise anything from the input then parser combinators just
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    66
return the empty set $\{\}$. This will indicate something
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    67
``went wrong''\ldots or more precisely, nothing could be
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    68
parsed.
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    69
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    70
The main attraction of parser combinators is that we can
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    71
easily build parser combinators out of smaller components
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    72
following very closely the structure of a grammar. In order to
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    73
implement this in an object-oriented programming language,
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    74
like Scala, we need to specify an abstract class for parser
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    75
combinators. This abstract class requires the implementation
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    76
of the function \texttt{parse} taking an argument of type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    77
\texttt{I} and returns a set of type \mbox{\texttt{Set[(T,
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    78
I)]}}.
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    79
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    80
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    81
\begin{lstlisting}[language=Scala]
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    82
abstract class Parser[I, T] {
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    83
  def parse(ts: I): Set[(T, I)]
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    84
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    85
  def parse_all(ts: I): Set[T] =
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    86
    for ((head, tail) <- parse(ts); if (tail.isEmpty)) 
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    87
      yield head
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    88
}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    89
\end{lstlisting}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    90
\end{center}
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    91
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    92
\noindent From the function \texttt{parse} we can then
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    93
``centrally'' derive the function \texttt{parse\_all}, which
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    94
just filters out all pairs whose second component is not empty
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    95
(that is has still some unprocessed part). The reason is that
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    96
at the end of parsing we are only interested in the results
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    97
where all the input has been consumed and no unprocessed part
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    98
is left.
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    99
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   100
One of the simplest parser combinators recognises just a
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   101
character, say $c$, from the beginning of strings. Its
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   102
behaviour can be described as follows:
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
   103
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   104
\begin{itemize}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   105
\item if the head of the input string starts with a $c$, it returns 
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   106
	the set $\{(c, \textit{tail of}\; s)\}$, where \textit{tail of} 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   107
	$s$ is the unprocessed part of the input string
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   108
\item otherwise it returns the empty set $\{\}$	
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   109
\end{itemize}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   110
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   111
\noindent
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   112
The input type of this simple parser combinator for characters is
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   113
\texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. 
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   114
The code in Scala is as follows:
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   115
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   116
\begin{center}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   117
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   118
case class CharParser(c: Char) extends Parser[String, Char] {
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   119
  def parse(sb: String) = 
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   120
    if (sb.head == c) Set((c, sb.tail)) else Set()
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   121
}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   122
\end{lstlisting}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   123
\end{center}
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
   124
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   125
\noindent The \texttt{parse} function tests whether the first
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   126
character of the input string \texttt{sb} is equal to
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   127
\texttt{c}. If yes, then it splits the string into the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   128
recognised part \texttt{c} and the unprocessed part
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   129
\texttt{sb.tail}. In case \texttt{sb} does not start with
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   130
\texttt{c} then the parser returns the empty set (in Scala
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   131
\texttt{Set()}).
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   132
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   133
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   134
More interesting are the parser combinators that build larger
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   135
parsers out of smaller component parsers. For example the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   136
alternative parser combinator is as follows: given two
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   137
parsers, say, $p$ and $q$, then we apply both parsers to the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   138
input (remember parsers are functions) and combine the input
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   139
 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   140
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   141
$p(\text{input}) \cup q(\text{input})$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   142
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   143
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   144
\noindent
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   145
In Scala we would implement alternative parsers as follows
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   146
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   147
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   148
\begin{lstlisting}[language=Scala, numbers=none]
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   149
class AltParser[I, T]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   150
       (p: => Parser[I, T], 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   151
        q: => Parser[I, T]) extends Parser[I, T] {
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   152
  def parse(sb: I) = p.parse(sb) ++ q.parse(sb)
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   153
}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   154
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   155
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   156
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   157
\noindent The types of this parser combinator are polymorphic
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   158
(we just have \texttt{I} for the input type, and \texttt{T}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   159
for the output type). The alternative parser builds a new
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   160
parser out of two existing parser combinator \texttt{p} and
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   161
\texttt{q}. Both need to be able to process input of type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   162
\texttt{I} and return the same output type \texttt{Set[(T,
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   163
I)]}. (There is an interesting detail of Scala, namely the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   164
\texttt{=>} in front of the types of \texttt{p} and
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   165
\texttt{q}. They will prevent the evaluation of the arguments
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   166
before they are used. This is often called \emph{lazy
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   167
evaluation} of the arguments.) The alternative parser should
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   168
run the input with the first parser \texttt{p} (producing a
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   169
set of outputs) and then run the same input with \texttt{q}.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   170
The result should be then just the union of both sets, which
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   171
is the operation \texttt{++} in Scala.
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   172
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   173
This parser combinator already allows us to construct a parser
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   174
that either a character \texttt{a} or \texttt{b}, as
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   175
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   176
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   177
\begin{lstlisting}[language=Scala, numbers=none]
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   178
new AltParser(CharParser('a'), CharParser('b'))
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   179
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   180
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   181
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   182
\noindent Scala allows us to introduce some more readable
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   183
shorthand notation for this, like \texttt{'a' || 'b'}. We can
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   184
call this parser combinator with the strings
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   185
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   186
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   187
\begin{tabular}{rcl}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   188
input strings & & output\medskip\\
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   189
\texttt{\Grid{ac}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{c}})\right\}$\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   190
\texttt{\Grid{bc}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{c}})\right\}$\\
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   191
\texttt{\Grid{cc}} & $\rightarrow$ & $\{\}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   192
\end{tabular}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   193
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   194
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   195
\noindent We receive in the first two cases a successful
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   196
output (that is a non-empty set). In each case, either
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   197
\pcode{a} or \pcode{b} are the processed parts, and \pcode{c}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   198
is the unprocessed part. Clearly this parser cannot parse
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   199
anything with the string \pcode{cc}.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   200
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   201
A bit more interesting is the \emph{sequence parser
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   202
combinator}. Given two parsers, say, $p$ and $q$,
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   203
apply first the input to $p$ producing a set of pairs;
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   204
then apply $q$ to the unparsed parts; then combine the 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   205
results like
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   206
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   207
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   208
\begin{tabular}{lcl}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   209
$\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   210
$(\textit{output}_1, u_1) \in p(\text{input}) 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   211
\;\wedge\;$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   212
&& $(\textit{output}_2, u_2) \in q(u_1)\}$
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   213
\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   214
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   215
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   216
\noindent
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   217
This can be implemented in Scala as follows:
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   218
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   219
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   220
\begin{lstlisting}[language=Scala,numbers=none]
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   221
class SeqParser[I, T, S]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   222
       (p: => Parser[I, T], 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   223
        q: => Parser[I, S]) extends Parser[I, (T, S)] {
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   224
  def parse(sb: I) = 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   225
    for ((head1, tail1) <- p.parse(sb); 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   226
         (head2, tail2) <- q.parse(tail1)) 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   227
            yield ((head1, head2), tail2)
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   228
}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   229
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   230
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   231
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   232
\noindent This parser takes as input two parsers, \texttt{p}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   233
and \texttt{q}. It implements \texttt{parse} as follows: let
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   234
first run the parser \texttt{p} on the input producing a set
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   235
of pairs (\texttt{head1}, \texttt{tail1}). The \texttt{tail1}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   236
stands for the unprocessed parts left over by \texttt{p}. Let
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   237
\texttt{q} run on these unprocessed parts producing again a
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   238
set of pairs. The output of the sequence parser combinator is
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   239
then a set containing pairs where the first components are
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   240
again pairs, namely what the first parser could parse together
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   241
with what the second parser could parse; the second component
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   242
is the unprocessed part left over after running the second
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   243
parser \texttt{q}. Therefore the input type of the sequence
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   244
parser combinator is as usual \texttt{I}, but the output type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   245
is
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   246
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   247
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   248
\texttt{Set[((T, S), I)]}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   249
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   250
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   251
Scala allows us to provide some shorthand notation for the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   252
sequence parser combinator. So we can write for example
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   253
\texttt{'a' $\sim$ 'b'}, which is the parser combinator that
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   254
first consumes the character \texttt{a} from a string and then
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   255
\texttt{b}. Three examples of this parser combinator are as
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   256
follows:
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   257
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   258
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   259
\begin{tabular}{rcl}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   260
input strings & & output\medskip\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   261
\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   262
\texttt{\Grid{bac}} & $\rightarrow$ & $\{\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   263
\texttt{\Grid{ccc}} & $\rightarrow$ & $\{\}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   264
\end{tabular}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   265
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   266
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   267
\noindent A slightly more complicated parser is \texttt{('a'
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   268
|| 'b') $\sim$ 'b'} which parses as first character either an
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   269
\texttt{a} or \texttt{b} followed by a \texttt{b}. This parser
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   270
produces the following results.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   271
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   272
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   273
\begin{tabular}{rcl}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   274
input strings & & output\medskip\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   275
\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   276
\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   277
\texttt{\Grid{aac}} & $\rightarrow$ & $\{\}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   278
\end{tabular}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   279
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   280
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   281
\noindent Two more examples: first consider the parser \pcode{('a' ~
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   282
'a') ~ 'a'} and the input \pcode{aaaa}:
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   283
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   284
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   285
\begin{tabular}{rcl}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   286
input string & & output\medskip\\
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   287
\texttt{\Grid{aaaa}} & $\rightarrow$ & 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   288
$\left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}$\\
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   289
\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   290
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   291
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   292
\noindent Notice how the results nest deeper as pairs (the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   293
last \pcode{a} is in the unprocessed part). To consume
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   294
everything we can use the parser \pcode{(('a' ~'a') ~ 'a') ~
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   295
'a'}. Then the output is as follows:
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   296
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   297
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   298
\begin{tabular}{rcl}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   299
input string & & output\medskip\\
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   300
\texttt{\Grid{aaaa}} & $\rightarrow$ & 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   301
$\left\{((((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{""})\right\}$\\
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   302
\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   303
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   304
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   305
\noindent This is an instance where the parser consumed
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   306
completely the input, meaning the unprocessed part is just the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   307
empty string.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   308
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   309
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   310
Note carefully that constructing a parser such \texttt{'a' ||
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   311
('a' $\sim$ 'b')} will result in a tying error. The first
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   312
parser has as output type a single character (recall the type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   313
of \texttt{CharParser}), but the second parser produces a pair
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   314
of characters as output. The alternative parser is however
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   315
required to have both component parsers to have the same type.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   316
We will see later how we can build this parser without the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   317
typing error.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   318
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   319
The next parser combinator does not actually combine smaller
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   320
parsers, but applies a function to the result of the parser.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   321
It is implemented in Scala as follows
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   322
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   323
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   324
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   325
class FunParser[I, T, S]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   326
         (p: => Parser[I, T], 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   327
          f: T => S) extends Parser[I, S] {
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   328
  def parse(sb: I) = 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   329
    for ((head, tail) <- p.parse(sb)) yield (f(head), tail)
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   330
}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   331
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   332
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   333
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   334
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   335
\noindent
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   336
This parser combinator takes a parser \texttt{p} with output type \texttt{T} as
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   337
input as well as a function \texttt{f} with type \texttt{T => S}. The parser \texttt{p}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   338
produces sets of type \texttt{(T, I)}. The \texttt{FunParser} combinator then
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   339
applies the function \texttt{f} to all the parer outputs. Since this function
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   340
is of type \texttt{T => S}, we obtain a parser with output type \texttt{S}.
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   341
Again Scala lets us introduce some shorthand notation for this parser combinator. 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   342
Therefore we will write \texttt{p ==> f} for it.
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   343
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   344
%\bigskip
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   345
%takes advantage of the full generality---have a look
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   346
%what it produces if we call it with the string \texttt{abc}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   347
%
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   348
%\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   349
%\begin{tabular}{rcl}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   350
%input string & & output\medskip\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   351
%\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   352
%\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   353
%\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   354
%\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   355
%\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   356
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   357
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   358
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   359
\end{document}
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   360
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   361
%%% Local Variables: 
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   362
%%% mode: latex  
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   363
%%% TeX-master: t
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   364
%%% End: