handouts/ho06.tex
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Thu, 26 Nov 2015 00:11:10 +0000
changeset 386 31295bb945c6
parent 385 7f8516ff408d
child 392 2d0a59127694
permissions -rw-r--r--
update
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     1
\documentclass{article}
297
5c51839c88fd updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 292
diff changeset
     2
\usepackage{../style}
217
cd6066f1056a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 183
diff changeset
     3
\usepackage{../langs}
cd6066f1056a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 183
diff changeset
     4
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     5
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     6
\begin{document}
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     7
292
7ed2a25dd115 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 217
diff changeset
     8
\section*{Handout 6 (Parser Combinators)}
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     9
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    10
In what follows we explain \emph{parser combinators}. Their
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    11
distinguishing feature is that they are very easy to
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    12
implement. However, they only work when the grammar to be
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    13
parsed is \emph{not} left-recursive and they are efficient
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    14
only when the grammar is unambiguous. It is the responsibility
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    15
of the grammar designer to ensure these two properties.
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    16
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    17
Parser combinators can deal with any kind of input as long as
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    18
this input is a kind of sequence, for example a string or a
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    19
list of tokens. The general idea behind parser combinators is
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    20
to transform the input into sets of pairs, like so
175
5801e8c0e528 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 173
diff changeset
    21
5801e8c0e528 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 173
diff changeset
    22
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    23
$\underbrace{\text{list of tokens}}_{\text{input}}$ 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    24
$\Rightarrow$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    25
$\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    26
\end{center} 
175
5801e8c0e528 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 173
diff changeset
    27
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    28
\noindent In Scala parser combinators are functions of type
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    29
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    30
\begin{center}
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    31
\texttt{I $\Rightarrow$ Set[(T, I)]}
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    32
\end{center}
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    33
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    34
\noindent that is they take as input something of type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    35
\texttt{I} and return a set of pairs. The first component of
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    36
these pairs corresponds to what the parser combinator was able
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    37
to process from the input and the second is the unprocessed
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    38
part of the input. As we shall see shortly, a parser
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    39
combinator might return more than one such pair, the idea
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    40
being that there are potentially several ways how to interpret
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    41
the input. To simplify matters we will first look at the case
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    42
where the input to the parser combinators is just strings. As
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    43
a concrete example, consider the string
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    44
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    45
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    46
\tt\Grid{iffoo\VS testbar}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    47
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    48
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    49
\noindent We might have a parser combinator which tries to
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    50
interpret this string as a keyword (\texttt{if}) or as an
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    51
identifier (\texttt{iffoo}). Then the output will be the set
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    52
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    53
\begin{center}
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    54
$\left\{ \left(\texttt{\Grid{if}}\;,\; \texttt{\Grid{foo\VS testbar}}\right), 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    55
           \left(\texttt{\Grid{iffoo}}\;,\; \texttt{\Grid{\VS testbar}}\right) \right\}$
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    56
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    57
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    58
\noindent where the first pair means the parser could
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    59
recognise \texttt{if} from the input and leaves the rest as
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    60
`unprocessed' as the second component of the pair; in the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    61
other case it could recognise \texttt{iffoo} and leaves
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    62
\texttt{\VS testbar} as unprocessed. If the parser cannot
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    63
recognise anything from the input, then parser combinators just
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    64
return the empty set $\{\}$. This will indicate something
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    65
``went wrong''\ldots or more precisely, nothing could be
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    66
parsed.
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    67
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    68
The main attraction of parser combinators is that we can
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    69
easily build parser combinators out of smaller components
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    70
following very closely the structure of a grammar. In order to
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    71
implement this in an object-oriented programming language,
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    72
like Scala, we need to specify an abstract class for parser
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    73
combinators. This abstract class requires the implementation
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    74
of the function \texttt{parse} taking an argument of type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    75
\texttt{I} and returns a set of type \mbox{\texttt{Set[(T,
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    76
I)]}}.
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    77
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    78
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    79
\begin{lstlisting}[language=Scala]
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    80
abstract class Parser[I, T] {
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    81
  def parse(ts: I): Set[(T, I)]
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    82
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    83
  def parse_all(ts: I): Set[T] =
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    84
    for ((head, tail) <- parse(ts); if (tail.isEmpty)) 
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    85
      yield head
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    86
}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    87
\end{lstlisting}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    88
\end{center}
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    89
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    90
\noindent From the function \texttt{parse} we can then
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    91
``centrally'' derive the function \texttt{parse\_all}, which
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    92
just filters out all pairs whose second component is not empty
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    93
(that is has still some unprocessed part). The reason is that
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    94
at the end of the parsing we are only interested in the results
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    95
where all the input has been consumed and no unprocessed part
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    96
is left over.
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    97
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    98
One of the simplest parser combinators recognises just a
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    99
character, say $c$, from the beginning of strings. Its
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   100
behaviour can be described as follows:
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
   101
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   102
\begin{itemize}
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   103
\item if the head of the input string starts with a $c$, then returns 
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   104
	the set $\{(c, \textit{tail of}\; s)\}$, where \textit{tail of} 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   105
	$s$ is the unprocessed part of the input string
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   106
\item otherwise return the empty set $\{\}$	
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   107
\end{itemize}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   108
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   109
\noindent
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   110
The input type of this simple parser combinator for characters is
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   111
\texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. 
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   112
The code in Scala is as follows:
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   113
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   114
\begin{center}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   115
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   116
case class CharParser(c: Char) extends Parser[String, Char] {
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   117
  def parse(sb: String) = 
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   118
    if (sb.head == c) Set((c, sb.tail)) else Set()
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   119
}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   120
\end{lstlisting}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   121
\end{center}
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
   122
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   123
\noindent The \texttt{parse} function tests whether the first
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   124
character of the input string \texttt{sb} is equal to
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   125
\texttt{c}. If yes, then it splits the string into the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   126
recognised part \texttt{c} and the unprocessed part
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   127
\texttt{sb.tail}. In case \texttt{sb} does not start with
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   128
\texttt{c} then the parser returns the empty set (in Scala
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   129
\texttt{Set()}).
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   130
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   131
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   132
More interesting are the parser combinators that build larger
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   133
parsers out of smaller component parsers. For example the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   134
alternative parser combinator is as follows: given two
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   135
parsers, say, $p$ and $q$, then we apply both parsers to the
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   136
input (remember parsers are functions) and combine the output
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   137
(remember they are sets)
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   138
 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   139
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   140
$p(\text{input}) \cup q(\text{input})$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   141
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   142
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   143
\noindent In Scala we would implement alternative parser
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   144
combinator as follows
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   145
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   146
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   147
\begin{lstlisting}[language=Scala, numbers=none]
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   148
class AltParser[I, T]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   149
       (p: => Parser[I, T], 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   150
        q: => Parser[I, T]) extends Parser[I, T] {
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   151
  def parse(sb: I) = p.parse(sb) ++ q.parse(sb)
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   152
}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   153
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   154
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   155
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   156
\noindent The types of this parser combinator are polymorphic
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   157
(we just have \texttt{I} for the input type, and \texttt{T}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   158
for the output type). The alternative parser builds a new
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   159
parser out of two existing parsers \texttt{p} and \texttt{q}.
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   160
Both need to be able to process input of type \texttt{I} and
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   161
return the same output type \texttt{Set[(T,
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   162
I)]}.\footnote{There is an interesting detail of Scala, namely
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   163
the \texttt{=>} in front of the types of \texttt{p} and
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   164
\texttt{q}. They will prevent the evaluation of the arguments
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   165
before they are used. This is often called \emph{lazy
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   166
evaluation} of the arguments. We will explain this later.} The alternative parser should
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   167
run the input with the first parser \texttt{p} (producing a
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   168
set of outputs) and then run the same input with \texttt{q}.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   169
The result should be then just the union of both sets, which
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   170
is the operation \texttt{++} in Scala.
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   171
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   172
The alternative parser combinator already allows us to
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   173
construct a parser that parses either a character \texttt{a}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   174
or \texttt{b}, as
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   175
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   176
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   177
\begin{lstlisting}[language=Scala, numbers=none]
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   178
new AltParser(CharParser('a'), CharParser('b'))
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   179
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   180
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   181
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   182
\noindent Scala allows us to introduce some more readable
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   183
shorthand notation for this, like \texttt{'a' || 'b'}. We can
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   184
call this parser combinator with the strings
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   185
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   186
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   187
\begin{tabular}{rcl}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   188
input strings & & output\medskip\\
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   189
\texttt{\Grid{ac}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{c}})\right\}$\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   190
\texttt{\Grid{bc}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{c}})\right\}$\\
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   191
\texttt{\Grid{cc}} & $\rightarrow$ & $\{\}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   192
\end{tabular}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   193
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   194
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   195
\noindent We receive in the first two cases a successful
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   196
output (that is a non-empty set). In each case, either
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   197
\pcode{a} or \pcode{b} are in the processed parts, and
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   198
\pcode{c} in the unprocessed part. Clearly this parser cannot
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   199
parse anything in the string \pcode{cc}, therefore the empty
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   200
set.
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   201
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   202
A bit more interesting is the \emph{sequence parser
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   203
combinator}. Given two parsers, say, $p$ and $q$, apply first
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   204
the input to $p$ producing a set of pairs; then apply $q$ to
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   205
all the unparsed parts; then combine the results like
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   206
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   207
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   208
\begin{tabular}{lcl}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   209
$\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   210
$(\textit{output}_1, u_1) \in p(\text{input}) 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   211
\;\wedge\;$\\
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   212
&& $\;(\textit{output}_2, u_2) \in q(u_1)\}$
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   213
\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   214
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   215
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   216
\noindent
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   217
This can be implemented in Scala as follows:
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   218
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   219
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   220
\begin{lstlisting}[language=Scala,numbers=none]
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   221
class SeqParser[I, T, S]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   222
       (p: => Parser[I, T], 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   223
        q: => Parser[I, S]) extends Parser[I, (T, S)] {
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   224
  def parse(sb: I) = 
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   225
    for ((output1, u1) <- p.parse(sb); 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   226
         (output2, u2) <- q.parse(u1)) 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   227
            yield ((output1, output2), u2)
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   228
}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   229
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   230
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   231
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   232
\noindent This parser takes as input two parsers, \texttt{p}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   233
and \texttt{q}. It implements \texttt{parse} as follows: let
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   234
first run the parser \texttt{p} on the input producing a set
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   235
of pairs (\texttt{output1}, \texttt{u1}). The \texttt{u1}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   236
stands for the unprocessed parts left over by \texttt{p}. Let
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   237
\texttt{q} run on these unprocessed parts producing again a
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   238
set of pairs. The output of the sequence parser combinator is
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   239
then a set containing pairs where the first components are
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   240
again pairs, namely what the first parser could parse together
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   241
with what the second parser could parse; the second component
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   242
is the unprocessed part left over after running the second
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   243
parser \texttt{q}. Therefore the input type of the sequence
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   244
parser combinator is as usual \texttt{I}, but the output type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   245
is
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   246
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   247
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   248
\texttt{Set[((T, S), I)]}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   249
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   250
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   251
Scala allows us to provide some shorthand notation for the
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   252
sequence parser combinator. We can write for example
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   253
\pcode{'a' ~ 'b'}, which is the parser combinator that
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   254
first consumes the character \texttt{a} from a string and then
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   255
\texttt{b}. Three examples of this parser combinator are as
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   256
follows:
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   257
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   258
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   259
\begin{tabular}{rcl}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   260
input strings & & output\medskip\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   261
\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   262
\texttt{\Grid{bac}} & $\rightarrow$ & $\{\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   263
\texttt{\Grid{ccc}} & $\rightarrow$ & $\{\}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   264
\end{tabular}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   265
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   266
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   267
\noindent A slightly more complicated parser is \pcode{('a'
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   268
|| 'b') ~ 'b'} which parses as first character either an
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   269
\texttt{a} or \texttt{b} followed by a \texttt{b}. This parser
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   270
produces the following outputs.
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   271
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   272
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   273
\begin{tabular}{rcl}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   274
input strings & & output\medskip\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   275
\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   276
\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   277
\texttt{\Grid{aac}} & $\rightarrow$ & $\{\}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   278
\end{tabular}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   279
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   280
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   281
\noindent Two more examples: first consider the parser \pcode{('a' ~
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   282
'a') ~ 'a'} and the input \pcode{aaaa}:
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   283
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   284
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   285
\begin{tabular}{rcl}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   286
input string & & output\medskip\\
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   287
\texttt{\Grid{aaaa}} & $\rightarrow$ & 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   288
$\left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}$\\
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   289
\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   290
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   291
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   292
\noindent Notice how the results nest deeper and deeper as
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   293
pairs (the last \pcode{a} is in the unprocessed part). To
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   294
consume everything of this string we can use the parser
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   295
\pcode{(('a' ~'a') ~ 'a') ~ 'a'}. Then the output is as
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   296
follows:
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   297
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   298
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   299
\begin{tabular}{rcl}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   300
input string & & output\medskip\\
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   301
\texttt{\Grid{aaaa}} & $\rightarrow$ & 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   302
$\left\{((((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{""})\right\}$\\
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   303
\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   304
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   305
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   306
\noindent This is an instance where the parser consumed
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   307
completely the input, meaning the unprocessed part is just the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   308
empty string.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   309
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   310
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   311
Note carefully that constructing a parser such \pcode{'a' ||
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   312
('a' ~ 'b')} will result in a typing error. The first
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   313
parser has as output type a single character (recall the type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   314
of \texttt{CharParser}), but the second parser produces a pair
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   315
of characters as output. The alternative parser is however
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   316
required to have both component parsers to have the same type.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   317
We will see later how we can build this parser without the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   318
typing error.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   319
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   320
The next parser combinator does not actually combine smaller
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   321
parsers, but applies a function to the result of a parser.
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   322
It is implemented in Scala as follows
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   323
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   324
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   325
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   326
class FunParser[I, T, S]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   327
         (p: => Parser[I, T], 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   328
          f: T => S) extends Parser[I, S] {
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   329
  def parse(sb: I) = 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   330
    for ((head, tail) <- p.parse(sb)) yield (f(head), tail)
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   331
}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   332
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   333
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   334
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   335
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   336
\noindent This parser combinator takes a parser \texttt{p}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   337
with output type \texttt{T} as one argument as well as a
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   338
function \texttt{f} with type \texttt{T => S}. The parser
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   339
\texttt{p} produces sets of type \texttt{(T, I)}. The
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   340
\texttt{FunParser} combinator then applies the function
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   341
\texttt{f} to all the parser outputs. Since this function is of
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   342
type \texttt{T => S}, we obtain a parser with output type
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   343
\texttt{S}. Again Scala lets us introduce some shorthand
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   344
notation for this parser combinator. Therefore we will write
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   345
\texttt{p ==> f} for it.
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   346
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   347
\subsubsection*{How to build parsers using parser combinators?}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   348
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   349
\subsubsection*{Implementing an Interpreter}
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   350
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   351
%\bigskip
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   352
%takes advantage of the full generality---have a look
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   353
%what it produces if we call it with the string \texttt{abc}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   354
%
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   355
%\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   356
%\begin{tabular}{rcl}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   357
%input string & & output\medskip\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   358
%\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   359
%\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   360
%\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   361
%\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   362
%\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   363
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   364
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   365
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   366
\end{document}
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   367
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   368
%%% Local Variables: 
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   369
%%% mode: latex  
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   370
%%% TeX-master: t
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   371
%%% End: