handouts/ho06.tex
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Sat, 29 Oct 2016 21:45:44 +0100
changeset 467 b5ec11e89768
parent 392 2d0a59127694
child 584 529460073b24
permissions -rw-r--r--
fixed bug
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     1
\documentclass{article}
297
5c51839c88fd updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 292
diff changeset
     2
\usepackage{../style}
217
cd6066f1056a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 183
diff changeset
     3
\usepackage{../langs}
cd6066f1056a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 183
diff changeset
     4
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     5
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     6
\begin{document}
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     7
292
7ed2a25dd115 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 217
diff changeset
     8
\section*{Handout 6 (Parser Combinators)}
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     9
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    10
In what follows we explain \emph{parser combinators}. Their
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    11
distinguishing feature is that they are very easy to
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    12
implement. However, they only work when the grammar to be
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    13
parsed is \emph{not} left-recursive and they are efficient
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    14
only when the grammar is unambiguous. It is the responsibility
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    15
of the grammar designer to ensure these two properties.
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    16
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    17
Parser combinators can deal with any kind of input as long as
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    18
this input is a kind of sequence, for example a string or a
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    19
list of tokens. The general idea behind parser combinators is
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    20
to transform the input into sets of pairs, like so
175
5801e8c0e528 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 173
diff changeset
    21
5801e8c0e528 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 173
diff changeset
    22
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    23
$\underbrace{\text{list of tokens}}_{\text{input}}$ 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    24
$\Rightarrow$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    25
$\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    26
\end{center} 
175
5801e8c0e528 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 173
diff changeset
    27
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    28
\noindent In Scala parser combinators are functions of type
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    29
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    30
\begin{center}
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    31
\texttt{I $\Rightarrow$ Set[(T, I)]}
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    32
\end{center}
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    33
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    34
\noindent that is they take as input something of type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    35
\texttt{I} and return a set of pairs. The first component of
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    36
these pairs corresponds to what the parser combinator was able
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    37
to process from the input and the second is the unprocessed
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    38
part of the input. As we shall see shortly, a parser
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    39
combinator might return more than one such pair, the idea
392
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    40
being that there are potentially several ways of how to
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    41
interpret the input. To simplify matters we will first look at
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    42
the case where the input to the parser combinators is just
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    43
strings. As a concrete example, consider the string
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    44
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    45
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    46
\tt\Grid{iffoo\VS testbar}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    47
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    48
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    49
\noindent We might have a parser combinator which tries to
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    50
interpret this string as a keyword (\texttt{if}) or as an
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    51
identifier (\texttt{iffoo}). Then the output will be the set
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    52
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    53
\begin{center}
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    54
$\left\{ \left(\texttt{\Grid{if}}\;,\; \texttt{\Grid{foo\VS testbar}}\right), 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    55
           \left(\texttt{\Grid{iffoo}}\;,\; \texttt{\Grid{\VS testbar}}\right) \right\}$
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    56
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    57
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    58
\noindent where the first pair means the parser could
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    59
recognise \texttt{if} from the input and leaves the rest as
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    60
`unprocessed' as the second component of the pair; in the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    61
other case it could recognise \texttt{iffoo} and leaves
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    62
\texttt{\VS testbar} as unprocessed. If the parser cannot
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    63
recognise anything from the input, then parser combinators just
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    64
return the empty set $\{\}$. This will indicate something
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    65
``went wrong''\ldots or more precisely, nothing could be
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    66
parsed.
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    67
392
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    68
The idea of parser combinators is that we can easily build
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    69
parser combinators out of smaller components following very
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    70
closely the structure of a grammar. In order to implement this
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    71
in an object-oriented programming language, like Scala, we
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    72
need to specify an abstract class for parser combinators. This
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    73
abstract class requires the implementation of the function
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    74
\texttt{parse} taking an argument of type \texttt{I} and
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
    75
returns a set of type \mbox{\texttt{Set[(T, I)]}}.
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    76
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    77
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    78
\begin{lstlisting}[language=Scala]
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    79
abstract class Parser[I, T] {
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    80
  def parse(ts: I): Set[(T, I)]
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    81
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    82
  def parse_all(ts: I): Set[T] =
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    83
    for ((head, tail) <- parse(ts); if (tail.isEmpty)) 
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    84
      yield head
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    85
}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    86
\end{lstlisting}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
    87
\end{center}
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
    88
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    89
\noindent From the function \texttt{parse} we can then
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    90
``centrally'' derive the function \texttt{parse\_all}, which
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    91
just filters out all pairs whose second component is not empty
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    92
(that is has still some unprocessed part). The reason is that
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    93
at the end of the parsing we are only interested in the results
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    94
where all the input has been consumed and no unprocessed part
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
    95
is left over.
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
    96
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    97
One of the simplest parser combinators recognises just a
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    98
character, say $c$, from the beginning of strings. Its
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
    99
behaviour can be described as follows:
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
   100
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   101
\begin{itemize}
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   102
\item if the head of the input string starts with a $c$, then returns 
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   103
	the set $\{(c, \textit{tail of}\; s)\}$, where \textit{tail of} 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   104
	$s$ is the unprocessed part of the input string
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   105
\item otherwise return the empty set $\{\}$	
177
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   106
\end{itemize}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   107
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   108
\noindent
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   109
The input type of this simple parser combinator for characters is
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   110
\texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. 
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   111
The code in Scala is as follows:
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   112
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   113
\begin{center}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   114
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   115
case class CharParser(c: Char) extends Parser[String, Char] {
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   116
  def parse(sb: String) = 
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   117
    if (sb.head == c) Set((c, sb.tail)) else Set()
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   118
}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   119
\end{lstlisting}
53def1fbf472 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 176
diff changeset
   120
\end{center}
176
3c2653fc8b5a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 175
diff changeset
   121
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   122
\noindent The \texttt{parse} function tests whether the first
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   123
character of the input string \texttt{sb} is equal to
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   124
\texttt{c}. If yes, then it splits the string into the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   125
recognised part \texttt{c} and the unprocessed part
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   126
\texttt{sb.tail}. In case \texttt{sb} does not start with
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   127
\texttt{c} then the parser returns the empty set (in Scala
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   128
\texttt{Set()}).
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   129
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   130
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   131
More interesting are the parser combinators that build larger
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   132
parsers out of smaller component parsers. For example the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   133
alternative parser combinator is as follows: given two
392
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
   134
parsers, say, $p$ and $q$, we apply both parsers to the
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   135
input (remember parsers are functions) and combine the output
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   136
(remember they are sets)
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   137
 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   138
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   139
$p(\text{input}) \cup q(\text{input})$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   140
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   141
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   142
\noindent In Scala we would implement alternative parser
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   143
combinator as follows
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   144
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   145
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   146
\begin{lstlisting}[language=Scala, numbers=none]
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   147
class AltParser[I, T]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   148
       (p: => Parser[I, T], 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   149
        q: => Parser[I, T]) extends Parser[I, T] {
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   150
  def parse(sb: I) = p.parse(sb) ++ q.parse(sb)
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   151
}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   152
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   153
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   154
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   155
\noindent The types of this parser combinator are polymorphic
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   156
(we just have \texttt{I} for the input type, and \texttt{T}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   157
for the output type). The alternative parser builds a new
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   158
parser out of two existing parsers \texttt{p} and \texttt{q}.
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   159
Both need to be able to process input of type \texttt{I} and
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   160
return the same output type \texttt{Set[(T,
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   161
I)]}.\footnote{There is an interesting detail of Scala, namely
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   162
the \texttt{=>} in front of the types of \texttt{p} and
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   163
\texttt{q}. They will prevent the evaluation of the arguments
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   164
before they are used. This is often called \emph{lazy
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   165
evaluation} of the arguments. We will explain this later.} The alternative parser should
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   166
run the input with the first parser \texttt{p} (producing a
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   167
set of outputs) and then run the same input with \texttt{q}.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   168
The result should be then just the union of both sets, which
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   169
is the operation \texttt{++} in Scala.
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   170
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   171
The alternative parser combinator already allows us to
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   172
construct a parser that parses either a character \texttt{a}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   173
or \texttt{b}, as
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   174
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   175
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   176
\begin{lstlisting}[language=Scala, numbers=none]
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   177
new AltParser(CharParser('a'), CharParser('b'))
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   178
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   179
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   180
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   181
\noindent Scala allows us to introduce some more readable
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   182
shorthand notation for this, like \texttt{'a' || 'b'}. We can
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   183
call this parser combinator with the strings
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   184
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   185
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   186
\begin{tabular}{rcl}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   187
input strings & & output\medskip\\
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   188
\texttt{\Grid{ac}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{c}})\right\}$\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   189
\texttt{\Grid{bc}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{c}})\right\}$\\
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   190
\texttt{\Grid{cc}} & $\rightarrow$ & $\{\}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   191
\end{tabular}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   192
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   193
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   194
\noindent We receive in the first two cases a successful
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   195
output (that is a non-empty set). In each case, either
392
2d0a59127694 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 386
diff changeset
   196
\pcode{a} or \pcode{b} is in the processed part, and
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   197
\pcode{c} in the unprocessed part. Clearly this parser cannot
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   198
parse anything in the string \pcode{cc}, therefore the empty
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   199
set.
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   200
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   201
A bit more interesting is the \emph{sequence parser
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   202
combinator}. Given two parsers, say, $p$ and $q$, apply first
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   203
the input to $p$ producing a set of pairs; then apply $q$ to
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   204
all the unparsed parts; then combine the results like
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   205
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   206
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   207
\begin{tabular}{lcl}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   208
$\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   209
$(\textit{output}_1, u_1) \in p(\text{input}) 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   210
\;\wedge\;$\\
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   211
&& $\;(\textit{output}_2, u_2) \in q(u_1)\}$
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   212
\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   213
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   214
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   215
\noindent
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   216
This can be implemented in Scala as follows:
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   217
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   218
\begin{center}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   219
\begin{lstlisting}[language=Scala,numbers=none]
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   220
class SeqParser[I, T, S]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   221
       (p: => Parser[I, T], 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   222
        q: => Parser[I, S]) extends Parser[I, (T, S)] {
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   223
  def parse(sb: I) = 
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   224
    for ((output1, u1) <- p.parse(sb); 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   225
         (output2, u2) <- q.parse(u1)) 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   226
            yield ((output1, output2), u2)
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   227
}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   228
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   229
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   230
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   231
\noindent This parser takes as input two parsers, \texttt{p}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   232
and \texttt{q}. It implements \texttt{parse} as follows: let
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   233
first run the parser \texttt{p} on the input producing a set
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   234
of pairs (\texttt{output1}, \texttt{u1}). The \texttt{u1}
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   235
stands for the unprocessed parts left over by \texttt{p}. Let
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   236
\texttt{q} run on these unprocessed parts producing again a
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   237
set of pairs. The output of the sequence parser combinator is
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   238
then a set containing pairs where the first components are
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   239
again pairs, namely what the first parser could parse together
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   240
with what the second parser could parse; the second component
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   241
is the unprocessed part left over after running the second
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   242
parser \texttt{q}. Therefore the input type of the sequence
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   243
parser combinator is as usual \texttt{I}, but the output type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   244
is
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   245
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   246
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   247
\texttt{Set[((T, S), I)]}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   248
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   249
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   250
Scala allows us to provide some shorthand notation for the
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   251
sequence parser combinator. We can write for example
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   252
\pcode{'a' ~ 'b'}, which is the parser combinator that
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   253
first consumes the character \texttt{a} from a string and then
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   254
\texttt{b}. Three examples of this parser combinator are as
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   255
follows:
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   256
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   257
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   258
\begin{tabular}{rcl}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   259
input strings & & output\medskip\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   260
\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   261
\texttt{\Grid{bac}} & $\rightarrow$ & $\{\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   262
\texttt{\Grid{ccc}} & $\rightarrow$ & $\{\}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   263
\end{tabular}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   264
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   265
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   266
\noindent A slightly more complicated parser is \pcode{('a'
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   267
|| 'b') ~ 'b'} which parses as first character either an
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   268
\texttt{a} or \texttt{b} followed by a \texttt{b}. This parser
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   269
produces the following outputs.
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   270
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   271
\begin{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   272
\begin{tabular}{rcl}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   273
input strings & & output\medskip\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   274
\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   275
\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   276
\texttt{\Grid{aac}} & $\rightarrow$ & $\{\}$
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   277
\end{tabular}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   278
\end{center}
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   279
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   280
\noindent Two more examples: first consider the parser \pcode{('a' ~
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   281
'a') ~ 'a'} and the input \pcode{aaaa}:
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   282
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   283
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   284
\begin{tabular}{rcl}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   285
input string & & output\medskip\\
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   286
\texttt{\Grid{aaaa}} & $\rightarrow$ & 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   287
$\left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}$\\
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   288
\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   289
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   290
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   291
\noindent Notice how the results nest deeper and deeper as
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   292
pairs (the last \pcode{a} is in the unprocessed part). To
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   293
consume everything of this string we can use the parser
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   294
\pcode{(('a' ~'a') ~ 'a') ~ 'a'}. Then the output is as
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   295
follows:
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   296
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   297
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   298
\begin{tabular}{rcl}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   299
input string & & output\medskip\\
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   300
\texttt{\Grid{aaaa}} & $\rightarrow$ & 
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   301
$\left\{((((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{""})\right\}$\\
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   302
\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   303
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   304
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   305
\noindent This is an instance where the parser consumed
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   306
completely the input, meaning the unprocessed part is just the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   307
empty string.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   308
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   309
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   310
Note carefully that constructing a parser such \pcode{'a' ||
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   311
('a' ~ 'b')} will result in a typing error. The first
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   312
parser has as output type a single character (recall the type
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   313
of \texttt{CharParser}), but the second parser produces a pair
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   314
of characters as output. The alternative parser is however
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   315
required to have both component parsers to have the same type.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   316
We will see later how we can build this parser without the
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   317
typing error.
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   318
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   319
The next parser combinator does not actually combine smaller
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   320
parsers, but applies a function to the result of a parser.
385
7f8516ff408d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 297
diff changeset
   321
It is implemented in Scala as follows
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   322
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   323
\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   324
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   325
class FunParser[I, T, S]
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   326
         (p: => Parser[I, T], 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   327
          f: T => S) extends Parser[I, S] {
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   328
  def parse(sb: I) = 
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   329
    for ((head, tail) <- p.parse(sb)) yield (f(head), tail)
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   330
}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   331
\end{lstlisting}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   332
\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   333
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   334
386
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   335
\noindent This parser combinator takes a parser \texttt{p}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   336
with output type \texttt{T} as one argument as well as a
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   337
function \texttt{f} with type \texttt{T => S}. The parser
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   338
\texttt{p} produces sets of type \texttt{(T, I)}. The
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   339
\texttt{FunParser} combinator then applies the function
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   340
\texttt{f} to all the parser outputs. Since this function is of
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   341
type \texttt{T => S}, we obtain a parser with output type
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   342
\texttt{S}. Again Scala lets us introduce some shorthand
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   343
notation for this parser combinator. Therefore we will write
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   344
\texttt{p ==> f} for it.
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   345
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   346
\subsubsection*{How to build parsers using parser combinators?}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   347
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 385
diff changeset
   348
\subsubsection*{Implementing an Interpreter}
183
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   349
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   350
%\bigskip
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   351
%takes advantage of the full generality---have a look
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   352
%what it produces if we call it with the string \texttt{abc}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   353
%
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   354
%\begin{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   355
%\begin{tabular}{rcl}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   356
%input string & & output\medskip\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   357
%\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   358
%\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   359
%\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   360
%\end{tabular}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   361
%\end{center}
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   362
b17eff695c7f added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 177
diff changeset
   363
173
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   364
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   365
\end{document}
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   366
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   367
%%% Local Variables: 
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   368
%%% mode: latex  
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   369
%%% TeX-master: t
7cfb7a6f7c99 added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   370
%%% End: