author | Christian Urban <urbanc@in.tum.de> |
Wed, 22 Mar 2017 14:10:01 +0000 | |
changeset 480 | 9e42ccbbd1e6 |
parent 392 | 2d0a59127694 |
child 584 | 529460073b24 |
permissions | -rw-r--r-- |
173
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
1 |
\documentclass{article} |
297
5c51839c88fd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
292
diff
changeset
|
2 |
\usepackage{../style} |
217
cd6066f1056a
updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
183
diff
changeset
|
3 |
\usepackage{../langs} |
cd6066f1056a
updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
183
diff
changeset
|
4 |
|
173
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
5 |
|
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
6 |
\begin{document} |
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
7 |
|
292
7ed2a25dd115
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
217
diff
changeset
|
8 |
\section*{Handout 6 (Parser Combinators)} |
173
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
9 |
|
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
10 |
In what follows we explain \emph{parser combinators}. Their |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
11 |
distinguishing feature is that they are very easy to |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
12 |
implement. However, they only work when the grammar to be |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
13 |
parsed is \emph{not} left-recursive and they are efficient |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
14 |
only when the grammar is unambiguous. It is the responsibility |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
15 |
of the grammar designer to ensure these two properties. |
173
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
16 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
17 |
Parser combinators can deal with any kind of input as long as |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
18 |
this input is a kind of sequence, for example a string or a |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
19 |
list of tokens. The general idea behind parser combinators is |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
20 |
to transform the input into sets of pairs, like so |
175
5801e8c0e528
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
173
diff
changeset
|
21 |
|
5801e8c0e528
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
173
diff
changeset
|
22 |
\begin{center} |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
23 |
$\underbrace{\text{list of tokens}}_{\text{input}}$ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
24 |
$\Rightarrow$ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
25 |
$\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
26 |
\end{center} |
175
5801e8c0e528
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
173
diff
changeset
|
27 |
|
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
28 |
\noindent In Scala parser combinators are functions of type |
176
3c2653fc8b5a
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
175
diff
changeset
|
29 |
|
3c2653fc8b5a
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
175
diff
changeset
|
30 |
\begin{center} |
177
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
31 |
\texttt{I $\Rightarrow$ Set[(T, I)]} |
176
3c2653fc8b5a
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
175
diff
changeset
|
32 |
\end{center} |
3c2653fc8b5a
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
175
diff
changeset
|
33 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
34 |
\noindent that is they take as input something of type |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
35 |
\texttt{I} and return a set of pairs. The first component of |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
36 |
these pairs corresponds to what the parser combinator was able |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
37 |
to process from the input and the second is the unprocessed |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
38 |
part of the input. As we shall see shortly, a parser |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
39 |
combinator might return more than one such pair, the idea |
392
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
40 |
being that there are potentially several ways of how to |
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
41 |
interpret the input. To simplify matters we will first look at |
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
42 |
the case where the input to the parser combinators is just |
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
43 |
strings. As a concrete example, consider the string |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
44 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
45 |
\begin{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
46 |
\tt\Grid{iffoo\VS testbar} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
47 |
\end{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
48 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
49 |
\noindent We might have a parser combinator which tries to |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
50 |
interpret this string as a keyword (\texttt{if}) or as an |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
51 |
identifier (\texttt{iffoo}). Then the output will be the set |
177
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
52 |
|
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
53 |
\begin{center} |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
54 |
$\left\{ \left(\texttt{\Grid{if}}\;,\; \texttt{\Grid{foo\VS testbar}}\right), |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
55 |
\left(\texttt{\Grid{iffoo}}\;,\; \texttt{\Grid{\VS testbar}}\right) \right\}$ |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
56 |
\end{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
57 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
58 |
\noindent where the first pair means the parser could |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
59 |
recognise \texttt{if} from the input and leaves the rest as |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
60 |
`unprocessed' as the second component of the pair; in the |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
61 |
other case it could recognise \texttt{iffoo} and leaves |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
62 |
\texttt{\VS testbar} as unprocessed. If the parser cannot |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
63 |
recognise anything from the input, then parser combinators just |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
64 |
return the empty set $\{\}$. This will indicate something |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
65 |
``went wrong''\ldots or more precisely, nothing could be |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
66 |
parsed. |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
67 |
|
392
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
68 |
The idea of parser combinators is that we can easily build |
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
69 |
parser combinators out of smaller components following very |
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
70 |
closely the structure of a grammar. In order to implement this |
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
71 |
in an object-oriented programming language, like Scala, we |
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
72 |
need to specify an abstract class for parser combinators. This |
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
73 |
abstract class requires the implementation of the function |
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
74 |
\texttt{parse} taking an argument of type \texttt{I} and |
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
75 |
returns a set of type \mbox{\texttt{Set[(T, I)]}}. |
177
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
76 |
|
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
77 |
\begin{center} |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
78 |
\begin{lstlisting}[language=Scala] |
177
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
79 |
abstract class Parser[I, T] { |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
80 |
def parse(ts: I): Set[(T, I)] |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
81 |
|
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
82 |
def parse_all(ts: I): Set[T] = |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
83 |
for ((head, tail) <- parse(ts); if (tail.isEmpty)) |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
84 |
yield head |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
85 |
} |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
86 |
\end{lstlisting} |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
87 |
\end{center} |
176
3c2653fc8b5a
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
175
diff
changeset
|
88 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
89 |
\noindent From the function \texttt{parse} we can then |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
90 |
``centrally'' derive the function \texttt{parse\_all}, which |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
91 |
just filters out all pairs whose second component is not empty |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
92 |
(that is has still some unprocessed part). The reason is that |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
93 |
at the end of the parsing we are only interested in the results |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
94 |
where all the input has been consumed and no unprocessed part |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
95 |
is left over. |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
96 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
97 |
One of the simplest parser combinators recognises just a |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
98 |
character, say $c$, from the beginning of strings. Its |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
99 |
behaviour can be described as follows: |
176
3c2653fc8b5a
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
175
diff
changeset
|
100 |
|
177
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
101 |
\begin{itemize} |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
102 |
\item if the head of the input string starts with a $c$, then returns |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
103 |
the set $\{(c, \textit{tail of}\; s)\}$, where \textit{tail of} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
104 |
$s$ is the unprocessed part of the input string |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
105 |
\item otherwise return the empty set $\{\}$ |
177
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
106 |
\end{itemize} |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
107 |
|
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
108 |
\noindent |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
109 |
The input type of this simple parser combinator for characters is |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
110 |
\texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
111 |
The code in Scala is as follows: |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
112 |
|
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
113 |
\begin{center} |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
114 |
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
115 |
case class CharParser(c: Char) extends Parser[String, Char] { |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
116 |
def parse(sb: String) = |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
117 |
if (sb.head == c) Set((c, sb.tail)) else Set() |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
118 |
} |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
119 |
\end{lstlisting} |
53def1fbf472
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
176
diff
changeset
|
120 |
\end{center} |
176
3c2653fc8b5a
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
175
diff
changeset
|
121 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
122 |
\noindent The \texttt{parse} function tests whether the first |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
123 |
character of the input string \texttt{sb} is equal to |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
124 |
\texttt{c}. If yes, then it splits the string into the |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
125 |
recognised part \texttt{c} and the unprocessed part |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
126 |
\texttt{sb.tail}. In case \texttt{sb} does not start with |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
127 |
\texttt{c} then the parser returns the empty set (in Scala |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
128 |
\texttt{Set()}). |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
129 |
|
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
130 |
|
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
131 |
More interesting are the parser combinators that build larger |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
132 |
parsers out of smaller component parsers. For example the |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
133 |
alternative parser combinator is as follows: given two |
392
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
134 |
parsers, say, $p$ and $q$, we apply both parsers to the |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
135 |
input (remember parsers are functions) and combine the output |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
136 |
(remember they are sets) |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
137 |
|
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
138 |
\begin{center} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
139 |
$p(\text{input}) \cup q(\text{input})$ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
140 |
\end{center} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
141 |
|
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
142 |
\noindent In Scala we would implement alternative parser |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
143 |
combinator as follows |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
144 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
145 |
\begin{center} |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
146 |
\begin{lstlisting}[language=Scala, numbers=none] |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
147 |
class AltParser[I, T] |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
148 |
(p: => Parser[I, T], |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
149 |
q: => Parser[I, T]) extends Parser[I, T] { |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
150 |
def parse(sb: I) = p.parse(sb) ++ q.parse(sb) |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
151 |
} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
152 |
\end{lstlisting} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
153 |
\end{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
154 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
155 |
\noindent The types of this parser combinator are polymorphic |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
156 |
(we just have \texttt{I} for the input type, and \texttt{T} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
157 |
for the output type). The alternative parser builds a new |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
158 |
parser out of two existing parsers \texttt{p} and \texttt{q}. |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
159 |
Both need to be able to process input of type \texttt{I} and |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
160 |
return the same output type \texttt{Set[(T, |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
161 |
I)]}.\footnote{There is an interesting detail of Scala, namely |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
162 |
the \texttt{=>} in front of the types of \texttt{p} and |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
163 |
\texttt{q}. They will prevent the evaluation of the arguments |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
164 |
before they are used. This is often called \emph{lazy |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
165 |
evaluation} of the arguments. We will explain this later.} The alternative parser should |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
166 |
run the input with the first parser \texttt{p} (producing a |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
167 |
set of outputs) and then run the same input with \texttt{q}. |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
168 |
The result should be then just the union of both sets, which |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
169 |
is the operation \texttt{++} in Scala. |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
170 |
|
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
171 |
The alternative parser combinator already allows us to |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
172 |
construct a parser that parses either a character \texttt{a} |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
173 |
or \texttt{b}, as |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
174 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
175 |
\begin{center} |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
176 |
\begin{lstlisting}[language=Scala, numbers=none] |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
177 |
new AltParser(CharParser('a'), CharParser('b')) |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
178 |
\end{lstlisting} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
179 |
\end{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
180 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
181 |
\noindent Scala allows us to introduce some more readable |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
182 |
shorthand notation for this, like \texttt{'a' || 'b'}. We can |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
183 |
call this parser combinator with the strings |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
184 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
185 |
\begin{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
186 |
\begin{tabular}{rcl} |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
187 |
input strings & & output\medskip\\ |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
188 |
\texttt{\Grid{ac}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{c}})\right\}$\\ |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
189 |
\texttt{\Grid{bc}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{c}})\right\}$\\ |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
190 |
\texttt{\Grid{cc}} & $\rightarrow$ & $\{\}$ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
191 |
\end{tabular} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
192 |
\end{center} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
193 |
|
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
194 |
\noindent We receive in the first two cases a successful |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
195 |
output (that is a non-empty set). In each case, either |
392
2d0a59127694
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
386
diff
changeset
|
196 |
\pcode{a} or \pcode{b} is in the processed part, and |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
197 |
\pcode{c} in the unprocessed part. Clearly this parser cannot |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
198 |
parse anything in the string \pcode{cc}, therefore the empty |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
199 |
set. |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
200 |
|
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
201 |
A bit more interesting is the \emph{sequence parser |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
202 |
combinator}. Given two parsers, say, $p$ and $q$, apply first |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
203 |
the input to $p$ producing a set of pairs; then apply $q$ to |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
204 |
all the unparsed parts; then combine the results like |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
205 |
|
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
206 |
\begin{center} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
207 |
\begin{tabular}{lcl} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
208 |
$\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
209 |
$(\textit{output}_1, u_1) \in p(\text{input}) |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
210 |
\;\wedge\;$\\ |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
211 |
&& $\;(\textit{output}_2, u_2) \in q(u_1)\}$ |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
212 |
\end{tabular} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
213 |
\end{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
214 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
215 |
\noindent |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
216 |
This can be implemented in Scala as follows: |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
217 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
218 |
\begin{center} |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
219 |
\begin{lstlisting}[language=Scala,numbers=none] |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
220 |
class SeqParser[I, T, S] |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
221 |
(p: => Parser[I, T], |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
222 |
q: => Parser[I, S]) extends Parser[I, (T, S)] { |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
223 |
def parse(sb: I) = |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
224 |
for ((output1, u1) <- p.parse(sb); |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
225 |
(output2, u2) <- q.parse(u1)) |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
226 |
yield ((output1, output2), u2) |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
227 |
} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
228 |
\end{lstlisting} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
229 |
\end{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
230 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
231 |
\noindent This parser takes as input two parsers, \texttt{p} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
232 |
and \texttt{q}. It implements \texttt{parse} as follows: let |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
233 |
first run the parser \texttt{p} on the input producing a set |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
234 |
of pairs (\texttt{output1}, \texttt{u1}). The \texttt{u1} |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
235 |
stands for the unprocessed parts left over by \texttt{p}. Let |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
236 |
\texttt{q} run on these unprocessed parts producing again a |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
237 |
set of pairs. The output of the sequence parser combinator is |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
238 |
then a set containing pairs where the first components are |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
239 |
again pairs, namely what the first parser could parse together |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
240 |
with what the second parser could parse; the second component |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
241 |
is the unprocessed part left over after running the second |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
242 |
parser \texttt{q}. Therefore the input type of the sequence |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
243 |
parser combinator is as usual \texttt{I}, but the output type |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
244 |
is |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
245 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
246 |
\begin{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
247 |
\texttt{Set[((T, S), I)]} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
248 |
\end{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
249 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
250 |
Scala allows us to provide some shorthand notation for the |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
251 |
sequence parser combinator. We can write for example |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
252 |
\pcode{'a' ~ 'b'}, which is the parser combinator that |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
253 |
first consumes the character \texttt{a} from a string and then |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
254 |
\texttt{b}. Three examples of this parser combinator are as |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
255 |
follows: |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
256 |
|
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
257 |
\begin{center} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
258 |
\begin{tabular}{rcl} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
259 |
input strings & & output\medskip\\ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
260 |
\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
261 |
\texttt{\Grid{bac}} & $\rightarrow$ & $\{\}$\\ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
262 |
\texttt{\Grid{ccc}} & $\rightarrow$ & $\{\}$ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
263 |
\end{tabular} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
264 |
\end{center} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
265 |
|
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
266 |
\noindent A slightly more complicated parser is \pcode{('a' |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
267 |
|| 'b') ~ 'b'} which parses as first character either an |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
268 |
\texttt{a} or \texttt{b} followed by a \texttt{b}. This parser |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
269 |
produces the following outputs. |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
270 |
|
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
271 |
\begin{center} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
272 |
\begin{tabular}{rcl} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
273 |
input strings & & output\medskip\\ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
274 |
\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
275 |
\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
276 |
\texttt{\Grid{aac}} & $\rightarrow$ & $\{\}$ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
277 |
\end{tabular} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
278 |
\end{center} |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
279 |
|
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
280 |
\noindent Two more examples: first consider the parser \pcode{('a' ~ |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
281 |
'a') ~ 'a'} and the input \pcode{aaaa}: |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
282 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
283 |
\begin{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
284 |
\begin{tabular}{rcl} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
285 |
input string & & output\medskip\\ |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
286 |
\texttt{\Grid{aaaa}} & $\rightarrow$ & |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
287 |
$\left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}$\\ |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
288 |
\end{tabular} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
289 |
\end{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
290 |
|
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
291 |
\noindent Notice how the results nest deeper and deeper as |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
292 |
pairs (the last \pcode{a} is in the unprocessed part). To |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
293 |
consume everything of this string we can use the parser |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
294 |
\pcode{(('a' ~'a') ~ 'a') ~ 'a'}. Then the output is as |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
295 |
follows: |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
296 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
297 |
\begin{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
298 |
\begin{tabular}{rcl} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
299 |
input string & & output\medskip\\ |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
300 |
\texttt{\Grid{aaaa}} & $\rightarrow$ & |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
301 |
$\left\{((((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{""})\right\}$\\ |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
302 |
\end{tabular} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
303 |
\end{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
304 |
|
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
305 |
\noindent This is an instance where the parser consumed |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
306 |
completely the input, meaning the unprocessed part is just the |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
307 |
empty string. |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
308 |
|
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
309 |
|
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
310 |
Note carefully that constructing a parser such \pcode{'a' || |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
311 |
('a' ~ 'b')} will result in a typing error. The first |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
312 |
parser has as output type a single character (recall the type |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
313 |
of \texttt{CharParser}), but the second parser produces a pair |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
314 |
of characters as output. The alternative parser is however |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
315 |
required to have both component parsers to have the same type. |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
316 |
We will see later how we can build this parser without the |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
317 |
typing error. |
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
318 |
|
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
319 |
The next parser combinator does not actually combine smaller |
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
320 |
parsers, but applies a function to the result of a parser. |
385
7f8516ff408d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
297
diff
changeset
|
321 |
It is implemented in Scala as follows |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
322 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
323 |
\begin{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
324 |
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
325 |
class FunParser[I, T, S] |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
326 |
(p: => Parser[I, T], |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
327 |
f: T => S) extends Parser[I, S] { |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
328 |
def parse(sb: I) = |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
329 |
for ((head, tail) <- p.parse(sb)) yield (f(head), tail) |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
330 |
} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
331 |
\end{lstlisting} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
332 |
\end{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
333 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
334 |
|
386
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
335 |
\noindent This parser combinator takes a parser \texttt{p} |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
336 |
with output type \texttt{T} as one argument as well as a |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
337 |
function \texttt{f} with type \texttt{T => S}. The parser |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
338 |
\texttt{p} produces sets of type \texttt{(T, I)}. The |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
339 |
\texttt{FunParser} combinator then applies the function |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
340 |
\texttt{f} to all the parser outputs. Since this function is of |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
341 |
type \texttt{T => S}, we obtain a parser with output type |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
342 |
\texttt{S}. Again Scala lets us introduce some shorthand |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
343 |
notation for this parser combinator. Therefore we will write |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
344 |
\texttt{p ==> f} for it. |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
345 |
|
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
346 |
\subsubsection*{How to build parsers using parser combinators?} |
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
347 |
|
31295bb945c6
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
385
diff
changeset
|
348 |
\subsubsection*{Implementing an Interpreter} |
183
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
349 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
350 |
%\bigskip |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
351 |
%takes advantage of the full generality---have a look |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
352 |
%what it produces if we call it with the string \texttt{abc} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
353 |
% |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
354 |
%\begin{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
355 |
%\begin{tabular}{rcl} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
356 |
%input string & & output\medskip\\ |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
357 |
%\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
358 |
%\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
359 |
%\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$ |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
360 |
%\end{tabular} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
361 |
%\end{center} |
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
362 |
|
b17eff695c7f
added new stuff
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
177
diff
changeset
|
363 |
|
173
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
364 |
|
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
365 |
\end{document} |
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
366 |
|
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
367 |
%%% Local Variables: |
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
368 |
%%% mode: latex |
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
369 |
%%% TeX-master: t |
7cfb7a6f7c99
added slides
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
370 |
%%% End: |