|
1 |
1 \documentclass{article} |
2 \documentclass{article} |
2 \usepackage{../style} |
3 \usepackage{../style} |
3 \usepackage{../langs} |
4 \usepackage{../langs} |
4 |
5 |
5 |
6 |
6 \begin{document} |
7 \begin{document} |
7 |
8 |
8 \section*{Handout 6 (Parser Combinators)} |
9 \section*{Handout 6 (Parser Combinators)} |
9 |
10 |
10 In what follows we explain \emph{parser combinators}. Their |
11 This handout explains how \emph{parser combinators} work and how they |
11 distinguishing feature is that they are very easy to |
12 can be implemented in Scala. Their distinguishing feature is that they |
12 implement. However, they only work when the grammar to be |
13 are very easy to implement (admittedly it is only easy in a functional |
13 parsed is \emph{not} left-recursive and they are efficient |
14 programming language). However, parser combinators require that the |
14 only when the grammar is unambiguous. It is the responsibility |
15 grammar to be parsed is \emph{not} left-recursive and they are |
15 of the grammar designer to ensure these two properties. |
16 efficient only when the grammar is unambiguous. It is the |
16 |
17 responsibility of the grammar designer to ensure these two properties. |
17 Parser combinators can deal with any kind of input as long as |
18 |
18 this input is a kind of sequence, for example a string or a |
19 Another good point of parser combinators is that they can deal with any kind |
19 list of tokens. The general idea behind parser combinators is |
20 of input as long as this input of ``sequence-kind'', for example a |
20 to transform the input into sets of pairs, like so |
21 string or a list of tokens. The general idea behind parser combinators |
|
22 is to transform the input into sets of pairs, like so |
21 |
23 |
22 \begin{center} |
24 \begin{center} |
23 $\underbrace{\text{list of tokens}}_{\text{input}}$ |
25 $\underbrace{\text{list of tokens}}_{\text{input}}$ |
24 $\Rightarrow$ |
26 $\Rightarrow$ |
25 $\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$ |
27 $\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$ |
26 \end{center} |
28 \end{center} |
27 |
29 |
28 \noindent In Scala parser combinators are functions of type |
30 \noindent As said, the input can be anything as long as it is a |
29 |
31 ``sequence''. The only property of the input we need is to be able to |
30 \begin{center} |
32 test when it is empty. Obviously we can do this for strings and lists. |
31 \texttt{I $\Rightarrow$ Set[(T, I)]} |
33 For more lucidity we shall below often use strings as input in order |
32 \end{center} |
34 to illustrate matters. However, this does not make our previous work |
33 |
35 on lexers obsolete (remember they transform a string into a list of |
34 \noindent that is they take as input something of type |
36 tokens). Lexers will still be needed to build a somewhat realistic |
35 \texttt{I} and return a set of pairs. The first component of |
37 compiler. |
36 these pairs corresponds to what the parser combinator was able |
38 |
37 to process from the input and the second is the unprocessed |
39 |
38 part of the input. As we shall see shortly, a parser |
40 In my Scala code I use the following polymorphic types for parser combinators: |
39 combinator might return more than one such pair, the idea |
41 |
40 being that there are potentially several ways of how to |
42 \begin{center} |
41 interpret the input. To simplify matters we will first look at |
43 input:\;\; \texttt{I} \qquad output:\;\; \texttt{T} |
42 the case where the input to the parser combinators is just |
44 \end{center} |
43 strings. As a concrete example, consider the string |
45 |
|
46 \noindent that is they take as input something of type \texttt{I} and |
|
47 return a set of pairs of \texttt{Set[(T, I)]}. Since the input needs |
|
48 to be of ``sequence-kind'' I actually have to often write \texttt{I |
|
49 <\% Seq[\_]} for the input type in order to indicate it is a |
|
50 subtype of Scala sequences. The first component of the generated pairs |
|
51 corresponds to what the parser combinator was able to process from the |
|
52 input and the second is the unprocessed part of the input (therefore |
|
53 the type of this unprocessed part is the same as the input). As we |
|
54 shall see shortly, a parser combinator might return more than one such |
|
55 pair; the idea being that there are potentially several ways of how to |
|
56 parse the input. As a concrete example, consider the string |
44 |
57 |
45 \begin{center} |
58 \begin{center} |
46 \tt\Grid{iffoo\VS testbar} |
59 \tt\Grid{iffoo\VS testbar} |
47 \end{center} |
60 \end{center} |
48 |
61 |
63 recognise anything from the input, then parser combinators just |
76 recognise anything from the input, then parser combinators just |
64 return the empty set $\{\}$. This will indicate something |
77 return the empty set $\{\}$. This will indicate something |
65 ``went wrong''\ldots or more precisely, nothing could be |
78 ``went wrong''\ldots or more precisely, nothing could be |
66 parsed. |
79 parsed. |
67 |
80 |
68 The idea of parser combinators is that we can easily build |
81 Also important to note is that the type \texttt{T} for the processed |
69 parser combinators out of smaller components following very |
82 part is different from the input type. The reason is that in general |
70 closely the structure of a grammar. In order to implement this |
83 we are interested in transform our input into something |
71 in an object-oriented programming language, like Scala, we |
84 ``different''\ldots for example into a tree, or if we implement the |
72 need to specify an abstract class for parser combinators. This |
85 grammar for arithmetic expressions we might be interested in the |
73 abstract class requires the implementation of the function |
86 actual integer number the arithmetic expression, say \texttt{1 + 2 * |
74 \texttt{parse} taking an argument of type \texttt{I} and |
87 3}, stands for. In this way we can use parser combinators to |
75 returns a set of type \mbox{\texttt{Set[(T, I)]}}. |
88 implement relativaley easily a calculator. |
|
89 |
|
90 The main idea of parser combinators is that we can easily build parser |
|
91 combinators out of smaller components following very closely the |
|
92 structure of a grammar. In order to implement this in an |
|
93 object-oriented programming language, like Scala, we need to specify |
|
94 an abstract class for parser combinators. This abstract class states |
|
95 that the function \texttt{parse} takes an argument of type \texttt{I} |
|
96 and returns a set of type \mbox{\texttt{Set[(T, I)]}}. |
76 |
97 |
77 \begin{center} |
98 \begin{center} |
78 \begin{lstlisting}[language=Scala] |
99 \begin{lstlisting}[language=Scala] |
79 abstract class Parser[I, T] { |
100 abstract class Parser[I, T] { |
80 def parse(ts: I): Set[(T, I)] |
101 def parse(ts: I): Set[(T, I)] |
84 yield head |
105 yield head |
85 } |
106 } |
86 \end{lstlisting} |
107 \end{lstlisting} |
87 \end{center} |
108 \end{center} |
88 |
109 |
89 \noindent From the function \texttt{parse} we can then |
110 \noindent It is the obligation in each instance (parser combinator) to |
90 ``centrally'' derive the function \texttt{parse\_all}, which |
111 supply an implementation for \texttt{parse}. From this function we |
91 just filters out all pairs whose second component is not empty |
112 can then ``centrally'' derive the function \texttt{parse\_all}, which |
92 (that is has still some unprocessed part). The reason is that |
113 just filters out all pairs whose second component is not empty (that |
93 at the end of the parsing we are only interested in the results |
114 is has still some unprocessed part). The reason is that at the end of |
94 where all the input has been consumed and no unprocessed part |
115 the parsing we are only interested in the results where all the input |
95 is left over. |
116 has been consumed and no unprocessed part is left over. |
96 |
117 |
97 One of the simplest parser combinators recognises just a |
118 One of the simplest parser combinators recognises just a |
98 character, say $c$, from the beginning of strings. Its |
119 single character, say $c$, from the beginning of strings. Its |
99 behaviour can be described as follows: |
120 behaviour can be described as follows: |
100 |
121 |
101 \begin{itemize} |
122 \begin{itemize} |
102 \item if the head of the input string starts with a $c$, then returns |
123 \item If the head of the input string starts with a $c$, then return |
103 the set $\{(c, \textit{tail of}\; s)\}$, where \textit{tail of} |
124 the set |
104 $s$ is the unprocessed part of the input string |
125 \[\{(c, \textit{tail of}\; s)\}\] |
105 \item otherwise return the empty set $\{\}$ |
126 where \textit{tail of} |
|
127 $s$ is the unprocessed part of the input string. |
|
128 \item Otherwise return the empty set $\{\}$. |
106 \end{itemize} |
129 \end{itemize} |
107 |
130 |
108 \noindent |
131 \noindent |
109 The input type of this simple parser combinator for characters is |
132 The input type of this simple parser combinator for characters is |
110 \texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. |
133 \texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. |
117 if (sb.head == c) Set((c, sb.tail)) else Set() |
140 if (sb.head == c) Set((c, sb.tail)) else Set() |
118 } |
141 } |
119 \end{lstlisting} |
142 \end{lstlisting} |
120 \end{center} |
143 \end{center} |
121 |
144 |
122 \noindent The \texttt{parse} function tests whether the first |
145 \noindent You can see the \texttt{parse} function tests whether the |
123 character of the input string \texttt{sb} is equal to |
146 first character of the input string \texttt{sb} is equal to |
124 \texttt{c}. If yes, then it splits the string into the |
147 \texttt{c}. If yes, then it splits the string into the recognised part |
125 recognised part \texttt{c} and the unprocessed part |
148 \texttt{c} and the unprocessed part \texttt{sb.tail}. In case |
126 \texttt{sb.tail}. In case \texttt{sb} does not start with |
149 \texttt{sb} does not start with \texttt{c} then the parser returns the |
127 \texttt{c} then the parser returns the empty set (in Scala |
150 empty set (in Scala \texttt{Set()}). Since this parser recognises |
128 \texttt{Set()}). |
151 characters and just returns characters as the processed part, the |
129 |
152 output type of the parser is \texttt{Char}. |
130 |
153 |
131 More interesting are the parser combinators that build larger |
154 If we want to parse a list of tokens and interested in recognising a |
132 parsers out of smaller component parsers. For example the |
155 number token, we could write something like this |
133 alternative parser combinator is as follows: given two |
156 |
134 parsers, say, $p$ and $q$, we apply both parsers to the |
157 \begin{center} |
135 input (remember parsers are functions) and combine the output |
158 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily,numbers=none] |
136 (remember they are sets) |
159 case object NumParser extends Parser[List[Token], Int] { |
|
160 def parse(ts: List[Token]) = ts match { |
|
161 case Num_token(s)::ts => Set((s.toInt, ts)) |
|
162 case _ => Set () |
|
163 } |
|
164 } |
|
165 \end{lstlisting} |
|
166 \end{center} |
|
167 |
|
168 \noindent |
|
169 In this parser the input is of type \texttt{List[Token]}. The function |
|
170 parse looks at the input \texttt{ts} and checks whether the first |
|
171 token is a \texttt{Num\_token}. Let us assume our lexer generated |
|
172 these tokens for numbers. But this parser does not just return this |
|
173 token (and the rest of the list), like the \texttt{CharParser} above, |
|
174 rather extracts the string \texttt{s} from the token and converts it |
|
175 into an integer. The hope is that the lexer did its work well and this |
|
176 conversion always succeeds. The consequence of this is that the output |
|
177 type for this parser is \texttt{Int}. Such a conversion would be |
|
178 needed if we want to implement a simple calculator program. |
|
179 |
|
180 |
|
181 These simple parsers that just look at the input and do a simple |
|
182 transformation are often called \emph{atomic} parser combinators. |
|
183 More interesting are the parser combinators that build larger parsers |
|
184 out of smaller component parsers. For example the \emph{alternative |
|
185 parser combinator} is as follows: given two parsers, say, $p$ and |
|
186 $q$, we apply both parsers to the input (remember parsers are |
|
187 functions) and combine the output (remember they are sets of pairs) |
137 |
188 |
138 \begin{center} |
189 \begin{center} |
139 $p(\text{input}) \cup q(\text{input})$ |
190 $p(\text{input}) \cup q(\text{input})$ |
140 \end{center} |
191 \end{center} |
141 |
192 |
150 def parse(sb: I) = p.parse(sb) ++ q.parse(sb) |
201 def parse(sb: I) = p.parse(sb) ++ q.parse(sb) |
151 } |
202 } |
152 \end{lstlisting} |
203 \end{lstlisting} |
153 \end{center} |
204 \end{center} |
154 |
205 |
155 \noindent The types of this parser combinator are polymorphic |
206 \noindent The types of this parser combinator are again generic (we |
156 (we just have \texttt{I} for the input type, and \texttt{T} |
207 just have \texttt{I} for the input type, and \texttt{T} for the output |
157 for the output type). The alternative parser builds a new |
208 type). The alternative parser builds a new parser out of two existing |
158 parser out of two existing parsers \texttt{p} and \texttt{q}. |
209 parsers \texttt{p} and \texttt{q}. Both need to be able to process |
159 Both need to be able to process input of type \texttt{I} and |
210 input of type \texttt{I} and return the same output type |
160 return the same output type \texttt{Set[(T, |
211 \texttt{Set[(T, I)]}.\footnote{There is an interesting detail of |
161 I)]}.\footnote{There is an interesting detail of Scala, namely |
212 Scala, namely the \texttt{=>} in front of the types of \texttt{p} |
162 the \texttt{=>} in front of the types of \texttt{p} and |
213 and \texttt{q}. They will prevent the evaluation of the arguments |
163 \texttt{q}. They will prevent the evaluation of the arguments |
214 before they are used. This is often called \emph{lazy evaluation} of |
164 before they are used. This is often called \emph{lazy |
215 the arguments. We will explain this later.} Therefore the output |
165 evaluation} of the arguments. We will explain this later.} The alternative parser should |
216 type of this parser is \texttt{T}. The alternative parser should run |
166 run the input with the first parser \texttt{p} (producing a |
217 the input with the first parser \texttt{p} (producing a set of pairs) |
167 set of outputs) and then run the same input with \texttt{q}. |
218 and then run the same input with \texttt{q} (producing another set of |
168 The result should be then just the union of both sets, which |
219 pairs). The result should be then just the union of both sets, which |
169 is the operation \texttt{++} in Scala. |
220 is the operation \texttt{++} in Scala. |
170 |
221 |
171 The alternative parser combinator already allows us to |
222 The alternative parser combinator already allows us to |
172 construct a parser that parses either a character \texttt{a} |
223 construct a parser that parses either a character \texttt{a} |
173 or \texttt{b}, as |
224 or \texttt{b}, as |
176 \begin{lstlisting}[language=Scala, numbers=none] |
227 \begin{lstlisting}[language=Scala, numbers=none] |
177 new AltParser(CharParser('a'), CharParser('b')) |
228 new AltParser(CharParser('a'), CharParser('b')) |
178 \end{lstlisting} |
229 \end{lstlisting} |
179 \end{center} |
230 \end{center} |
180 |
231 |
181 \noindent Scala allows us to introduce some more readable |
232 \noindent Later on we will use again Scala mechanism for introducing some |
182 shorthand notation for this, like \texttt{'a' || 'b'}. We can |
233 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some somple strings |
183 call this parser combinator with the strings |
|
184 |
234 |
185 \begin{center} |
235 \begin{center} |
186 \begin{tabular}{rcl} |
236 \begin{tabular}{rcl} |
187 input strings & & output\medskip\\ |
237 input strings & & output\medskip\\ |
188 \texttt{\Grid{ac}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{c}})\right\}$\\ |
238 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\ |
189 \texttt{\Grid{bc}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{c}})\right\}$\\ |
239 \texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{cde}})\right\}$\\ |
190 \texttt{\Grid{cc}} & $\rightarrow$ & $\{\}$ |
240 \texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$ |
191 \end{tabular} |
241 \end{tabular} |
192 \end{center} |
242 \end{center} |
193 |
243 |
194 \noindent We receive in the first two cases a successful |
244 \noindent We receive in the first two cases a successful |
195 output (that is a non-empty set). In each case, either |
245 output (that is a non-empty set). In each case, either |
196 \pcode{a} or \pcode{b} is in the processed part, and |
246 \pcode{a} or \pcode{b} is in the processed part, and |
197 \pcode{c} in the unprocessed part. Clearly this parser cannot |
247 \pcode{cde} in the unprocessed part. Clearly this parser cannot |
198 parse anything in the string \pcode{cc}, therefore the empty |
248 parse anything in the string \pcode{ccde}, therefore the empty |
199 set. |
249 set. |
200 |
250 |
201 A bit more interesting is the \emph{sequence parser |
251 A bit more interesting is the \emph{sequence parser combinator}. Given |
202 combinator}. Given two parsers, say, $p$ and $q$, apply first |
252 two parsers, say again, $p$ and $q$, we want to apply first the input |
203 the input to $p$ producing a set of pairs; then apply $q$ to |
253 to $p$ producing a set of pairs; then apply $q$ to all the unparsed |
204 all the unparsed parts; then combine the results like |
254 parts in the pairs; and then combine the results like |
205 |
255 |
206 \begin{center} |
256 \begin{center} |
207 \begin{tabular}{lcl} |
257 \begin{tabular}{lcl} |
208 $\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & |
258 $\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & |
209 $(\textit{output}_1, u_1) \in p(\text{input}) |
259 $(\textit{output}_1, u_1) \in p(\text{input}) |
210 \;\wedge\;$\\ |
260 \;\wedge\;$\\ |
211 && $\;(\textit{output}_2, u_2) \in q(u_1)\}$ |
261 && $(\textit{output}_2, u_2) \in q(u_1)\}$ |
212 \end{tabular} |
262 \end{tabular} |
213 \end{center} |
263 \end{center} |
214 |
264 |
215 \noindent |
265 \noindent Notice that the $p$ wil first be run on the input, producing |
216 This can be implemented in Scala as follows: |
266 pairs of the form $(\textit{output}_1, u_1)$ where the $u_1$ stands |
|
267 for the unprocessed, or left-over, parts. We want that $q$ runs on all |
|
268 these unprocessed parts $u_1$. This again will produce some |
|
269 processed part , $p$ and |
|
270 $q$, we apply both parsers to the input (remember parsers are |
|
271 functions) and combine the output (remember they are sets of pairs) |
|
272 |
|
273 \begin{center} |
|
274 $p(\text{input}) \cup q(\text{input})$ |
|
275 \end{center} |
|
276 |
|
277 \noindent In Scala we would implement alternative parser |
|
278 combinator as follows |
|
279 |
|
280 \begin{center} |
|
281 \begin{lstlisting}[language=Scala, numbers=none] |
|
282 class AltParser[I, T] |
|
283 (p: => Parser[I, T], |
|
284 q: => Parser[I, T]) extends Parser[I, T] { |
|
285 def parse(sb: I) = p.parse(sb) ++ q.parse(sb) |
|
286 } |
|
287 \end{lstlisting} |
|
288 \end{center} |
|
289 |
|
290 \noindent The types of this parser combinator are again generic (we |
|
291 just have \texttt{I} for the input type, and \texttt{T} for the output |
|
292 type). The alternative parser builds a new parser out of two existing |
|
293 parsers \texttt{p} and \texttt{q}. Both need to be able to process |
|
294 input of type \texttt{I} and return the same output type |
|
295 \texttt{Set[(T, I)]}.\footnote{There is an interesting detail of |
|
296 Scala, namely the \texttt{=>} in front of the types of \texttt{p} |
|
297 and \texttt{q}. They will prevent the evaluation of the arguments |
|
298 before they are used. This is often called \emph{lazy evaluation} of |
|
299 the arguments. We will explain this later.} Therefore the output |
|
300 type of this parser is \texttt{T}. The alternative parser should run |
|
301 the input with the first parser \texttt{p} (producing a set of pairs) |
|
302 and then run the same input with \texttt{q} (producing another set of |
|
303 pairs). The result should be then just the union of both sets, which |
|
304 is the operation \texttt{++} in Scala. |
|
305 |
|
306 The alternative parser combinator already allows us to |
|
307 construct a parser that parses either a character \texttt{a} |
|
308 or \texttt{b}, as |
|
309 |
|
310 \begin{center} |
|
311 \begin{lstlisting}[language=Scala, numbers=none] |
|
312 new AltParser(CharParser('a'), CharParser('b')) |
|
313 \end{lstlisting} |
|
314 \end{center} |
|
315 |
|
316 \noindent Later on we will use again Scala mechanism for introducing some |
|
317 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some somple strings |
|
318 |
|
319 \begin{center} |
|
320 \begin{tabular}{rcl} |
|
321 input strings & & output\medskip\\ |
|
322 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\ |
|
323 \texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{cde}})\right\}$\\ |
|
324 \texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$ |
|
325 \end{tabular} |
|
326 \end{center} |
|
327 |
|
328 \noindent We receive in the first two cases a successful |
|
329 output (that is a non-empty set). In each case, either |
|
330 \pcode{a} or \pcode{b} is in the processed part, and |
|
331 \pcode{cde} in the unprocessed part. Clearly this parser cannot |
|
332 parse anything in the string \pcode{ccde}, therefore the empty |
|
333 set. |
|
334 |
|
335 A bit more interesting is the \emph{sequence parser combinator}. Given |
|
336 two parsers, say again, $p$ and $q$, we want to apply first the input |
|
337 to $p$ producing a set of pairs; then apply $q$ to all the unparsed |
|
338 parts in the pairs; and then combine the results like |
|
339 |
|
340 \begin{center} |
|
341 \begin{tabular}{lcl} |
|
342 $\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & |
|
343 $(\textit{output}_1, u_1) \in p(\text{input}) |
|
344 \;\wedge\;$\\ |
|
345 && $(\textit{output}_2, u_2) \in q(u_1)\}$ |
|
346 \end{tabular} |
|
347 \end{center} |
|
348 |
|
349 \noindent Notice that the $p$ wil first be run on the input, producing |
|
350 pairs of the form $\textit{output}_1$ and unprocessed part $u_1$. The |
|
351 overall result of the sequence parser combinator is pairs of the form |
|
352 $((\textit{output}_1, \textit{output}_2), u_2)$. This means the |
|
353 unprocessed parts of both parsers is the unprocessed parts the second |
|
354 parser $q$ produces as left-over. The processed parts of both parsers |
|
355 is just the pair of the outputs |
|
356 $(\textit{output}_1, \textit{output}_2)$. This behavious can be |
|
357 implemented in Scala as follows: |
217 |
358 |
218 \begin{center} |
359 \begin{center} |
219 \begin{lstlisting}[language=Scala,numbers=none] |
360 \begin{lstlisting}[language=Scala,numbers=none] |
220 class SeqParser[I, T, S] |
361 class SeqParser[I, T, S] |
221 (p: => Parser[I, T], |
362 (p: => Parser[I, T], |
245 |
386 |
246 \begin{center} |
387 \begin{center} |
247 \texttt{Set[((T, S), I)]} |
388 \texttt{Set[((T, S), I)]} |
248 \end{center} |
389 \end{center} |
249 |
390 |
250 Scala allows us to provide some shorthand notation for the |
391 \noindent |
251 sequence parser combinator. We can write for example |
392 If any of the runs of \textit{p} and \textit{q} fail, that is produce |
252 \pcode{'a' ~ 'b'}, which is the parser combinator that |
393 the empty set, then \texttt{parse} will also produce the empty set. |
253 first consumes the character \texttt{a} from a string and then |
394 Notice that we have now two output types for the sequence parser |
254 \texttt{b}. Three examples of this parser combinator are as |
395 combinator, because in general \textit{p} and \textit{q} might produce |
255 follows: |
396 differetn things (for example first we recognise a number and then a |
|
397 string corresponding to an operator). |
|
398 |
|
399 |
|
400 We have not yet looked at this in detail, but Scala allows us to |
|
401 provide some shorthand notation for the sequence parser combinator. We |
|
402 can write for example \pcode{"a" ~ "b"}, which is the parser |
|
403 combinator that first recognises the character \texttt{a} from a |
|
404 string and then \texttt{b}. Let us look again at three examples of |
|
405 how this parser combinator processes strings: |
256 |
406 |
257 \begin{center} |
407 \begin{center} |
258 \begin{tabular}{rcl} |
408 \begin{tabular}{rcl} |
259 input strings & & output\medskip\\ |
409 input strings & & output\medskip\\ |
260 \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |
410 \texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{cde}})\right\}$\\ |
261 \texttt{\Grid{bac}} & $\rightarrow$ & $\{\}$\\ |
411 \texttt{\Grid{bacde}} & $\rightarrow$ & $\{\}$\\ |
262 \texttt{\Grid{ccc}} & $\rightarrow$ & $\{\}$ |
412 \texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$ |
263 \end{tabular} |
413 \end{tabular} |
264 \end{center} |
414 \end{center} |
265 |
415 |
266 \noindent A slightly more complicated parser is \pcode{('a' |
416 \noindent In the first line we have a sucessful parse, because the |
267 || 'b') ~ 'b'} which parses as first character either an |
417 string started with \texttt{ab}, which is the prefix we are looking |
268 \texttt{a} or \texttt{b} followed by a \texttt{b}. This parser |
418 for. But since the parsing combinator is constructed as sequence of |
269 produces the following outputs. |
419 the two simple (atomic) parsers for \texttt{a} and \texttt{b}, the |
|
420 result is a nested pair of the form \texttt{((a, b), cde)}. It is |
|
421 \emph{not} a simple pair \texttt{(ab, cde)} as one might errorneously |
|
422 expects. The parser returns the ampty set in the other examples, |
|
423 because they do not fit with what the parser is supposed to parse. |
|
424 |
|
425 |
|
426 A slightly more complicated parser is \pcode{("a" || "b") ~ "c"}. |
|
427 which parses as first character either an \texttt{a} or \texttt{b} |
|
428 followed by a \texttt{c}. This parser produces the following outputs. |
270 |
429 |
271 \begin{center} |
430 \begin{center} |
272 \begin{tabular}{rcl} |
431 \begin{tabular}{rcl} |
273 input strings & & output\medskip\\ |
432 input strings & & output\medskip\\ |
274 \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |
433 \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |