83 we are interested in transform our input into something |
83 we are interested in transform our input into something |
84 ``different''\ldots for example into a tree, or if we implement the |
84 ``different''\ldots for example into a tree, or if we implement the |
85 grammar for arithmetic expressions we might be interested in the |
85 grammar for arithmetic expressions we might be interested in the |
86 actual integer number the arithmetic expression, say \texttt{1 + 2 * |
86 actual integer number the arithmetic expression, say \texttt{1 + 2 * |
87 3}, stands for. In this way we can use parser combinators to |
87 3}, stands for. In this way we can use parser combinators to |
88 implement relativaley easily a calculator. |
88 implement relatively easily a calculator. |
89 |
89 |
90 The main idea of parser combinators is that we can easily build parser |
90 The main idea of parser combinators is that we can easily build parser |
91 combinators out of smaller components following very closely the |
91 combinators out of smaller components following very closely the |
92 structure of a grammar. In order to implement this in an |
92 structure of a grammar. In order to implement this in an |
93 object-oriented programming language, like Scala, we need to specify |
93 object-oriented programming language, like Scala, we need to specify |
228 new AltParser(CharParser('a'), CharParser('b')) |
228 new AltParser(CharParser('a'), CharParser('b')) |
229 \end{lstlisting} |
229 \end{lstlisting} |
230 \end{center} |
230 \end{center} |
231 |
231 |
232 \noindent Later on we will use again Scala mechanism for introducing some |
232 \noindent Later on we will use again Scala mechanism for introducing some |
233 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some somple strings |
233 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some sample strings |
234 |
234 |
235 \begin{center} |
235 \begin{center} |
236 \begin{tabular}{rcl} |
236 \begin{tabular}{rcl} |
237 input strings & & output\medskip\\ |
237 input strings & & output\medskip\\ |
238 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\ |
238 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\ |
312 new AltParser(CharParser('a'), CharParser('b')) |
312 new AltParser(CharParser('a'), CharParser('b')) |
313 \end{lstlisting} |
313 \end{lstlisting} |
314 \end{center} |
314 \end{center} |
315 |
315 |
316 \noindent Later on we will use again Scala mechanism for introducing some |
316 \noindent Later on we will use again Scala mechanism for introducing some |
317 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some somple strings |
317 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some sample strings |
318 |
318 |
319 \begin{center} |
319 \begin{center} |
320 \begin{tabular}{rcl} |
320 \begin{tabular}{rcl} |
321 input strings & & output\medskip\\ |
321 input strings & & output\medskip\\ |
322 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\ |
322 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\ |
351 overall result of the sequence parser combinator is pairs of the form |
351 overall result of the sequence parser combinator is pairs of the form |
352 $((\textit{output}_1, \textit{output}_2), u_2)$. This means the |
352 $((\textit{output}_1, \textit{output}_2), u_2)$. This means the |
353 unprocessed parts of both parsers is the unprocessed parts the second |
353 unprocessed parts of both parsers is the unprocessed parts the second |
354 parser $q$ produces as left-over. The processed parts of both parsers |
354 parser $q$ produces as left-over. The processed parts of both parsers |
355 is just the pair of the outputs |
355 is just the pair of the outputs |
356 $(\textit{output}_1, \textit{output}_2)$. This behavious can be |
356 $(\textit{output}_1, \textit{output}_2)$. This behaviour can be |
357 implemented in Scala as follows: |
357 implemented in Scala as follows: |
358 |
358 |
359 \begin{center} |
359 \begin{center} |
360 \begin{lstlisting}[language=Scala,numbers=none] |
360 \begin{lstlisting}[language=Scala,numbers=none] |
361 class SeqParser[I, T, S] |
361 class SeqParser[I, T, S] |
391 \noindent |
391 \noindent |
392 If any of the runs of \textit{p} and \textit{q} fail, that is produce |
392 If any of the runs of \textit{p} and \textit{q} fail, that is produce |
393 the empty set, then \texttt{parse} will also produce the empty set. |
393 the empty set, then \texttt{parse} will also produce the empty set. |
394 Notice that we have now two output types for the sequence parser |
394 Notice that we have now two output types for the sequence parser |
395 combinator, because in general \textit{p} and \textit{q} might produce |
395 combinator, because in general \textit{p} and \textit{q} might produce |
396 differetn things (for example first we recognise a number and then a |
396 different things (for example first we recognise a number and then a |
397 string corresponding to an operator). |
397 string corresponding to an operator). |
398 |
398 |
399 |
399 |
400 We have not yet looked at this in detail, but Scala allows us to |
400 We have not yet looked at this in detail, but Scala allows us to |
401 provide some shorthand notation for the sequence parser combinator. We |
401 provide some shorthand notation for the sequence parser combinator. We |
411 \texttt{\Grid{bacde}} & $\rightarrow$ & $\{\}$\\ |
411 \texttt{\Grid{bacde}} & $\rightarrow$ & $\{\}$\\ |
412 \texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$ |
412 \texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$ |
413 \end{tabular} |
413 \end{tabular} |
414 \end{center} |
414 \end{center} |
415 |
415 |
416 \noindent In the first line we have a sucessful parse, because the |
416 \noindent In the first line we have a successful parse, because the |
417 string started with \texttt{ab}, which is the prefix we are looking |
417 string started with \texttt{ab}, which is the prefix we are looking |
418 for. But since the parsing combinator is constructed as sequence of |
418 for. But since the parsing combinator is constructed as sequence of |
419 the two simple (atomic) parsers for \texttt{a} and \texttt{b}, the |
419 the two simple (atomic) parsers for \texttt{a} and \texttt{b}, the |
420 result is a nested pair of the form \texttt{((a, b), cde)}. It is |
420 result is a nested pair of the form \texttt{((a, b), cde)}. It is |
421 \emph{not} a simple pair \texttt{(ab, cde)} as one might errorneously |
421 \emph{not} a simple pair \texttt{(ab, cde)} as one might erroneously |
422 expects. The parser returns the ampty set in the other examples, |
422 expects. The parser returns the empty set in the other examples, |
423 because they do not fit with what the parser is supposed to parse. |
423 because they do not fit with what the parser is supposed to parse. |
424 |
424 |
425 |
425 |
426 A slightly more complicated parser is \pcode{("a" || "b") ~ "c"} |
426 A slightly more complicated parser is \pcode{("a" || "b") ~ "c"} |
427 which parses as first character either an \texttt{a} or \texttt{b} |
427 which parses as first character either an \texttt{a} or \texttt{b} |