| author | Christian Urban <urbanc@in.tum.de> | 
| Wed, 15 Nov 2017 00:28:25 +0000 | |
| changeset 534 | 755ad7e57c7e | 
| parent 392 | 2d0a59127694 | 
| child 584 | 7b879f2a8f6a | 
| permissions | -rw-r--r-- | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1 | \documentclass{article}
 | 
| 297 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 2 | \usepackage{../style}
 | 
| 217 
cd6066f1056a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
183diff
changeset | 3 | \usepackage{../langs}
 | 
| 
cd6066f1056a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
183diff
changeset | 4 | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 5 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 6 | \begin{document}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 7 | |
| 292 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 8 | \section*{Handout 6 (Parser Combinators)}
 | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 9 | |
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 10 | In what follows we explain \emph{parser combinators}. Their
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 11 | distinguishing feature is that they are very easy to | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 12 | implement. However, they only work when the grammar to be | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 13 | parsed is \emph{not} left-recursive and they are efficient
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 14 | only when the grammar is unambiguous. It is the responsibility | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 15 | of the grammar designer to ensure these two properties. | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 16 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 17 | Parser combinators can deal with any kind of input as long as | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 18 | this input is a kind of sequence, for example a string or a | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 19 | list of tokens. The general idea behind parser combinators is | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 20 | to transform the input into sets of pairs, like so | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 21 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 22 | \begin{center}
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 23 | $\underbrace{\text{list of tokens}}_{\text{input}}$ 
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 24 | $\Rightarrow$ | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 25 | $\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 26 | \end{center} 
 | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 27 | |
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 28 | \noindent In Scala parser combinators are functions of type | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 29 | |
| 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 30 | \begin{center}
 | 
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 31 | \texttt{I $\Rightarrow$ Set[(T, I)]}
 | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 32 | \end{center}
 | 
| 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 33 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 34 | \noindent that is they take as input something of type | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 35 | \texttt{I} and return a set of pairs. The first component of
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 36 | these pairs corresponds to what the parser combinator was able | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 37 | to process from the input and the second is the unprocessed | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 38 | part of the input. As we shall see shortly, a parser | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 39 | combinator might return more than one such pair, the idea | 
| 392 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 40 | being that there are potentially several ways of how to | 
| 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 41 | interpret the input. To simplify matters we will first look at | 
| 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 42 | the case where the input to the parser combinators is just | 
| 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 43 | strings. As a concrete example, consider the string | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 44 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 45 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 46 | \tt\Grid{iffoo\VS testbar}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 47 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 48 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 49 | \noindent We might have a parser combinator which tries to | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 50 | interpret this string as a keyword (\texttt{if}) or as an
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 51 | identifier (\texttt{iffoo}). Then the output will be the set
 | 
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 52 | |
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 53 | \begin{center}
 | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 54 | $\left\{ \left(\texttt{\Grid{if}}\;,\; \texttt{\Grid{foo\VS testbar}}\right), 
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 55 |            \left(\texttt{\Grid{iffoo}}\;,\; \texttt{\Grid{\VS testbar}}\right) \right\}$
 | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 56 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 57 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 58 | \noindent where the first pair means the parser could | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 59 | recognise \texttt{if} from the input and leaves the rest as
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 60 | `unprocessed' as the second component of the pair; in the | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 61 | other case it could recognise \texttt{iffoo} and leaves
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 62 | \texttt{\VS testbar} as unprocessed. If the parser cannot
 | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 63 | recognise anything from the input, then parser combinators just | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 64 | return the empty set $\{\}$. This will indicate something
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 65 | ``went wrong''\ldots or more precisely, nothing could be | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 66 | parsed. | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 67 | |
| 392 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 68 | The idea of parser combinators is that we can easily build | 
| 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 69 | parser combinators out of smaller components following very | 
| 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 70 | closely the structure of a grammar. In order to implement this | 
| 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 71 | in an object-oriented programming language, like Scala, we | 
| 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 72 | need to specify an abstract class for parser combinators. This | 
| 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 73 | abstract class requires the implementation of the function | 
| 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 74 | \texttt{parse} taking an argument of type \texttt{I} and
 | 
| 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 75 | returns a set of type \mbox{\texttt{Set[(T, I)]}}.
 | 
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 76 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 77 | \begin{center}
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 78 | \begin{lstlisting}[language=Scala]
 | 
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 79 | abstract class Parser[I, T] {
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 80 | def parse(ts: I): Set[(T, I)] | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 81 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 82 | def parse_all(ts: I): Set[T] = | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 83 | for ((head, tail) <- parse(ts); if (tail.isEmpty)) | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 84 | yield head | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 85 | } | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 86 | \end{lstlisting}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 87 | \end{center}
 | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 88 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 89 | \noindent From the function \texttt{parse} we can then
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 90 | ``centrally'' derive the function \texttt{parse\_all}, which
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 91 | just filters out all pairs whose second component is not empty | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 92 | (that is has still some unprocessed part). The reason is that | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 93 | at the end of the parsing we are only interested in the results | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 94 | where all the input has been consumed and no unprocessed part | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 95 | is left over. | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 96 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 97 | One of the simplest parser combinators recognises just a | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 98 | character, say $c$, from the beginning of strings. Its | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 99 | behaviour can be described as follows: | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 100 | |
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 101 | \begin{itemize}
 | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 102 | \item if the head of the input string starts with a $c$, then returns | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 103 | 	the set $\{(c, \textit{tail of}\; s)\}$, where \textit{tail of} 
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 104 | $s$ is the unprocessed part of the input string | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 105 | \item otherwise return the empty set $\{\}$	
 | 
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 106 | \end{itemize}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 107 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 108 | \noindent | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 109 | The input type of this simple parser combinator for characters is | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 110 | \texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. 
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 111 | The code in Scala is as follows: | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 112 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 113 | \begin{center}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 114 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 115 | case class CharParser(c: Char) extends Parser[String, Char] {
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 116 | def parse(sb: String) = | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 117 | if (sb.head == c) Set((c, sb.tail)) else Set() | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 118 | } | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 119 | \end{lstlisting}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 120 | \end{center}
 | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 121 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 122 | \noindent The \texttt{parse} function tests whether the first
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 123 | character of the input string \texttt{sb} is equal to
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 124 | \texttt{c}. If yes, then it splits the string into the
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 125 | recognised part \texttt{c} and the unprocessed part
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 126 | \texttt{sb.tail}. In case \texttt{sb} does not start with
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 127 | \texttt{c} then the parser returns the empty set (in Scala
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 128 | \texttt{Set()}).
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 129 | |
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 130 | |
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 131 | More interesting are the parser combinators that build larger | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 132 | parsers out of smaller component parsers. For example the | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 133 | alternative parser combinator is as follows: given two | 
| 392 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 134 | parsers, say, $p$ and $q$, we apply both parsers to the | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 135 | input (remember parsers are functions) and combine the output | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 136 | (remember they are sets) | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 137 | |
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 138 | \begin{center}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 139 | $p(\text{input}) \cup q(\text{input})$
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 140 | \end{center}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 141 | |
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 142 | \noindent In Scala we would implement alternative parser | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 143 | combinator as follows | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 144 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 145 | \begin{center}
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 146 | \begin{lstlisting}[language=Scala, numbers=none]
 | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 147 | class AltParser[I, T] | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 148 | (p: => Parser[I, T], | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 149 |         q: => Parser[I, T]) extends Parser[I, T] {
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 150 | def parse(sb: I) = p.parse(sb) ++ q.parse(sb) | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 151 | } | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 152 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 153 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 154 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 155 | \noindent The types of this parser combinator are polymorphic | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 156 | (we just have \texttt{I} for the input type, and \texttt{T}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 157 | for the output type). The alternative parser builds a new | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 158 | parser out of two existing parsers \texttt{p} and \texttt{q}.
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 159 | Both need to be able to process input of type \texttt{I} and
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 160 | return the same output type \texttt{Set[(T,
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 161 | I)]}.\footnote{There is an interesting detail of Scala, namely
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 162 | the \texttt{=>} in front of the types of \texttt{p} and
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 163 | \texttt{q}. They will prevent the evaluation of the arguments
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 164 | before they are used. This is often called \emph{lazy
 | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 165 | evaluation} of the arguments. We will explain this later.} The alternative parser should | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 166 | run the input with the first parser \texttt{p} (producing a
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 167 | set of outputs) and then run the same input with \texttt{q}.
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 168 | The result should be then just the union of both sets, which | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 169 | is the operation \texttt{++} in Scala.
 | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 170 | |
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 171 | The alternative parser combinator already allows us to | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 172 | construct a parser that parses either a character \texttt{a}
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 173 | or \texttt{b}, as
 | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 174 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 175 | \begin{center}
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 176 | \begin{lstlisting}[language=Scala, numbers=none]
 | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 177 | new AltParser(CharParser('a'), CharParser('b'))
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 178 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 179 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 180 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 181 | \noindent Scala allows us to introduce some more readable | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 182 | shorthand notation for this, like \texttt{'a' || 'b'}. We can
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 183 | call this parser combinator with the strings | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 184 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 185 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 186 | \begin{tabular}{rcl}
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 187 | input strings & & output\medskip\\ | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 188 | \texttt{\Grid{ac}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 189 | \texttt{\Grid{bc}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{c}})\right\}$\\
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 190 | \texttt{\Grid{cc}} & $\rightarrow$ & $\{\}$
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 191 | \end{tabular}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 192 | \end{center}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 193 | |
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 194 | \noindent We receive in the first two cases a successful | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 195 | output (that is a non-empty set). In each case, either | 
| 392 
2d0a59127694
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
386diff
changeset | 196 | \pcode{a} or \pcode{b} is in the processed part, and
 | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 197 | \pcode{c} in the unprocessed part. Clearly this parser cannot
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 198 | parse anything in the string \pcode{cc}, therefore the empty
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 199 | set. | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 200 | |
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 201 | A bit more interesting is the \emph{sequence parser
 | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 202 | combinator}. Given two parsers, say, $p$ and $q$, apply first | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 203 | the input to $p$ producing a set of pairs; then apply $q$ to | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 204 | all the unparsed parts; then combine the results like | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 205 | |
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 206 | \begin{center}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 207 | \begin{tabular}{lcl}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 208 | $\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & 
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 209 | $(\textit{output}_1, u_1) \in p(\text{input}) 
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 210 | \;\wedge\;$\\ | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 211 | && $\;(\textit{output}_2, u_2) \in q(u_1)\}$
 | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 212 | \end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 213 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 214 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 215 | \noindent | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 216 | This can be implemented in Scala as follows: | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 217 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 218 | \begin{center}
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 219 | \begin{lstlisting}[language=Scala,numbers=none]
 | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 220 | class SeqParser[I, T, S] | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 221 | (p: => Parser[I, T], | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 222 |         q: => Parser[I, S]) extends Parser[I, (T, S)] {
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 223 | def parse(sb: I) = | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 224 | for ((output1, u1) <- p.parse(sb); | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 225 | (output2, u2) <- q.parse(u1)) | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 226 | yield ((output1, output2), u2) | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 227 | } | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 228 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 229 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 230 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 231 | \noindent This parser takes as input two parsers, \texttt{p}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 232 | and \texttt{q}. It implements \texttt{parse} as follows: let
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 233 | first run the parser \texttt{p} on the input producing a set
 | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 234 | of pairs (\texttt{output1}, \texttt{u1}). The \texttt{u1}
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 235 | stands for the unprocessed parts left over by \texttt{p}. Let
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 236 | \texttt{q} run on these unprocessed parts producing again a
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 237 | set of pairs. The output of the sequence parser combinator is | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 238 | then a set containing pairs where the first components are | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 239 | again pairs, namely what the first parser could parse together | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 240 | with what the second parser could parse; the second component | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 241 | is the unprocessed part left over after running the second | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 242 | parser \texttt{q}. Therefore the input type of the sequence
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 243 | parser combinator is as usual \texttt{I}, but the output type
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 244 | is | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 245 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 246 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 247 | \texttt{Set[((T, S), I)]}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 248 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 249 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 250 | Scala allows us to provide some shorthand notation for the | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 251 | sequence parser combinator. We can write for example | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 252 | \pcode{'a' ~ 'b'}, which is the parser combinator that
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 253 | first consumes the character \texttt{a} from a string and then
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 254 | \texttt{b}. Three examples of this parser combinator are as
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 255 | follows: | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 256 | |
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 257 | \begin{center}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 258 | \begin{tabular}{rcl}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 259 | input strings & & output\medskip\\ | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 260 | \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 261 | \texttt{\Grid{bac}} & $\rightarrow$ & $\{\}$\\
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 262 | \texttt{\Grid{ccc}} & $\rightarrow$ & $\{\}$
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 263 | \end{tabular}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 264 | \end{center}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 265 | |
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 266 | \noindent A slightly more complicated parser is \pcode{('a'
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 267 | || 'b') ~ 'b'} which parses as first character either an | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 268 | \texttt{a} or \texttt{b} followed by a \texttt{b}. This parser
 | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 269 | produces the following outputs. | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 270 | |
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 271 | \begin{center}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 272 | \begin{tabular}{rcl}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 273 | input strings & & output\medskip\\ | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 274 | \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 275 | \texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 276 | \texttt{\Grid{aac}} & $\rightarrow$ & $\{\}$
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 277 | \end{tabular}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 278 | \end{center}
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 279 | |
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 280 | \noindent Two more examples: first consider the parser \pcode{('a' ~
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 281 | 'a') ~ 'a'} and the input \pcode{aaaa}:
 | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 282 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 283 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 284 | \begin{tabular}{rcl}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 285 | input string & & output\medskip\\ | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 286 | \texttt{\Grid{aaaa}} & $\rightarrow$ & 
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 287 | $\left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}$\\
 | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 288 | \end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 289 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 290 | |
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 291 | \noindent Notice how the results nest deeper and deeper as | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 292 | pairs (the last \pcode{a} is in the unprocessed part). To
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 293 | consume everything of this string we can use the parser | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 294 | \pcode{(('a' ~'a') ~ 'a') ~ 'a'}. Then the output is as
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 295 | follows: | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 296 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 297 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 298 | \begin{tabular}{rcl}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 299 | input string & & output\medskip\\ | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 300 | \texttt{\Grid{aaaa}} & $\rightarrow$ & 
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 301 | $\left\{((((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{""})\right\}$\\
 | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 302 | \end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 303 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 304 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 305 | \noindent This is an instance where the parser consumed | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 306 | completely the input, meaning the unprocessed part is just the | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 307 | empty string. | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 308 | |
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 309 | |
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 310 | Note carefully that constructing a parser such \pcode{'a' ||
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 311 | ('a' ~ 'b')} will result in a typing error. The first
 | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 312 | parser has as output type a single character (recall the type | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 313 | of \texttt{CharParser}), but the second parser produces a pair
 | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 314 | of characters as output. The alternative parser is however | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 315 | required to have both component parsers to have the same type. | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 316 | We will see later how we can build this parser without the | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 317 | typing error. | 
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 318 | |
| 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 319 | The next parser combinator does not actually combine smaller | 
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 320 | parsers, but applies a function to the result of a parser. | 
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
297diff
changeset | 321 | It is implemented in Scala as follows | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 322 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 323 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 324 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 325 | class FunParser[I, T, S] | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 326 | (p: => Parser[I, T], | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 327 |           f: T => S) extends Parser[I, S] {
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 328 | def parse(sb: I) = | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 329 | for ((head, tail) <- p.parse(sb)) yield (f(head), tail) | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 330 | } | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 331 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 332 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 333 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 334 | |
| 386 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 335 | \noindent This parser combinator takes a parser \texttt{p}
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 336 | with output type \texttt{T} as one argument as well as a
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 337 | function \texttt{f} with type \texttt{T => S}. The parser
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 338 | \texttt{p} produces sets of type \texttt{(T, I)}. The
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 339 | \texttt{FunParser} combinator then applies the function
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 340 | \texttt{f} to all the parser outputs. Since this function is of
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 341 | type \texttt{T => S}, we obtain a parser with output type
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 342 | \texttt{S}. Again Scala lets us introduce some shorthand
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 343 | notation for this parser combinator. Therefore we will write | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 344 | \texttt{p ==> f} for it.
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 345 | |
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 346 | \subsubsection*{How to build parsers using parser combinators?}
 | 
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 347 | |
| 
31295bb945c6
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
385diff
changeset | 348 | \subsubsection*{Implementing an Interpreter}
 | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 349 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 350 | %\bigskip | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 351 | %takes advantage of the full generality---have a look | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 352 | %what it produces if we call it with the string \texttt{abc}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 353 | % | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 354 | %\begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 355 | %\begin{tabular}{rcl}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 356 | %input string & & output\medskip\\ | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 357 | %\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 358 | %\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 359 | %\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 360 | %\end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 361 | %\end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 362 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 363 | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 364 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 365 | \end{document}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 366 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 367 | %%% Local Variables: | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 368 | %%% mode: latex | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 369 | %%% TeX-master: t | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 370 | %%% End: |