afl-material: handouts/ho06.tex@529460073b24 (annotated)

584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	1
173 7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	2	\documentclass{article}
297 5c51839c88fd updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 292 diff changeset	3	\usepackage{../style}
217 cd6066f1056a updated handouts Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 183 diff changeset	4	\usepackage{../langs}
cd6066f1056a updated handouts Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 183 diff changeset	5
173 7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	6
7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	7	\begin{document}
7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	8
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 217 diff changeset	9	\section*{Handout 6 (Parser Combinators)}
173 7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	10
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	11	This handout explains how \emph{parser combinators} work and how they
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	12	can be implemented in Scala. Their distinguishing feature is that they
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	13	are very easy to implement (admittedly it is only easy in a functional
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	14	programming language). However, parser combinators require that the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	15	grammar to be parsed is \emph{not} left-recursive and they are
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	16	efficient only when the grammar is unambiguous. It is the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	17	responsibility of the grammar designer to ensure these two properties.
173 7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	18
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	19	Another good point of parser combinators is that they can deal with any kind
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	20	of input as long as this input of ``sequence-kind'', for example a
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	21	string or a list of tokens. The general idea behind parser combinators
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	22	is to transform the input into sets of pairs, like so
175 5801e8c0e528 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 173 diff changeset	23
5801e8c0e528 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 173 diff changeset	24	\begin{center}
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	25	$\underbrace{\text{list of tokens}}_{\text{input}}$
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	26	$\Rightarrow$
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	27	$\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	28	\end{center}
175 5801e8c0e528 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 173 diff changeset	29
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	30	\noindent As said, the input can be anything as long as it is a
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	31	``sequence''. The only property of the input we need is to be able to
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	32	test when it is empty. Obviously we can do this for strings and lists.
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	33	For more lucidity we shall below often use strings as input in order
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	34	to illustrate matters. However, this does not make our previous work
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	35	on lexers obsolete (remember they transform a string into a list of
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	36	tokens). Lexers will still be needed to build a somewhat realistic
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	37	compiler.
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	38
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	39
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	40	In my Scala code I use the following polymorphic types for parser combinators:
176 3c2653fc8b5a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 175 diff changeset	41
3c2653fc8b5a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 175 diff changeset	42	\begin{center}
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	43	input:\;\; \texttt{I} \qquad output:\;\; \texttt{T}
176 3c2653fc8b5a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 175 diff changeset	44	\end{center}
3c2653fc8b5a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 175 diff changeset	45
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	46	\noindent that is they take as input something of type \texttt{I} and
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	47	return a set of pairs of \texttt{Set[(T, I)]}. Since the input needs
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	48	to be of ``sequence-kind'' I actually have to often write \texttt{I
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	49	<\% Seq[\_]} for the input type in order to indicate it is a
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	50	subtype of Scala sequences. The first component of the generated pairs
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	51	corresponds to what the parser combinator was able to process from the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	52	input and the second is the unprocessed part of the input (therefore
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	53	the type of this unprocessed part is the same as the input). As we
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	54	shall see shortly, a parser combinator might return more than one such
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	55	pair; the idea being that there are potentially several ways of how to
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	56	parse the input. As a concrete example, consider the string
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	57
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	58	\begin{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	59	\tt\Grid{iffoo\VS testbar}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	60	\end{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	61
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	62	\noindent We might have a parser combinator which tries to
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	63	interpret this string as a keyword (\texttt{if}) or as an
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	64	identifier (\texttt{iffoo}). Then the output will be the set
177 53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	65
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	66	\begin{center}
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	67	$\left\{ \left(\texttt{\Grid{if}}\;,\; \texttt{\Grid{foo\VS testbar}}\right),
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	68	\left(\texttt{\Grid{iffoo}}\;,\; \texttt{\Grid{\VS testbar}}\right) \right\}$
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	69	\end{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	70
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	71	\noindent where the first pair means the parser could
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	72	recognise \texttt{if} from the input and leaves the rest as
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	73	`unprocessed' as the second component of the pair; in the
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	74	other case it could recognise \texttt{iffoo} and leaves
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	75	\texttt{\VS testbar} as unprocessed. If the parser cannot
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	76	recognise anything from the input, then parser combinators just
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	77	return the empty set $\{\}$. This will indicate something
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	78	``went wrong''\ldots or more precisely, nothing could be
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	79	parsed.
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	80
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	81	Also important to note is that the type \texttt{T} for the processed
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	82	part is different from the input type. The reason is that in general
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	83	we are interested in transform our input into something
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	84	``different''\ldots for example into a tree, or if we implement the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	85	grammar for arithmetic expressions we might be interested in the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	86	actual integer number the arithmetic expression, say \texttt{1 + 2 *
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	87	3}, stands for. In this way we can use parser combinators to
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	88	implement relativaley easily a calculator.
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	89
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	90	The main idea of parser combinators is that we can easily build parser
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	91	combinators out of smaller components following very closely the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	92	structure of a grammar. In order to implement this in an
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	93	object-oriented programming language, like Scala, we need to specify
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	94	an abstract class for parser combinators. This abstract class states
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	95	that the function \texttt{parse} takes an argument of type \texttt{I}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	96	and returns a set of type \mbox{\texttt{Set[(T, I)]}}.
177 53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	97
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	98	\begin{center}
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	99	\begin{lstlisting}[language=Scala]
177 53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	100	abstract class Parser[I, T] {
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	101	def parse(ts: I): Set[(T, I)]
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	102
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	103	def parse_all(ts: I): Set[T] =
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	104	for ((head, tail) <- parse(ts); if (tail.isEmpty))
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	105	yield head
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	106	}
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	107	\end{lstlisting}
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	108	\end{center}
176 3c2653fc8b5a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 175 diff changeset	109
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	110	\noindent It is the obligation in each instance (parser combinator) to
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	111	supply an implementation for \texttt{parse}. From this function we
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	112	can then ``centrally'' derive the function \texttt{parse\_all}, which
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	113	just filters out all pairs whose second component is not empty (that
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	114	is has still some unprocessed part). The reason is that at the end of
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	115	the parsing we are only interested in the results where all the input
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	116	has been consumed and no unprocessed part is left over.
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	117
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	118	One of the simplest parser combinators recognises just a
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	119	single character, say $c$, from the beginning of strings. Its
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	120	behaviour can be described as follows:
176 3c2653fc8b5a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 175 diff changeset	121
177 53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	122	\begin{itemize}
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	123	\item If the head of the input string starts with a $c$, then return
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	124	the set
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	125	\[\{(c, \textit{tail of}\; s)\}\]
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	126	where \textit{tail of}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	127	$s$ is the unprocessed part of the input string.
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	128	\item Otherwise return the empty set $\{\}$.
177 53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	129	\end{itemize}
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	130
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	131	\noindent
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	132	The input type of this simple parser combinator for characters is
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	133	\texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}.
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	134	The code in Scala is as follows:
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	135
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	136	\begin{center}
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	137	\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	138	case class CharParser(c: Char) extends Parser[String, Char] {
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	139	def parse(sb: String) =
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	140	if (sb.head == c) Set((c, sb.tail)) else Set()
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	141	}
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	142	\end{lstlisting}
53def1fbf472 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 176 diff changeset	143	\end{center}
176 3c2653fc8b5a updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 175 diff changeset	144
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	145	\noindent You can see the \texttt{parse} function tests whether the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	146	first character of the input string \texttt{sb} is equal to
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	147	\texttt{c}. If yes, then it splits the string into the recognised part
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	148	\texttt{c} and the unprocessed part \texttt{sb.tail}. In case
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	149	\texttt{sb} does not start with \texttt{c} then the parser returns the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	150	empty set (in Scala \texttt{Set()}). Since this parser recognises
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	151	characters and just returns characters as the processed part, the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	152	output type of the parser is \texttt{Char}.
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	153
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	154	If we want to parse a list of tokens and interested in recognising a
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	155	number token, we could write something like this
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	156
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	157	\begin{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	158	\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily,numbers=none]
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	159	case object NumParser extends Parser[List[Token], Int] {
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	160	def parse(ts: List[Token]) = ts match {
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	161	case Num_token(s)::ts => Set((s.toInt, ts))
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	162	case _ => Set ()
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	163	}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	164	}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	165	\end{lstlisting}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	166	\end{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	167
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	168	\noindent
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	169	In this parser the input is of type \texttt{List[Token]}. The function
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	170	parse looks at the input \texttt{ts} and checks whether the first
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	171	token is a \texttt{Num\_token}. Let us assume our lexer generated
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	172	these tokens for numbers. But this parser does not just return this
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	173	token (and the rest of the list), like the \texttt{CharParser} above,
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	174	rather extracts the string \texttt{s} from the token and converts it
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	175	into an integer. The hope is that the lexer did its work well and this
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	176	conversion always succeeds. The consequence of this is that the output
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	177	type for this parser is \texttt{Int}. Such a conversion would be
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	178	needed if we want to implement a simple calculator program.
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	179
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	180
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	181	These simple parsers that just look at the input and do a simple
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	182	transformation are often called \emph{atomic} parser combinators.
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	183	More interesting are the parser combinators that build larger parsers
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	184	out of smaller component parsers. For example the \emph{alternative
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	185	parser combinator} is as follows: given two parsers, say, $p$ and
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	186	$q$, we apply both parsers to the input (remember parsers are
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	187	functions) and combine the output (remember they are sets of pairs)
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	188
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	189	\begin{center}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	190	$p(\text{input}) \cup q(\text{input})$
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	191	\end{center}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	192
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	193	\noindent In Scala we would implement alternative parser
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	194	combinator as follows
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	195
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	196	\begin{center}
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	197	\begin{lstlisting}[language=Scala, numbers=none]
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	198	class AltParser[I, T]
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	199	(p: => Parser[I, T],
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	200	q: => Parser[I, T]) extends Parser[I, T] {
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	201	def parse(sb: I) = p.parse(sb) ++ q.parse(sb)
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	202	}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	203	\end{lstlisting}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	204	\end{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	205
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	206	\noindent The types of this parser combinator are again generic (we
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	207	just have \texttt{I} for the input type, and \texttt{T} for the output
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	208	type). The alternative parser builds a new parser out of two existing
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	209	parsers \texttt{p} and \texttt{q}. Both need to be able to process
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	210	input of type \texttt{I} and return the same output type
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	211	\texttt{Set[(T, I)]}.\footnote{There is an interesting detail of
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	212	Scala, namely the \texttt{=>} in front of the types of \texttt{p}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	213	and \texttt{q}. They will prevent the evaluation of the arguments
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	214	before they are used. This is often called \emph{lazy evaluation} of
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	215	the arguments. We will explain this later.} Therefore the output
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	216	type of this parser is \texttt{T}. The alternative parser should run
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	217	the input with the first parser \texttt{p} (producing a set of pairs)
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	218	and then run the same input with \texttt{q} (producing another set of
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	219	pairs). The result should be then just the union of both sets, which
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	220	is the operation \texttt{++} in Scala.
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	221
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	222	The alternative parser combinator already allows us to
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	223	construct a parser that parses either a character \texttt{a}
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	224	or \texttt{b}, as
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	225
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	226	\begin{center}
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	227	\begin{lstlisting}[language=Scala, numbers=none]
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	228	new AltParser(CharParser('a'), CharParser('b'))
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	229	\end{lstlisting}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	230	\end{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	231
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	232	\noindent Later on we will use again Scala mechanism for introducing some
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	233	more readable shorthand notation for this, like \texttt{"a" \|\| "b"}. Let us look in detail at what this parser combinator produces with some somple strings
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	234
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	235	\begin{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	236	\begin{tabular}{rcl}
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	237	input strings & & output\medskip\\
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	238	\texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	239	\texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{cde}})\right\}$\\
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	240	\texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	241	\end{tabular}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	242	\end{center}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	243
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	244	\noindent We receive in the first two cases a successful
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	245	output (that is a non-empty set). In each case, either
392 2d0a59127694 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 386 diff changeset	246	\pcode{a} or \pcode{b} is in the processed part, and
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	247	\pcode{cde} in the unprocessed part. Clearly this parser cannot
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	248	parse anything in the string \pcode{ccde}, therefore the empty
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	249	set.
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	250
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	251	A bit more interesting is the \emph{sequence parser combinator}. Given
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	252	two parsers, say again, $p$ and $q$, we want to apply first the input
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	253	to $p$ producing a set of pairs; then apply $q$ to all the unparsed
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	254	parts in the pairs; and then combine the results like
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	255
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	256	\begin{center}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	257	\begin{tabular}{lcl}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	258	$\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;\|\;$ &
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	259	$(\textit{output}_1, u_1) \in p(\text{input})
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	260	\;\wedge\;$\\
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	261	&& $(\textit{output}_2, u_2) \in q(u_1)\}$
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	262	\end{tabular}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	263	\end{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	264
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	265	\noindent Notice that the $p$ wil first be run on the input, producing
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	266	pairs of the form $(\textit{output}_1, u_1)$ where the $u_1$ stands
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	267	for the unprocessed, or left-over, parts. We want that $q$ runs on all
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	268	these unprocessed parts $u_1$. This again will produce some
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	269	processed part , $p$ and
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	270	$q$, we apply both parsers to the input (remember parsers are
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	271	functions) and combine the output (remember they are sets of pairs)
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	272
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	273	\begin{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	274	$p(\text{input}) \cup q(\text{input})$
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	275	\end{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	276
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	277	\noindent In Scala we would implement alternative parser
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	278	combinator as follows
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	279
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	280	\begin{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	281	\begin{lstlisting}[language=Scala, numbers=none]
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	282	class AltParser[I, T]
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	283	(p: => Parser[I, T],
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	284	q: => Parser[I, T]) extends Parser[I, T] {
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	285	def parse(sb: I) = p.parse(sb) ++ q.parse(sb)
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	286	}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	287	\end{lstlisting}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	288	\end{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	289
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	290	\noindent The types of this parser combinator are again generic (we
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	291	just have \texttt{I} for the input type, and \texttt{T} for the output
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	292	type). The alternative parser builds a new parser out of two existing
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	293	parsers \texttt{p} and \texttt{q}. Both need to be able to process
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	294	input of type \texttt{I} and return the same output type
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	295	\texttt{Set[(T, I)]}.\footnote{There is an interesting detail of
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	296	Scala, namely the \texttt{=>} in front of the types of \texttt{p}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	297	and \texttt{q}. They will prevent the evaluation of the arguments
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	298	before they are used. This is often called \emph{lazy evaluation} of
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	299	the arguments. We will explain this later.} Therefore the output
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	300	type of this parser is \texttt{T}. The alternative parser should run
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	301	the input with the first parser \texttt{p} (producing a set of pairs)
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	302	and then run the same input with \texttt{q} (producing another set of
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	303	pairs). The result should be then just the union of both sets, which
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	304	is the operation \texttt{++} in Scala.
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	305
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	306	The alternative parser combinator already allows us to
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	307	construct a parser that parses either a character \texttt{a}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	308	or \texttt{b}, as
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	309
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	310	\begin{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	311	\begin{lstlisting}[language=Scala, numbers=none]
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	312	new AltParser(CharParser('a'), CharParser('b'))
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	313	\end{lstlisting}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	314	\end{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	315
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	316	\noindent Later on we will use again Scala mechanism for introducing some
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	317	more readable shorthand notation for this, like \texttt{"a" \|\| "b"}. Let us look in detail at what this parser combinator produces with some somple strings
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	318
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	319	\begin{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	320	\begin{tabular}{rcl}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	321	input strings & & output\medskip\\
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	322	\texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	323	\texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{cde}})\right\}$\\
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	324	\texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	325	\end{tabular}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	326	\end{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	327
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	328	\noindent We receive in the first two cases a successful
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	329	output (that is a non-empty set). In each case, either
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	330	\pcode{a} or \pcode{b} is in the processed part, and
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	331	\pcode{cde} in the unprocessed part. Clearly this parser cannot
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	332	parse anything in the string \pcode{ccde}, therefore the empty
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	333	set.
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	334
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	335	A bit more interesting is the \emph{sequence parser combinator}. Given
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	336	two parsers, say again, $p$ and $q$, we want to apply first the input
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	337	to $p$ producing a set of pairs; then apply $q$ to all the unparsed
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	338	parts in the pairs; and then combine the results like
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	339
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	340	\begin{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	341	\begin{tabular}{lcl}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	342	$\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;\|\;$ &
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	343	$(\textit{output}_1, u_1) \in p(\text{input})
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	344	\;\wedge\;$\\
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	345	&& $(\textit{output}_2, u_2) \in q(u_1)\}$
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	346	\end{tabular}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	347	\end{center}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	348
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	349	\noindent Notice that the $p$ wil first be run on the input, producing
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	350	pairs of the form $\textit{output}_1$ and unprocessed part $u_1$. The
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	351	overall result of the sequence parser combinator is pairs of the form
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	352	$((\textit{output}_1, \textit{output}_2), u_2)$. This means the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	353	unprocessed parts of both parsers is the unprocessed parts the second
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	354	parser $q$ produces as left-over. The processed parts of both parsers
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	355	is just the pair of the outputs
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	356	$(\textit{output}_1, \textit{output}_2)$. This behavious can be
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	357	implemented in Scala as follows:
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	358
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	359	\begin{center}
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	360	\begin{lstlisting}[language=Scala,numbers=none]
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	361	class SeqParser[I, T, S]
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	362	(p: => Parser[I, T],
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	363	q: => Parser[I, S]) extends Parser[I, (T, S)] {
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	364	def parse(sb: I) =
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	365	for ((output1, u1) <- p.parse(sb);
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	366	(output2, u2) <- q.parse(u1))
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	367	yield ((output1, output2), u2)
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	368	}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	369	\end{lstlisting}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	370	\end{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	371
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	372	\noindent This parser takes again as input two parsers, \texttt{p}
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	373	and \texttt{q}. It implements \texttt{parse} as follows: let
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	374	first run the parser \texttt{p} on the input producing a set
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	375	of pairs (\texttt{output1}, \texttt{u1}). The \texttt{u1}
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	376	stands for the unprocessed parts left over by \texttt{p}. Let
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	377	\texttt{q} run on these unprocessed parts producing again a
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	378	set of pairs. The output of the sequence parser combinator is
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	379	then a set containing pairs where the first components are
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	380	again pairs, namely what the first parser could parse together
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	381	with what the second parser could parse; the second component
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	382	is the unprocessed part left over after running the second
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	383	parser \texttt{q}. Therefore the input type of the sequence
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	384	parser combinator is as usual \texttt{I}, but the output type
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	385	is
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	386
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	387	\begin{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	388	\texttt{Set[((T, S), I)]}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	389	\end{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	390
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	391	\noindent
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	392	If any of the runs of \textit{p} and \textit{q} fail, that is produce
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	393	the empty set, then \texttt{parse} will also produce the empty set.
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	394	Notice that we have now two output types for the sequence parser
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	395	combinator, because in general \textit{p} and \textit{q} might produce
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	396	differetn things (for example first we recognise a number and then a
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	397	string corresponding to an operator).
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	398
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	399
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	400	We have not yet looked at this in detail, but Scala allows us to
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	401	provide some shorthand notation for the sequence parser combinator. We
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	402	can write for example \pcode{"a" ~ "b"}, which is the parser
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	403	combinator that first recognises the character \texttt{a} from a
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	404	string and then \texttt{b}. Let us look again at three examples of
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	405	how this parser combinator processes strings:
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	406
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	407	\begin{center}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	408	\begin{tabular}{rcl}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	409	input strings & & output\medskip\\
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	410	\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{cde}})\right\}$\\
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	411	\texttt{\Grid{bacde}} & $\rightarrow$ & $\{\}$\\
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	412	\texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	413	\end{tabular}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	414	\end{center}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	415
584 529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	416	\noindent In the first line we have a sucessful parse, because the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	417	string started with \texttt{ab}, which is the prefix we are looking
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	418	for. But since the parsing combinator is constructed as sequence of
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	419	the two simple (atomic) parsers for \texttt{a} and \texttt{b}, the
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	420	result is a nested pair of the form \texttt{((a, b), cde)}. It is
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	421	\emph{not} a simple pair \texttt{(ab, cde)} as one might errorneously
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	422	expects. The parser returns the ampty set in the other examples,
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	423	because they do not fit with what the parser is supposed to parse.
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	424
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	425
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	426	A slightly more complicated parser is \pcode{("a" \|\| "b") ~ "c"}.
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	427	which parses as first character either an \texttt{a} or \texttt{b}
529460073b24 updated Christian Urban <urbanc@in.tum.de> parents: 392 diff changeset	428	followed by a \texttt{c}. This parser produces the following outputs.
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	429
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	430	\begin{center}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	431	\begin{tabular}{rcl}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	432	input strings & & output\medskip\\
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	433	\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	434	\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	435	\texttt{\Grid{aac}} & $\rightarrow$ & $\{\}$
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	436	\end{tabular}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	437	\end{center}
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	438
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	439	\noindent Two more examples: first consider the parser \pcode{('a' ~
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	440	'a') ~ 'a'} and the input \pcode{aaaa}:
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	441
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	442	\begin{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	443	\begin{tabular}{rcl}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	444	input string & & output\medskip\\
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	445	\texttt{\Grid{aaaa}} & $\rightarrow$ &
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	446	$\left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}$\\
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	447	\end{tabular}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	448	\end{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	449
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	450	\noindent Notice how the results nest deeper and deeper as
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	451	pairs (the last \pcode{a} is in the unprocessed part). To
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	452	consume everything of this string we can use the parser
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	453	\pcode{(('a' ~'a') ~ 'a') ~ 'a'}. Then the output is as
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	454	follows:
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	455
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	456	\begin{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	457	\begin{tabular}{rcl}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	458	input string & & output\medskip\\
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	459	\texttt{\Grid{aaaa}} & $\rightarrow$ &
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	460	$\left\{((((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{""})\right\}$\\
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	461	\end{tabular}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	462	\end{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	463
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	464	\noindent This is an instance where the parser consumed
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	465	completely the input, meaning the unprocessed part is just the
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	466	empty string.
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	467
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	468
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	469	Note carefully that constructing a parser such \pcode{'a' \|\|
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	470	('a' ~ 'b')} will result in a typing error. The first
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	471	parser has as output type a single character (recall the type
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	472	of \texttt{CharParser}), but the second parser produces a pair
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	473	of characters as output. The alternative parser is however
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	474	required to have both component parsers to have the same type.
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	475	We will see later how we can build this parser without the
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	476	typing error.
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	477
7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	478	The next parser combinator does not actually combine smaller
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	479	parsers, but applies a function to the result of a parser.
385 7f8516ff408d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 297 diff changeset	480	It is implemented in Scala as follows
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	481
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	482	\begin{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	483	\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	484	class FunParser[I, T, S]
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	485	(p: => Parser[I, T],
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	486	f: T => S) extends Parser[I, S] {
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	487	def parse(sb: I) =
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	488	for ((head, tail) <- p.parse(sb)) yield (f(head), tail)
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	489	}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	490	\end{lstlisting}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	491	\end{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	492
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	493
386 31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	494	\noindent This parser combinator takes a parser \texttt{p}
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	495	with output type \texttt{T} as one argument as well as a
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	496	function \texttt{f} with type \texttt{T => S}. The parser
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	497	\texttt{p} produces sets of type \texttt{(T, I)}. The
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	498	\texttt{FunParser} combinator then applies the function
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	499	\texttt{f} to all the parser outputs. Since this function is of
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	500	type \texttt{T => S}, we obtain a parser with output type
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	501	\texttt{S}. Again Scala lets us introduce some shorthand
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	502	notation for this parser combinator. Therefore we will write
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	503	\texttt{p ==> f} for it.
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	504
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	505	\subsubsection*{How to build parsers using parser combinators?}
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	506
31295bb945c6 update Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 385 diff changeset	507	\subsubsection*{Implementing an Interpreter}
183 b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	508
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	509	%\bigskip
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	510	%takes advantage of the full generality---have a look
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	511	%what it produces if we call it with the string \texttt{abc}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	512	%
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	513	%\begin{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	514	%\begin{tabular}{rcl}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	515	%input string & & output\medskip\\
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	516	%\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	517	%\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	518	%\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	519	%\end{tabular}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	520	%\end{center}
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	521
b17eff695c7f added new stuff Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 177 diff changeset	522
173 7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	523
7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	524	\end{document}
7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	525
7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	526	%%% Local Variables:
7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	527	%%% mode: latex
7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	528	%%% TeX-master: t
7cfb7a6f7c99 added slides Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	529	%%% End:

author	Christian Urban <urbanc@in.tum.de>
	Wed, 24 Oct 2018 12:49:23 +0100
changeset 584	529460073b24
parent 392	2d0a59127694
child 585	6ee22f196884
permissions	-rw-r--r--