handouts/ho06.tex
changeset 584 529460073b24
parent 392 2d0a59127694
child 585 6ee22f196884
equal deleted inserted replaced
583:200d2a3eb1b1 584:529460073b24
       
     1 
     1 \documentclass{article}
     2 \documentclass{article}
     2 \usepackage{../style}
     3 \usepackage{../style}
     3 \usepackage{../langs}
     4 \usepackage{../langs}
     4 
     5 
     5 
     6 
     6 \begin{document}
     7 \begin{document}
     7 
     8 
     8 \section*{Handout 6 (Parser Combinators)}
     9 \section*{Handout 6 (Parser Combinators)}
     9 
    10 
    10 In what follows we explain \emph{parser combinators}. Their
    11 This handout explains how \emph{parser combinators} work and how they
    11 distinguishing feature is that they are very easy to
    12 can be implemented in Scala. Their distinguishing feature is that they
    12 implement. However, they only work when the grammar to be
    13 are very easy to implement (admittedly it is only easy in a functional
    13 parsed is \emph{not} left-recursive and they are efficient
    14 programming language). However, parser combinators require that the
    14 only when the grammar is unambiguous. It is the responsibility
    15 grammar to be parsed is \emph{not} left-recursive and they are
    15 of the grammar designer to ensure these two properties.
    16 efficient only when the grammar is unambiguous. It is the
    16 
    17 responsibility of the grammar designer to ensure these two properties.
    17 Parser combinators can deal with any kind of input as long as
    18 
    18 this input is a kind of sequence, for example a string or a
    19 Another good point of parser combinators is that they can deal with any kind
    19 list of tokens. The general idea behind parser combinators is
    20 of input as long as this input of ``sequence-kind'', for example a
    20 to transform the input into sets of pairs, like so
    21 string or a list of tokens. The general idea behind parser combinators
       
    22 is to transform the input into sets of pairs, like so
    21 
    23 
    22 \begin{center}
    24 \begin{center}
    23 $\underbrace{\text{list of tokens}}_{\text{input}}$ 
    25 $\underbrace{\text{list of tokens}}_{\text{input}}$ 
    24 $\Rightarrow$
    26 $\Rightarrow$
    25 $\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$
    27 $\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$
    26 \end{center} 
    28 \end{center} 
    27 
    29 
    28 \noindent In Scala parser combinators are functions of type
    30 \noindent As said, the input can be anything as long as it is a
    29 
    31 ``sequence''. The only property of the input we need is to be able to
    30 \begin{center}
    32 test when it is empty. Obviously we can do this for strings and lists.
    31 \texttt{I $\Rightarrow$ Set[(T, I)]}
    33 For more lucidity we shall below often use strings as input in order
    32 \end{center}
    34 to illustrate matters. However, this does not make our previous work
    33 
    35 on lexers obsolete (remember they transform a string into a list of
    34 \noindent that is they take as input something of type
    36 tokens). Lexers will still be needed to build a somewhat realistic
    35 \texttt{I} and return a set of pairs. The first component of
    37 compiler.
    36 these pairs corresponds to what the parser combinator was able
    38 
    37 to process from the input and the second is the unprocessed
    39 
    38 part of the input. As we shall see shortly, a parser
    40 In my Scala code I use the following polymorphic types for parser combinators:
    39 combinator might return more than one such pair, the idea
    41 
    40 being that there are potentially several ways of how to
    42 \begin{center}
    41 interpret the input. To simplify matters we will first look at
    43 input:\;\; \texttt{I}  \qquad output:\;\; \texttt{T}
    42 the case where the input to the parser combinators is just
    44 \end{center}
    43 strings. As a concrete example, consider the string
    45 
       
    46 \noindent that is they take as input something of type \texttt{I} and
       
    47 return a set of pairs of \texttt{Set[(T, I)]}. Since the input needs
       
    48 to be of ``sequence-kind'' I actually have to often write \texttt{I
       
    49   <\% Seq[\_]} for the input type in order to indicate it is a
       
    50 subtype of Scala sequences. The first component of the generated pairs
       
    51 corresponds to what the parser combinator was able to process from the
       
    52 input and the second is the unprocessed part of the input (therefore
       
    53 the type of this unprocessed part is the same as the input). As we
       
    54 shall see shortly, a parser combinator might return more than one such
       
    55 pair; the idea being that there are potentially several ways of how to
       
    56 parse the input.  As a concrete example, consider the string
    44 
    57 
    45 \begin{center}
    58 \begin{center}
    46 \tt\Grid{iffoo\VS testbar}
    59 \tt\Grid{iffoo\VS testbar}
    47 \end{center}
    60 \end{center}
    48 
    61 
    63 recognise anything from the input, then parser combinators just
    76 recognise anything from the input, then parser combinators just
    64 return the empty set $\{\}$. This will indicate something
    77 return the empty set $\{\}$. This will indicate something
    65 ``went wrong''\ldots or more precisely, nothing could be
    78 ``went wrong''\ldots or more precisely, nothing could be
    66 parsed.
    79 parsed.
    67 
    80 
    68 The idea of parser combinators is that we can easily build
    81 Also important to note is that the type \texttt{T} for the processed
    69 parser combinators out of smaller components following very
    82 part is different from the input type. The reason is that in general
    70 closely the structure of a grammar. In order to implement this
    83 we are interested in transform our input into something
    71 in an object-oriented programming language, like Scala, we
    84 ``different''\ldots for example into a tree, or if we implement the
    72 need to specify an abstract class for parser combinators. This
    85 grammar for arithmetic expressions we might be interested in the
    73 abstract class requires the implementation of the function
    86 actual integer number the arithmetic expression, say \texttt{1 + 2 *
    74 \texttt{parse} taking an argument of type \texttt{I} and
    87   3}, stands for. In this way we can use parser combinators to
    75 returns a set of type \mbox{\texttt{Set[(T, I)]}}.
    88 implement relativaley easily a calculator.
       
    89 
       
    90 The main idea of parser combinators is that we can easily build parser
       
    91 combinators out of smaller components following very closely the
       
    92 structure of a grammar. In order to implement this in an
       
    93 object-oriented programming language, like Scala, we need to specify
       
    94 an abstract class for parser combinators. This abstract class states
       
    95 that the function \texttt{parse} takes an argument of type \texttt{I}
       
    96 and returns a set of type \mbox{\texttt{Set[(T, I)]}}.
    76 
    97 
    77 \begin{center}
    98 \begin{center}
    78 \begin{lstlisting}[language=Scala]
    99 \begin{lstlisting}[language=Scala]
    79 abstract class Parser[I, T] {
   100 abstract class Parser[I, T] {
    80   def parse(ts: I): Set[(T, I)]
   101   def parse(ts: I): Set[(T, I)]
    84       yield head
   105       yield head
    85 }
   106 }
    86 \end{lstlisting}
   107 \end{lstlisting}
    87 \end{center}
   108 \end{center}
    88 
   109 
    89 \noindent From the function \texttt{parse} we can then
   110 \noindent It is the obligation in each instance (parser combinator) to
    90 ``centrally'' derive the function \texttt{parse\_all}, which
   111 supply an implementation for \texttt{parse}.  From this function we
    91 just filters out all pairs whose second component is not empty
   112 can then ``centrally'' derive the function \texttt{parse\_all}, which
    92 (that is has still some unprocessed part). The reason is that
   113 just filters out all pairs whose second component is not empty (that
    93 at the end of the parsing we are only interested in the results
   114 is has still some unprocessed part). The reason is that at the end of
    94 where all the input has been consumed and no unprocessed part
   115 the parsing we are only interested in the results where all the input
    95 is left over.
   116 has been consumed and no unprocessed part is left over.
    96 
   117 
    97 One of the simplest parser combinators recognises just a
   118 One of the simplest parser combinators recognises just a
    98 character, say $c$, from the beginning of strings. Its
   119 single character, say $c$, from the beginning of strings. Its
    99 behaviour can be described as follows:
   120 behaviour can be described as follows:
   100 
   121 
   101 \begin{itemize}
   122 \begin{itemize}
   102 \item if the head of the input string starts with a $c$, then returns 
   123 \item If the head of the input string starts with a $c$, then return 
   103 	the set $\{(c, \textit{tail of}\; s)\}$, where \textit{tail of} 
   124   the set
   104 	$s$ is the unprocessed part of the input string
   125   \[\{(c, \textit{tail of}\; s)\}\]
   105 \item otherwise return the empty set $\{\}$	
   126   where \textit{tail of} 
       
   127   $s$ is the unprocessed part of the input string.
       
   128 \item Otherwise return the empty set $\{\}$.	
   106 \end{itemize}
   129 \end{itemize}
   107 
   130 
   108 \noindent
   131 \noindent
   109 The input type of this simple parser combinator for characters is
   132 The input type of this simple parser combinator for characters is
   110 \texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. 
   133 \texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. 
   117     if (sb.head == c) Set((c, sb.tail)) else Set()
   140     if (sb.head == c) Set((c, sb.tail)) else Set()
   118 }
   141 }
   119 \end{lstlisting}
   142 \end{lstlisting}
   120 \end{center}
   143 \end{center}
   121 
   144 
   122 \noindent The \texttt{parse} function tests whether the first
   145 \noindent You can see the \texttt{parse} function tests whether the
   123 character of the input string \texttt{sb} is equal to
   146 first character of the input string \texttt{sb} is equal to
   124 \texttt{c}. If yes, then it splits the string into the
   147 \texttt{c}. If yes, then it splits the string into the recognised part
   125 recognised part \texttt{c} and the unprocessed part
   148 \texttt{c} and the unprocessed part \texttt{sb.tail}. In case
   126 \texttt{sb.tail}. In case \texttt{sb} does not start with
   149 \texttt{sb} does not start with \texttt{c} then the parser returns the
   127 \texttt{c} then the parser returns the empty set (in Scala
   150 empty set (in Scala \texttt{Set()}). Since this parser recognises
   128 \texttt{Set()}).
   151 characters and just returns characters as the processed part, the
   129 
   152 output type of the parser is \texttt{Char}.
   130 
   153 
   131 More interesting are the parser combinators that build larger
   154 If we want to parse a list of tokens and interested in recognising a
   132 parsers out of smaller component parsers. For example the
   155 number token, we could write something like this
   133 alternative parser combinator is as follows: given two
   156 
   134 parsers, say, $p$ and $q$, we apply both parsers to the
   157 \begin{center}
   135 input (remember parsers are functions) and combine the output
   158 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily,numbers=none]
   136 (remember they are sets)
   159 case object NumParser extends Parser[List[Token], Int] {
       
   160   def parse(ts: List[Token]) = ts match {
       
   161     case Num_token(s)::ts => Set((s.toInt, ts)) 
       
   162     case _ => Set ()
       
   163   }
       
   164 }
       
   165 \end{lstlisting}
       
   166 \end{center}
       
   167 
       
   168 \noindent
       
   169 In this parser the input is of type \texttt{List[Token]}. The function
       
   170 parse looks at the input \texttt{ts} and checks whether the first
       
   171 token is a \texttt{Num\_token}. Let us assume our lexer generated
       
   172 these tokens for numbers. But this parser does not just return this
       
   173 token (and the rest of the list), like the \texttt{CharParser} above,
       
   174 rather extracts the string \texttt{s} from the token and converts it
       
   175 into an integer. The hope is that the lexer did its work well and this
       
   176 conversion always succeeds. The consequence of this is that the output
       
   177 type for this parser is \texttt{Int}. Such a conversion would be
       
   178 needed if we want to implement a simple calculator program.
       
   179 
       
   180 
       
   181 These simple parsers that just look at the input and do a simple
       
   182 transformation are often called \emph{atomic} parser combinators.
       
   183 More interesting are the parser combinators that build larger parsers
       
   184 out of smaller component parsers. For example the \emph{alternative
       
   185   parser combinator} is as follows: given two parsers, say, $p$ and
       
   186 $q$, we apply both parsers to the input (remember parsers are
       
   187 functions) and combine the output (remember they are sets of pairs)
   137  
   188  
   138 \begin{center}
   189 \begin{center}
   139 $p(\text{input}) \cup q(\text{input})$
   190 $p(\text{input}) \cup q(\text{input})$
   140 \end{center}
   191 \end{center}
   141 
   192 
   150   def parse(sb: I) = p.parse(sb) ++ q.parse(sb)
   201   def parse(sb: I) = p.parse(sb) ++ q.parse(sb)
   151 }
   202 }
   152 \end{lstlisting}
   203 \end{lstlisting}
   153 \end{center}
   204 \end{center}
   154 
   205 
   155 \noindent The types of this parser combinator are polymorphic
   206 \noindent The types of this parser combinator are again generic (we
   156 (we just have \texttt{I} for the input type, and \texttt{T}
   207 just have \texttt{I} for the input type, and \texttt{T} for the output
   157 for the output type). The alternative parser builds a new
   208 type). The alternative parser builds a new parser out of two existing
   158 parser out of two existing parsers \texttt{p} and \texttt{q}.
   209 parsers \texttt{p} and \texttt{q}.  Both need to be able to process
   159 Both need to be able to process input of type \texttt{I} and
   210 input of type \texttt{I} and return the same output type
   160 return the same output type \texttt{Set[(T,
   211 \texttt{Set[(T, I)]}.\footnote{There is an interesting detail of
   161 I)]}.\footnote{There is an interesting detail of Scala, namely
   212   Scala, namely the \texttt{=>} in front of the types of \texttt{p}
   162 the \texttt{=>} in front of the types of \texttt{p} and
   213   and \texttt{q}. They will prevent the evaluation of the arguments
   163 \texttt{q}. They will prevent the evaluation of the arguments
   214   before they are used. This is often called \emph{lazy evaluation} of
   164 before they are used. This is often called \emph{lazy
   215   the arguments. We will explain this later.}  Therefore the output
   165 evaluation} of the arguments. We will explain this later.} The alternative parser should
   216 type of this parser is \texttt{T}. The alternative parser should run
   166 run the input with the first parser \texttt{p} (producing a
   217 the input with the first parser \texttt{p} (producing a set of pairs)
   167 set of outputs) and then run the same input with \texttt{q}.
   218 and then run the same input with \texttt{q} (producing another set of
   168 The result should be then just the union of both sets, which
   219 pairs).  The result should be then just the union of both sets, which
   169 is the operation \texttt{++} in Scala.
   220 is the operation \texttt{++} in Scala.
   170 
   221 
   171 The alternative parser combinator already allows us to
   222 The alternative parser combinator already allows us to
   172 construct a parser that parses either a character \texttt{a}
   223 construct a parser that parses either a character \texttt{a}
   173 or \texttt{b}, as
   224 or \texttt{b}, as
   176 \begin{lstlisting}[language=Scala, numbers=none]
   227 \begin{lstlisting}[language=Scala, numbers=none]
   177 new AltParser(CharParser('a'), CharParser('b'))
   228 new AltParser(CharParser('a'), CharParser('b'))
   178 \end{lstlisting}
   229 \end{lstlisting}
   179 \end{center}
   230 \end{center}
   180 
   231 
   181 \noindent Scala allows us to introduce some more readable
   232 \noindent Later on we will use again Scala mechanism for introducing some
   182 shorthand notation for this, like \texttt{'a' || 'b'}. We can
   233 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some somple strings 
   183 call this parser combinator with the strings
       
   184 
   234 
   185 \begin{center}
   235 \begin{center}
   186 \begin{tabular}{rcl}
   236 \begin{tabular}{rcl}
   187 input strings & & output\medskip\\
   237 input strings & & output\medskip\\
   188 \texttt{\Grid{ac}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{c}})\right\}$\\
   238 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
   189 \texttt{\Grid{bc}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{c}})\right\}$\\
   239 \texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{cde}})\right\}$\\
   190 \texttt{\Grid{cc}} & $\rightarrow$ & $\{\}$
   240 \texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$
   191 \end{tabular}
   241 \end{tabular}
   192 \end{center}
   242 \end{center}
   193 
   243 
   194 \noindent We receive in the first two cases a successful
   244 \noindent We receive in the first two cases a successful
   195 output (that is a non-empty set). In each case, either
   245 output (that is a non-empty set). In each case, either
   196 \pcode{a} or \pcode{b} is in the processed part, and
   246 \pcode{a} or \pcode{b} is in the processed part, and
   197 \pcode{c} in the unprocessed part. Clearly this parser cannot
   247 \pcode{cde} in the unprocessed part. Clearly this parser cannot
   198 parse anything in the string \pcode{cc}, therefore the empty
   248 parse anything in the string \pcode{ccde}, therefore the empty
   199 set.
   249 set.
   200 
   250 
   201 A bit more interesting is the \emph{sequence parser
   251 A bit more interesting is the \emph{sequence parser combinator}. Given
   202 combinator}. Given two parsers, say, $p$ and $q$, apply first
   252 two parsers, say again, $p$ and $q$, we want to apply first the input
   203 the input to $p$ producing a set of pairs; then apply $q$ to
   253 to $p$ producing a set of pairs; then apply $q$ to all the unparsed
   204 all the unparsed parts; then combine the results like
   254 parts in the pairs; and then combine the results like
   205 
   255 
   206 \begin{center}
   256 \begin{center}
   207 \begin{tabular}{lcl}
   257 \begin{tabular}{lcl}
   208 $\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & 
   258 $\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & 
   209 $(\textit{output}_1, u_1) \in p(\text{input}) 
   259 $(\textit{output}_1, u_1) \in p(\text{input}) 
   210 \;\wedge\;$\\
   260 \;\wedge\;$\\
   211 && $\;(\textit{output}_2, u_2) \in q(u_1)\}$
   261 && $(\textit{output}_2, u_2) \in q(u_1)\}$
   212 \end{tabular}
   262 \end{tabular}
   213 \end{center}
   263 \end{center}
   214 
   264 
   215 \noindent
   265 \noindent Notice that the $p$ wil first be run on the input, producing
   216 This can be implemented in Scala as follows:
   266 pairs of the form $(\textit{output}_1, u_1)$ where the $u_1$ stands
       
   267 for the unprocessed, or left-over, parts. We want that $q$ runs on all
       
   268 these unprocessed parts $u_1$. This again will produce some
       
   269 processed part , $p$ and
       
   270 $q$, we apply both parsers to the input (remember parsers are
       
   271 functions) and combine the output (remember they are sets of pairs)
       
   272  
       
   273 \begin{center}
       
   274 $p(\text{input}) \cup q(\text{input})$
       
   275 \end{center}
       
   276 
       
   277 \noindent In Scala we would implement alternative parser
       
   278 combinator as follows
       
   279 
       
   280 \begin{center}
       
   281 \begin{lstlisting}[language=Scala, numbers=none]
       
   282 class AltParser[I, T]
       
   283        (p: => Parser[I, T], 
       
   284         q: => Parser[I, T]) extends Parser[I, T] {
       
   285   def parse(sb: I) = p.parse(sb) ++ q.parse(sb)
       
   286 }
       
   287 \end{lstlisting}
       
   288 \end{center}
       
   289 
       
   290 \noindent The types of this parser combinator are again generic (we
       
   291 just have \texttt{I} for the input type, and \texttt{T} for the output
       
   292 type). The alternative parser builds a new parser out of two existing
       
   293 parsers \texttt{p} and \texttt{q}.  Both need to be able to process
       
   294 input of type \texttt{I} and return the same output type
       
   295 \texttt{Set[(T, I)]}.\footnote{There is an interesting detail of
       
   296   Scala, namely the \texttt{=>} in front of the types of \texttt{p}
       
   297   and \texttt{q}. They will prevent the evaluation of the arguments
       
   298   before they are used. This is often called \emph{lazy evaluation} of
       
   299   the arguments. We will explain this later.}  Therefore the output
       
   300 type of this parser is \texttt{T}. The alternative parser should run
       
   301 the input with the first parser \texttt{p} (producing a set of pairs)
       
   302 and then run the same input with \texttt{q} (producing another set of
       
   303 pairs).  The result should be then just the union of both sets, which
       
   304 is the operation \texttt{++} in Scala.
       
   305 
       
   306 The alternative parser combinator already allows us to
       
   307 construct a parser that parses either a character \texttt{a}
       
   308 or \texttt{b}, as
       
   309 
       
   310 \begin{center}
       
   311 \begin{lstlisting}[language=Scala, numbers=none]
       
   312 new AltParser(CharParser('a'), CharParser('b'))
       
   313 \end{lstlisting}
       
   314 \end{center}
       
   315 
       
   316 \noindent Later on we will use again Scala mechanism for introducing some
       
   317 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some somple strings 
       
   318 
       
   319 \begin{center}
       
   320 \begin{tabular}{rcl}
       
   321 input strings & & output\medskip\\
       
   322 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
       
   323 \texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{cde}})\right\}$\\
       
   324 \texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$
       
   325 \end{tabular}
       
   326 \end{center}
       
   327 
       
   328 \noindent We receive in the first two cases a successful
       
   329 output (that is a non-empty set). In each case, either
       
   330 \pcode{a} or \pcode{b} is in the processed part, and
       
   331 \pcode{cde} in the unprocessed part. Clearly this parser cannot
       
   332 parse anything in the string \pcode{ccde}, therefore the empty
       
   333 set.
       
   334 
       
   335 A bit more interesting is the \emph{sequence parser combinator}. Given
       
   336 two parsers, say again, $p$ and $q$, we want to apply first the input
       
   337 to $p$ producing a set of pairs; then apply $q$ to all the unparsed
       
   338 parts in the pairs; and then combine the results like
       
   339 
       
   340 \begin{center}
       
   341 \begin{tabular}{lcl}
       
   342 $\{((\textit{output}_1, \textit{output}_2), u_2)$ & $\;|\;$ & 
       
   343 $(\textit{output}_1, u_1) \in p(\text{input}) 
       
   344 \;\wedge\;$\\
       
   345 && $(\textit{output}_2, u_2) \in q(u_1)\}$
       
   346 \end{tabular}
       
   347 \end{center}
       
   348 
       
   349 \noindent Notice that the $p$ wil first be run on the input, producing
       
   350 pairs of the form $\textit{output}_1$ and unprocessed part $u_1$. The
       
   351 overall result of the sequence parser combinator is pairs of the form
       
   352 $((\textit{output}_1, \textit{output}_2), u_2)$. This means the
       
   353 unprocessed parts of both parsers is the unprocessed parts the second
       
   354 parser $q$ produces as left-over. The processed parts of both parsers
       
   355 is just the pair of the outputs
       
   356 $(\textit{output}_1, \textit{output}_2)$. This behavious can be
       
   357 implemented in Scala as follows:
   217 
   358 
   218 \begin{center}
   359 \begin{center}
   219 \begin{lstlisting}[language=Scala,numbers=none]
   360 \begin{lstlisting}[language=Scala,numbers=none]
   220 class SeqParser[I, T, S]
   361 class SeqParser[I, T, S]
   221        (p: => Parser[I, T], 
   362        (p: => Parser[I, T], 
   226             yield ((output1, output2), u2)
   367             yield ((output1, output2), u2)
   227 }
   368 }
   228 \end{lstlisting}
   369 \end{lstlisting}
   229 \end{center}
   370 \end{center}
   230 
   371 
   231 \noindent This parser takes as input two parsers, \texttt{p}
   372 \noindent This parser takes again as input two parsers, \texttt{p}
   232 and \texttt{q}. It implements \texttt{parse} as follows: let
   373 and \texttt{q}. It implements \texttt{parse} as follows: let
   233 first run the parser \texttt{p} on the input producing a set
   374 first run the parser \texttt{p} on the input producing a set
   234 of pairs (\texttt{output1}, \texttt{u1}). The \texttt{u1}
   375 of pairs (\texttt{output1}, \texttt{u1}). The \texttt{u1}
   235 stands for the unprocessed parts left over by \texttt{p}. Let
   376 stands for the unprocessed parts left over by \texttt{p}. Let
   236 \texttt{q} run on these unprocessed parts producing again a
   377 \texttt{q} run on these unprocessed parts producing again a
   245 
   386 
   246 \begin{center}
   387 \begin{center}
   247 \texttt{Set[((T, S), I)]}
   388 \texttt{Set[((T, S), I)]}
   248 \end{center}
   389 \end{center}
   249 
   390 
   250 Scala allows us to provide some shorthand notation for the
   391 \noindent
   251 sequence parser combinator. We can write for example
   392 If any of the runs of \textit{p} and \textit{q} fail, that is produce
   252 \pcode{'a' ~ 'b'}, which is the parser combinator that
   393 the empty set, then \texttt{parse} will also produce the empty set.
   253 first consumes the character \texttt{a} from a string and then
   394 Notice that we have now two output types for the sequence parser
   254 \texttt{b}. Three examples of this parser combinator are as
   395 combinator, because in general \textit{p} and \textit{q} might produce
   255 follows:
   396 differetn things (for example first we recognise a number and then a
       
   397 string corresponding to an operator).
       
   398 
       
   399 
       
   400 We have not yet looked at this in detail, but Scala allows us to
       
   401 provide some shorthand notation for the sequence parser combinator. We
       
   402 can write for example \pcode{"a" ~ "b"}, which is the parser
       
   403 combinator that first recognises the character \texttt{a} from a
       
   404 string and then \texttt{b}. Let us look again at three examples of
       
   405 how this parser combinator processes strings:
   256 
   406 
   257 \begin{center}
   407 \begin{center}
   258 \begin{tabular}{rcl}
   408 \begin{tabular}{rcl}
   259 input strings & & output\medskip\\
   409 input strings & & output\medskip\\
   260 \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
   410 \texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{cde}})\right\}$\\
   261 \texttt{\Grid{bac}} & $\rightarrow$ & $\{\}$\\
   411 \texttt{\Grid{bacde}} & $\rightarrow$ & $\{\}$\\
   262 \texttt{\Grid{ccc}} & $\rightarrow$ & $\{\}$
   412 \texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$
   263 \end{tabular}
   413 \end{tabular}
   264 \end{center}
   414 \end{center}
   265 
   415 
   266 \noindent A slightly more complicated parser is \pcode{('a'
   416 \noindent In the first line we have a sucessful parse, because the
   267 || 'b') ~ 'b'} which parses as first character either an
   417 string started with \texttt{ab}, which is the prefix we are looking
   268 \texttt{a} or \texttt{b} followed by a \texttt{b}. This parser
   418 for. But since the parsing combinator is constructed as sequence of
   269 produces the following outputs.
   419 the two simple (atomic) parsers for \texttt{a} and \texttt{b}, the
       
   420 result is a nested pair of the form \texttt{((a, b), cde)}. It is
       
   421 \emph{not} a simple pair \texttt{(ab, cde)} as one might errorneously
       
   422 expects.  The parser returns the ampty set in the other examples,
       
   423 because they do not fit with what the parser is supposed to parse.
       
   424 
       
   425 
       
   426 A slightly more complicated parser is \pcode{("a" || "b") ~ "c"}.
       
   427 which parses as first character either an \texttt{a} or \texttt{b}
       
   428 followed by a \texttt{c}. This parser produces the following outputs.
   270 
   429 
   271 \begin{center}
   430 \begin{center}
   272 \begin{tabular}{rcl}
   431 \begin{tabular}{rcl}
   273 input strings & & output\medskip\\
   432 input strings & & output\medskip\\
   274 \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
   433 \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\