handouts/ho06.tex
changeset 586 451a95e1bc25
parent 585 6ee22f196884
child 587 5ddedcd92d84
equal deleted inserted replaced
585:6ee22f196884 586:451a95e1bc25
    83 we are interested in transform our input into something
    83 we are interested in transform our input into something
    84 ``different''\ldots for example into a tree, or if we implement the
    84 ``different''\ldots for example into a tree, or if we implement the
    85 grammar for arithmetic expressions we might be interested in the
    85 grammar for arithmetic expressions we might be interested in the
    86 actual integer number the arithmetic expression, say \texttt{1 + 2 *
    86 actual integer number the arithmetic expression, say \texttt{1 + 2 *
    87   3}, stands for. In this way we can use parser combinators to
    87   3}, stands for. In this way we can use parser combinators to
    88 implement relativaley easily a calculator.
    88 implement relatively easily a calculator.
    89 
    89 
    90 The main idea of parser combinators is that we can easily build parser
    90 The main idea of parser combinators is that we can easily build parser
    91 combinators out of smaller components following very closely the
    91 combinators out of smaller components following very closely the
    92 structure of a grammar. In order to implement this in an
    92 structure of a grammar. In order to implement this in an
    93 object-oriented programming language, like Scala, we need to specify
    93 object-oriented programming language, like Scala, we need to specify
   228 new AltParser(CharParser('a'), CharParser('b'))
   228 new AltParser(CharParser('a'), CharParser('b'))
   229 \end{lstlisting}
   229 \end{lstlisting}
   230 \end{center}
   230 \end{center}
   231 
   231 
   232 \noindent Later on we will use again Scala mechanism for introducing some
   232 \noindent Later on we will use again Scala mechanism for introducing some
   233 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some somple strings 
   233 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some sample strings 
   234 
   234 
   235 \begin{center}
   235 \begin{center}
   236 \begin{tabular}{rcl}
   236 \begin{tabular}{rcl}
   237 input strings & & output\medskip\\
   237 input strings & & output\medskip\\
   238 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
   238 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
   312 new AltParser(CharParser('a'), CharParser('b'))
   312 new AltParser(CharParser('a'), CharParser('b'))
   313 \end{lstlisting}
   313 \end{lstlisting}
   314 \end{center}
   314 \end{center}
   315 
   315 
   316 \noindent Later on we will use again Scala mechanism for introducing some
   316 \noindent Later on we will use again Scala mechanism for introducing some
   317 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some somple strings 
   317 more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some sample strings 
   318 
   318 
   319 \begin{center}
   319 \begin{center}
   320 \begin{tabular}{rcl}
   320 \begin{tabular}{rcl}
   321 input strings & & output\medskip\\
   321 input strings & & output\medskip\\
   322 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
   322 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
   351 overall result of the sequence parser combinator is pairs of the form
   351 overall result of the sequence parser combinator is pairs of the form
   352 $((\textit{output}_1, \textit{output}_2), u_2)$. This means the
   352 $((\textit{output}_1, \textit{output}_2), u_2)$. This means the
   353 unprocessed parts of both parsers is the unprocessed parts the second
   353 unprocessed parts of both parsers is the unprocessed parts the second
   354 parser $q$ produces as left-over. The processed parts of both parsers
   354 parser $q$ produces as left-over. The processed parts of both parsers
   355 is just the pair of the outputs
   355 is just the pair of the outputs
   356 $(\textit{output}_1, \textit{output}_2)$. This behavious can be
   356 $(\textit{output}_1, \textit{output}_2)$. This behaviour can be
   357 implemented in Scala as follows:
   357 implemented in Scala as follows:
   358 
   358 
   359 \begin{center}
   359 \begin{center}
   360 \begin{lstlisting}[language=Scala,numbers=none]
   360 \begin{lstlisting}[language=Scala,numbers=none]
   361 class SeqParser[I, T, S]
   361 class SeqParser[I, T, S]
   391 \noindent
   391 \noindent
   392 If any of the runs of \textit{p} and \textit{q} fail, that is produce
   392 If any of the runs of \textit{p} and \textit{q} fail, that is produce
   393 the empty set, then \texttt{parse} will also produce the empty set.
   393 the empty set, then \texttt{parse} will also produce the empty set.
   394 Notice that we have now two output types for the sequence parser
   394 Notice that we have now two output types for the sequence parser
   395 combinator, because in general \textit{p} and \textit{q} might produce
   395 combinator, because in general \textit{p} and \textit{q} might produce
   396 differetn things (for example first we recognise a number and then a
   396 different things (for example first we recognise a number and then a
   397 string corresponding to an operator).
   397 string corresponding to an operator).
   398 
   398 
   399 
   399 
   400 We have not yet looked at this in detail, but Scala allows us to
   400 We have not yet looked at this in detail, but Scala allows us to
   401 provide some shorthand notation for the sequence parser combinator. We
   401 provide some shorthand notation for the sequence parser combinator. We
   411 \texttt{\Grid{bacde}} & $\rightarrow$ & $\{\}$\\
   411 \texttt{\Grid{bacde}} & $\rightarrow$ & $\{\}$\\
   412 \texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$
   412 \texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$
   413 \end{tabular}
   413 \end{tabular}
   414 \end{center}
   414 \end{center}
   415 
   415 
   416 \noindent In the first line we have a sucessful parse, because the
   416 \noindent In the first line we have a successful parse, because the
   417 string started with \texttt{ab}, which is the prefix we are looking
   417 string started with \texttt{ab}, which is the prefix we are looking
   418 for. But since the parsing combinator is constructed as sequence of
   418 for. But since the parsing combinator is constructed as sequence of
   419 the two simple (atomic) parsers for \texttt{a} and \texttt{b}, the
   419 the two simple (atomic) parsers for \texttt{a} and \texttt{b}, the
   420 result is a nested pair of the form \texttt{((a, b), cde)}. It is
   420 result is a nested pair of the form \texttt{((a, b), cde)}. It is
   421 \emph{not} a simple pair \texttt{(ab, cde)} as one might errorneously
   421 \emph{not} a simple pair \texttt{(ab, cde)} as one might erroneously
   422 expects.  The parser returns the ampty set in the other examples,
   422 expects.  The parser returns the empty set in the other examples,
   423 because they do not fit with what the parser is supposed to parse.
   423 because they do not fit with what the parser is supposed to parse.
   424 
   424 
   425 
   425 
   426 A slightly more complicated parser is \pcode{("a" || "b") ~ "c"}
   426 A slightly more complicated parser is \pcode{("a" || "b") ~ "c"}
   427 which parses as first character either an \texttt{a} or \texttt{b}
   427 which parses as first character either an \texttt{a} or \texttt{b}