handouts/ho06.tex
changeset 937 dc5ab66b11cc
parent 799 85267be9a5ed
child 941 66adcae6c762
equal deleted inserted replaced
936:0b5f06539a84 937:dc5ab66b11cc
    35 
    35 
    36 \noindent
    36 \noindent
    37 Given the extended effort we have spent implementing a lexer in order
    37 Given the extended effort we have spent implementing a lexer in order
    38 to generate lists of tokens, it might be surprising that in what
    38 to generate lists of tokens, it might be surprising that in what
    39 follows we shall often use strings as input, rather than lists of
    39 follows we shall often use strings as input, rather than lists of
    40 tokens. This is for making the explanation more lucid and for quick
    40 tokens. This is for making the explanation more lucid and ensure the
    41 examples. It does not make our previous work on lexers obsolete
    41 examples are simple. It does not make our previous work on lexers obsolete
    42 (remember they transform a string into a list of tokens). Lexers will
    42 (remember they transform a string into a list of tokens). Lexers will
    43 still be needed for building a somewhat realistic compiler.
    43 still be needed for building a somewhat realistic compiler. See also
       
    44 a question in the homework regarding this issue.
    44 
    45 
    45 As mentioned above, parser combinators are relatively agnostic about what
    46 As mentioned above, parser combinators are relatively agnostic about what
    46 kind of input they process. In my Scala code I use the following
    47 kind of input they process. In my Scala code I use the following
    47 polymorphic types for parser combinators:
    48 polymorphic types for parser combinators:
    48 
    49 
    51 \end{center}
    52 \end{center}
    52 
    53 
    53 \noindent That is they take as input something of type \texttt{I} and
    54 \noindent That is they take as input something of type \texttt{I} and
    54 return a set of pairs of type \texttt{Set[(T, I)]}. Since the input
    55 return a set of pairs of type \texttt{Set[(T, I)]}. Since the input
    55 needs to be of ``sequence-kind'', I actually have to often write
    56 needs to be of ``sequence-kind'', I actually have to often write
    56 \texttt{I <\% Seq[\_]} for the input type. This ensures the
    57 \code{(using is: I => Seq[_])} for the input type. This ensures the
    57 input is a subtype of Scala sequences. The first component of the
    58 input is a subtype of Scala sequences.\footnote{This is a new feature
    58 generated pairs corresponds to what the parser combinator was able to
    59   in Scala 3 and is about type-classes, meaning if you use Scala 2 you will have difficulties
    59 parse from the input and the second is the unprocessed, or
    60   with running my code.} The first component of the generated pairs
    60 leftover, part of the input (therefore the type of this unprocessed part is
    61 corresponds to what the parser combinator was able to parse from the
    61 the same as the input). A parser combinator might return more than one
    62 input and the second is the unprocessed, or leftover, part of the
    62 such pair; the idea is that there are potentially several ways of how
    63 input (therefore the type of this unprocessed part is the same as the
    63 to parse the input.  As a concrete example, consider the string
    64 input). A parser combinator might return more than one such pair; the
       
    65 idea is that there are potentially several ways of how to parse the
       
    66 input.  As a concrete example, consider the string
    64 
    67 
    65 \begin{center}
    68 \begin{center}
    66 \tt\Grid{iffoo\VS testbar}
    69 \tt\Grid{iffoo\VS testbar}
    67 \end{center}
    70 \end{center}
    68 
    71 
   106 implies that the function \texttt{parse} takes an argument of type
   109 implies that the function \texttt{parse} takes an argument of type
   107 \texttt{I} and returns a set of type \mbox{\texttt{Set[(T, I)]}}.
   110 \texttt{I} and returns a set of type \mbox{\texttt{Set[(T, I)]}}.
   108 
   111 
   109 \begin{center}
   112 \begin{center}
   110 \begin{lstlisting}[language=Scala]
   113 \begin{lstlisting}[language=Scala]
   111 abstract class Parser[I, T] {
   114 abstract class Parser[I, T](using is: I => Seq[_])  {
   112   def parse(in: I) : Set[(T, I)]
   115   def parse(in: I): Set[(T, I)]  
   113 
   116 
   114   def parse_all(in: I) : Set[T] =
   117   def parse_all(in: I) : Set[T] =
   115     for ((head, tail) <- parse(in); if (tail.isEmpty)) 
   118     for ((hd, tl) <- parse(in); 
   116       yield head
   119         if is(tl).isEmpty) yield hd
   117 }
   120 }
   118 \end{lstlisting}
   121 \end{lstlisting}
   119 \end{center}
   122 \end{center}
   120 
   123 
   121 \noindent It is the obligation in each instance of this class to
   124 \noindent It is the obligation in each instance of this class to
   129 One of the simplest parser combinators recognises just a
   132 One of the simplest parser combinators recognises just a
   130 single character, say $c$, from the beginning of strings. Its
   133 single character, say $c$, from the beginning of strings. Its
   131 behaviour can be described as follows:
   134 behaviour can be described as follows:
   132 
   135 
   133 \begin{itemize}
   136 \begin{itemize}
   134 \item If the head of the input string starts with a $c$, then return 
   137 \item If the head of the input string $s$ starts with a $c$, then return 
   135   the set
   138   the set
   136   \[\{(c, \textit{tail of}\; s)\}\]
   139   \[\{(c, \textit{tail-of-}s)\}\]
   137   where \textit{tail of} 
   140   where \textit{tail of} 
   138   $s$ is the unprocessed part of the input string.
   141   $s$ is the unprocessed part of the input string.
   139 \item Otherwise return the empty set $\{\}$.	
   142 \item Otherwise return the empty set $\{\}$.	
   140 \end{itemize}
   143 \end{itemize}
   141 
   144 
   145 \mbox{\texttt{Set[(Char, String)]}}.  The code in Scala is as follows:
   148 \mbox{\texttt{Set[(Char, String)]}}.  The code in Scala is as follows:
   146 
   149 
   147 \begin{center}
   150 \begin{center}
   148 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   151 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   149 case class CharParser(c: Char) extends Parser[String, Char] {
   152 case class CharParser(c: Char) extends Parser[String, Char] {
   150   def parse(in: String) = 
   153   def parse(s: String) = 
   151     if (in.head == c) Set((c, in.tail)) else Set()
   154     if (s != "" && s.head == c) Set((c, s.tail)) else Set()
   152 }
   155 }
   153 \end{lstlisting}
   156 \end{lstlisting}
   154 \end{center}
   157 \end{center}
   155 
   158 
   156 \noindent You can see \texttt{parse} tests whether the
   159 \noindent You can see \texttt{parse} tests here whether the
   157 first character of the input string \texttt{in} is equal to
   160 first character of the input string \texttt{s} is equal to
   158 \texttt{c}. If yes, then it splits the string into the recognised part
   161 \texttt{c}. If yes, then it splits the string into the recognised part
   159 \texttt{c} and the unprocessed part \texttt{in.tail}. In case
   162 \texttt{c} and the unprocessed part \texttt{s.tail}. In case
   160 \texttt{in} does not start with \texttt{c} then the parser returns the
   163 \texttt{s} does not start with \texttt{c} then the parser returns the
   161 empty set (in Scala \texttt{Set()}). Since this parser recognises
   164 empty set (in Scala \texttt{Set()}). Since this parser recognises
   162 characters and just returns characters as the processed part, the
   165 characters and just returns characters as the processed part, the
   163 output type of the parser is \texttt{Char}.
   166 output type of the parser is \texttt{Char}.
   164 
   167 
   165 If we want to parse a list of tokens and interested in recognising a
   168 If we want to parse a list of tokens and interested in recognising a
   167 
   170 
   168 \begin{center}
   171 \begin{center}
   169 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily,numbers=none]
   172 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily,numbers=none]
   170 case object NumParser extends Parser[List[Token], Int] {
   173 case object NumParser extends Parser[List[Token], Int] {
   171   def parse(ts: List[Token]) = ts match {
   174   def parse(ts: List[Token]) = ts match {
   172     case Num_token(s)::ts => Set((s.toInt, ts)) 
   175     case Num_token(s)::rest => Set((s.toInt, rest)) 
   173     case _ => Set ()
   176     case _ => Set ()
   174   }
   177   }
   175 }
   178 }
   176 \end{lstlisting}
   179 \end{lstlisting}
   177 \end{center}
   180 \end{center}
   184 token (and the rest of the list), like the \texttt{CharParser} above,
   187 token (and the rest of the list), like the \texttt{CharParser} above,
   185 rather it extracts also the string \texttt{s} from the token and
   188 rather it extracts also the string \texttt{s} from the token and
   186 converts it into an integer. The hope is that the lexer did its work
   189 converts it into an integer. The hope is that the lexer did its work
   187 well and this conversion always succeeds. The consequence of this is
   190 well and this conversion always succeeds. The consequence of this is
   188 that the output type for this parser is \texttt{Int}, not
   191 that the output type for this parser is \texttt{Int}, not
   189 \texttt{Token}. Such a conversion would be needed if we want to
   192 \texttt{Token}. Such a conversion would be needed in our parser,
   190 implement a simple calculator program, because string-numbers need to
   193 because when we encounter a number in our program, we want to do
   191 be transformed into \texttt{Int}-numbers in order to do the
   194 some calculations based on integers, not strings (or tokens). 
   192 calculations.
       
   193 
   195 
   194 
   196 
   195 These simple parsers that just look at the input and do a simple
   197 These simple parsers that just look at the input and do a simple
   196 transformation are often called \emph{atomic} parser combinators.
   198 transformation are often called \emph{atomic} parser combinators.
   197 More interesting are the parser combinators that build larger parsers
   199 More interesting are the parser combinators that build larger parsers
   209 combinator as follows
   211 combinator as follows
   210 
   212 
   211 \begin{center}
   213 \begin{center}
   212 \begin{lstlisting}[language=Scala, numbers=none]
   214 \begin{lstlisting}[language=Scala, numbers=none]
   213 class AltParser[I, T]
   215 class AltParser[I, T]
   214        (p: => Parser[I, T], 
   216          (p: => Parser[I, T], 
   215         q: => Parser[I, T]) extends Parser[I, T] {
   217           q: => Parser[I, T])(using I => Seq[_])
   216   def parse(in: I) = p.parse(in) ++ q.parse(in)
   218                                  extends Parser[I, T] {
   217 }
   219   def parse(in: I) = p.parse(in) ++ q.parse(in)   
       
   220 }    
   218 \end{lstlisting}
   221 \end{lstlisting}
   219 \end{center}
   222 \end{center}
   220 
   223 
   221 \noindent The types of this parser combinator are again generic (we
   224 \noindent The types of this parser combinator are again generic (we
   222 have \texttt{I} for the input type, and \texttt{T} for the output
   225 have \texttt{I} for the input type, and \texttt{T} for the output
   223 type). The alternative parser builds a new parser out of two existing
   226 type). The alternative parser builds a new parser out of two existing
   224 parsers \texttt{p} and \texttt{q} which are given as arguments.  Both
   227 parsers \texttt{p} and \texttt{q} which are given as arguments.  Both
   225 parsers need to be able to process input of type \texttt{I} and return
   228 parsers need to be able to process input of type \texttt{I} and return
   226 in \texttt{parse} the same output type \texttt{Set[(T,
   229 in \texttt{parse} the same output type \texttt{Set[(T,
   227   I)]}.\footnote{There is an interesting detail of Scala, namely the
   230   I)]}.\footnote{There is an interesting detail of Scala, namely the
   228   \texttt{=>} in front of the types of \texttt{p} and \texttt{q}. They
   231   \texttt{=>} in front of the types of \texttt{p} and \texttt{q}. These arrows
   229   will prevent the evaluation of the arguments before they are
   232   will prevent the evaluation of the arguments before they are
   230   used. This is often called \emph{lazy evaluation} of the
   233   used. This is often called \emph{lazy evaluation} of the
   231   arguments. We will explain this later.}  The alternative parser runs
   234   arguments. We will explain this later.}  The alternative parser runs
   232 the input with the first parser \texttt{p} (producing a set of pairs)
   235 the input with the first parser \texttt{p} (producing a set of pairs)
   233 and then runs the same input with \texttt{q} (producing another set of
   236 and then runs the same input with \texttt{q} (producing another set of
   234 pairs).  The result should be then just the union of both sets, which
   237 pairs).  The result should be then just the union of both sets, which
   235 is the operation \texttt{++} in Scala.
   238 is the operation \texttt{++} in Scala.
   236 
   239 
   237 The alternative parser combinator allows us to construct a parser that
   240 The alternative parser combinator allows us to construct a parser that
   238 parses either a character \texttt{a} or \texttt{b} using the
   241 parses either a character \texttt{a} or \texttt{b} using the
   239 \texttt{CharParser} shown above. For this we can write
   242 \texttt{CharParser} shown above. For this we can write\footnote{Note
       
   243   that we cannot use a \texttt{case}-class for \texttt{AltParser}s
       
   244   because of the problem with laziness and Scala quirks. Hating
       
   245   \texttt{new} like the plague, we will work around this later with
       
   246   some syntax tricks. ;o)}
   240 
   247 
   241 \begin{center}
   248 \begin{center}
   242 \begin{lstlisting}[language=Scala, numbers=none]
   249 \begin{lstlisting}[language=Scala, numbers=none]
   243 new AltParser(CharParser('a'), CharParser('b'))
   250 new AltParser(CharParser('a'), CharParser('b'))
   244 \end{lstlisting}
   251 \end{lstlisting}
   245 \end{center}
   252 \end{center}
   246 
   253 
   247 \noindent Later on we will use Scala mechanism for introducing some
   254 \noindent Later on we will use Scala mechanism for introducing some
   248 more readable shorthand notation for this, like \texttt{p"a" ||
   255 more readable shorthand notation for this, like \texttt{p"a" ||
   249   p"b"}. Let us look in detail at what this parser combinator produces
   256   p"b"}. But first let us look in detail at what this parser combinator produces
   250 with some sample strings.
   257 with some sample strings.
   251 
   258 
   252 \begin{center}
   259 \begin{center}
   253 \begin{tabular}{rcl}
   260 \begin{tabular}{rcl}
   254 input strings & & output\medskip\\
   261 input strings & & output\medskip\\
   255 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
   262 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}},\; \texttt{\Grid{cde}})\right\}$\\
   256 \texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{cde}})\right\}$\\
   263 \texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}},\; \texttt{\Grid{cde}})\right\}$\\
   257 \texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$
   264 \texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$
   258 \end{tabular}
   265 \end{tabular}
   259 \end{center}
   266 \end{center}
   260 
   267 
   261 \noindent We receive in the first two cases a successful
   268 \noindent We receive in the first two cases a successful output (that
   262 output (that is a non-empty set). In each case, either
   269 is a non-empty set). In each case, either \pcode{a} or \pcode{b} is in
   263 \pcode{a} or \pcode{b} is in the parsed part, and
   270 the parsed part, and \pcode{cde} in the unprocessed part. Clearly this
   264 \pcode{cde} in the unprocessed part. Clearly this parser cannot
   271 parser cannot parse anything of the form \pcode{ccde}, therefore the
   265 parse anything with \pcode{ccde}, therefore the empty
   272 empty set is returned in the last case. Observe that parser
   266 set is returned.
   273 combinators only look at the beginning of the given input: they do not
       
   274 fish out something in the ``middle'' of the input.
   267 
   275 
   268 A bit more interesting is the \emph{sequence parser combinator}. Given
   276 A bit more interesting is the \emph{sequence parser combinator}. Given
   269 two parsers, say again, $p$ and $q$, we want to apply first the input
   277 two parsers, say again, $p$ and $q$, we want to apply first the input
   270 to $p$ producing a set of pairs; then apply $q$ to all the unparsed  
   278 to $p$ producing a set of pairs; then apply $q$ to all the unparsed  
   271 parts in the pairs; and then combine the results. Mathematically we would
   279 parts in the pairs; and then combine the results. Mathematically we would
   296 
   304 
   297 \begin{center}
   305 \begin{center}
   298 \begin{lstlisting}[language=Scala,numbers=none]
   306 \begin{lstlisting}[language=Scala,numbers=none]
   299 class SeqParser[I, T, S]
   307 class SeqParser[I, T, S]
   300        (p: => Parser[I, T], 
   308        (p: => Parser[I, T], 
   301         q: => Parser[I, S]) extends Parser[I, (T, S)] {
   309         q: => Parser[I, S])(using I => Seq[_])
       
   310                                extends Parser[I, (T, S)] {
   302   def parse(in: I) = 
   311   def parse(in: I) = 
   303     for ((output1, u1) <- p.parse(in); 
   312     for ((output1, u1) <- p.parse(in); 
   304          (output2, u2) <- q.parse(u1)) 
   313          (output2, u2) <- q.parse(u1)) 
   305             yield ((output1, output2), u2)
   314             yield ((output1, output2), u2)
   306 }
   315 }
   310 \noindent This parser takes again as arguments two parsers, \texttt{p}
   319 \noindent This parser takes again as arguments two parsers, \texttt{p}
   311 and \texttt{q}. It implements \texttt{parse} as follows: first run the
   320 and \texttt{q}. It implements \texttt{parse} as follows: first run the
   312 parser \texttt{p} on the input producing a set of pairs
   321 parser \texttt{p} on the input producing a set of pairs
   313 (\texttt{output1}, \texttt{u1}). The \texttt{u1} stands for the
   322 (\texttt{output1}, \texttt{u1}). The \texttt{u1} stands for the
   314 unprocessed parts left over by \texttt{p} (recall that there can be
   323 unprocessed parts left over by \texttt{p} (recall that there can be
   315 several such pairs). Let then \texttt{q} run on these unprocessed
   324 several such pairs).  Let then \texttt{q} run on these unprocessed
   316 parts producing again a set of pairs. The output of the sequence
   325 parts producing again a set of pairs. The output of the sequence
   317 parser combinator is then a set containing pairs where the first
   326 parser combinator is then a set containing pairs where the first
   318 components are again pairs, namely what the first parser could parse
   327 components are again pairs, namely what the first parser could parse
   319 together with what the second parser could parse; the second component
   328 together with what the second parser could parse; the second component
   320 is the unprocessed part left over after running the second parser
   329 is the unprocessed part left over after running the second parser
   337 produce the empty set.
   346 produce the empty set.
   338 
   347 
   339 With the shorthand notation we shall introduce later for the sequence
   348 With the shorthand notation we shall introduce later for the sequence
   340 parser combinator, we can write for example \pcode{p"a" ~ p"b"}, which
   349 parser combinator, we can write for example \pcode{p"a" ~ p"b"}, which
   341 is the parser combinator that first recognises the character
   350 is the parser combinator that first recognises the character
   342 \texttt{a} from a string and then \texttt{b}. Let us look again at
   351 \texttt{a} from a string and then \texttt{b}. (Actually, we will be
   343 some examples of how this parser combinator processes some strings:
   352 able to write just \pcode{p"ab"} for such parsers, but it is good to
       
   353   understand first what happens behind the scenes.) Let us look again
       
   354   at some examples of how the sequence parser combinator processes some
       
   355   strings:
   344 
   356 
   345 \begin{center}
   357 \begin{center}
   346 \begin{tabular}{rcl}
   358 \begin{tabular}{rcl}
   347 input strings & & output\medskip\\
   359 input strings & & output\medskip\\
   348 \texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{cde}})\right\}$\\
   360 \texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}),\; \texttt{\Grid{cde}})\right\}$\\
   349 \texttt{\Grid{bacde}} & $\rightarrow$ & $\{\}$\\
   361 \texttt{\Grid{bacde}} & $\rightarrow$ & $\{\}$\\
   350 \texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$
   362 \texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$
   351 \end{tabular}
   363 \end{tabular}
   352 \end{center}
   364 \end{center}
   353 
   365 
   366 by a \texttt{c}. This parser produces the following outputs.
   378 by a \texttt{c}. This parser produces the following outputs.
   367 
   379 
   368 \begin{center}
   380 \begin{center}
   369 \begin{tabular}{rcl}
   381 \begin{tabular}{rcl}
   370 input strings & & output\medskip\\
   382 input strings & & output\medskip\\
   371 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\
   383 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{c}}),\; \texttt{\Grid{de}})\right\}$\\
   372 \texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\
   384 \texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{c}}),\; \texttt{\Grid{de}})\right\}$\\
   373 \texttt{\Grid{abde}} & $\rightarrow$ & $\{\}$
   385 \texttt{\Grid{abde}} & $\rightarrow$ & $\{\}$
   374 \end{tabular}
   386 \end{tabular}
   375 \end{center}
   387 \end{center}
   376 
   388 
   377 \noindent
   389 \noindent
   380 the following outputs.
   392 the following outputs.
   381 
   393 
   382 \begin{center}
   394 \begin{center}
   383 \begin{tabular}{rcl}
   395 \begin{tabular}{rcl}
   384 input strings & & output\medskip\\
   396 input strings & & output\medskip\\
   385 \texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{(((\texttt{\Grid{a}},\texttt{\Grid{b}}), \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\
   397 \texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{(((\texttt{\Grid{a}},\texttt{\Grid{b}}), \texttt{\Grid{c}}),\; \texttt{\Grid{de}})\right\}$\\
   386 \texttt{\Grid{abde}} & $\rightarrow$ & $\{\}$\\
   398 \texttt{\Grid{abde}} & $\rightarrow$ & $\{\}$\\
   387 \texttt{\Grid{bcde}} & $\rightarrow$ & $\{\}$
   399 \texttt{\Grid{bcde}} & $\rightarrow$ & $\{\}$
   388 \end{tabular}
   400 \end{tabular}
   389 \end{center}
   401 \end{center}
   390 
   402 
   397 our output pairs nest differently
   409 our output pairs nest differently
   398 
   410 
   399 \begin{center}
   411 \begin{center}
   400 \begin{tabular}{rcl}
   412 \begin{tabular}{rcl}
   401 input strings & & output\medskip\\
   413 input strings & & output\medskip\\
   402 \texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}},(\texttt{\Grid{b}}, \texttt{\Grid{c}})), \texttt{\Grid{de}})\right\}$\\
   414 \texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}},(\texttt{\Grid{b}}, \texttt{\Grid{c}})),\; \texttt{\Grid{de}})\right\}$\\
   403 \end{tabular}
   415 \end{tabular}
   404 \end{center}
   416 \end{center}
   405 
   417 
   406 \noindent
   418 \noindent
   407 Two more examples: first consider the parser
   419 Two more examples: first consider the parser
   443 ultimately-unsuccessful-parses. The main point is that the sequence
   455 ultimately-unsuccessful-parses. The main point is that the sequence
   444 parser combinator returns pairs that can nest according to the
   456 parser combinator returns pairs that can nest according to the
   445 nesting of the component parsers.
   457 nesting of the component parsers.
   446 
   458 
   447 
   459 
   448 Consider also carefully that constructing a parser such \pcode{p"a" ||
   460 Consider also carefully that constructing a parser such
   449   (p"a" ~ p"b")} will result in a typing error. The intention with this
   461 
       
   462 \begin{center}
       
   463 \pcode{p"a" || (p"a" ~ p"b")}
       
   464 \end{center}
       
   465 
       
   466 \noindent
       
   467 will result in a typing error. The intention with this
   450 parser is that we want to parse either an \texttt{a}, or an \texttt{a}
   468 parser is that we want to parse either an \texttt{a}, or an \texttt{a}
   451 followed by a \texttt{b}. However, the first parser has as output type
   469 followed by a \texttt{b}. However, the first parser has as output type
   452 a single character (recall the type of \texttt{CharParser}), but the
   470 a single character (recall the type of \texttt{CharParser}), but the
   453 second parser produces a pair of characters as output. The alternative
   471 second parser produces a pair of characters as output. The alternative
   454 parser is required to have both component parsers to have the same
   472 parser is required to have both component parsers to have the same
   455 type---the reason is that we need to be able to build the union of two
   473 type---the reason is that we need to be able to build the union of two
   456 sets, which requires in Scala that the sets have the same type.  Since
   474 sets, which requires in Scala that the sets have the same type.  Since
   457 they are not in this case, there is a typing error.  We will see later
   475 they are not in this case, there is a typing error.  We will see later
   458 how we can build this parser without the typing error.
   476 how we can build this parser without the typing error.
   459 
   477 
   460 The next parser combinator, called \emph{semantic action}, does not
   478 The next parser combinator, called \emph{semantic action} or \emph{map-parser}, does not
   461 actually combine two smaller parsers, but applies a function to the result
   479 actually combine two smaller parsers, but applies a function to the result
   462 of a parser.  It is implemented in Scala as follows
   480 of a parser.  It is implemented in Scala as follows
   463 
   481 
   464 \begin{center}
   482 \begin{center}
   465 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   483 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   466 class MapParser[I, T, S]
   484 class MapParser[I, T, S]
   467          (p: => Parser[I, T], 
   485         (p: => Parser[I, T], 
   468           f: T => S) extends Parser[I, S] {
   486          f: T => S)(using I => Seq[_]) extends Parser[I, S] {
   469   def parse(in: I) = 
   487   def parse(in: I) =
   470     for ((head, tail) <- p.parse(in)) yield (f(head), tail)
   488        for ((hd, tl) <- p.parse(in)) yield (f(hd), tl)
   471 }
   489 }
   472 \end{lstlisting}
   490 \end{lstlisting}
   473 \end{center}
   491 \end{center}
   474 
   492 
   475 
   493 
   476 \noindent This parser combinator takes a parser \texttt{p} (with input
   494 \noindent This parser combinator takes a parser \texttt{p} (with input
   477 type \texttt{I} and output type \texttt{T}) as one argument but also a
   495 type \texttt{I} and output type \texttt{T}) as one argument but also a
   478 function \texttt{f} (with type \texttt{T => S}). The parser \texttt{p}
   496 function \texttt{f} (with type \texttt{T => S}). The parser \texttt{p}
   479 produces sets of type \texttt{Set[(T, I)]}. The semantic action
   497 produces sets of type \texttt{Set[(S, I)]}. The semantic action
   480 combinator then applies the function \texttt{f} to all the `processed'
   498 combinator then applies the function \texttt{f} to all the `processed'
   481 parser outputs. Since this function is of type \texttt{T => S}, we
   499 parser outputs. Since this function is of type \texttt{T => S}, we
   482 obtain a parser with output type \texttt{S}. Again Scala lets us
   500 obtain a parser with output type \texttt{S}. Again Scala lets us
   483 introduce some shorthand notation for this parser
   501 introduce some shorthand notation for this parser
   484 combinator. Therefore we will write short \texttt{p.map(f)} for it.
   502 combinator. Therefore we will write short \texttt{p.map(f)} for it.
   558 \end{lstlisting}
   576 \end{lstlisting}
   559 \end{center}  
   577 \end{center}  
   560 
   578 
   561 \noindent
   579 \noindent
   562 produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is
   580 produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is
   563 still a string (the required double-quotes are not printed by
   581 still a string (the expected double-quotes are not printed by
   564 Scala). We want to convert this string into the corresponding
   582 Scala). We want to convert this string into the corresponding
   565 \texttt{Int}. We can do this as follows using a semantic action
   583 \texttt{Int}. We can do this as follows using a semantic action
   566 
   584 
   567 \begin{center}
   585 \begin{center}
   568 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   586 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   571 \end{center}  
   589 \end{center}  
   572 
   590 
   573 \noindent
   591 \noindent
   574 The function in the semantic action converts a string into an
   592 The function in the semantic action converts a string into an
   575 \texttt{Int}. Now \texttt{parse} generates \texttt{Set((123,abc))},
   593 \texttt{Int}. Now \texttt{parse} generates \texttt{Set((123,abc))},
   576 but this time \texttt{123} is an \texttt{Int}. Let us come back to
   594 but this time \texttt{123} is an \texttt{Int}. Think carefully what
   577 semantic actions when we are going to implement actual context-free
   595 the input and output type of the parser is without the semantic action
   578 grammars.
   596 adn what with the semantic action (the type of the function can
       
   597 already tell you). Let us come back to semantic actions when we are
       
   598 going to implement actual context-free grammars.
   579 
   599 
   580 \subsubsection*{Shorthand notation for parser combinators}
   600 \subsubsection*{Shorthand notation for parser combinators}
   581 
   601 
   582 Before we proceed, let us just explain the shorthand notation for
   602 Before we proceed, let us just explain the shorthand notation for
   583 parser combinators. Like for regular expressions, the shorthand notation
   603 parser combinators. Like for regular expressions, the shorthand
   584 will make our life much easier when writing actual parsers. We can define
   604 notation will make our life much easier when writing actual
   585 some implicits which allow us to write
   605 parsers. We can define some extensions\footnote{In Scala 2 this was
       
   606   generically called as ``implicits''.} which allow us to write
   586 
   607 
   587 \begin{center}
   608 \begin{center}
   588 \begin{tabular}{ll}  
   609 \begin{tabular}{ll}  
   589   \pcode{p || q} & alternative parser\\
   610   \pcode{p || q} & alternative parser\\
   590   \pcode{p ~ q} & sequence parser\\ 
   611   \pcode{p ~ q} & sequence parser\\ 
   591   \pcode{p.map(f)} & semantic action parser
   612   \pcode{p.map(f)} & semantic action parser
   592 \end{tabular}
   613 \end{tabular}
   593 \end{center}
   614 \end{center}
   594 
   615 
   595 \noindent
   616 \noindent
   596 as well as to use string interpolations for specifying simple string parsers.
   617 We will also use the \texttt{p}-string-interpolation for specifying simple string parsers.
   597 
   618 
   598 The idea is that this shorthand notation allows us to easily translate
   619 The idea is that this shorthand notation allows us to easily translate
   599 context-free grammars into code. For example recall our context-free
   620 context-free grammars into code. For example recall our context-free
   600 grammar for palindromes:
   621 grammar for palindromes:
   601 
   622 
   608 combinator.  The $\cdot$ can be translated to a sequence parser
   629 combinator.  The $\cdot$ can be translated to a sequence parser
   609 combinator. The parsers for $a$, $b$ and $\epsilon$ can be simply
   630 combinator. The parsers for $a$, $b$ and $\epsilon$ can be simply
   610 written as \texttt{p"a"}, \texttt{p"b"} and \texttt{p""}.
   631 written as \texttt{p"a"}, \texttt{p"b"} and \texttt{p""}.
   611 
   632 
   612 
   633 
   613 \subsubsection*{How to build parsers using parser combinators?}
   634 \subsubsection*{How to build more interesting parsers using parser combinators?}
   614 
   635 
   615 The beauty of parser combinators is the ease with which they can be
   636 The beauty of parser combinators is the ease with which they can be
   616 implemented and how easy it is to translate context-free grammars into
   637 implemented and how easy it is to translate context-free grammars into
   617 code (though the grammars need to be non-left-recursive). To
   638 code (though the grammars need to be non-left-recursive). To
   618 demonstrate this consider again the grammar for palindromes from above.
   639 demonstrate this consider again the grammar for palindromes from above.
   738 
   759 
   739 \begin{center}
   760 \begin{center}
   740 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   761 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   741 class AltParser[I, T]
   762 class AltParser[I, T]
   742        (p: => Parser[I, T], 
   763        (p: => Parser[I, T], 
   743         q: => Parser[I, T]) extends Parser[I, T] {...}
   764         q: => Parser[I, T]) ... extends Parser[I, T] {...}
   744 
   765 
   745 class SeqParser[I, T, S]
   766 class SeqParser[I, T, S]
   746        (p: => Parser[I, T], 
   767        (p: => Parser[I, T], 
   747         q: => Parser[I, S]) extends Parser[I, (T, S)] {...}
   768         q: => Parser[I, S]) ... extends Parser[I, (T, S)] {...}
   748 \end{lstlisting}
   769 \end{lstlisting}
   749 \end{center}
   770 \end{center}
   750 
   771 
   751 \noindent where the \texttt{\textbf{\textcolor{codepurple}{=>}}} in front of
   772 \noindent where the \texttt{\textbf{\textcolor{codepurple}{=>}}} in front of
   752 the argument types for \texttt{p} and \texttt{q} prevent Scala from
   773 the argument types for \texttt{p} and \texttt{q} prevent Scala from
   860 \end{plstx}
   881 \end{plstx}
   861 
   882 
   862 \noindent
   883 \noindent
   863 Recall what left-recursive means from Handout 5 and make sure you see
   884 Recall what left-recursive means from Handout 5 and make sure you see
   864 why this grammar is \emph{non} left-recursive. This version of the grammar
   885 why this grammar is \emph{non} left-recursive. This version of the grammar
   865 also deals with the fact that $*$ should have a higher precedence. This does not
   886 also deals with the fact that $*$ should have a higher precedence than $+$ and $-$. This does not
   866 affect which strings this grammar can recognise, but in which order we are going
   887 affect which strings this grammar can recognise, but in which order we are going
   867 to evaluate any arithmetic expression. We can translate this grammar into
   888 to evaluate any arithmetic expression. We can translate this grammar into
   868 parsing combinators as follows:
   889 parsing combinators as follows:
   869 
   890 
   870 
   891