afl-material: comparison handouts/ho06.tex

equal deleted inserted replaced

-:5ddedcd92d84
+:a4646557016d
 \documentclass{article}
 \usepackage{../style}
 \usepackage{../langs}
+\usepackage{../grammar}
 \begin{document}
 \section*{Handout 6 (Parser Combinators)}
 type \texttt{T => S}, we obtain a parser with output type
 \texttt{S}. Again Scala lets us introduce some shorthand
 notation for this parser combinator. Therefore we will write
 \texttt{p ==> f} for it.
-What are semantic actions good for? Ultimately, they allow is to
+What are semantic actions good for? Well, they allow is to transform
-transform the parsed input into a datastructure we can use for further
+the parsed input into a datastructure we can use for further
 processing. A simple example would be to transform parsed characters
 into ASCII numbers. Suppose we define a function \texttt{f} (from
-characters to ints) and a \texttt{CharParser}.
+characters to ints) and use a \texttt{CharParser} for the character \texttt{c}.
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 val f = (c: Char) => c.toInt
 val c = new CharParser('c')
 \end{lstlisting}
 \end{center}
 \noindent
-Then
+Then we can run the following parsers on the input \texttt{cbd}:
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 c.parse("cbd")
 (c ==> f).parse("cbd")
 \end{lstlisting}
 \end{center}
 \noindent
-the first line produces \texttt{Set(('c', "bd"))}, whereas the second
+The first line we obtain the result \texttt{Set(('c', "bd"))}, whereas the second
-produces \texttt{Set((99, "bd"))}.
+produces \texttt{Set((99, "bd"))}---the character has been transformed into
+an ASCII number.
+A slightly less contrived example is about parsing numbers (recall
+\texttt{NumParser} above). However, we want to do this here for strings.
+For this assume we have the following \texttt{RegexParser}.
+\begin{center}
+\begin{lstlisting}[language=Scala,xleftmargin=0mm,
+basicstyle=\small\ttfamily, numbers=none]
+import scala.util.matching.Regex
+case class RegexParser(reg: Regex) extends Parser[String, String] {
+def parse(in: String) = reg.findPrefixMatchOf(in) match {
+case None => Set()
+case Some(m) => Set((m.matched, m.after.toString))
+}
+}
+\end{lstlisting}
+\end{center}
+\noindent
+This parser takes a regex as argument and splits up a string into a
+prefix and the rest according to this regex
+(\texttt{reg.findPrefixMatchOf} generates a match---in the successful
+case---and the corresponding strings can be extracted with
+\texttt{matched} and \texttt{after}). We can now define a
+\texttt{NumParser} for strings as follows:
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+val NumParser = RegexParser("[0-9]+".r)
+\end{lstlisting}
+\end{center}
+\noindent
+This parser will recognise a number at the beginning of a string, for
+example
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+NumParser.parse("123abc")
+\end{lstlisting}
+\end{center}
+\noindent
+produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is
+still a string. We need to convert it into the corresponding
+\texttt{Int}. We can do this as follows
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+(NumParser ==> (s => s.toInt)).parse("123abc")
+\end{lstlisting}
+\end{center}
+\noindent
+The semantic action in form of a function converts a string into an
+\texttt{Int}. Let us come back to semantic actions when we are going
+to implement a simple calculator.
 \subsubsection*{Shorthand notation for parser combinators}
 Before we proceed, let us just explain the shorthand notation for
 parser combinators. Like for regular expressions, the shorthand notation
 will make our life much easier when writing actual parsers.
 \subsubsection*{How to build parsers using parser combinators?}
+The beauty of parser combinators is the ease with which they can be
+implemented and how easy it is to translate context-free grammars into
+code (though the grammars need to be non-left-recursive). To
+demonstrate this recall our context-free grammar for palindromes:
+\begin{plstx}[margin=3cm]
+: \meta{P} ::=  a\cdot \meta{P}\cdot a | b\cdot \meta{P}\cdot b | a | b | \epsilon\\
+\end{plstx}
+\noindent
+Given the parser combinators for alternatives and sequeneces, this grammar should be
+straightforward to implement. The first idea would be
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+lazy val Pal : Parser[String, String] =
+(("a" ~ Pal ~ "a") | ("b" ~ Pal ~ "b") | "a" | "b" | "")
+\end{lstlisting}
+\end{center}
+\noindent
+Unfortunately, this does not work as it produces a typing error. The
+reason is that the parsers \texttt{"a"}, \texttt{"b"} and \texttt{""}
+all produce strings and therefore can be put into an alternative
+\texttt{...| "a" | "b" | ""}. But both \pcode{"a" ~ Pal ~ "a"} and
+\pcode{"b" ~ Pal ~ "b"} produce pairs of the form
+$(((\_, \_), \_), \_)$---that is how the sequence parser combinator
+nests results when \pcode{\~} is used between two components. The
+solution is to use a semantic action that ``flattens'' these pairs and
+appends the corresponding strings, like
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+lazy val Pal : Parser[String, String] =
+(("a" ~ Pal ~ "a") ==> { case ((x, y), z) => x + y + z } |
+("b" ~ Pal ~ "b") ==> { case ((x, y), z) => x + y + z } |
+"a" | "b" | "")
+\end{lstlisting}
+\end{center}
 \subsubsection*{Implementing an Interpreter}
 %\bigskip
 %takes advantage of the full generality---have a look

changeset 588	a4646557016d
parent 587	5ddedcd92d84
child 589	0451b8b67f62