diff -r 5ddedcd92d84 -r a4646557016d handouts/ho06.tex --- a/handouts/ho06.tex Wed Oct 24 20:37:37 2018 +0100 +++ b/handouts/ho06.tex Thu Oct 25 00:50:58 2018 +0100 @@ -2,7 +2,7 @@ \documentclass{article} \usepackage{../style} \usepackage{../langs} - +\usepackage{../grammar} \begin{document} @@ -539,11 +539,11 @@ notation for this parser combinator. Therefore we will write \texttt{p ==> f} for it. -What are semantic actions good for? Ultimately, they allow is to -transform the parsed input into a datastructure we can use for further +What are semantic actions good for? Well, they allow is to transform +the parsed input into a datastructure we can use for further processing. A simple example would be to transform parsed characters into ASCII numbers. Suppose we define a function \texttt{f} (from -characters to ints) and a \texttt{CharParser}. +characters to ints) and use a \texttt{CharParser} for the character \texttt{c}. \begin{center} \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] @@ -553,7 +553,7 @@ \end{center} \noindent -Then +Then we can run the following parsers on the input \texttt{cbd}: \begin{center} \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] @@ -563,8 +563,67 @@ \end{center} \noindent -the first line produces \texttt{Set(('c', "bd"))}, whereas the second -produces \texttt{Set((99, "bd"))}. +The first line we obtain the result \texttt{Set(('c', "bd"))}, whereas the second +produces \texttt{Set((99, "bd"))}---the character has been transformed into +an ASCII number. + +A slightly less contrived example is about parsing numbers (recall +\texttt{NumParser} above). However, we want to do this here for strings. +For this assume we have the following \texttt{RegexParser}. + +\begin{center} + \begin{lstlisting}[language=Scala,xleftmargin=0mm, + basicstyle=\small\ttfamily, numbers=none] +import scala.util.matching.Regex + +case class RegexParser(reg: Regex) extends Parser[String, String] { + def parse(in: String) = reg.findPrefixMatchOf(in) match { + case None => Set() + case Some(m) => Set((m.matched, m.after.toString)) + } +} +\end{lstlisting} +\end{center} + +\noindent +This parser takes a regex as argument and splits up a string into a +prefix and the rest according to this regex +(\texttt{reg.findPrefixMatchOf} generates a match---in the successful +case---and the corresponding strings can be extracted with +\texttt{matched} and \texttt{after}). We can now define a +\texttt{NumParser} for strings as follows: + +\begin{center} +\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] +val NumParser = RegexParser("[0-9]+".r) +\end{lstlisting} +\end{center} + +\noindent +This parser will recognise a number at the beginning of a string, for +example + +\begin{center} +\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] +NumParser.parse("123abc") +\end{lstlisting} +\end{center} + +\noindent +produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is +still a string. We need to convert it into the corresponding +\texttt{Int}. We can do this as follows + +\begin{center} +\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] +(NumParser ==> (s => s.toInt)).parse("123abc") +\end{lstlisting} +\end{center} + +\noindent +The semantic action in form of a function converts a string into an +\texttt{Int}. Let us come back to semantic actions when we are going +to implement a simple calculator. \subsubsection*{Shorthand notation for parser combinators} @@ -574,6 +633,47 @@ \subsubsection*{How to build parsers using parser combinators?} +The beauty of parser combinators is the ease with which they can be +implemented and how easy it is to translate context-free grammars into +code (though the grammars need to be non-left-recursive). To +demonstrate this recall our context-free grammar for palindromes: + +\begin{plstx}[margin=3cm] +: \meta{P} ::= a\cdot \meta{P}\cdot a | b\cdot \meta{P}\cdot b | a | b | \epsilon\\ +\end{plstx} + +\noindent +Given the parser combinators for alternatives and sequeneces, this grammar should be +straightforward to implement. The first idea would be + +\begin{center} +\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] +lazy val Pal : Parser[String, String] = + (("a" ~ Pal ~ "a") | ("b" ~ Pal ~ "b") | "a" | "b" | "") +\end{lstlisting} +\end{center} + +\noindent +Unfortunately, this does not work as it produces a typing error. The +reason is that the parsers \texttt{"a"}, \texttt{"b"} and \texttt{""} +all produce strings and therefore can be put into an alternative +\texttt{...| "a" | "b" | ""}. But both \pcode{"a" ~ Pal ~ "a"} and +\pcode{"b" ~ Pal ~ "b"} produce pairs of the form +$(((\_, \_), \_), \_)$---that is how the sequence parser combinator +nests results when \pcode{\~} is used between two components. The +solution is to use a semantic action that ``flattens'' these pairs and +appends the corresponding strings, like + +\begin{center} +\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] +lazy val Pal : Parser[String, String] = + (("a" ~ Pal ~ "a") ==> { case ((x, y), z) => x + y + z } | + ("b" ~ Pal ~ "b") ==> { case ((x, y), z) => x + y + z } | + "a" | "b" | "") +\end{lstlisting} +\end{center} + + \subsubsection*{Implementing an Interpreter} %\bigskip