handouts/ho06.tex
changeset 588 a4646557016d
parent 587 5ddedcd92d84
child 589 0451b8b67f62
--- a/handouts/ho06.tex	Wed Oct 24 20:37:37 2018 +0100
+++ b/handouts/ho06.tex	Thu Oct 25 00:50:58 2018 +0100
@@ -2,7 +2,7 @@
 \documentclass{article}
 \usepackage{../style}
 \usepackage{../langs}
-
+\usepackage{../grammar}
 
 \begin{document}
 
@@ -539,11 +539,11 @@
 notation for this parser combinator. Therefore we will write
 \texttt{p ==> f} for it.
 
-What are semantic actions good for? Ultimately, they allow is to
-transform the parsed input into a datastructure we can use for further
+What are semantic actions good for? Well, they allow is to transform
+the parsed input into a datastructure we can use for further
 processing. A simple example would be to transform parsed characters
 into ASCII numbers. Suppose we define a function \texttt{f} (from
-characters to ints) and a \texttt{CharParser}.
+characters to ints) and use a \texttt{CharParser} for the character \texttt{c}.
 
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
@@ -553,7 +553,7 @@
 \end{center}
 
 \noindent
-Then
+Then we can run the following parsers on the input \texttt{cbd}:
 
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
@@ -563,8 +563,67 @@
 \end{center}
 
 \noindent
-the first line produces \texttt{Set(('c', "bd"))}, whereas the second
-produces \texttt{Set((99, "bd"))}.
+The first line we obtain the result \texttt{Set(('c', "bd"))}, whereas the second
+produces \texttt{Set((99, "bd"))}---the character has been transformed into
+an ASCII number.
+
+A slightly less contrived example is about parsing numbers (recall
+\texttt{NumParser} above). However, we want to do this here for strings.
+For this assume we have the following \texttt{RegexParser}.
+
+\begin{center}
+  \begin{lstlisting}[language=Scala,xleftmargin=0mm,
+    basicstyle=\small\ttfamily, numbers=none]
+import scala.util.matching.Regex
+    
+case class RegexParser(reg: Regex) extends Parser[String, String] {
+  def parse(in: String) = reg.findPrefixMatchOf(in) match {
+    case None => Set()
+    case Some(m) => Set((m.matched, m.after.toString))  
+  }
+}
+\end{lstlisting}
+\end{center}
+
+\noindent
+This parser takes a regex as argument and splits up a string into a
+prefix and the rest according to this regex
+(\texttt{reg.findPrefixMatchOf} generates a match---in the successful
+case---and the corresponding strings can be extracted with
+\texttt{matched} and \texttt{after}). We can now define a
+\texttt{NumParser} for strings as follows:
+
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+val NumParser = RegexParser("[0-9]+".r)
+\end{lstlisting}
+\end{center}
+
+\noindent
+This parser will recognise a number at the beginning of a string, for
+example
+
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+NumParser.parse("123abc")
+\end{lstlisting}
+\end{center}  
+
+\noindent
+produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is
+still a string. We need to convert it into the corresponding
+\texttt{Int}. We can do this as follows
+
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+(NumParser ==> (s => s.toInt)).parse("123abc")
+\end{lstlisting}
+\end{center}  
+
+\noindent
+The semantic action in form of a function converts a string into an
+\texttt{Int}. Let us come back to semantic actions when we are going
+to implement a simple calculator.
 
 \subsubsection*{Shorthand notation for parser combinators}
 
@@ -574,6 +633,47 @@
 
 \subsubsection*{How to build parsers using parser combinators?}
 
+The beauty of parser combinators is the ease with which they can be
+implemented and how easy it is to translate context-free grammars into
+code (though the grammars need to be non-left-recursive). To
+demonstrate this recall our context-free grammar for palindromes:
+
+\begin{plstx}[margin=3cm]
+: \meta{P} ::=  a\cdot \meta{P}\cdot a | b\cdot \meta{P}\cdot b | a | b | \epsilon\\
+\end{plstx}
+
+\noindent
+Given the parser combinators for alternatives and sequeneces, this grammar should be
+straightforward to implement. The first idea would be
+
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+lazy val Pal : Parser[String, String] = 
+  (("a" ~ Pal ~ "a") | ("b" ~ Pal ~ "b") | "a" | "b" | "")
+\end{lstlisting}
+\end{center}
+
+\noindent
+Unfortunately, this does not work as it produces a typing error. The
+reason is that the parsers \texttt{"a"}, \texttt{"b"} and \texttt{""}
+all produce strings and therefore can be put into an alternative
+\texttt{...| "a" | "b" | ""}. But both \pcode{"a" ~ Pal ~ "a"} and
+\pcode{"b" ~ Pal ~ "b"} produce pairs of the form
+$(((\_, \_), \_), \_)$---that is how the sequence parser combinator
+nests results when \pcode{\~} is used between two components. The
+solution is to use a semantic action that ``flattens'' these pairs and
+appends the corresponding strings, like
+
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+lazy val Pal : Parser[String, String] =  
+  (("a" ~ Pal ~ "a") ==> { case ((x, y), z) => x + y + z } |
+   ("b" ~ Pal ~ "b") ==> { case ((x, y), z) => x + y + z } |
+    "a" | "b" | "")
+\end{lstlisting}
+\end{center}
+
+
 \subsubsection*{Implementing an Interpreter}
 
 %\bigskip