updated
authorChristian Urban <urbanc@in.tum.de>
Thu, 25 Oct 2018 00:50:58 +0100
changeset 588 a4646557016d
parent 587 5ddedcd92d84
child 589 0451b8b67f62
updated
handouts/ho06.pdf
handouts/ho06.tex
langs.sty
progs/comb1.scala
slides/slides05.tex
Binary file handouts/ho06.pdf has changed
--- a/handouts/ho06.tex	Wed Oct 24 20:37:37 2018 +0100
+++ b/handouts/ho06.tex	Thu Oct 25 00:50:58 2018 +0100
@@ -2,7 +2,7 @@
 \documentclass{article}
 \usepackage{../style}
 \usepackage{../langs}
-
+\usepackage{../grammar}
 
 \begin{document}
 
@@ -539,11 +539,11 @@
 notation for this parser combinator. Therefore we will write
 \texttt{p ==> f} for it.
 
-What are semantic actions good for? Ultimately, they allow is to
-transform the parsed input into a datastructure we can use for further
+What are semantic actions good for? Well, they allow is to transform
+the parsed input into a datastructure we can use for further
 processing. A simple example would be to transform parsed characters
 into ASCII numbers. Suppose we define a function \texttt{f} (from
-characters to ints) and a \texttt{CharParser}.
+characters to ints) and use a \texttt{CharParser} for the character \texttt{c}.
 
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
@@ -553,7 +553,7 @@
 \end{center}
 
 \noindent
-Then
+Then we can run the following parsers on the input \texttt{cbd}:
 
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
@@ -563,8 +563,67 @@
 \end{center}
 
 \noindent
-the first line produces \texttt{Set(('c', "bd"))}, whereas the second
-produces \texttt{Set((99, "bd"))}.
+The first line we obtain the result \texttt{Set(('c', "bd"))}, whereas the second
+produces \texttt{Set((99, "bd"))}---the character has been transformed into
+an ASCII number.
+
+A slightly less contrived example is about parsing numbers (recall
+\texttt{NumParser} above). However, we want to do this here for strings.
+For this assume we have the following \texttt{RegexParser}.
+
+\begin{center}
+  \begin{lstlisting}[language=Scala,xleftmargin=0mm,
+    basicstyle=\small\ttfamily, numbers=none]
+import scala.util.matching.Regex
+    
+case class RegexParser(reg: Regex) extends Parser[String, String] {
+  def parse(in: String) = reg.findPrefixMatchOf(in) match {
+    case None => Set()
+    case Some(m) => Set((m.matched, m.after.toString))  
+  }
+}
+\end{lstlisting}
+\end{center}
+
+\noindent
+This parser takes a regex as argument and splits up a string into a
+prefix and the rest according to this regex
+(\texttt{reg.findPrefixMatchOf} generates a match---in the successful
+case---and the corresponding strings can be extracted with
+\texttt{matched} and \texttt{after}). We can now define a
+\texttt{NumParser} for strings as follows:
+
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+val NumParser = RegexParser("[0-9]+".r)
+\end{lstlisting}
+\end{center}
+
+\noindent
+This parser will recognise a number at the beginning of a string, for
+example
+
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+NumParser.parse("123abc")
+\end{lstlisting}
+\end{center}  
+
+\noindent
+produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is
+still a string. We need to convert it into the corresponding
+\texttt{Int}. We can do this as follows
+
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+(NumParser ==> (s => s.toInt)).parse("123abc")
+\end{lstlisting}
+\end{center}  
+
+\noindent
+The semantic action in form of a function converts a string into an
+\texttt{Int}. Let us come back to semantic actions when we are going
+to implement a simple calculator.
 
 \subsubsection*{Shorthand notation for parser combinators}
 
@@ -574,6 +633,47 @@
 
 \subsubsection*{How to build parsers using parser combinators?}
 
+The beauty of parser combinators is the ease with which they can be
+implemented and how easy it is to translate context-free grammars into
+code (though the grammars need to be non-left-recursive). To
+demonstrate this recall our context-free grammar for palindromes:
+
+\begin{plstx}[margin=3cm]
+: \meta{P} ::=  a\cdot \meta{P}\cdot a | b\cdot \meta{P}\cdot b | a | b | \epsilon\\
+\end{plstx}
+
+\noindent
+Given the parser combinators for alternatives and sequeneces, this grammar should be
+straightforward to implement. The first idea would be
+
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+lazy val Pal : Parser[String, String] = 
+  (("a" ~ Pal ~ "a") | ("b" ~ Pal ~ "b") | "a" | "b" | "")
+\end{lstlisting}
+\end{center}
+
+\noindent
+Unfortunately, this does not work as it produces a typing error. The
+reason is that the parsers \texttt{"a"}, \texttt{"b"} and \texttt{""}
+all produce strings and therefore can be put into an alternative
+\texttt{...| "a" | "b" | ""}. But both \pcode{"a" ~ Pal ~ "a"} and
+\pcode{"b" ~ Pal ~ "b"} produce pairs of the form
+$(((\_, \_), \_), \_)$---that is how the sequence parser combinator
+nests results when \pcode{\~} is used between two components. The
+solution is to use a semantic action that ``flattens'' these pairs and
+appends the corresponding strings, like
+
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+lazy val Pal : Parser[String, String] =  
+  (("a" ~ Pal ~ "a") ==> { case ((x, y), z) => x + y + z } |
+   ("b" ~ Pal ~ "b") ==> { case ((x, y), z) => x + y + z } |
+    "a" | "b" | "")
+\end{lstlisting}
+\end{center}
+
+
 \subsubsection*{Implementing an Interpreter}
 
 %\bigskip
--- a/langs.sty	Wed Oct 24 20:37:37 2018 +0100
+++ b/langs.sty	Thu Oct 25 00:50:58 2018 +0100
@@ -19,8 +19,8 @@
     new,null,object,override,package,%
     private,protected,requires,return,sealed,%
     super,this,throw,trait,true,try,%
-    type,val,var,while,with,yield,write,read},%
-  otherkeywords={=>,<-,<\%,<:,>:,\#},%
+    type,val,var,while,with,yield,write,read,lazy},%
+  otherkeywords={=>,<-,<\%,<:,>:,\#,==>},%
   sensitive=true,%
   %directives={Int,Char,Rexp,String,Boolean,BigInt,Unit,List,Set},%
   %moredelim=*[directive]:,%
--- a/progs/comb1.scala	Wed Oct 24 20:37:37 2018 +0100
+++ b/progs/comb1.scala	Thu Oct 25 00:50:58 2018 +0100
@@ -37,22 +37,17 @@
     if (sb != "" && sb.head == c) Set((c, sb.tail)) else Set()
 }
 
-case class StringParser(s: String) extends Parser[String, String] {
-  def parse(sb: String) = {
-    val (prefix, suffix) = sb.splitAt(s.length)
-    if (prefix == s) Set((prefix, suffix)) else Set()
+import scala.util.matching.Regex
+case class RegexParser(reg: Regex) extends Parser[String, String] {
+  def parse(sb: String) = reg.findPrefixMatchOf(sb) match {
+    case None => Set()
+    case Some(m) => Set((m.matched, m.after.toString))  
   }
 }
 
-case object NumParser extends Parser[String, Int] {
-  val reg = "[0-9]+".r
-  def parse(sb: String) = reg.findPrefixOf(sb) match {
-    case None => Set()
-    case Some(s) => Set(sb.splitAt(s.length) match {
-      case (x, y) => (x.toInt, y)
-    }) 
-  }
-}
+val NumParser = RegexParser("[0-9]+".r)
+def StringParser(s: String) = RegexParser(s.r)
+
 
 // convenience
 implicit def string2parser(s: String) = StringParser(s)
@@ -74,6 +69,13 @@
     new SeqParser[String, String, String](s, r)
 }
 
+val c = new CharParser('c')
+lazy val cn = c ==> (c => c.toInt)
+val f = (c: Char) => c.toInt
+
+c.parse("cb")
+(c ==> f).parse("cb")
+
 // a parse palindromes
 lazy val Pal : Parser[String, String] = 
   (("a" ~ Pal ~ "a") ==> { case ((x, y), z) => x + y + z } ||
--- a/slides/slides05.tex	Wed Oct 24 20:37:37 2018 +0100
+++ b/slides/slides05.tex	Thu Oct 25 00:50:58 2018 +0100
@@ -598,15 +598,17 @@
 A grammar for palindromes over the alphabet~\bl{$\{a,b\}$}:
 
 \bl{\begin{plstx}[margin=3cm]
-: \meta{S} ::= \epsilon\\
 : \meta{S} ::= a\cdot\meta{S}\cdot a\\
 : \meta{S} ::= b\cdot\meta{S}\cdot b\\
+: \meta{S} ::= a\\
+: \meta{S} ::= b\\
+: \meta{S} ::= \epsilon\\
 \end{plstx}}\pause
 
 or
 
 \bl{\begin{plstx}[margin=2cm]
-: \meta{S} ::=  \epsilon | a\cdot \meta{S}\cdot a | b\cdot \meta{S}\cdot b \\
+: \meta{S} ::=  a\cdot \meta{S}\cdot a | b\cdot \meta{S}\cdot b | a | b | \epsilon\\
 \end{plstx}}\pause\bigskip
 
 \small