handouts/ho06.tex
changeset 588 a4646557016d
parent 587 5ddedcd92d84
child 589 0451b8b67f62
equal deleted inserted replaced
587:5ddedcd92d84 588:a4646557016d
     1 
     1 
     2 \documentclass{article}
     2 \documentclass{article}
     3 \usepackage{../style}
     3 \usepackage{../style}
     4 \usepackage{../langs}
     4 \usepackage{../langs}
     5 
     5 \usepackage{../grammar}
     6 
     6 
     7 \begin{document}
     7 \begin{document}
     8 
     8 
     9 \section*{Handout 6 (Parser Combinators)}
     9 \section*{Handout 6 (Parser Combinators)}
    10 
    10 
   537 type \texttt{T => S}, we obtain a parser with output type
   537 type \texttt{T => S}, we obtain a parser with output type
   538 \texttt{S}. Again Scala lets us introduce some shorthand
   538 \texttt{S}. Again Scala lets us introduce some shorthand
   539 notation for this parser combinator. Therefore we will write
   539 notation for this parser combinator. Therefore we will write
   540 \texttt{p ==> f} for it.
   540 \texttt{p ==> f} for it.
   541 
   541 
   542 What are semantic actions good for? Ultimately, they allow is to
   542 What are semantic actions good for? Well, they allow is to transform
   543 transform the parsed input into a datastructure we can use for further
   543 the parsed input into a datastructure we can use for further
   544 processing. A simple example would be to transform parsed characters
   544 processing. A simple example would be to transform parsed characters
   545 into ASCII numbers. Suppose we define a function \texttt{f} (from
   545 into ASCII numbers. Suppose we define a function \texttt{f} (from
   546 characters to ints) and a \texttt{CharParser}.
   546 characters to ints) and use a \texttt{CharParser} for the character \texttt{c}.
   547 
   547 
   548 \begin{center}
   548 \begin{center}
   549 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   549 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   550 val f = (c: Char) => c.toInt
   550 val f = (c: Char) => c.toInt
   551 val c = new CharParser('c')
   551 val c = new CharParser('c')
   552 \end{lstlisting}
   552 \end{lstlisting}
   553 \end{center}
   553 \end{center}
   554 
   554 
   555 \noindent
   555 \noindent
   556 Then
   556 Then we can run the following parsers on the input \texttt{cbd}:
   557 
   557 
   558 \begin{center}
   558 \begin{center}
   559 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   559 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
   560 c.parse("cbd")
   560 c.parse("cbd")
   561 (c ==> f).parse("cbd")
   561 (c ==> f).parse("cbd")
   562 \end{lstlisting}
   562 \end{lstlisting}
   563 \end{center}
   563 \end{center}
   564 
   564 
   565 \noindent
   565 \noindent
   566 the first line produces \texttt{Set(('c', "bd"))}, whereas the second
   566 The first line we obtain the result \texttt{Set(('c', "bd"))}, whereas the second
   567 produces \texttt{Set((99, "bd"))}.
   567 produces \texttt{Set((99, "bd"))}---the character has been transformed into
       
   568 an ASCII number.
       
   569 
       
   570 A slightly less contrived example is about parsing numbers (recall
       
   571 \texttt{NumParser} above). However, we want to do this here for strings.
       
   572 For this assume we have the following \texttt{RegexParser}.
       
   573 
       
   574 \begin{center}
       
   575   \begin{lstlisting}[language=Scala,xleftmargin=0mm,
       
   576     basicstyle=\small\ttfamily, numbers=none]
       
   577 import scala.util.matching.Regex
       
   578     
       
   579 case class RegexParser(reg: Regex) extends Parser[String, String] {
       
   580   def parse(in: String) = reg.findPrefixMatchOf(in) match {
       
   581     case None => Set()
       
   582     case Some(m) => Set((m.matched, m.after.toString))  
       
   583   }
       
   584 }
       
   585 \end{lstlisting}
       
   586 \end{center}
       
   587 
       
   588 \noindent
       
   589 This parser takes a regex as argument and splits up a string into a
       
   590 prefix and the rest according to this regex
       
   591 (\texttt{reg.findPrefixMatchOf} generates a match---in the successful
       
   592 case---and the corresponding strings can be extracted with
       
   593 \texttt{matched} and \texttt{after}). We can now define a
       
   594 \texttt{NumParser} for strings as follows:
       
   595 
       
   596 \begin{center}
       
   597 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
       
   598 val NumParser = RegexParser("[0-9]+".r)
       
   599 \end{lstlisting}
       
   600 \end{center}
       
   601 
       
   602 \noindent
       
   603 This parser will recognise a number at the beginning of a string, for
       
   604 example
       
   605 
       
   606 \begin{center}
       
   607 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
       
   608 NumParser.parse("123abc")
       
   609 \end{lstlisting}
       
   610 \end{center}  
       
   611 
       
   612 \noindent
       
   613 produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is
       
   614 still a string. We need to convert it into the corresponding
       
   615 \texttt{Int}. We can do this as follows
       
   616 
       
   617 \begin{center}
       
   618 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
       
   619 (NumParser ==> (s => s.toInt)).parse("123abc")
       
   620 \end{lstlisting}
       
   621 \end{center}  
       
   622 
       
   623 \noindent
       
   624 The semantic action in form of a function converts a string into an
       
   625 \texttt{Int}. Let us come back to semantic actions when we are going
       
   626 to implement a simple calculator.
   568 
   627 
   569 \subsubsection*{Shorthand notation for parser combinators}
   628 \subsubsection*{Shorthand notation for parser combinators}
   570 
   629 
   571 Before we proceed, let us just explain the shorthand notation for
   630 Before we proceed, let us just explain the shorthand notation for
   572 parser combinators. Like for regular expressions, the shorthand notation
   631 parser combinators. Like for regular expressions, the shorthand notation
   573 will make our life much easier when writing actual parsers.
   632 will make our life much easier when writing actual parsers.
   574 
   633 
   575 \subsubsection*{How to build parsers using parser combinators?}
   634 \subsubsection*{How to build parsers using parser combinators?}
       
   635 
       
   636 The beauty of parser combinators is the ease with which they can be
       
   637 implemented and how easy it is to translate context-free grammars into
       
   638 code (though the grammars need to be non-left-recursive). To
       
   639 demonstrate this recall our context-free grammar for palindromes:
       
   640 
       
   641 \begin{plstx}[margin=3cm]
       
   642 : \meta{P} ::=  a\cdot \meta{P}\cdot a | b\cdot \meta{P}\cdot b | a | b | \epsilon\\
       
   643 \end{plstx}
       
   644 
       
   645 \noindent
       
   646 Given the parser combinators for alternatives and sequeneces, this grammar should be
       
   647 straightforward to implement. The first idea would be
       
   648 
       
   649 \begin{center}
       
   650 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
       
   651 lazy val Pal : Parser[String, String] = 
       
   652   (("a" ~ Pal ~ "a") | ("b" ~ Pal ~ "b") | "a" | "b" | "")
       
   653 \end{lstlisting}
       
   654 \end{center}
       
   655 
       
   656 \noindent
       
   657 Unfortunately, this does not work as it produces a typing error. The
       
   658 reason is that the parsers \texttt{"a"}, \texttt{"b"} and \texttt{""}
       
   659 all produce strings and therefore can be put into an alternative
       
   660 \texttt{...| "a" | "b" | ""}. But both \pcode{"a" ~ Pal ~ "a"} and
       
   661 \pcode{"b" ~ Pal ~ "b"} produce pairs of the form
       
   662 $(((\_, \_), \_), \_)$---that is how the sequence parser combinator
       
   663 nests results when \pcode{\~} is used between two components. The
       
   664 solution is to use a semantic action that ``flattens'' these pairs and
       
   665 appends the corresponding strings, like
       
   666 
       
   667 \begin{center}
       
   668 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
       
   669 lazy val Pal : Parser[String, String] =  
       
   670   (("a" ~ Pal ~ "a") ==> { case ((x, y), z) => x + y + z } |
       
   671    ("b" ~ Pal ~ "b") ==> { case ((x, y), z) => x + y + z } |
       
   672     "a" | "b" | "")
       
   673 \end{lstlisting}
       
   674 \end{center}
       
   675 
   576 
   676 
   577 \subsubsection*{Implementing an Interpreter}
   677 \subsubsection*{Implementing an Interpreter}
   578 
   678 
   579 %\bigskip
   679 %\bigskip
   580 %takes advantage of the full generality---have a look
   680 %takes advantage of the full generality---have a look