537 type \texttt{T => S}, we obtain a parser with output type |
537 type \texttt{T => S}, we obtain a parser with output type |
538 \texttt{S}. Again Scala lets us introduce some shorthand |
538 \texttt{S}. Again Scala lets us introduce some shorthand |
539 notation for this parser combinator. Therefore we will write |
539 notation for this parser combinator. Therefore we will write |
540 \texttt{p ==> f} for it. |
540 \texttt{p ==> f} for it. |
541 |
541 |
542 What are semantic actions good for? Ultimately, they allow is to |
542 What are semantic actions good for? Well, they allow is to transform |
543 transform the parsed input into a datastructure we can use for further |
543 the parsed input into a datastructure we can use for further |
544 processing. A simple example would be to transform parsed characters |
544 processing. A simple example would be to transform parsed characters |
545 into ASCII numbers. Suppose we define a function \texttt{f} (from |
545 into ASCII numbers. Suppose we define a function \texttt{f} (from |
546 characters to ints) and a \texttt{CharParser}. |
546 characters to ints) and use a \texttt{CharParser} for the character \texttt{c}. |
547 |
547 |
548 \begin{center} |
548 \begin{center} |
549 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
549 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
550 val f = (c: Char) => c.toInt |
550 val f = (c: Char) => c.toInt |
551 val c = new CharParser('c') |
551 val c = new CharParser('c') |
552 \end{lstlisting} |
552 \end{lstlisting} |
553 \end{center} |
553 \end{center} |
554 |
554 |
555 \noindent |
555 \noindent |
556 Then |
556 Then we can run the following parsers on the input \texttt{cbd}: |
557 |
557 |
558 \begin{center} |
558 \begin{center} |
559 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
559 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
560 c.parse("cbd") |
560 c.parse("cbd") |
561 (c ==> f).parse("cbd") |
561 (c ==> f).parse("cbd") |
562 \end{lstlisting} |
562 \end{lstlisting} |
563 \end{center} |
563 \end{center} |
564 |
564 |
565 \noindent |
565 \noindent |
566 the first line produces \texttt{Set(('c', "bd"))}, whereas the second |
566 The first line we obtain the result \texttt{Set(('c', "bd"))}, whereas the second |
567 produces \texttt{Set((99, "bd"))}. |
567 produces \texttt{Set((99, "bd"))}---the character has been transformed into |
|
568 an ASCII number. |
|
569 |
|
570 A slightly less contrived example is about parsing numbers (recall |
|
571 \texttt{NumParser} above). However, we want to do this here for strings. |
|
572 For this assume we have the following \texttt{RegexParser}. |
|
573 |
|
574 \begin{center} |
|
575 \begin{lstlisting}[language=Scala,xleftmargin=0mm, |
|
576 basicstyle=\small\ttfamily, numbers=none] |
|
577 import scala.util.matching.Regex |
|
578 |
|
579 case class RegexParser(reg: Regex) extends Parser[String, String] { |
|
580 def parse(in: String) = reg.findPrefixMatchOf(in) match { |
|
581 case None => Set() |
|
582 case Some(m) => Set((m.matched, m.after.toString)) |
|
583 } |
|
584 } |
|
585 \end{lstlisting} |
|
586 \end{center} |
|
587 |
|
588 \noindent |
|
589 This parser takes a regex as argument and splits up a string into a |
|
590 prefix and the rest according to this regex |
|
591 (\texttt{reg.findPrefixMatchOf} generates a match---in the successful |
|
592 case---and the corresponding strings can be extracted with |
|
593 \texttt{matched} and \texttt{after}). We can now define a |
|
594 \texttt{NumParser} for strings as follows: |
|
595 |
|
596 \begin{center} |
|
597 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
|
598 val NumParser = RegexParser("[0-9]+".r) |
|
599 \end{lstlisting} |
|
600 \end{center} |
|
601 |
|
602 \noindent |
|
603 This parser will recognise a number at the beginning of a string, for |
|
604 example |
|
605 |
|
606 \begin{center} |
|
607 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
|
608 NumParser.parse("123abc") |
|
609 \end{lstlisting} |
|
610 \end{center} |
|
611 |
|
612 \noindent |
|
613 produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is |
|
614 still a string. We need to convert it into the corresponding |
|
615 \texttt{Int}. We can do this as follows |
|
616 |
|
617 \begin{center} |
|
618 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
|
619 (NumParser ==> (s => s.toInt)).parse("123abc") |
|
620 \end{lstlisting} |
|
621 \end{center} |
|
622 |
|
623 \noindent |
|
624 The semantic action in form of a function converts a string into an |
|
625 \texttt{Int}. Let us come back to semantic actions when we are going |
|
626 to implement a simple calculator. |
568 |
627 |
569 \subsubsection*{Shorthand notation for parser combinators} |
628 \subsubsection*{Shorthand notation for parser combinators} |
570 |
629 |
571 Before we proceed, let us just explain the shorthand notation for |
630 Before we proceed, let us just explain the shorthand notation for |
572 parser combinators. Like for regular expressions, the shorthand notation |
631 parser combinators. Like for regular expressions, the shorthand notation |
573 will make our life much easier when writing actual parsers. |
632 will make our life much easier when writing actual parsers. |
574 |
633 |
575 \subsubsection*{How to build parsers using parser combinators?} |
634 \subsubsection*{How to build parsers using parser combinators?} |
|
635 |
|
636 The beauty of parser combinators is the ease with which they can be |
|
637 implemented and how easy it is to translate context-free grammars into |
|
638 code (though the grammars need to be non-left-recursive). To |
|
639 demonstrate this recall our context-free grammar for palindromes: |
|
640 |
|
641 \begin{plstx}[margin=3cm] |
|
642 : \meta{P} ::= a\cdot \meta{P}\cdot a | b\cdot \meta{P}\cdot b | a | b | \epsilon\\ |
|
643 \end{plstx} |
|
644 |
|
645 \noindent |
|
646 Given the parser combinators for alternatives and sequeneces, this grammar should be |
|
647 straightforward to implement. The first idea would be |
|
648 |
|
649 \begin{center} |
|
650 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
|
651 lazy val Pal : Parser[String, String] = |
|
652 (("a" ~ Pal ~ "a") | ("b" ~ Pal ~ "b") | "a" | "b" | "") |
|
653 \end{lstlisting} |
|
654 \end{center} |
|
655 |
|
656 \noindent |
|
657 Unfortunately, this does not work as it produces a typing error. The |
|
658 reason is that the parsers \texttt{"a"}, \texttt{"b"} and \texttt{""} |
|
659 all produce strings and therefore can be put into an alternative |
|
660 \texttt{...| "a" | "b" | ""}. But both \pcode{"a" ~ Pal ~ "a"} and |
|
661 \pcode{"b" ~ Pal ~ "b"} produce pairs of the form |
|
662 $(((\_, \_), \_), \_)$---that is how the sequence parser combinator |
|
663 nests results when \pcode{\~} is used between two components. The |
|
664 solution is to use a semantic action that ``flattens'' these pairs and |
|
665 appends the corresponding strings, like |
|
666 |
|
667 \begin{center} |
|
668 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
|
669 lazy val Pal : Parser[String, String] = |
|
670 (("a" ~ Pal ~ "a") ==> { case ((x, y), z) => x + y + z } | |
|
671 ("b" ~ Pal ~ "b") ==> { case ((x, y), z) => x + y + z } | |
|
672 "a" | "b" | "") |
|
673 \end{lstlisting} |
|
674 \end{center} |
|
675 |
576 |
676 |
577 \subsubsection*{Implementing an Interpreter} |
677 \subsubsection*{Implementing an Interpreter} |
578 |
678 |
579 %\bigskip |
679 %\bigskip |
580 %takes advantage of the full generality---have a look |
680 %takes advantage of the full generality---have a look |