783 |
783 |
784 |
784 |
785 |
785 |
786 \subsubsection*{Implementing an Interpreter} |
786 \subsubsection*{Implementing an Interpreter} |
787 |
787 |
788 %\bigskip |
788 The first step before implementing an interpreter for fullblown |
789 %takes advantage of the full generality---have a look |
789 language is to implement a simple calculator for arithmetic |
790 %what it produces if we call it with the string \texttt{abc} |
790 expressions. Suppose our arithmetic expressions are given by the |
791 % |
791 grammar: |
792 %\begin{center} |
792 |
793 %\begin{tabular}{rcl} |
793 \begin{plstx}[margin=3cm,one per line] |
794 %input string & & output\medskip\\ |
794 : \meta{E} ::= |
795 %\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |
795 | \meta{E} \cdot + \cdot \meta{E} |
796 %\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |
796 | \meta{E} \cdot - \cdot \meta{E} |
797 %\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$ |
797 | \meta{E} \cdot * \cdot \meta{E} |
798 %\end{tabular} |
798 | ( \cdot \meta{E} \cdot ) |
799 %\end{center} |
799 | Number \\ |
|
800 \end{plstx} |
|
801 |
|
802 \noindent |
|
803 Naturally we want to implement the grammar in such a way that we can |
|
804 calculate what the result of \texttt{4*2+3} is---we are interested in |
|
805 an \texttt{Int} rather than a string. This means every component |
|
806 parser needs to have as output type \texttt{Int} and when we assemble |
|
807 the intermediate results, strings like \texttt{"+"}, \texttt{"*"} and |
|
808 so on, need to be translated into the appropriate Scala operation. |
|
809 Being inspired by the parser for well-nested parentheses and ignoring |
|
810 the fact that we want $*$ to take precedence, we might write something |
|
811 like |
|
812 |
|
813 \begin{center} |
|
814 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
|
815 lazy val E: Parser[String, Int] = |
|
816 (E ~ "+" ~ E ==> { case ((x, y), z) => x + z} | |
|
817 E ~ "-" ~ E ==> { case ((x, y), z) => x - z} | |
|
818 E ~ "*" ~ E ==> { case ((x, y), z) => x * z} | |
|
819 "(" ~ E ~ ")" ==> { case ((x, y), z) => y} | |
|
820 NumParserInt) |
|
821 \end{lstlisting} |
|
822 \end{center} |
|
823 |
|
824 \noindent |
|
825 Consider again carfully how the semantic actions pick out the correct |
|
826 arguments. In case of plus, we need \texttt{x} and \texttt{z}, because |
|
827 they correspond to the results of the parser \texttt{E}. We can just |
|
828 add \texttt{x + z} in order to obtain \texttt{Int} because the output |
|
829 type of \texttt{E} is \texttt{Int}. Similarly with subtraction and |
|
830 multiplication. In contrast in the fourth clause we need to return |
|
831 \texttt{y}, because it is the result enclosed inside the parentheses. |
|
832 |
|
833 So far so good. The problem arises when we try to call \pcode{parse_all} with the |
|
834 expression \texttt{"1+2+3"}. Lets try it |
|
835 |
|
836 \begin{center} |
|
837 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none] |
|
838 E.parse_all("1+2+3") |
|
839 \end{lstlisting} |
|
840 \end{center} |
|
841 |
|
842 \noindent |
|
843 \ldots and we wait and wait and \ldots wait. What is the problem? Actually, |
|
844 the parser just fell into an infinite loop. The reason is that the above |
|
845 grammar is left-recursive and recall that parser combinator cannot deal with |
|
846 such grammars. Luckily every left-recursive context-free grammar can be |
|
847 transformed into a non-left-recursive grammars that still recognise the |
|
848 same strings. This allows us to design the following grammar |
|
849 |
|
850 |
|
851 |
|
852 |
|
853 |
800 |
854 |
801 |
855 |
802 |
856 |
803 \end{document} |
857 \end{document} |
804 |
858 |