handouts/ho06.tex
changeset 592 0d309fafa9f0
parent 591 863e502f6a5c
child 593 bb24d4e207b6
equal deleted inserted replaced
591:863e502f6a5c 592:0d309fafa9f0
   783 
   783 
   784 
   784 
   785 
   785 
   786 \subsubsection*{Implementing an Interpreter}
   786 \subsubsection*{Implementing an Interpreter}
   787 
   787 
   788 %\bigskip
   788 The first step before implementing an interpreter for fullblown
   789 %takes advantage of the full generality---have a look
   789 language is to implement a simple calculator for arithmetic
   790 %what it produces if we call it with the string \texttt{abc}
   790 expressions. Suppose our arithmetic expressions are given by the
   791 %
   791 grammar:
   792 %\begin{center}
   792 
   793 %\begin{tabular}{rcl}
   793 \begin{plstx}[margin=3cm,one per line]
   794 %input string & & output\medskip\\
   794 : \meta{E} ::= 
   795 %\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
   795    | \meta{E} \cdot + \cdot \meta{E} 
   796 %\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
   796    | \meta{E} \cdot - \cdot \meta{E} 
   797 %\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
   797    | \meta{E} \cdot * \cdot \meta{E} 
   798 %\end{tabular}
   798    | ( \cdot \meta{E} \cdot )
   799 %\end{center}
   799    | Number \\
       
   800 \end{plstx}
       
   801 
       
   802 \noindent
       
   803 Naturally we want to implement the grammar in such a way that we can
       
   804 calculate what the result of \texttt{4*2+3} is---we are interested in
       
   805 an \texttt{Int} rather than a string. This means every component
       
   806 parser needs to have as output type \texttt{Int} and when we assemble
       
   807 the intermediate results, strings like \texttt{"+"}, \texttt{"*"} and
       
   808 so on, need to be translated into the appropriate Scala operation.
       
   809 Being inspired by the parser for well-nested parentheses and ignoring
       
   810 the fact that we want $*$ to take precedence, we might write something
       
   811 like
       
   812 
       
   813 \begin{center}
       
   814 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
       
   815 lazy val E: Parser[String, Int] = 
       
   816   (E ~ "+" ~ E ==> { case ((x, y), z) => x + z} |
       
   817    E ~ "-" ~ E ==> { case ((x, y), z) => x - z} |
       
   818    E ~ "*" ~ E ==> { case ((x, y), z) => x * z} |
       
   819    "(" ~ E ~ ")" ==> { case ((x, y), z) => y} |
       
   820    NumParserInt)
       
   821 \end{lstlisting}
       
   822 \end{center}
       
   823 
       
   824 \noindent
       
   825 Consider again carfully how the semantic actions pick out the correct
       
   826 arguments. In case of plus, we need \texttt{x} and \texttt{z}, because
       
   827 they correspond to the results of the parser \texttt{E}. We can just
       
   828 add \texttt{x + z} in order to obtain \texttt{Int} because the output
       
   829 type of \texttt{E} is \texttt{Int}.  Similarly with subtraction and
       
   830 multiplication. In contrast in the fourth clause we need to return
       
   831 \texttt{y}, because it is the result enclosed inside the parentheses.
       
   832 
       
   833 So far so good. The problem arises when we try to call \pcode{parse_all} with the
       
   834 expression \texttt{"1+2+3"}. Lets try it
       
   835 
       
   836 \begin{center}
       
   837 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
       
   838 E.parse_all("1+2+3")
       
   839 \end{lstlisting}
       
   840 \end{center}
       
   841 
       
   842 \noindent
       
   843 \ldots and we wait and wait and \ldots wait. What is the problem? Actually,
       
   844 the parser just fell into an infinite loop. The reason is that the above
       
   845 grammar is left-recursive and recall that parser combinator cannot deal with
       
   846 such grammars. Luckily every left-recursive context-free grammar can be
       
   847 transformed into a non-left-recursive grammars that still recognise the
       
   848 same strings. This allows us to design the following grammar
       
   849 
       
   850 
       
   851 
       
   852 
       
   853 
   800 
   854 
   801 
   855 
   802 
   856 
   803 \end{document}
   857 \end{document}
   804 
   858