handouts/ho06.tex
changeset 593 bb24d4e207b6
parent 592 0d309fafa9f0
child 594 d40d7d7b85bc
--- a/handouts/ho06.tex	Fri Oct 26 16:14:10 2018 +0100
+++ b/handouts/ho06.tex	Fri Oct 26 17:13:41 2018 +0100
@@ -99,7 +99,7 @@
 functional/object-oriented programming language, like Scala, we need
 to specify an abstract class for parser combinators. In the abstract
 class we specify that \texttt{I} is the \emph{input type} of the
-parser combinator and that \texttt{T} is the \emph{ouput type}.  This
+parser combinator and that \texttt{T} is the \emph{output type}.  This
 implies that the function \texttt{parse} takes an argument of type
 \texttt{I} and returns a set of type \mbox{\texttt{Set[(T, I)]}}.
 
@@ -284,7 +284,7 @@
 unprocessed parts are fed into the second parser $q$. The overall
 result of the sequence parser combinator is pairs of the form
 $((\textit{output}_1, \textit{output}_2), u_2)$. This means the
-unprocessed part of the sequqnce parser combinator is the unprocessed
+unprocessed part of the sequence parser combinator is the unprocessed
 part the second parser $q$ leaves as leftover. The parsed parts of the
 component parsers are combined in a pair, namely
 $(\textit{output}_1, \textit{output}_2)$. The reason is we want to
@@ -572,7 +572,7 @@
 \texttt{Int}. Now \texttt{parse} generates \texttt{Set((123,abc))},
 but this time \texttt{123} is an \texttt{Int}. Let us come back to
 semantic actions when we are going to implement actual context-free
-gammars.
+grammars.
 
 \subsubsection*{Shorthand notation for parser combinators}
 
@@ -671,7 +671,7 @@
 \texttt{...| "a" | "b" | ""}.
 
 If we run the parser above with \pcode{Pal.parse_all("abaaaba")} we obtain
-as result the \pcode{Set(abaaaba)}, which indicates that the string is a palindrom
+as result the \pcode{Set(abaaaba)}, which indicates that the string is a palindrome
 (an empty set would mean something is wrong). But also notice what the
 intermediate results are generated by \pcode{Pal.parse("abaaaba")}
 
@@ -685,14 +685,14 @@
 That there are more than one output might be slightly unexpected, but
 can be explained as follows: the pairs represent all possible
 (partial) parses of the string \pcode{"abaaaba"}. The first pair above
-correesponds to a complete parse (all output is consumed) and this is
+corresponds to a complete parse (all output is consumed) and this is
 what \pcode{Pal.parse_all} returns. The second pair is a small
 ``sub-palindrome'' that can also be parsed, but the parse fails with
 the rest \pcode{aaba}, which is therefore left as unprocessed. The
 third one is an attempt to parse the whole string with the
 single-character parser \pcode{a}. That of course only partially
 succeeds, by leaving \pcode{"baaaba"} as the unprocessed
-part. Finally, since we allow the empty string to be a palindrom we
+part. Finally, since we allow the empty string to be a palindrome we
 also obtain the last pair, where actually nothing is consumed from the
 input string. While all this works as intended, we need to be careful
 with this (especially with including the \pcode{""} parser in our
@@ -713,7 +713,7 @@
 \end{center}
 
 \noindent
-then Scala before making this assignemnt to \texttt{Pal} attempts to
+then Scala before making this assignment to \texttt{Pal} attempts to
 find out what the expression on the right-hand side evaluates to. This
 is straightforward in case of simple expressions \texttt{2 + 3}, but
 the expression above contains \texttt{Pal} in the right-hand
@@ -749,8 +749,8 @@
 the argument types for \texttt{p} and \texttt{q} prevent Scala from
 evaluating the arguments. Normally, Scala would first evaluate what
 kind of parsers \texttt{p} and \texttt{q} are, and only then generate
-the alternative parser combinator, repsectively sequence parser
-combinator. Since the argumants can be recursive parsers, such as
+the alternative parser combinator, respectively sequence parser
+combinator. Since the arguments can be recursive parsers, such as
 \texttt{Pal}, this would lead again to an infinite loop.
 
 As a final example in this section, let us consider the grammar for
@@ -762,7 +762,7 @@
 
 \noindent
 Let us assume we want to not just recognise strings of
-well-nested parentheses but also transfrom round parentheses
+well-nested parentheses but also transform round parentheses
 into curly braces. We can do this by using a semantic
 action:
 
@@ -785,14 +785,13 @@
 
 \subsubsection*{Implementing an Interpreter}
 
-The first step before implementing an interpreter for fullblown
+The first step before implementing an interpreter for a full-blown
 language is to implement a simple calculator for arithmetic
 expressions. Suppose our arithmetic expressions are given by the
 grammar:
 
 \begin{plstx}[margin=3cm,one per line]
-: \meta{E} ::= 
-   | \meta{E} \cdot + \cdot \meta{E} 
+: \meta{E} ::= \meta{E} \cdot + \cdot \meta{E} 
    | \meta{E} \cdot - \cdot \meta{E} 
    | \meta{E} \cdot * \cdot \meta{E} 
    | ( \cdot \meta{E} \cdot )
@@ -801,14 +800,15 @@
 
 \noindent
 Naturally we want to implement the grammar in such a way that we can
-calculate what the result of \texttt{4*2+3} is---we are interested in
-an \texttt{Int} rather than a string. This means every component
-parser needs to have as output type \texttt{Int} and when we assemble
-the intermediate results, strings like \texttt{"+"}, \texttt{"*"} and
-so on, need to be translated into the appropriate Scala operation.
-Being inspired by the parser for well-nested parentheses and ignoring
-the fact that we want $*$ to take precedence, we might write something
-like
+calculate what the result of, for example, \texttt{4*2+3} is---we are
+interested in an \texttt{Int} rather than a string. This means every
+component parser needs to have as output type \texttt{Int} and when we
+assemble the intermediate results, strings like \texttt{"+"},
+\texttt{"*"} and so on, need to be translated into the appropriate
+Scala operation of adding, multiplying and so on.  Being inspired by
+the parser for well-nested parentheses above and ignoring the fact
+that we want $*$ to take precedence over $+$ and $-$, we might want to
+write something like
 
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
@@ -822,13 +822,15 @@
 \end{center}
 
 \noindent
-Consider again carfully how the semantic actions pick out the correct
-arguments. In case of plus, we need \texttt{x} and \texttt{z}, because
-they correspond to the results of the parser \texttt{E}. We can just
-add \texttt{x + z} in order to obtain \texttt{Int} because the output
-type of \texttt{E} is \texttt{Int}.  Similarly with subtraction and
-multiplication. In contrast in the fourth clause we need to return
-\texttt{y}, because it is the result enclosed inside the parentheses.
+Consider again carefully how the semantic actions pick out the correct
+arguments for the calculation. In case of plus, we need \texttt{x} and
+\texttt{z}, because they correspond to the results of the component
+parser \texttt{E}. We can just add \texttt{x + z} in order to obtain
+an \texttt{Int} because the output type of \texttt{E} is
+\texttt{Int}.  Similarly with subtraction and multiplication. In
+contrast in the fourth clause we need to return \texttt{y}, because it
+is the result enclosed inside the parentheses. The information about
+parentheses, roughly speaking, we just throw away.
 
 So far so good. The problem arises when we try to call \pcode{parse_all} with the
 expression \texttt{"1+2+3"}. Lets try it
@@ -840,19 +842,62 @@
 \end{center}
 
 \noindent
-\ldots and we wait and wait and \ldots wait. What is the problem? Actually,
-the parser just fell into an infinite loop. The reason is that the above
-grammar is left-recursive and recall that parser combinator cannot deal with
-such grammars. Luckily every left-recursive context-free grammar can be
-transformed into a non-left-recursive grammars that still recognise the
-same strings. This allows us to design the following grammar
+\ldots and we wait and wait and \ldots still wait. What is the
+problem? Actually, the parser just fell into an infinite loop! The
+reason is that the above grammar is left-recursive and recall that our
+parser combinators cannot deal with such left-recursive
+grammars. Fortunately, every left-recursive context-free grammar can be
+transformed into a non-left-recursive grammars that still recognises
+the same strings. This allows us to design the following grammar
+
+\begin{plstx}[margin=3cm]
+  : \meta{E} ::=  \meta{T} \cdot + \cdot \meta{E} |  \meta{T} \cdot - \cdot \meta{E} | \meta{T}\\
+: \meta{T} ::=  \meta{F} \cdot * \cdot \meta{T} | \meta{F}\\
+: \meta{F} ::= ( \cdot \meta{E} \cdot ) | Number\\
+\end{plstx}
+
+\noindent
+Recall what left-recursive means from Handout 5 and make sure you see
+why this grammar is \emph{non} left-recursive. This version of the grammar
+also deals with the fact that $*$ should have a higher precedence. This does not
+affect which strings this grammar can recognise, but in which order we are going
+to evaluate any arithmetic expression. We can translate this grammar into
+parsing combinators as follows:
 
 
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+lazy val E: Parser[String, Int] = 
+  (T ~ "+" ~ E) ==> { case ((x, y), z) => x + z } |
+  (T ~ "-" ~ E) ==> { case ((x, y), z) => x - z } | T 
+lazy val T: Parser[String, Int] = 
+  (F ~ "*" ~ T) ==> { case ((x, y), z) => x * z } | F
+lazy val F: Parser[String, Int] = 
+  ("(" ~ E ~ ")") ==> { case ((x, y), z) => y } | NumParserInt
+\end{lstlisting}
+\end{center}
 
-
+\noindent
+Let us try out on some examples:
 
+\begin{center}
+\begin{tabular}{rcl}
+  input strings & & output of \pcode{parse_all}\medskip\\
+  \texttt{\Grid{1+2+3}} & $\rightarrow$ & \texttt{Set(6)}\\
+  \texttt{\Grid{4*2+3}} & $\rightarrow$ & \texttt{Set(11)}\\
+  \texttt{\Grid{4*(2+3)}} & $\rightarrow$ & \texttt{Set(20)}\\
+  \texttt{\Grid{4/2+3}} & $\rightarrow$ & \texttt{Set()}\\
+  \texttt{\Grid{1\VS +\VS 2\VS +\VS 3}} & $\rightarrow$ & \texttt{Set()}\\                      
+\end{tabular}
+\end{center}
 
-
+\noindent
+All examples should be quite self-explanatory: the last two do not
+produce any result because our parser did not define what to do in
+case of division (could be easily added) but also has no idea what to
+do with whitescpaces. To deal with them is the task of the lexer. We
+can deal with them inside the grammar, but that would render many
+grammars becoming unintelligible.
 
 \end{document}