--- a/handouts/ho06.tex Fri Oct 26 16:14:10 2018 +0100
+++ b/handouts/ho06.tex Fri Oct 26 17:13:41 2018 +0100
@@ -99,7 +99,7 @@
functional/object-oriented programming language, like Scala, we need
to specify an abstract class for parser combinators. In the abstract
class we specify that \texttt{I} is the \emph{input type} of the
-parser combinator and that \texttt{T} is the \emph{ouput type}. This
+parser combinator and that \texttt{T} is the \emph{output type}. This
implies that the function \texttt{parse} takes an argument of type
\texttt{I} and returns a set of type \mbox{\texttt{Set[(T, I)]}}.
@@ -284,7 +284,7 @@
unprocessed parts are fed into the second parser $q$. The overall
result of the sequence parser combinator is pairs of the form
$((\textit{output}_1, \textit{output}_2), u_2)$. This means the
-unprocessed part of the sequqnce parser combinator is the unprocessed
+unprocessed part of the sequence parser combinator is the unprocessed
part the second parser $q$ leaves as leftover. The parsed parts of the
component parsers are combined in a pair, namely
$(\textit{output}_1, \textit{output}_2)$. The reason is we want to
@@ -572,7 +572,7 @@
\texttt{Int}. Now \texttt{parse} generates \texttt{Set((123,abc))},
but this time \texttt{123} is an \texttt{Int}. Let us come back to
semantic actions when we are going to implement actual context-free
-gammars.
+grammars.
\subsubsection*{Shorthand notation for parser combinators}
@@ -671,7 +671,7 @@
\texttt{...| "a" | "b" | ""}.
If we run the parser above with \pcode{Pal.parse_all("abaaaba")} we obtain
-as result the \pcode{Set(abaaaba)}, which indicates that the string is a palindrom
+as result the \pcode{Set(abaaaba)}, which indicates that the string is a palindrome
(an empty set would mean something is wrong). But also notice what the
intermediate results are generated by \pcode{Pal.parse("abaaaba")}
@@ -685,14 +685,14 @@
That there are more than one output might be slightly unexpected, but
can be explained as follows: the pairs represent all possible
(partial) parses of the string \pcode{"abaaaba"}. The first pair above
-correesponds to a complete parse (all output is consumed) and this is
+corresponds to a complete parse (all output is consumed) and this is
what \pcode{Pal.parse_all} returns. The second pair is a small
``sub-palindrome'' that can also be parsed, but the parse fails with
the rest \pcode{aaba}, which is therefore left as unprocessed. The
third one is an attempt to parse the whole string with the
single-character parser \pcode{a}. That of course only partially
succeeds, by leaving \pcode{"baaaba"} as the unprocessed
-part. Finally, since we allow the empty string to be a palindrom we
+part. Finally, since we allow the empty string to be a palindrome we
also obtain the last pair, where actually nothing is consumed from the
input string. While all this works as intended, we need to be careful
with this (especially with including the \pcode{""} parser in our
@@ -713,7 +713,7 @@
\end{center}
\noindent
-then Scala before making this assignemnt to \texttt{Pal} attempts to
+then Scala before making this assignment to \texttt{Pal} attempts to
find out what the expression on the right-hand side evaluates to. This
is straightforward in case of simple expressions \texttt{2 + 3}, but
the expression above contains \texttt{Pal} in the right-hand
@@ -749,8 +749,8 @@
the argument types for \texttt{p} and \texttt{q} prevent Scala from
evaluating the arguments. Normally, Scala would first evaluate what
kind of parsers \texttt{p} and \texttt{q} are, and only then generate
-the alternative parser combinator, repsectively sequence parser
-combinator. Since the argumants can be recursive parsers, such as
+the alternative parser combinator, respectively sequence parser
+combinator. Since the arguments can be recursive parsers, such as
\texttt{Pal}, this would lead again to an infinite loop.
As a final example in this section, let us consider the grammar for
@@ -762,7 +762,7 @@
\noindent
Let us assume we want to not just recognise strings of
-well-nested parentheses but also transfrom round parentheses
+well-nested parentheses but also transform round parentheses
into curly braces. We can do this by using a semantic
action:
@@ -785,14 +785,13 @@
\subsubsection*{Implementing an Interpreter}
-The first step before implementing an interpreter for fullblown
+The first step before implementing an interpreter for a full-blown
language is to implement a simple calculator for arithmetic
expressions. Suppose our arithmetic expressions are given by the
grammar:
\begin{plstx}[margin=3cm,one per line]
-: \meta{E} ::=
- | \meta{E} \cdot + \cdot \meta{E}
+: \meta{E} ::= \meta{E} \cdot + \cdot \meta{E}
| \meta{E} \cdot - \cdot \meta{E}
| \meta{E} \cdot * \cdot \meta{E}
| ( \cdot \meta{E} \cdot )
@@ -801,14 +800,15 @@
\noindent
Naturally we want to implement the grammar in such a way that we can
-calculate what the result of \texttt{4*2+3} is---we are interested in
-an \texttt{Int} rather than a string. This means every component
-parser needs to have as output type \texttt{Int} and when we assemble
-the intermediate results, strings like \texttt{"+"}, \texttt{"*"} and
-so on, need to be translated into the appropriate Scala operation.
-Being inspired by the parser for well-nested parentheses and ignoring
-the fact that we want $*$ to take precedence, we might write something
-like
+calculate what the result of, for example, \texttt{4*2+3} is---we are
+interested in an \texttt{Int} rather than a string. This means every
+component parser needs to have as output type \texttt{Int} and when we
+assemble the intermediate results, strings like \texttt{"+"},
+\texttt{"*"} and so on, need to be translated into the appropriate
+Scala operation of adding, multiplying and so on. Being inspired by
+the parser for well-nested parentheses above and ignoring the fact
+that we want $*$ to take precedence over $+$ and $-$, we might want to
+write something like
\begin{center}
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
@@ -822,13 +822,15 @@
\end{center}
\noindent
-Consider again carfully how the semantic actions pick out the correct
-arguments. In case of plus, we need \texttt{x} and \texttt{z}, because
-they correspond to the results of the parser \texttt{E}. We can just
-add \texttt{x + z} in order to obtain \texttt{Int} because the output
-type of \texttt{E} is \texttt{Int}. Similarly with subtraction and
-multiplication. In contrast in the fourth clause we need to return
-\texttt{y}, because it is the result enclosed inside the parentheses.
+Consider again carefully how the semantic actions pick out the correct
+arguments for the calculation. In case of plus, we need \texttt{x} and
+\texttt{z}, because they correspond to the results of the component
+parser \texttt{E}. We can just add \texttt{x + z} in order to obtain
+an \texttt{Int} because the output type of \texttt{E} is
+\texttt{Int}. Similarly with subtraction and multiplication. In
+contrast in the fourth clause we need to return \texttt{y}, because it
+is the result enclosed inside the parentheses. The information about
+parentheses, roughly speaking, we just throw away.
So far so good. The problem arises when we try to call \pcode{parse_all} with the
expression \texttt{"1+2+3"}. Lets try it
@@ -840,19 +842,62 @@
\end{center}
\noindent
-\ldots and we wait and wait and \ldots wait. What is the problem? Actually,
-the parser just fell into an infinite loop. The reason is that the above
-grammar is left-recursive and recall that parser combinator cannot deal with
-such grammars. Luckily every left-recursive context-free grammar can be
-transformed into a non-left-recursive grammars that still recognise the
-same strings. This allows us to design the following grammar
+\ldots and we wait and wait and \ldots still wait. What is the
+problem? Actually, the parser just fell into an infinite loop! The
+reason is that the above grammar is left-recursive and recall that our
+parser combinators cannot deal with such left-recursive
+grammars. Fortunately, every left-recursive context-free grammar can be
+transformed into a non-left-recursive grammars that still recognises
+the same strings. This allows us to design the following grammar
+
+\begin{plstx}[margin=3cm]
+ : \meta{E} ::= \meta{T} \cdot + \cdot \meta{E} | \meta{T} \cdot - \cdot \meta{E} | \meta{T}\\
+: \meta{T} ::= \meta{F} \cdot * \cdot \meta{T} | \meta{F}\\
+: \meta{F} ::= ( \cdot \meta{E} \cdot ) | Number\\
+\end{plstx}
+
+\noindent
+Recall what left-recursive means from Handout 5 and make sure you see
+why this grammar is \emph{non} left-recursive. This version of the grammar
+also deals with the fact that $*$ should have a higher precedence. This does not
+affect which strings this grammar can recognise, but in which order we are going
+to evaluate any arithmetic expression. We can translate this grammar into
+parsing combinators as follows:
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+lazy val E: Parser[String, Int] =
+ (T ~ "+" ~ E) ==> { case ((x, y), z) => x + z } |
+ (T ~ "-" ~ E) ==> { case ((x, y), z) => x - z } | T
+lazy val T: Parser[String, Int] =
+ (F ~ "*" ~ T) ==> { case ((x, y), z) => x * z } | F
+lazy val F: Parser[String, Int] =
+ ("(" ~ E ~ ")") ==> { case ((x, y), z) => y } | NumParserInt
+\end{lstlisting}
+\end{center}
-
+\noindent
+Let us try out on some examples:
+\begin{center}
+\begin{tabular}{rcl}
+ input strings & & output of \pcode{parse_all}\medskip\\
+ \texttt{\Grid{1+2+3}} & $\rightarrow$ & \texttt{Set(6)}\\
+ \texttt{\Grid{4*2+3}} & $\rightarrow$ & \texttt{Set(11)}\\
+ \texttt{\Grid{4*(2+3)}} & $\rightarrow$ & \texttt{Set(20)}\\
+ \texttt{\Grid{4/2+3}} & $\rightarrow$ & \texttt{Set()}\\
+ \texttt{\Grid{1\VS +\VS 2\VS +\VS 3}} & $\rightarrow$ & \texttt{Set()}\\
+\end{tabular}
+\end{center}
-
+\noindent
+All examples should be quite self-explanatory: the last two do not
+produce any result because our parser did not define what to do in
+case of division (could be easily added) but also has no idea what to
+do with whitescpaces. To deal with them is the task of the lexer. We
+can deal with them inside the grammar, but that would render many
+grammars becoming unintelligible.
\end{document}