afl-material: comparison handouts/ho06.tex

equal deleted inserted replaced

-:0d309fafa9f0
+:bb24d4e207b6
 combinators out of smaller components following very closely the
 structure of a grammar. In order to implement this in a
 functional/object-oriented programming language, like Scala, we need
 to specify an abstract class for parser combinators. In the abstract
 class we specify that \texttt{I} is the \emph{input type} of the
-parser combinator and that \texttt{T} is the \emph{ouput type}.  This
+parser combinator and that \texttt{T} is the \emph{output type}.  This
 implies that the function \texttt{parse} takes an argument of type
 \texttt{I} and returns a set of type \mbox{\texttt{Set[(T, I)]}}.
 \begin{center}
 \begin{lstlisting}[language=Scala]
 stands for the unprocessed, or leftover, parts of $p$. We want that
 $q$ runs on all these unprocessed parts $u_1$. Therefore these
 unprocessed parts are fed into the second parser $q$. The overall
 result of the sequence parser combinator is pairs of the form
 $((\textit{output}_1, \textit{output}_2), u_2)$. This means the
-unprocessed part of the sequqnce parser combinator is the unprocessed
+unprocessed part of the sequence parser combinator is the unprocessed
 part the second parser $q$ leaves as leftover. The parsed parts of the
 component parsers are combined in a pair, namely
 $(\textit{output}_1, \textit{output}_2)$. The reason is we want to
 know what $p$ and $q$ were able to parse. This behaviour can be
 implemented in Scala as follows:
 \noindent
 The function in the semantic action converts a string into an
 \texttt{Int}. Now \texttt{parse} generates \texttt{Set((123,abc))},
 but this time \texttt{123} is an \texttt{Int}. Let us come back to
 semantic actions when we are going to implement actual context-free
-gammars.
+grammars.
 \subsubsection*{Shorthand notation for parser combinators}
 Before we proceed, let us just explain the shorthand notation for
 parser combinators. Like for regular expressions, the shorthand notation
 means after the semantic action is applied the output type of the parser
 is \texttt{String}, which means it fits with the alternative parsers
 \texttt{...| "a" | "b" | ""}.
 If we run the parser above with \pcode{Pal.parse_all("abaaaba")} we obtain
-as result the \pcode{Set(abaaaba)}, which indicates that the string is a palindrom
+as result the \pcode{Set(abaaaba)}, which indicates that the string is a palindrome
 (an empty set would mean something is wrong). But also notice what the
 intermediate results are generated by \pcode{Pal.parse("abaaaba")}
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 \noindent
 That there are more than one output might be slightly unexpected, but
 can be explained as follows: the pairs represent all possible
 (partial) parses of the string \pcode{"abaaaba"}. The first pair above
-correesponds to a complete parse (all output is consumed) and this is
+corresponds to a complete parse (all output is consumed) and this is
 what \pcode{Pal.parse_all} returns. The second pair is a small
 ``sub-palindrome'' that can also be parsed, but the parse fails with
 the rest \pcode{aaba}, which is therefore left as unprocessed. The
 third one is an attempt to parse the whole string with the
 single-character parser \pcode{a}. That of course only partially
 succeeds, by leaving \pcode{"baaaba"} as the unprocessed
-part. Finally, since we allow the empty string to be a palindrom we
+part. Finally, since we allow the empty string to be a palindrome we
 also obtain the last pair, where actually nothing is consumed from the
 input string. While all this works as intended, we need to be careful
 with this (especially with including the \pcode{""} parser in our
 grammar): if during parsing the set of parsing attempts gets too big,
 then the parsing process can become very slow as the potential
 val Pal : Parser[String, String] =  ...rhs...
 \end{lstlisting}
 \end{center}
 \noindent
-then Scala before making this assignemnt to \texttt{Pal} attempts to
+then Scala before making this assignment to \texttt{Pal} attempts to
 find out what the expression on the right-hand side evaluates to. This
 is straightforward in case of simple expressions \texttt{2 + 3}, but
 the expression above contains \texttt{Pal} in the right-hand
 side. Without \pcode{lazy} it would try to evaluate what \texttt{Pal}
 evaluates to and start a new recursion, which means it falls into an
 \noindent where the \texttt{\textbf{\textcolor{codepurple}{=>}}} in front of
 the argument types for \texttt{p} and \texttt{q} prevent Scala from
 evaluating the arguments. Normally, Scala would first evaluate what
 kind of parsers \texttt{p} and \texttt{q} are, and only then generate
-the alternative parser combinator, repsectively sequence parser
+the alternative parser combinator, respectively sequence parser
-combinator. Since the argumants can be recursive parsers, such as
+combinator. Since the arguments can be recursive parsers, such as
 \texttt{Pal}, this would lead again to an infinite loop.
 As a final example in this section, let us consider the grammar for
 well-nested parentheses:
 : \meta{P} ::=  (\cdot \meta{P}\cdot ) \cdot \meta{P} | \epsilon\\
 \end{plstx}
 \noindent
 Let us assume we want to not just recognise strings of
-well-nested parentheses but also transfrom round parentheses
+well-nested parentheses but also transform round parentheses
 into curly braces. We can do this by using a semantic
 action:
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily,
 \subsubsection*{Implementing an Interpreter}
-The first step before implementing an interpreter for fullblown
+The first step before implementing an interpreter for a full-blown
 language is to implement a simple calculator for arithmetic
 expressions. Suppose our arithmetic expressions are given by the
 grammar:
 \begin{plstx}[margin=3cm,one per line]
-: \meta{E} ::=
+: \meta{E} ::= \meta{E} \cdot + \cdot \meta{E}
-| \meta{E} \cdot + \cdot \meta{E}
 | \meta{E} \cdot - \cdot \meta{E}
 | \meta{E} \cdot * \cdot \meta{E}
 | ( \cdot \meta{E} \cdot )
 | Number \\
 \end{plstx}
 \noindent
 Naturally we want to implement the grammar in such a way that we can
-calculate what the result of \texttt{4*2+3} is---we are interested in
+calculate what the result of, for example, \texttt{4*2+3} is---we are
-an \texttt{Int} rather than a string. This means every component
+interested in an \texttt{Int} rather than a string. This means every
-parser needs to have as output type \texttt{Int} and when we assemble
+component parser needs to have as output type \texttt{Int} and when we
-the intermediate results, strings like \texttt{"+"}, \texttt{"*"} and
+assemble the intermediate results, strings like \texttt{"+"},
-so on, need to be translated into the appropriate Scala operation.
+\texttt{"*"} and so on, need to be translated into the appropriate
-Being inspired by the parser for well-nested parentheses and ignoring
+Scala operation of adding, multiplying and so on.  Being inspired by
-the fact that we want $*$ to take precedence, we might write something
+the parser for well-nested parentheses above and ignoring the fact
-like
+that we want $*$ to take precedence over $+$ and $-$, we might want to
+write something like
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 lazy val E: Parser[String, Int] =
 (E ~ "+" ~ E ==> { case ((x, y), z) => x + z} |
 NumParserInt)
 \end{lstlisting}
 \end{center}
 \noindent
-Consider again carfully how the semantic actions pick out the correct
+Consider again carefully how the semantic actions pick out the correct
-arguments. In case of plus, we need \texttt{x} and \texttt{z}, because
+arguments for the calculation. In case of plus, we need \texttt{x} and
-they correspond to the results of the parser \texttt{E}. We can just
+\texttt{z}, because they correspond to the results of the component
-add \texttt{x + z} in order to obtain \texttt{Int} because the output
+parser \texttt{E}. We can just add \texttt{x + z} in order to obtain
-type of \texttt{E} is \texttt{Int}.  Similarly with subtraction and
+an \texttt{Int} because the output type of \texttt{E} is
-multiplication. In contrast in the fourth clause we need to return
+\texttt{Int}.  Similarly with subtraction and multiplication. In
-\texttt{y}, because it is the result enclosed inside the parentheses.
+contrast in the fourth clause we need to return \texttt{y}, because it
+is the result enclosed inside the parentheses. The information about
+parentheses, roughly speaking, we just throw away.
 So far so good. The problem arises when we try to call \pcode{parse_all} with the
 expression \texttt{"1+2+3"}. Lets try it
 \begin{center}
 E.parse_all("1+2+3")
 \end{lstlisting}
 \end{center}
 \noindent
-\ldots and we wait and wait and \ldots wait. What is the problem? Actually,
+\ldots and we wait and wait and \ldots still wait. What is the
-the parser just fell into an infinite loop. The reason is that the above
+problem? Actually, the parser just fell into an infinite loop! The
-grammar is left-recursive and recall that parser combinator cannot deal with
+reason is that the above grammar is left-recursive and recall that our
-such grammars. Luckily every left-recursive context-free grammar can be
+parser combinators cannot deal with such left-recursive
-transformed into a non-left-recursive grammars that still recognise the
+grammars. Fortunately, every left-recursive context-free grammar can be
-same strings. This allows us to design the following grammar
+transformed into a non-left-recursive grammars that still recognises
+the same strings. This allows us to design the following grammar
+\begin{plstx}[margin=3cm]
+: \meta{E} ::=  \meta{T} \cdot + \cdot \meta{E} |  \meta{T} \cdot - \cdot \meta{E} | \meta{T}\\
+: \meta{T} ::=  \meta{F} \cdot * \cdot \meta{T} | \meta{F}\\
+: \meta{F} ::= ( \cdot \meta{E} \cdot ) | Number\\
+\end{plstx}
+\noindent
+Recall what left-recursive means from Handout 5 and make sure you see
+why this grammar is \emph{non} left-recursive. This version of the grammar
+also deals with the fact that $*$ should have a higher precedence. This does not
+affect which strings this grammar can recognise, but in which order we are going
+to evaluate any arithmetic expression. We can translate this grammar into
+parsing combinators as follows:
+\begin{center}
+\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
+lazy val E: Parser[String, Int] =
+(T ~ "+" ~ E) ==> { case ((x, y), z) => x + z } |
+(T ~ "-" ~ E) ==> { case ((x, y), z) => x - z } | T
+lazy val T: Parser[String, Int] =
+(F ~ "*" ~ T) ==> { case ((x, y), z) => x * z } | F
+lazy val F: Parser[String, Int] =
+("(" ~ E ~ ")") ==> { case ((x, y), z) => y } | NumParserInt
+\end{lstlisting}
+\end{center}
+\noindent
+Let us try out on some examples:
+\begin{center}
+\begin{tabular}{rcl}
+input strings & & output of \pcode{parse_all}\medskip\\
+\texttt{\Grid{1+2+3}} & $\rightarrow$ & \texttt{Set(6)}\\
+\texttt{\Grid{4*2+3}} & $\rightarrow$ & \texttt{Set(11)}\\
+\texttt{\Grid{4*(2+3)}} & $\rightarrow$ & \texttt{Set(20)}\\
+\texttt{\Grid{4/2+3}} & $\rightarrow$ & \texttt{Set()}\\
+\texttt{\Grid{1\VS +\VS 2\VS +\VS 3}} & $\rightarrow$ & \texttt{Set()}\\
+\end{tabular}
+\end{center}
+\noindent
+All examples should be quite self-explanatory: the last two do not
+produce any result because our parser did not define what to do in
+case of division (could be easily added) but also has no idea what to
+do with whitescpaces. To deal with them is the task of the lexer. We
+can deal with them inside the grammar, but that would render many
+grammars becoming unintelligible.
 \end{document}
 %%% Local Variables:
 %%% mode: latex

changeset 593	bb24d4e207b6
parent 592	0d309fafa9f0
child 594	d40d7d7b85bc