afl-material: comparison handouts/ho06.tex

equal deleted inserted replaced

-:a4646557016d
+:0451b8b67f62
 $\Rightarrow$
 $\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$
 \end{center}
 \noindent
-Given the extended effort we have spent in order to implement a lexer
+Given the extended effort we have spent implementing a lexer
 in order to generate list of tokens, it might be surprising that in
 what follows we shall often use strings as input. This is for making
 the explanation more lucid. It does not make our previous work on
 lexers obsolete (remember they transform a string into a list of
 tokens). Lexers will still be needed for building a somewhat realistic
 if (in.head == c) Set((c, in.tail)) else Set()
 }
 \end{lstlisting}
 \end{center}
-\noindent You can see the \texttt{parse} tests whether the
+\noindent You can see \texttt{parse} tests whether the
 first character of the input string \texttt{in} is equal to
 \texttt{c}. If yes, then it splits the string into the recognised part
 \texttt{c} and the unprocessed part \texttt{in.tail}. In case
 \texttt{in} does not start with \texttt{c} then the parser returns the
 empty set (in Scala \texttt{Set()}). Since this parser recognises
 \end{center}
 \noindent
 In this parser the input is of type \texttt{List[Token]}. The function
 parse looks at the input \texttt{ts} and checks whether the first
-token is a \texttt{Num\_token}. Let us assume our lexer generated
+token is a \texttt{Num\_token} (let us assume our lexer generated
-these tokens for numbers. But this parser does not just return this
+these tokens for numbers). But this parser does not just return this
 token (and the rest of the list), like the \texttt{CharParser} above,
-rather extracts the string \texttt{s} from the token and converts it
+rather it extracts the string \texttt{s} from the token and converts it
 into an integer. The hope is that the lexer did its work well and this
 conversion always succeeds. The consequence of this is that the output
 type for this parser is \texttt{Int}, not \texttt{Token}. Such a
 conversion would be needed if we want to implement a simple calculator
 program.
 new AltParser(CharParser('a'), CharParser('b'))
 \end{lstlisting}
 \end{center}
 \noindent Later on we will use Scala mechanism for introducing some
-more readable shorthand notation for this, like \texttt{"a" ||
+more readable shorthand notation for this, like \texttt{"a" |
 "b"}. Let us look in detail at what this parser combinator produces
 with some sample strings
 \begin{center}
 \begin{tabular}{rcl}
 \end{tabular}
 \end{center}
 \noindent Notice that the $p$ wil first be run on the input, producing
 pairs of the form $(\textit{output}_1, u_1)$ where the $u_1$ stands
-for the unprocessed, or left-over, parts. We want that $q$ runs on all
+for the unprocessed, or leftover, parts. We want that $q$ runs on all
 these unprocessed parts $u_1$. This again will produce some
 processed part , $p$ and
 $q$, we apply both parsers to the input (remember parsers are
 functions) and combine the output (remember they are sets of pairs)
 new AltParser(CharParser('a'), CharParser('b'))
 \end{lstlisting}
 \end{center}
 \noindent Later on we will use again Scala mechanism for introducing some
-more readable shorthand notation for this, like \texttt{"a" || "b"}. Let us look in detail at what this parser combinator produces with some sample strings
+more readable shorthand notation for this, like \texttt{"a" | "b"}. Let us look in detail at what this parser combinator produces with some sample strings
 \begin{center}
 \begin{tabular}{rcl}
 input strings & & output\medskip\\
 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
 producing pairs of the form $\textit{output}_1$ and unprocessed part
 $u_1$. This unprocessed part is fed into the second parser $q$. The
 overall result of the sequence parser combinator is pairs of the form
 $((\textit{output}_1, \textit{output}_2), u_2)$. This means the
 unprocessed part of both parsers is the unprocessed part the second
-parser $q$ produces leaves as left-over. The processed parts of both
+parser $q$ leaves as leftover. The processed parts of both
 parsers is a pair consisting of the outputs of $p$ and $q$, namely
 $(\textit{output}_1, \textit{output}_2)$. This behaviour can be
 implemented in Scala as follows:
 \begin{center}
 \emph{not} a simple pair \texttt{(ab, cde)} as one might erroneously
 expect.  The parser returns the empty set in the other examples,
 because they do not fit with what the parser is supposed to parse.
-A slightly more complicated parser is \pcode{("a" || "b") ~ "c"} which
+A slightly more complicated parser is \pcode{("a" | "b") ~ "c"} which
 parses as first character either an \texttt{a} or \texttt{b}, followed
 by a \texttt{c}. This parser produces the following outputs.
 \begin{center}
 \begin{tabular}{rcl}
 \noindent The second and third example fail, because something is
 ``missing'' in the sequence we are looking for. Also notice how the
 results nest with sequences: the parsed part is a nested pair of the
-form \pcode{((a, b), c)}. Two more examples: first consider the parser
+form \pcode{((a, b), c)}. If we nest the sequence parser differently,
+for example \pcode{"a" ~ ("b" ~ "c")}, then also our output pairs nest
+differently
+\begin{center}
+\begin{tabular}{rcl}
+input strings & & output\medskip\\
+\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}},(\texttt{\Grid{b}}, \texttt{\Grid{c}})), \texttt{\Grid{de}})\right\}$\\
+\end{tabular}
+\end{center}
+\noindent
+Two more examples: first consider the parser
 \pcode{("a" ~ "a") ~ "a"} and the input \pcode{aaaa}:
 \begin{center}
 \begin{tabular}{rcl}
 input string & & output\medskip\\
 from the pairs; everything where the second part was not empty has
 been thrown away as well, because they represent
 ultimately-unsuccessful-parses.
-Note carefully that constructing a parser such \pcode{"a" || ("a" ~
+Note carefully that constructing a parser such \pcode{"a" | ("a" ~
 "b")} will result in a typing error. The intention is that we want
 to parse an \texttt{a}, or an \texttt{a} followed by a
 \texttt{b}. However, the first parser has as output type a single
 character (recall the type of \texttt{CharParser}), but the second
 parser produces a pair of characters as output. The alternative parser
 is required to have both component parsers to have the same type---we
 need to be able to build the union of two sets, which means in Scala
-they need to be of the same type.  We will see later how we can build
+they need to be of the same type.  Therefore the typing error.
+We will see later how we can build
 this parser without the typing error.
 The next parser combinator, called \emph{semantic action}, does not
 actually combine smaller parsers, but applies a function to the result
 of a parser.  It is implemented in Scala as follows
 \noindent This parser combinator takes a parser \texttt{p}
 with output type \texttt{T} as one argument as well as a
 function \texttt{f} with type \texttt{T => S}. The parser
 \texttt{p} produces sets of type \texttt{(T, I)}. The
-semantic action combinastor then applies the function
+semantic action combinator then applies the function
 \texttt{f} to all the parser outputs. Since this function is of
 type \texttt{T => S}, we obtain a parser with output type
 \texttt{S}. Again Scala lets us introduce some shorthand
 notation for this parser combinator. Therefore we will write
 \texttt{p ==> f} for it.
-What are semantic actions good for? Well, they allow is to transform
+What are semantic actions good for? Well, they allow you to transform
-the parsed input into a datastructure we can use for further
+the parsed input into a datastructure you can use for further
 processing. A simple example would be to transform parsed characters
 into ASCII numbers. Suppose we define a function \texttt{f} (from
-characters to ints) and use a \texttt{CharParser} for the character \texttt{c}.
+characters to ints) and use a \texttt{CharParser} for parsing
+the character \texttt{c}.
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 val f = (c: Char) => c.toInt
 val c = new CharParser('c')
 \end{lstlisting}
 \end{center}
 \noindent
-Then we can run the following parsers on the input \texttt{cbd}:
+We then can run the following two parsers on the input \texttt{cbd}:
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 c.parse("cbd")
 (c ==> f).parse("cbd")
 \end{lstlisting}
 \end{center}
 \noindent
-The first line we obtain the result \texttt{Set(('c', "bd"))}, whereas the second
+In the first line we obtain the expected result \texttt{Set(('c',
-produces \texttt{Set((99, "bd"))}---the character has been transformed into
+"bd"))}, whereas the second produces \texttt{Set((99, "bd"))}---the
-an ASCII number.
+character has been transformed into an ASCII number.
 A slightly less contrived example is about parsing numbers (recall
 \texttt{NumParser} above). However, we want to do this here for strings.
 For this assume we have the following \texttt{RegexParser}.
 \noindent
 This parser takes a regex as argument and splits up a string into a
 prefix and the rest according to this regex
 (\texttt{reg.findPrefixMatchOf} generates a match---in the successful
 case---and the corresponding strings can be extracted with
-\texttt{matched} and \texttt{after}). We can now define a
+\texttt{matched} and \texttt{after}). Using this parser we can define a
 \texttt{NumParser} for strings as follows:
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 val NumParser = RegexParser("[0-9]+".r)
 \end{lstlisting}
 \end{center}
 \noindent
 produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is
-still a string. We need to convert it into the corresponding
+still a string (the double-quotes are not printed by Scala). We need
-\texttt{Int}. We can do this as follows
+to convert it into the corresponding \texttt{Int}. We can do this as
+follows using a semantic action
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 (NumParser ==> (s => s.toInt)).parse("123abc")
 \end{lstlisting}
 \end{center}
 \noindent
-The semantic action in form of a function converts a string into an
+The function in the semantic action converts a string into an
 \texttt{Int}. Let us come back to semantic actions when we are going
-to implement a simple calculator.
+to implement actual context-free gammars.
 \subsubsection*{Shorthand notation for parser combinators}
 Before we proceed, let us just explain the shorthand notation for
 parser combinators. Like for regular expressions, the shorthand notation
 ("b" ~ Pal ~ "b") ==> { case ((x, y), z) => x + y + z } |
 "a" | "b" | "")
 \end{lstlisting}
 \end{center}
+\noindent
+The semantic action
+\noindent
+Important to note is that we must define \texttt{Pal}-parser as a
+\emph{lazy} value.
 \subsubsection*{Implementing an Interpreter}
 %\bigskip
 %takes advantage of the full generality---have a look

changeset 589	0451b8b67f62
parent 588	a4646557016d
child 590	c6a1e19e9801