afl-material: comparison handouts/ho04.tex

equal deleted inserted replaced

-:f71399fe3fdc
+:36799f7b9702
 shown below. I use the convention that
 regular expressions are written entirely with upper-case
 letters, whereas values start with a single upper-case
 character and the rest are lower-case letters.
-{\small\lstinputlisting[language=Scala,numbers=none,linebackgroundcolor=
+{\small%
-{\ifodd\value{lstnumber}\color{capri!3}\fi}]
+\begin{lstlisting}[language=Scala,numbers=none]
-{../progs/app01.scala}}
+enum Rexp {
+case ZERO
+case ONE
-{\small\lstinputlisting[language=Scala,numbers=none,linebackgroundcolor=
+case CHAR(c: Char)
-{\ifodd\value{lstnumber}\color{capri!3}\fi}]
+case ALT(r1: Rexp, r2: Rexp)
-{../progs/app02.scala}}
+case SEQ(r1: Rexp, r2: Rexp)
+case STAR(r: Rexp)
+}
+\end{lstlisting}
+\begin{lstlisting}[language=Scala,numbers=none]
+enum Val {
+case Empty
+case Chr(c: Char)
+case Sequ(v1: Val, v2: Val)
+case Left(v: Val)
+case Right(v: Val)
+case Stars(vs: List[Val])
+}
+\end{lstlisting}}
+%{\small\lstinputlisting[language=Scala,numbers=none,linebackgroundcolor=
+%                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
+%{../progs/app01.scala}}
+%
+%
+%{\small\lstinputlisting[language=Scala,numbers=none,linebackgroundcolor=
+%                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
+%{../progs/app02.scala}}
 Graphically the algorithm by Sulzmann \& Lu can be illustrated
 by the picture in Figure~\ref{Sulz} where the path from the
 left to the right involving $\der/\nullable$ is the first phase
 the second phase. This picture shows the steps required when a
 regular expression, say $r_1$, matches the string $abc$. We
 first build the three derivatives (according to $a$, $b$ and
 $c$). We then use $\nullable$ to find out whether the resulting
 regular expression can match the empty string. If yes, we call
-the function $\textit{mkeps}$. The $\textit{mkeps}$ function calculates a value for how a regular
+the function $\textit{mkeps}$. This function calculates a value for how a regular
 expression has matched the empty string. Its definition
 is as follows:
 \begin{figure}[t]
 \begin{center}
 \noindent If there had been a $\ONE$ on the left, then
 $\textit{mkeps}$ would have returned something of the form
 $\Left(\ldots)$. The point is that from this value we can
 directly read off which part of $r_4$ matched the empty
 string: take the right-alternative first, and then the
-right-alternative again, then you got to the $\ONE$.
+right-alternative again, then you got to the $\ONE$ (indicated by
+\textit{Empty}).
 Next we have to ``inject'' the last character, that is $c$ in
 the running example, into this value $v_4$ in order to
 calculate how $r_3$ could have matched the string $c$.
 For this we call injection with $\inj(r_3, c, v_4)$.
 \begin{center}
 $v_1:\;Seq(Char(a), Seq(Char(b), Char(c)))$
 \end{center}
 \noindent This value corresponds to how the regular expression $r_1$,
-namely $a \cdot (b \cdot c)$, matched the string $abc$.
+namely $a \cdot (b \cdot c)$, matched the string $abc$. I let you
+think whether it is the expected value.
 There are a few auxiliary functions that are of interest
 when analysing this algorithm. One is called \emph{flatten},
 written $|\_|$, which extracts the string ``underlying'' a
 adding, back a character into the value.
 The definition of $\inj$ might still be very puzzling and each clause
 might look arbitrary, but there is in fact a relatively simple idea
 behind it. Ultimately the $\inj$-functions is determined by the
-derivative functions. For this consider one of the ``squares'' from
+derivative function. For this consider one of the ``squares'' from
 Figure~\ref{Sulz}:
 \begin{center}
 \begin{tikzpicture}[scale=2,node distance=1.2cm,
 every node/.style={minimum size=7mm}]
 $(x:r)$, where $x$ is just an identifier (in my implementation
 a plain string) and $r$ is a regular expression. A record will
 be regarded as a regular expression. The extended definition
 in Scala therefore looks as follows:
-{\small\lstinputlisting[language=Scala, numbers=none,linebackgroundcolor=
+{\small%
-{\ifodd\value{lstnumber}\color{capri!3}\fi}]
+\begin{lstlisting}[language=Scala,numbers=none]
-{../progs/app03.scala}}
+enum Rexp {
+case ZERO
+case ONE
+case CHAR(c: Char)
+case ALT(r1: Rexp, r2: Rexp)
+case SEQ(r1: Rexp, r2: Rexp)
+case STAR(r: Rexp)
+case RECD(label: String, r: Rexp)
+}
+\end{lstlisting}}
 \noindent Since we regard records as regular expressions we
 need to extend the functions $\nullable$ and $\der$. Similarly
 $\textit{mkeps}$ and $\inj$ need to be extended. This means we also need
 to extend the definition of values, which in Scala looks as
 follows:
-{\small\lstinputlisting[language=Scala, numbers=none,linebackgroundcolor=
+{\small%
-{\ifodd\value{lstnumber}\color{capri!3}\fi}]
+\begin{lstlisting}[language=Scala,numbers=none]
-{../progs/app04.scala}}
+enum Val {
+case Empty
+case Chr(c: Char)
+case Sequ(v1: Val, v2: Val)
+case Left(v: Val)
+case Right(v: Val)
+case Stars(vs: List[Val])
+case Rec(label: String, v: Val)
+}
+\end{lstlisting}}
 \noindent Let us now look at the purpose of records more
 closely and let us return to our question whether the string
 terminated in a $b$ or $c$. We can do this as follows: we
 annotate the regular expression $ab + ac$ with a record
 \noindent This regular expression can still only recognise
 the strings $ab$ and $ac$, but we can now use a function
 that takes a value and returns all records. I call this
 function \emph{env} for environment\ldots it builds a list
-of identifiers associated with a string. This function
+of labels associated with a string. This function
 can be defined as follows:
 \begin{center}
 \begin{tabular}{lcl}
 $env(Empty)$     & $\dn$ & $[]$\\
 out all underlying strings where a record is given. Since
 there can be more than one, the environment will potentially
 contain many ``records''. If we now postprocess the value
 calculated by $lex$ extracting all records using $env$, we can
 answer the question whether the last element in the string was
-an $b$ or a $c$. Lets see this in action: if we use $a\cdot b
+an $b$ or a $c$. Lets see this in action: if we use the regular expression
-+ a\cdot c$ and $ac$ the calculated value will be
+$a\cdot b + a\cdot c$ and the string$ac$, then the calculated value of \textit{lex}
+will be
 \begin{center}
 $Right(Seq(Char(a), Char(c)))$
 \end{center}
 \noindent If we use instead $a\cdot (x:b) + a\cdot (x:c)$ and
-use the $env$ function to extract the recording for
+use the $env$ function to extract what is recorded for
 $x$ we obtain
 \begin{center}
 $[(x:c)]$
 \end{center}
 \noindent If we had given the string $ab$ instead, then the
-record would have been $[(x:b)]$. The fun starts if we
+record would have been $[(x:b)]$. Note that $x$ is just an arbitrary
+label we use to identify a place in the regular expression
+where we are interested in what substring has been matched. The fun starts if we
 iterate this. Consider the regular expression
 \begin{center}
 $(a\cdot (x:b) + a\cdot (y:c))^*$
 \end{center}
 (b, (Begin + End))\;+ \\
 (w, WhiteSpaces)
 \end{array}\right)^{\mbox{\LARGE{}*}}$
 \end{center}
-\noindent and ask the algorithm by Sulzmann \& Lu to lex, say
+\noindent The string $k$ stands for keywords, $i$ for identifiers, $o$ for
-the following string
+operations and so on. These labels essentially allow us to classify what
+strings we match with each alternative.  Then we can ask the algorithm
+by Sulzmann \& Lu to lex the following string
 \begin{center}\large
 \code{"if true then then 42 else +"}
 \end{center}

changeset 1020	36799f7b9702
parent 940	1c1fbf45a03c
child 1021	5990e03b2ba8