updated
authorChristian Urban <christian.urban@kcl.ac.uk>
Fri, 31 Oct 2025 19:26:11 +0000
changeset 1020 36799f7b9702
parent 1019 f71399fe3fdc
child 1021 5990e03b2ba8
updated
handouts/ho04.pdf
handouts/ho04.tex
progs/lexer/lex.sc
progs/lexer/lexer.sc
slides/slides05.pdf
slides/slides05.tex
Binary file handouts/ho04.pdf has changed
--- a/handouts/ho04.tex	Fri Oct 31 11:25:14 2025 +0000
+++ b/handouts/ho04.tex	Fri Oct 31 19:26:11 2025 +0000
@@ -91,14 +91,37 @@
 letters, whereas values start with a single upper-case
 character and the rest are lower-case letters.
  
-{\small\lstinputlisting[language=Scala,numbers=none,linebackgroundcolor=
-                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
-{../progs/app01.scala}}
+{\small%
+\begin{lstlisting}[language=Scala,numbers=none]
+ enum Rexp {
+   case ZERO
+   case ONE
+   case CHAR(c: Char)
+   case ALT(r1: Rexp, r2: Rexp)
+   case SEQ(r1: Rexp, r2: Rexp)
+   case STAR(r: Rexp)
+}
+\end{lstlisting}
 
- 
-{\small\lstinputlisting[language=Scala,numbers=none,linebackgroundcolor=
-                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
-{../progs/app02.scala}}
+\begin{lstlisting}[language=Scala,numbers=none]
+enum Val {
+   case Empty
+   case Chr(c: Char)
+   case Sequ(v1: Val, v2: Val)
+   case Left(v: Val)
+   case Right(v: Val)
+   case Stars(vs: List[Val])
+}
+\end{lstlisting}}
+
+%{\small\lstinputlisting[language=Scala,numbers=none,linebackgroundcolor=
+%                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
+%{../progs/app01.scala}}
+%
+% 
+%{\small\lstinputlisting[language=Scala,numbers=none,linebackgroundcolor=
+%                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
+%{../progs/app02.scala}}
 
 
 Graphically the algorithm by Sulzmann \& Lu can be illustrated
@@ -110,7 +133,7 @@
 first build the three derivatives (according to $a$, $b$ and
 $c$). We then use $\nullable$ to find out whether the resulting
 regular expression can match the empty string. If yes, we call
-the function $\textit{mkeps}$. The $\textit{mkeps}$ function calculates a value for how a regular
+the function $\textit{mkeps}$. This function calculates a value for how a regular
 expression has matched the empty string. Its definition
 is as follows:
 
@@ -235,7 +258,8 @@
 $\Left(\ldots)$. The point is that from this value we can
 directly read off which part of $r_4$ matched the empty
 string: take the right-alternative first, and then the
-right-alternative again, then you got to the $\ONE$.
+right-alternative again, then you got to the $\ONE$ (indicated by
+\textit{Empty}).
 
 Next we have to ``inject'' the last character, that is $c$ in
 the running example, into this value $v_4$ in order to
@@ -269,7 +293,8 @@
 \end{center}
 
 \noindent This value corresponds to how the regular expression $r_1$,
-namely $a \cdot (b \cdot c)$, matched the string $abc$.
+namely $a \cdot (b \cdot c)$, matched the string $abc$. I let you
+think whether it is the expected value.
 
 
 There are a few auxiliary functions that are of interest
@@ -307,7 +332,7 @@
 The definition of $\inj$ might still be very puzzling and each clause
 might look arbitrary, but there is in fact a relatively simple idea
 behind it. Ultimately the $\inj$-functions is determined by the
-derivative functions. For this consider one of the ``squares'' from
+derivative function. For this consider one of the ``squares'' from
 Figure~\ref{Sulz}:
 
 \begin{center}
@@ -775,9 +800,19 @@
 be regarded as a regular expression. The extended definition
 in Scala therefore looks as follows:
 
-{\small\lstinputlisting[language=Scala, numbers=none,linebackgroundcolor=
-                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
-{../progs/app03.scala}}
+{\small%
+\begin{lstlisting}[language=Scala,numbers=none]
+ enum Rexp {
+   case ZERO
+   case ONE
+   case CHAR(c: Char)
+   case ALT(r1: Rexp, r2: Rexp)
+   case SEQ(r1: Rexp, r2: Rexp)
+   case STAR(r: Rexp)
+   case RECD(label: String, r: Rexp)
+}
+\end{lstlisting}}
+
 
 \noindent Since we regard records as regular expressions we
 need to extend the functions $\nullable$ and $\der$. Similarly
@@ -785,9 +820,19 @@
 to extend the definition of values, which in Scala looks as
 follows:
 
-{\small\lstinputlisting[language=Scala, numbers=none,linebackgroundcolor=
-                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
-{../progs/app04.scala}}
+{\small%
+\begin{lstlisting}[language=Scala,numbers=none]
+enum Val {
+   case Empty
+   case Chr(c: Char)
+   case Sequ(v1: Val, v2: Val)
+   case Left(v: Val)
+   case Right(v: Val)
+   case Stars(vs: List[Val])
+   case Rec(label: String, v: Val)
+}
+\end{lstlisting}}
+
 
 \noindent Let us now look at the purpose of records more
 closely and let us return to our question whether the string
@@ -803,7 +848,7 @@
 the strings $ab$ and $ac$, but we can now use a function
 that takes a value and returns all records. I call this
 function \emph{env} for environment\ldots it builds a list
-of identifiers associated with a string. This function
+of labels associated with a string. This function
 can be defined as follows:
 
 \begin{center}
@@ -826,15 +871,16 @@
 contain many ``records''. If we now postprocess the value
 calculated by $lex$ extracting all records using $env$, we can
 answer the question whether the last element in the string was
-an $b$ or a $c$. Lets see this in action: if we use $a\cdot b
-+ a\cdot c$ and $ac$ the calculated value will be
+an $b$ or a $c$. Lets see this in action: if we use the regular expression 
+$a\cdot b + a\cdot c$ and the string$ac$, then the calculated value of \textit{lex} 
+will be
 
 \begin{center}
 $Right(Seq(Char(a), Char(c)))$
 \end{center}
 
 \noindent If we use instead $a\cdot (x:b) + a\cdot (x:c)$ and
-use the $env$ function to extract the recording for 
+use the $env$ function to extract what is recorded for 
 $x$ we obtain
 
 \begin{center}
@@ -842,7 +888,9 @@
 \end{center}
 
 \noindent If we had given the string $ab$ instead, then the
-record would have been $[(x:b)]$. The fun starts if we 
+record would have been $[(x:b)]$. Note that $x$ is just an arbitrary 
+label we use to identify a place in the regular expression 
+where we are interested in what substring has been matched. The fun starts if we 
 iterate this. Consider the regular expression 
 
 \begin{center}
@@ -926,8 +974,10 @@
       \end{array}\right)^{\mbox{\LARGE{}*}}$
 \end{center}
 
-\noindent and ask the algorithm by Sulzmann \& Lu to lex, say
-the following string
+\noindent The string $k$ stands for keywords, $i$ for identifiers, $o$ for 
+operations and so on. These labels essentially allow us to classify what 
+strings we match with each alternative.  Then we can ask the algorithm 
+by Sulzmann \& Lu to lex the following string
 
 \begin{center}\large
 \code{"if true then then 42 else +"}
--- a/progs/lexer/lex.sc	Fri Oct 31 11:25:14 2025 +0000
+++ b/progs/lexer/lex.sc	Fri Oct 31 19:26:11 2025 +0000
@@ -139,7 +139,7 @@
 
 
 
-println(lex(("ab" | "a") ~ (ONE | "b"), "ab".toList))
+println(lex(("ab" | "ab") ~ (ONE | "b"), "ab".toList))
 
 println(lex(STAR("aa" | "a"), "aaa".toList))
 
@@ -207,7 +207,7 @@
   println(s"test: $prog1")
   println(lexing(WHILE_REGS, prog1))
 
-  val prog2 = """read  n; write n"""  
+  val prog2 = """read  n; write n; write n"""  
   println(s"test: $prog2")
   println(lexing(WHILE_REGS, prog2))
 }
--- a/progs/lexer/lexer.sc	Fri Oct 31 11:25:14 2025 +0000
+++ b/progs/lexer/lexer.sc	Fri Oct 31 19:26:11 2025 +0000
@@ -43,9 +43,7 @@
 
 given Conversion[String, Rexp] = (s => charlist2rexp(s.toList))
 
-//extension (s: String) {
-//  def $ (r: Rexp) = RECD(s, r)
-//}
+
 extension (s: String) {
   def $ (r: Rexp) = RECD(s, r)
   def | (r: Rexp) = ALT(s, r)
@@ -81,7 +79,7 @@
     if (nullable(r1)) ALT(SEQ(der(c, r1), r2), der(c, r2))
     else SEQ(der(c, r1), r2)
   case STAR(r) => SEQ(der(c, r), STAR(r))
-  case RECD(_, r1) => der(c, r1)
+  case RECD(label, r1) => der(c, r1)
 }
 
 // extracts a string from a value
Binary file slides/slides05.pdf has changed
--- a/slides/slides05.tex	Fri Oct 31 11:25:14 2025 +0000
+++ b/slides/slides05.tex	Fri Oct 31 19:26:11 2025 +0000
@@ -188,7 +188,7 @@
 \normalsize
 parser input: a sequence of tokens\smallskip\\
 
-{\small\hspace{5mm}\code{key(read) lpar id(n) rpar semi}}\smallskip\\
+{\small\hspace{5mm}\code{key(read) lpar id(n) rpar}}\smallskip\\
 
 parser output: an abstract syntax tree\smallskip\\
 \footnotesize
@@ -677,7 +677,7 @@
   \end{mybox3}\bigskip
 
 
-  \hfill from the \href{http://www.computing.surrey.ac.uk/research/dsrg/fog/FogThesis.pdf}{PhD thesis} by Willink (2001)
+  \hfill from the \href{https://urbanchr.github.io/cfl/willing-cpp-thesis.pdf}{PhD thesis} by Willink (2001)
   
   \small
   \begin{center}