updated
authorChristian Urban <christian.urban@kcl.ac.uk>
Tue, 03 Oct 2023 14:29:12 +0100
changeset 937 dc5ab66b11cc
parent 936 0b5f06539a84
child 938 91c20364402b
updated
handouts/ho05.pdf
handouts/ho05.tex
handouts/ho06.pdf
handouts/ho06.tex
hws/build.sc
hws/hw01.pdf
hws/hw02.pdf
hws/hw03.pdf
hws/hw03.tex
hws/hw04.pdf
hws/hw05.pdf
hws/hw05.tex
hws/hw06.pdf
hws/hw07.pdf
hws/hw08.pdf
hws/hw09.pdf
langs.sty
progs/parser-combinators/comb1.sc
Binary file handouts/ho05.pdf has changed
--- a/handouts/ho05.tex	Mon Oct 02 23:10:56 2023 +0100
+++ b/handouts/ho05.tex	Tue Oct 03 14:29:12 2023 +0100
@@ -84,7 +84,7 @@
 \noindent
 from natural languages were the meaning of \emph{flies} depends on the
 surrounding \emph{context} are avoided as much as possible. Here is
-an interesting video about C++ being not a context-free language
+an interesting video about C++ not being a context-free language
 
 \begin{center}
 \url{https://www.youtube.com/watch?v=OzK8pUu4UfM}
@@ -99,7 +99,7 @@
 \end{plstx}
 
 \noindent 
-or a grammar for recognising strings consisting of ones is
+or a grammar for recognising strings consisting of ones (at least one) is
 
 \begin{plstx}[margin=3cm]
 : \meta{O} ::= 1 \cdot  \meta{O} 
@@ -378,10 +378,11 @@
 \end{plstx}
 
 \noindent
-In this grammar all $\meta{E}$xpressions, $\meta{T}$erms and $\meta{F}$actors
-are in some way protected from being left-recusive. For example if you
-start $\meta{E}$ you can derive another one by going through $\meta{T}$, then 
-$\meta{F}$, but then $\meta{E}$ is protected by the open-parenthesis.
+In this grammar all $\meta{E}$xpressions, $\meta{T}$erms and
+$\meta{F}$actors are in some way protected from being
+left-recusive. For example if you start $\meta{E}$ you can derive
+another one by going through $\meta{T}$, then $\meta{F}$, but then
+$\meta{E}$ is protected by the open-parenthesis in the last rule.
 
 \subsection*{Removing $\epsilon$-Rules and CYK-Algorithm}
 
@@ -395,7 +396,7 @@
 \noindent
 The transformation made the original grammar non-left-recursive, but at
 the expense of introducing an $\epsilon$ in the second rule. Having an
-explicit $\epsilon$-rule is annoying to, not in terms of looping, but in
+explicit $\epsilon$-rule is annoying, not in terms of looping, but in
 terms of efficiency. The reason is that the $\epsilon$-rule always
 applies but since it recognises the empty string, it does not make any
 progress with recognising a string. Better are rules like $( \cdot
@@ -455,14 +456,15 @@
 \noindent
 I let you think about whether this grammar can still recognise all
 binary numbers and whether this grammar is non-left-recursive. The
-precise statement for the transformation of removing $\epsilon$-rules is
-that if the original grammar was able to recognise only non-empty
+precise statement for the transformation of removing $\epsilon$-rules
+is that if the original grammar was able to recognise only non-empty
 strings, then the transformed grammar will be equivalent (matching the
 same set of strings); if the original grammar was able to match the
 empty string, then the transformed grammar will be able to match the
-same strings, \emph{except} the empty string. So the  $\epsilon$-removal
-does not preserve equivalence of grammars, but the small defect with the
-empty string is not important for practical purposes.
+same strings, \emph{except} the empty string. So the
+$\epsilon$-removal does not preserve equivalence of grammars in
+general, but the small defect with the empty string is not important
+for practical purposes.
 
 So why are these transformations all useful? Well apart from making the 
 parser combinators work (remember they cannot deal with left-recursion and
@@ -563,10 +565,10 @@
 \noindent
 The last row contains the information about all words and their
 corresponding non-terminals. For example the field for \texttt{trains}
-contains the information $\meta{N}$ and $\meta{V}$ because it can be a
+contains the information $\meta{N}$ and $\meta{V}$ because \texttt{trains} can be a
 ``verb'' and a ``noun'' according to the grammar.  The row above,
 let's call the corresponding fields 5a to 5e, contains information
-about 2-word parts of the sentence, namely
+about \underline{2-word} parts of the sentence, namely
 
 \begin{center}
 \begin{tabular}{llll}
Binary file handouts/ho06.pdf has changed
--- a/handouts/ho06.tex	Mon Oct 02 23:10:56 2023 +0100
+++ b/handouts/ho06.tex	Tue Oct 03 14:29:12 2023 +0100
@@ -37,10 +37,11 @@
 Given the extended effort we have spent implementing a lexer in order
 to generate lists of tokens, it might be surprising that in what
 follows we shall often use strings as input, rather than lists of
-tokens. This is for making the explanation more lucid and for quick
-examples. It does not make our previous work on lexers obsolete
+tokens. This is for making the explanation more lucid and ensure the
+examples are simple. It does not make our previous work on lexers obsolete
 (remember they transform a string into a list of tokens). Lexers will
-still be needed for building a somewhat realistic compiler.
+still be needed for building a somewhat realistic compiler. See also
+a question in the homework regarding this issue.
 
 As mentioned above, parser combinators are relatively agnostic about what
 kind of input they process. In my Scala code I use the following
@@ -53,14 +54,16 @@
 \noindent That is they take as input something of type \texttt{I} and
 return a set of pairs of type \texttt{Set[(T, I)]}. Since the input
 needs to be of ``sequence-kind'', I actually have to often write
-\texttt{I <\% Seq[\_]} for the input type. This ensures the
-input is a subtype of Scala sequences. The first component of the
-generated pairs corresponds to what the parser combinator was able to
-parse from the input and the second is the unprocessed, or
-leftover, part of the input (therefore the type of this unprocessed part is
-the same as the input). A parser combinator might return more than one
-such pair; the idea is that there are potentially several ways of how
-to parse the input.  As a concrete example, consider the string
+\code{(using is: I => Seq[_])} for the input type. This ensures the
+input is a subtype of Scala sequences.\footnote{This is a new feature
+  in Scala 3 and is about type-classes, meaning if you use Scala 2 you will have difficulties
+  with running my code.} The first component of the generated pairs
+corresponds to what the parser combinator was able to parse from the
+input and the second is the unprocessed, or leftover, part of the
+input (therefore the type of this unprocessed part is the same as the
+input). A parser combinator might return more than one such pair; the
+idea is that there are potentially several ways of how to parse the
+input.  As a concrete example, consider the string
 
 \begin{center}
 \tt\Grid{iffoo\VS testbar}
@@ -108,12 +111,12 @@
 
 \begin{center}
 \begin{lstlisting}[language=Scala]
-abstract class Parser[I, T] {
-  def parse(in: I) : Set[(T, I)]
+abstract class Parser[I, T](using is: I => Seq[_])  {
+  def parse(in: I): Set[(T, I)]  
 
   def parse_all(in: I) : Set[T] =
-    for ((head, tail) <- parse(in); if (tail.isEmpty)) 
-      yield head
+    for ((hd, tl) <- parse(in); 
+        if is(tl).isEmpty) yield hd
 }
 \end{lstlisting}
 \end{center}
@@ -131,9 +134,9 @@
 behaviour can be described as follows:
 
 \begin{itemize}
-\item If the head of the input string starts with a $c$, then return 
+\item If the head of the input string $s$ starts with a $c$, then return 
   the set
-  \[\{(c, \textit{tail of}\; s)\}\]
+  \[\{(c, \textit{tail-of-}s)\}\]
   where \textit{tail of} 
   $s$ is the unprocessed part of the input string.
 \item Otherwise return the empty set $\{\}$.	
@@ -147,17 +150,17 @@
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 case class CharParser(c: Char) extends Parser[String, Char] {
-  def parse(in: String) = 
-    if (in.head == c) Set((c, in.tail)) else Set()
+  def parse(s: String) = 
+    if (s != "" && s.head == c) Set((c, s.tail)) else Set()
 }
 \end{lstlisting}
 \end{center}
 
-\noindent You can see \texttt{parse} tests whether the
-first character of the input string \texttt{in} is equal to
+\noindent You can see \texttt{parse} tests here whether the
+first character of the input string \texttt{s} is equal to
 \texttt{c}. If yes, then it splits the string into the recognised part
-\texttt{c} and the unprocessed part \texttt{in.tail}. In case
-\texttt{in} does not start with \texttt{c} then the parser returns the
+\texttt{c} and the unprocessed part \texttt{s.tail}. In case
+\texttt{s} does not start with \texttt{c} then the parser returns the
 empty set (in Scala \texttt{Set()}). Since this parser recognises
 characters and just returns characters as the processed part, the
 output type of the parser is \texttt{Char}.
@@ -169,7 +172,7 @@
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily,numbers=none]
 case object NumParser extends Parser[List[Token], Int] {
   def parse(ts: List[Token]) = ts match {
-    case Num_token(s)::ts => Set((s.toInt, ts)) 
+    case Num_token(s)::rest => Set((s.toInt, rest)) 
     case _ => Set ()
   }
 }
@@ -186,10 +189,9 @@
 converts it into an integer. The hope is that the lexer did its work
 well and this conversion always succeeds. The consequence of this is
 that the output type for this parser is \texttt{Int}, not
-\texttt{Token}. Such a conversion would be needed if we want to
-implement a simple calculator program, because string-numbers need to
-be transformed into \texttt{Int}-numbers in order to do the
-calculations.
+\texttt{Token}. Such a conversion would be needed in our parser,
+because when we encounter a number in our program, we want to do
+some calculations based on integers, not strings (or tokens). 
 
 
 These simple parsers that just look at the input and do a simple
@@ -211,10 +213,11 @@
 \begin{center}
 \begin{lstlisting}[language=Scala, numbers=none]
 class AltParser[I, T]
-       (p: => Parser[I, T], 
-        q: => Parser[I, T]) extends Parser[I, T] {
-  def parse(in: I) = p.parse(in) ++ q.parse(in)
-}
+         (p: => Parser[I, T], 
+          q: => Parser[I, T])(using I => Seq[_])
+                                 extends Parser[I, T] {
+  def parse(in: I) = p.parse(in) ++ q.parse(in)   
+}    
 \end{lstlisting}
 \end{center}
 
@@ -225,7 +228,7 @@
 parsers need to be able to process input of type \texttt{I} and return
 in \texttt{parse} the same output type \texttt{Set[(T,
   I)]}.\footnote{There is an interesting detail of Scala, namely the
-  \texttt{=>} in front of the types of \texttt{p} and \texttt{q}. They
+  \texttt{=>} in front of the types of \texttt{p} and \texttt{q}. These arrows
   will prevent the evaluation of the arguments before they are
   used. This is often called \emph{lazy evaluation} of the
   arguments. We will explain this later.}  The alternative parser runs
@@ -236,7 +239,11 @@
 
 The alternative parser combinator allows us to construct a parser that
 parses either a character \texttt{a} or \texttt{b} using the
-\texttt{CharParser} shown above. For this we can write
+\texttt{CharParser} shown above. For this we can write\footnote{Note
+  that we cannot use a \texttt{case}-class for \texttt{AltParser}s
+  because of the problem with laziness and Scala quirks. Hating
+  \texttt{new} like the plague, we will work around this later with
+  some syntax tricks. ;o)}
 
 \begin{center}
 \begin{lstlisting}[language=Scala, numbers=none]
@@ -246,24 +253,25 @@
 
 \noindent Later on we will use Scala mechanism for introducing some
 more readable shorthand notation for this, like \texttt{p"a" ||
-  p"b"}. Let us look in detail at what this parser combinator produces
+  p"b"}. But first let us look in detail at what this parser combinator produces
 with some sample strings.
 
 \begin{center}
 \begin{tabular}{rcl}
 input strings & & output\medskip\\
-\texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{cde}})\right\}$\\
-\texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{cde}})\right\}$\\
+\texttt{\Grid{acde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}},\; \texttt{\Grid{cde}})\right\}$\\
+\texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}},\; \texttt{\Grid{cde}})\right\}$\\
 \texttt{\Grid{ccde}} & $\rightarrow$ & $\{\}$
 \end{tabular}
 \end{center}
 
-\noindent We receive in the first two cases a successful
-output (that is a non-empty set). In each case, either
-\pcode{a} or \pcode{b} is in the parsed part, and
-\pcode{cde} in the unprocessed part. Clearly this parser cannot
-parse anything with \pcode{ccde}, therefore the empty
-set is returned.
+\noindent We receive in the first two cases a successful output (that
+is a non-empty set). In each case, either \pcode{a} or \pcode{b} is in
+the parsed part, and \pcode{cde} in the unprocessed part. Clearly this
+parser cannot parse anything of the form \pcode{ccde}, therefore the
+empty set is returned in the last case. Observe that parser
+combinators only look at the beginning of the given input: they do not
+fish out something in the ``middle'' of the input.
 
 A bit more interesting is the \emph{sequence parser combinator}. Given
 two parsers, say again, $p$ and $q$, we want to apply first the input
@@ -298,7 +306,8 @@
 \begin{lstlisting}[language=Scala,numbers=none]
 class SeqParser[I, T, S]
        (p: => Parser[I, T], 
-        q: => Parser[I, S]) extends Parser[I, (T, S)] {
+        q: => Parser[I, S])(using I => Seq[_])
+                               extends Parser[I, (T, S)] {
   def parse(in: I) = 
     for ((output1, u1) <- p.parse(in); 
          (output2, u2) <- q.parse(u1)) 
@@ -312,7 +321,7 @@
 parser \texttt{p} on the input producing a set of pairs
 (\texttt{output1}, \texttt{u1}). The \texttt{u1} stands for the
 unprocessed parts left over by \texttt{p} (recall that there can be
-several such pairs). Let then \texttt{q} run on these unprocessed
+several such pairs).  Let then \texttt{q} run on these unprocessed
 parts producing again a set of pairs. The output of the sequence
 parser combinator is then a set containing pairs where the first
 components are again pairs, namely what the first parser could parse
@@ -339,13 +348,16 @@
 With the shorthand notation we shall introduce later for the sequence
 parser combinator, we can write for example \pcode{p"a" ~ p"b"}, which
 is the parser combinator that first recognises the character
-\texttt{a} from a string and then \texttt{b}. Let us look again at
-some examples of how this parser combinator processes some strings:
+\texttt{a} from a string and then \texttt{b}. (Actually, we will be
+able to write just \pcode{p"ab"} for such parsers, but it is good to
+  understand first what happens behind the scenes.) Let us look again
+  at some examples of how the sequence parser combinator processes some
+  strings:
 
 \begin{center}
 \begin{tabular}{rcl}
 input strings & & output\medskip\\
-\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{cde}})\right\}$\\
+\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}),\; \texttt{\Grid{cde}})\right\}$\\
 \texttt{\Grid{bacde}} & $\rightarrow$ & $\{\}$\\
 \texttt{\Grid{cccde}} & $\rightarrow$ & $\{\}$
 \end{tabular}
@@ -368,8 +380,8 @@
 \begin{center}
 \begin{tabular}{rcl}
 input strings & & output\medskip\\
-\texttt{\Grid{acde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\
-\texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\
+\texttt{\Grid{acde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{c}}),\; \texttt{\Grid{de}})\right\}$\\
+\texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{c}}),\; \texttt{\Grid{de}})\right\}$\\
 \texttt{\Grid{abde}} & $\rightarrow$ & $\{\}$
 \end{tabular}
 \end{center}
@@ -382,7 +394,7 @@
 \begin{center}
 \begin{tabular}{rcl}
 input strings & & output\medskip\\
-\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{(((\texttt{\Grid{a}},\texttt{\Grid{b}}), \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\
+\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{(((\texttt{\Grid{a}},\texttt{\Grid{b}}), \texttt{\Grid{c}}),\; \texttt{\Grid{de}})\right\}$\\
 \texttt{\Grid{abde}} & $\rightarrow$ & $\{\}$\\
 \texttt{\Grid{bcde}} & $\rightarrow$ & $\{\}$
 \end{tabular}
@@ -399,7 +411,7 @@
 \begin{center}
 \begin{tabular}{rcl}
 input strings & & output\medskip\\
-\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}},(\texttt{\Grid{b}}, \texttt{\Grid{c}})), \texttt{\Grid{de}})\right\}$\\
+\texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}},(\texttt{\Grid{b}}, \texttt{\Grid{c}})),\; \texttt{\Grid{de}})\right\}$\\
 \end{tabular}
 \end{center}
 
@@ -445,8 +457,14 @@
 nesting of the component parsers.
 
 
-Consider also carefully that constructing a parser such \pcode{p"a" ||
-  (p"a" ~ p"b")} will result in a typing error. The intention with this
+Consider also carefully that constructing a parser such
+
+\begin{center}
+\pcode{p"a" || (p"a" ~ p"b")}
+\end{center}
+
+\noindent
+will result in a typing error. The intention with this
 parser is that we want to parse either an \texttt{a}, or an \texttt{a}
 followed by a \texttt{b}. However, the first parser has as output type
 a single character (recall the type of \texttt{CharParser}), but the
@@ -457,17 +475,17 @@
 they are not in this case, there is a typing error.  We will see later
 how we can build this parser without the typing error.
 
-The next parser combinator, called \emph{semantic action}, does not
+The next parser combinator, called \emph{semantic action} or \emph{map-parser}, does not
 actually combine two smaller parsers, but applies a function to the result
 of a parser.  It is implemented in Scala as follows
 
 \begin{center}
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 class MapParser[I, T, S]
-         (p: => Parser[I, T], 
-          f: T => S) extends Parser[I, S] {
-  def parse(in: I) = 
-    for ((head, tail) <- p.parse(in)) yield (f(head), tail)
+        (p: => Parser[I, T], 
+         f: T => S)(using I => Seq[_]) extends Parser[I, S] {
+  def parse(in: I) =
+       for ((hd, tl) <- p.parse(in)) yield (f(hd), tl)
 }
 \end{lstlisting}
 \end{center}
@@ -476,7 +494,7 @@
 \noindent This parser combinator takes a parser \texttt{p} (with input
 type \texttt{I} and output type \texttt{T}) as one argument but also a
 function \texttt{f} (with type \texttt{T => S}). The parser \texttt{p}
-produces sets of type \texttt{Set[(T, I)]}. The semantic action
+produces sets of type \texttt{Set[(S, I)]}. The semantic action
 combinator then applies the function \texttt{f} to all the `processed'
 parser outputs. Since this function is of type \texttt{T => S}, we
 obtain a parser with output type \texttt{S}. Again Scala lets us
@@ -560,7 +578,7 @@
 
 \noindent
 produces \texttt{Set((123,abc))}. The problem is that \texttt{123} is
-still a string (the required double-quotes are not printed by
+still a string (the expected double-quotes are not printed by
 Scala). We want to convert this string into the corresponding
 \texttt{Int}. We can do this as follows using a semantic action
 
@@ -573,16 +591,19 @@
 \noindent
 The function in the semantic action converts a string into an
 \texttt{Int}. Now \texttt{parse} generates \texttt{Set((123,abc))},
-but this time \texttt{123} is an \texttt{Int}. Let us come back to
-semantic actions when we are going to implement actual context-free
-grammars.
+but this time \texttt{123} is an \texttt{Int}. Think carefully what
+the input and output type of the parser is without the semantic action
+adn what with the semantic action (the type of the function can
+already tell you). Let us come back to semantic actions when we are
+going to implement actual context-free grammars.
 
 \subsubsection*{Shorthand notation for parser combinators}
 
 Before we proceed, let us just explain the shorthand notation for
-parser combinators. Like for regular expressions, the shorthand notation
-will make our life much easier when writing actual parsers. We can define
-some implicits which allow us to write
+parser combinators. Like for regular expressions, the shorthand
+notation will make our life much easier when writing actual
+parsers. We can define some extensions\footnote{In Scala 2 this was
+  generically called as ``implicits''.} which allow us to write
 
 \begin{center}
 \begin{tabular}{ll}  
@@ -593,7 +614,7 @@
 \end{center}
 
 \noindent
-as well as to use string interpolations for specifying simple string parsers.
+We will also use the \texttt{p}-string-interpolation for specifying simple string parsers.
 
 The idea is that this shorthand notation allows us to easily translate
 context-free grammars into code. For example recall our context-free
@@ -610,7 +631,7 @@
 written as \texttt{p"a"}, \texttt{p"b"} and \texttt{p""}.
 
 
-\subsubsection*{How to build parsers using parser combinators?}
+\subsubsection*{How to build more interesting parsers using parser combinators?}
 
 The beauty of parser combinators is the ease with which they can be
 implemented and how easy it is to translate context-free grammars into
@@ -740,11 +761,11 @@
 \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 class AltParser[I, T]
        (p: => Parser[I, T], 
-        q: => Parser[I, T]) extends Parser[I, T] {...}
+        q: => Parser[I, T]) ... extends Parser[I, T] {...}
 
 class SeqParser[I, T, S]
        (p: => Parser[I, T], 
-        q: => Parser[I, S]) extends Parser[I, (T, S)] {...}
+        q: => Parser[I, S]) ... extends Parser[I, (T, S)] {...}
 \end{lstlisting}
 \end{center}
 
@@ -862,7 +883,7 @@
 \noindent
 Recall what left-recursive means from Handout 5 and make sure you see
 why this grammar is \emph{non} left-recursive. This version of the grammar
-also deals with the fact that $*$ should have a higher precedence. This does not
+also deals with the fact that $*$ should have a higher precedence than $+$ and $-$. This does not
 affect which strings this grammar can recognise, but in which order we are going
 to evaluate any arithmetic expression. We can translate this grammar into
 parsing combinators as follows:
--- a/hws/build.sc	Mon Oct 02 23:10:56 2023 +0100
+++ b/hws/build.sc	Tue Oct 03 14:29:12 2023 +0100
@@ -33,6 +33,12 @@
 }
 
 @main
+def all() = {
+  make_sols()
+  make()
+}
+
+@main
 def hg() = {
   println(os.proc("hg", "commit", "-m texupdate", files ++ pdf_files).call())
   println(os.proc("hg", "push").call())
Binary file hws/hw01.pdf has changed
Binary file hws/hw02.pdf has changed
Binary file hws/hw03.pdf has changed
--- a/hws/hw03.tex	Mon Oct 02 23:10:56 2023 +0100
+++ b/hws/hw03.tex	Tue Oct 03 14:29:12 2023 +0100
@@ -69,8 +69,13 @@
 
   What is the language recognised by this automaton?
 
+  
+
 \item Give a non-deterministic finite automaton that can
-      recognise the language $L(a\cdot (a + b)^* \cdot c)$.
+  recognise the language $L(a\cdot (a + b)^* \cdot c)$.
+
+  \solution{It is already possible to just read off the automaton without
+  going through Thompson.}
 
 \item Given a deterministic finite automaton $A(\varSigma, Q, Q_0, F,
       \delta)$, define which language is recognised by this
@@ -232,7 +237,7 @@
   automaton (DFA) that can recognise the same language
   as the NFA maximal need?
 
-  \solution{ $2^n$ in the worst-case and for some regexes the worst case
+  \solution{$2^n$ in the worst-case and for some regexes the worst case
     cannot be avoided. 
 
     Other comments: $r^{\{n\}}$ can only be represented as $n$
@@ -241,6 +246,16 @@
     represented as automaton.
   }
 
+\item Rust implements a non-backtracking regular expression matcher
+  based on the classic idea of DFAs. Still, some regular expressions
+  take a surprising amount of time for matching problems. Explain the
+  problem?
+
+  \solution{The problem has to do with bounded regular expressions,
+    such as $r^{\{n\}}$. They are represented as $n$-copies of some
+    automaton for $r$. If $n$ is large, then this can result in a
+    large memory-footprint and slow runtime.}
+
 \item Prove that for all regular expressions $r$ we have
       
 \begin{center} 
Binary file hws/hw04.pdf has changed
Binary file hws/hw05.pdf has changed
--- a/hws/hw05.tex	Mon Oct 02 23:10:56 2023 +0100
+++ b/hws/hw05.tex	Tue Oct 03 14:29:12 2023 +0100
@@ -118,7 +118,30 @@
 
 Describe which language is generated by this grammar.
   
+\item Remember we have specified identifiers with regular expressions as
+  strings that start with a letter followed by letters, digits and
+  underscores. This can also be specified by a grammar rule or rules.
+  What would the rule(s) look like for identifiers? 
 
+  \solution{
+  \begin{plstx}[margin=1cm]
+  : \meta{Id\/} ::= \meta{Let\/}\cdot \meta{R}\\
+  : \meta{Let\/} ::= a \;\mid\; \dots \;\mid\; z\\
+  : \meta{Dig\/} ::= 0 \;\mid\; \dots \;\mid\; 9\\
+  : \meta{R\/} ::= \meta{Let\/} \cdot \meta{R\/} \;\mid\;
+                  \meta{Dig\/} \cdot \meta{R\/} \;\mid\;
+                  $\_$ \cdot \meta{R\/} \;\mid\; \epsilon\\
+  \end{plstx}  
+}
+
+\item If we specify keywords, identifiers (see above) and programs
+  by grammar rules, are there any problems you need to be careful
+  about when using a parser for identifying tokens?
+
+  \solution{Parsers do not have the POSIX rules (e.g.~longest munch
+    rule) built in. I am not aware that any parser does this out of
+    the box and you would need to build in such constraints into the
+    grammar rules or parsing mechanism.}
 
 \item {\bf(Optional)} Recall the definitions for $Der$ and $der$
       from the lectures. Prove by induction on $r$ the
Binary file hws/hw06.pdf has changed
Binary file hws/hw07.pdf has changed
Binary file hws/hw08.pdf has changed
Binary file hws/hw09.pdf has changed
--- a/langs.sty	Mon Oct 02 23:10:56 2023 +0100
+++ b/langs.sty	Tue Oct 03 14:29:12 2023 +0100
@@ -42,7 +42,7 @@
     new,null,object,override,package,%
     private,protected,requires,return,sealed,%
     super,this,then,throw,trait,true,try,%
-    type,val,var,while,with,yield,write,read},%
+    type,using,val,var,while,with,yield,write,read},%
   literate={==>}{{\mbox{\color{codepurple}{\textbf{\texttt{==>}}}}}}2,%
   otherkeywords={=>,<-,<\%,<:,>:,\#},%
   sensitive=true,%
--- a/progs/parser-combinators/comb1.sc	Mon Oct 02 23:10:56 2023 +0100
+++ b/progs/parser-combinators/comb1.sc	Tue Oct 03 14:29:12 2023 +0100
@@ -252,4 +252,4 @@
 println(E2.parse("((((1))))"))
 println(E2.parse("((((((1))))))"))
 println(E2.parse("(((((((1)))))))"))
-println(E2.parse("((((((((1))))))))"))
\ No newline at end of file
+println(E2.parse("((((((((1))))))))"))