afl-material: changeset 960:c7009356ddd8

--- a/XX	Wed Feb 21 09:14:12 2024 +0000
+++ b/XX	Wed May 29 13:25:30 2024 +0100
@@ -3,3 +3,10 @@
 
 Pygments
 
+
+======================
+video about disassembly and the backdoor in xz
+
+https://www.youtube.com/watch?v=Q6ovtLdSbEA&t=4705s
+
+Deep Dive into XZ Utils Backdoor - Columbia Engineering, Advanced Systems Programming Guest Lecture
\ No newline at end of file

Binary file handouts/ho07.pdf has changed

--- a/handouts/ho07.tex	Wed Feb 21 09:14:12 2024 +0000
+++ b/handouts/ho07.tex	Wed May 29 13:25:30 2024 +0100
@@ -24,44 +24,44 @@
 that is often pretty fast. This way of producing code has also the
 advantage that the virtual machine takes care of things a compiler would
 normally need to take care of (hairy things like explicit memory
-management). 
+management).
 
 As a first example in this module we will implement a compiler for the
 very simple WHILE-language that we parsed in the last lecture. The
 compiler will target the Java Virtual Machine (JVM), but not directly.
 Pictorially the compiler will work as follows:
- 
+
 \begin{center}
   \begin{tikzpicture}[scale=1,font=\bf,
                       node/.style={
                       rectangle,rounded corners=3mm,
-                      ultra thick,draw=black!50,minimum height=18mm, 
+                      ultra thick,draw=black!50,minimum height=18mm,
                       minimum width=20mm,
                       top color=white,bottom color=black!20}]
 
-  \node (0) at (-3,0) {};  
+  \node (0) at (-3,0) {};
   \node (A) at (0,0) [node,text width=1.6cm,text centered] {our compiler};
   \node (B) at (3.5,0) [node,text width=1.6cm,text centered] {Jasmin / Krakatau};
   \node (C) at (7.5,0) [node] {JVM};
- 
-  \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A); 
-  \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B); 
-  \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C); 
+
+  \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A);
+  \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B);
+  \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C);
   \end{tikzpicture}
   \end{center}
 
 \noindent
 The input will be WHILE-programs; the output will be assembly files
-(with the file extension .j). Assembly files essentially contain
+(with the file extension *.j). Assembly files essentially contain
 human-readable low-level code, meaning they are not just bits and
 bytes, but rather something you can read and understand---with a bit
 of practice of course. An \emph{assembler} will then translate the
 assembly files into unreadable class- or binary-files the JVM or CPU
-can run.  Unfortunately, the Java ecosystem does not come with an
+can run, i.e.~bits and bytes.  Unfortunately, the Java ecosystem does not come with an
 assembler which would be handy for our compiler-endeavour (unlike
 Microsoft's Common Language Infrastructure for the .Net platform which
 has an assembler out-of-the-box). As a substitute we shall use the
-3rd-party programs Jasmin or Krakatau (Jasmin is the preferred
+3rd-party program Jasmin, or alternatively Krakatau (Jasmin is the preferred
 option---a \texttt{jasmin.jar}-file is available on KEATS):
 
 \begin{itemize}
@@ -75,7 +75,7 @@
 readable by humans, as opposed to class-files which are pretty much just
 (horrible) zeros and ones. Jasmin (respectively Krakatau) will then take
 our assembly files as input and generate the corresponding class-files for
-us. 
+us.
 
 What is good about the JVM is that it is a stack-based virtual machine,
 a fact which will make it easy to generate code for arithmetic
@@ -85,7 +85,7 @@
 \begin{lstlisting}[language=JVMIS,numbers=none]
 ldc 1
 ldc 2
-iadd 
+iadd
 \end{lstlisting}
 
 \noindent The first instruction loads the constant $1$ onto the stack,
@@ -134,14 +134,14 @@
 generates the instructions
 
 \begin{lstlisting}[language=JVMIS,numbers=none]
-ldc 1 
-ldc 2 
-ldc 3 
-imul 
-ldc 4 
-ldc 3 
-isub 
-iadd 
+ldc 1
+ldc 2
+ldc 3
+imul
+ldc 4
+ldc 3
+isub
+iadd
 iadd
 \end{lstlisting}
 
@@ -151,7 +151,7 @@
 will be an important convention we always observe in our compiler. Note,
 that a different bracketing of the expression, for example $(1 + (2 *
 3)) + (4 - 3)$, produces a different abstract syntax tree and thus also
-a different list of instructions. 
+a different list of instructions.
 
 Generating code in this post-order-traversal fashion is rather easy to
 implement: it can be done with the following recursive
@@ -163,11 +163,11 @@
 $\textit{compile}(n)$ & $\dn$ & $\instr{ldc}\; n$\\
 $\textit{compile}(a_1 + a_2)$ & $\dn$ &
 $\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \instr{iadd}$\\
-$\textit{compile}(a_1 - a_2)$ & $\dn$ & 
+$\textit{compile}(a_1 - a_2)$ & $\dn$ &
 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{isub}$\\
-$\textit{compile}(a_1 * a_2)$ & $\dn$ & 
+$\textit{compile}(a_1 * a_2)$ & $\dn$ &
 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{imul}$\\
-$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & 
+$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ &
 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{idiv}$\\
 \end{tabular}
 \end{center}
@@ -184,9 +184,9 @@
 iload $index$
 \end{lstlisting}
 
-\noindent 
-which places the content of the local variable $index$ onto 
-the stack. Storing the top of the stack into a local variable 
+\noindent
+which places the content of the local variable $index$ onto
+the stack. Storing the top of the stack into a local variable
 can be done by the instruction
 
 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
@@ -207,13 +207,13 @@
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\
-$\textit{compile}(a_1 + a_2, E)$ & $\dn$ & 
+$\textit{compile}(a_1 + a_2, E)$ & $\dn$ &
 $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\
 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\
 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\
-$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & 
+$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ &
 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\
 $\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\
 \end{tabular}
@@ -254,7 +254,7 @@
 
 \begin{center}
 \begin{tabular}{lcl}
-$\textit{compile}(x := a, E)$ & $\dn$ & 
+$\textit{compile}(x := a, E)$ & $\dn$ &
 $(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$
 \end{tabular}
 \end{center}
@@ -280,7 +280,7 @@
 istore $n_x$
 \end{lstlisting}
 
-\noindent 
+\noindent
 where $n_x$ is the index (or pointer to the memory) for the variable
 $x$. The Scala code for looking-up the index for the variable is as follow:
 
@@ -367,8 +367,8 @@
 \end{center}
 
 \noindent where we now need a conditional jump (if the
-if-condition is false) from the end of the code for the 
-boolean to the beginning of the instructions $cs_2$. Once we 
+if-condition is false) from the end of the code for the
+boolean to the beginning of the instructions $cs_2$. Once we
 are finished with running $cs_2$ we can continue with whatever
 code comes after the if-statement.
 
@@ -376,33 +376,35 @@
 where the jump should go. Since we are generating assembly
 code for the JVM, we do not actually have to give (numeric)
 addresses, but can just attach (symbolic) labels to our code.
-These labels specify a target for a jump. Therefore the labels
+These labels specify a target for a jump---essentially they mark
+a location in our code. Therefore the labels
 need to be unique, as otherwise it would be ambiguous where a
-jump should go to. A label, say \pcode{L}, is attached to assembly 
+jump should go to. A label, say \pcode{L}, is attached to assembly
 code like
 
 \begin{lstlisting}[mathescape,numbers=none]
+  $\textit{instr\_1}$
 L:
-  $\textit{instr\_1}$
   $\textit{instr\_2}$
+  $\textit{instr\_3}$
     $\vdots$
 \end{lstlisting}
- 
+
 \noindent where the label needs to be followed by a colon. The task of
 the assembler (in our case Jasmin or Krakatau) is to resolve the labels
 to actual (numeric) addresses, for example jump 10 instructions forward,
-or 20 instructions backwards.
- 
-Recall the ``trick'' with compiling boolean expressions: the 
+or 20 instructions backwards, or jump to this particular address.
+
+Recall the ``trick'' with compiling boolean expressions: the
 \textit{compile}-function for boolean expressions takes three
-arguments: an abstract syntax tree, an environment for 
-variable indices and also the label, $lab$, to where an conditional 
-jump needs to go. The clause for the expression $a_1 = a_2$, 
+arguments: an abstract syntax tree, an environment for
+variable indices and also the label, which I called $lab$, to where an conditional
+jump needs to go. The clause for the expression $a_1 = a_2$,
 for example, is as follows:
 
 \begin{center}
 \begin{tabular}{lcl}
-$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ 
+$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\
 \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$}
 \end{tabular}
 \end{center}
@@ -426,7 +428,7 @@
 leave it to you to extend the \textit{compile}-function for the other
 boolean expressions. Note that we need to jump whenever the boolean is
 \emph{not} true, which means we have to ``negate'' the jump
-condition---equals becomes not-equal, less becomes greater-or-equal. 
+condition---equals becomes not-equal, less becomes greater-or-equal.
 Other jump instructions for boolean operators are
 
 \begin{center}
@@ -444,13 +446,13 @@
 ``upside-down-inside-out'' way of generating code still seems the most
 convenient.
 
-We are now ready to give the compile function for 
-if-statements---remember this function returns for statements a 
+We are now ready to give the compile function for
+if-statements---remember this function returns for statements a
 pair consisting of the code and an environment:
 
 \begin{center}
 \begin{tabular}{lcl}
-$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ 
+$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\
 \multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\
 \multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\
 \multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\
@@ -468,18 +470,18 @@
 for the jump addresses (just before the else-branch and just
 after). In the next two lines we generate the instructions for
 the two branches, $is_1$ and $is_2$. The final code will
-be first the code for $b$ (including the label 
+be first the code for $b$ (including the label
 just-before-the-else-branch), then the \pcode{goto} for after
-the else-branch, the label $L_\textit{ifelse}$, followed by
-the instructions for the else-branch, followed by the 
-after-the-else-branch label. Consider for example the 
+the else-branch, the label $L_\textit{--ifelse}$, followed by
+the instructions for the else-branch, followed by the
+after-the-else-branch label. Consider for example the
 if-statement:
 
 \begin{lstlisting}[mathescape,numbers=none,language=While]
 if 1 == 1 then x := 2 else y := 3
 \end{lstlisting}
 
-\noindent 
+\noindent
 The generated code is as follows:
 
 \begin{lstlisting}[language=JVMIS,mathescape,numbers=left]
@@ -496,16 +498,16 @@
 \end{lstlisting}
 
 \begin{tikzpicture}[remember picture,overlay]
-  \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) 
+  \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm)
            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east);
-  \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) 
+  \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm)
            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east);
 \end{tikzpicture}
 
 \noindent The first three lines correspond to the the boolean
 expression \texttt{1 == 1}. The jump for when this boolean expression
 is false is in Line~3. Lines 4-6 corresponds to the if-branch;
-the else-branch is in Lines 8 and 9. 
+the else-branch is in Lines 8 and 9.
 
 Note carefully how the environment $E$ is threaded through the recursive
 calls of \textit{compile}. The function receives an environment $E$, but
@@ -515,9 +517,9 @@
 for the second call to \textit{compile}. $E''$ is also the environment
 that needs to be returned as part of the answer.
 
-The compilation of the while-loops, say 
-\pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
-the condition is true and we need to do another iteration, 
+The compilation of the while-loops of the form
+\pcode{while} $b$ \pcode{do} $cs$ is very similar. In case
+the condition is true and we need to do another iteration,
 and the control-flow needs to be as follows
 
 \begin{center}
@@ -568,19 +570,19 @@
 
 \noindent Again we can use the \textit{compile}-function for
 boolean expressions to insert the appropriate jump to the
-end of the loop (label $L_{wend}$ below).
+end of the loop (label $L_\textit{--wend}$ below).
 
 \begin{center}
 \begin{tabular}{lcl}
-$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ 
-\multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\
-\multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\
+$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\
+\multicolumn{3}{l}{$\qquad L_\textit{--wbegin}\;$ (fresh label)}\\
+\multicolumn{3}{l}{$\qquad L_\textit{--wend}\;$ (fresh label)}\\
 \multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\
-\multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\
+\multicolumn{3}{l}{$\qquad(L_\textit{--wbegin}$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_\textit{--wend})$}\\
 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_\textit{--wbegin}$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{--wend}, E')$}\\
 \end{tabular}
 \end{center}
 
@@ -605,16 +607,16 @@
    goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$
 L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$
 \end{lstlisting}
- 
+
 \begin{tikzpicture}[remember picture,overlay]
-  \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) 
+  \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm)
            -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east);
-  \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) 
+  \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm)
            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
 \end{tikzpicture}
 
 \noindent
-As said, I leave it to you to decide whether the code implements
+As said, I leave it to you to check that the code really implements
 the usual controlflow of while-loops.
 
 Next we need to consider the WHILE-statement \pcode{write x}, which can
@@ -626,13 +628,13 @@
 follows.
 
 \begin{lstlisting}[language=JVMIS,numbers=left,basicstyle=\ttfamily\small]
-.method public static write(I)V 
-    .limit locals 1 
-    .limit stack 2 
-    getstatic java/lang/System/out Ljava/io/PrintStream; 
+.method public static write(I)V
+    .limit locals 1
+    .limit stack 2
+    getstatic java/lang/System/out Ljava/io/PrintStream;
     iload 0
-    invokevirtual java/io/PrintStream/println(I)V 
-    return 
+    invokevirtual java/io/PrintStream/println(I)V
+    return
 .end method
 \end{lstlisting}
 
@@ -662,7 +664,7 @@
 \pcode{write} can be invoked with the two instructions
 
 \begin{lstlisting}[mathescape,language=JVMIS]
-iload $E(x)$ 
+iload $E(x)$
 invokestatic XXX/XXX/write(I)V
 \end{lstlisting}
 
@@ -727,8 +729,8 @@
 \begin{framed}
 \lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
 \begin{tikzpicture}[remember picture,overlay]
-  \draw[|<->|,very thick] (LA.north) -- (LB.south) 
-     node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; 
+  \draw[|<->|,very thick] (LA.north) -- (LB.south)
+     node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}};
   \draw[|<->|,very thick] (LC.north) -- (LD.south)
      node[left=-0.5mm,midway] {\footnotesize\texttt{write x}};
 \end{tikzpicture}
@@ -742,7 +744,7 @@
 
 Maybe a useful addition to the WHILE-language would be arrays. This
 would allow us to generate more interesting WHILE-programs by
-translating BF*** programs into equivalent WHILE-code. Therefore in this
+translating BF*** programs into equivalent WHILE-code. Yeah! Therefore in this
 section let us have a look at how we can support the following three
 constructions
 
@@ -750,7 +752,7 @@
 \item \lstinline|new(arr[15000])|
 \item \lstinline|x := 3 + arr[3 + y]|
 \item \lstinline|arr[42 * n] := ...|
-\end{itemize}  
+\end{itemize}
 
 \noindent
 The first construct is for creating new arrays. In this instance the
@@ -760,14 +762,14 @@
 inside an arithmetic expression---we need to be able to look up the
 contents of an array at an index determined by an arithmetic expression.
 Similarly in the line below, we need to be able to update the content of
-an array at a calculated index.  
+an array at a calculated index.
 
 For creating a new array we can generate the following three JVM
 instructions:
 
 \begin{lstlisting}[mathescape,language=JVMIS]
-ldc number 
-newarray int 
+ldc number
+newarray int
 astore loc_var
 \end{lstlisting}
 
@@ -781,8 +783,8 @@
 array we can use the following JVM code
 
 \begin{lstlisting}[mathescape,language=JVMIS]
-aload loc_var 
-$\textit{index\_aexp}$ 
+aload loc_var
+$\textit{index\_aexp}$
 iaload
 \end{lstlisting}
 
@@ -796,9 +798,9 @@
 at an index with a value is as follows.
 
 \begin{lstlisting}[mathescape,language=JVMIS]
-aload loc_var 
-$\textit{index\_aexp}$ 
-$\textit{value\_aexp}$ 
+aload loc_var
+$\textit{index\_aexp}$
+$\textit{value\_aexp}$
 iastore
 \end{lstlisting}
 
@@ -833,7 +835,7 @@
 
 \begin{plstx}[rhs style=, margin=2cm, one per line]
 : \meta{Stmt} ::=  \ldots
-              | \texttt{new}(\meta{Id}\,[\,\meta{Num}\,]) 
+              | \texttt{new}(\meta{Id}\,[\,\meta{Num}\,])
               | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\
 \end{plstx}
 
@@ -858,7 +860,7 @@
 }
 \end{lstlisting}
 
-\noindent 
+\noindent
 The idea behind the translation is that BF-programs operate on an array,
 called here \texttt{mem}. The BF-memory pointer into this array is
 represented as the variable \texttt{ptr}. As usual the BF-instructions
@@ -871,7 +873,7 @@
 generate a \code{skip} instruction just before finishing with the
 closing \code{"\}"}. The reason is that we are rather pedantic about
 semicolons in our WHILE-grammar: the last command cannot have a
-semicolon---adding a \code{skip} works around this snag. 
+semicolon---adding a \code{skip} works around this snag.
 
 Putting this all together and we can generate WHILE-programs with more
 than 15K JVM-instructions; run the compiled JVM code for such
@@ -884,8 +886,8 @@
 around 32K---not too bad). The generation of the picture completes
 within 20 or so seconds. Try replicating this with an interpreter! The
 good point is that we now have a sufficiently complicated program in our
-WHILE-language in order to do some benchmarking. Which means we now face
-the question about what to do next\ldots
+WHILE-language in order to do some benchmarking (your task!). Having done
+this, we now face the question about what to do next in our compiler\ldots?
 
 \subsection*{Optimisations \& Co}
 
@@ -907,7 +909,7 @@
 (which many integers will be when compiling a BF-program). To counter
 this ``waste'', the JVM has specific instructions for small integers,
 for example
- 
+
 \begin{itemize}
 \item \instr{iconst_0},\ldots, \instr{iconst_5}
 \item \instr{bipush n}
@@ -933,13 +935,13 @@
 following Scala helper-function
 
 \begin{lstlisting}[language=Scala]
-def compile_num(i: Int) = 
-  if (0 <= i && i <= 5) i"iconst_$i" else 
-  if (-128 <= i && i <= 127) i"bipush $i" 
+def compile_num(i: Int) =
+  if (0 <= i && i <= 5) i"iconst_$i" else
+  if (-128 <= i && i <= 127) i"bipush $i"
   else i"ldc $i"
 \end{lstlisting}
 
-\noindent 
+\noindent
 It generates the more efficient instructions when pushing a small integer
 constant onto the stack. The default is \instr{ldc} for any other constants.
 
@@ -970,10 +972,10 @@
 \end{tabular}
 \end{center}
 
-\noindent 
+\noindent
 Quite good! Such optimisations are called \emph{peephole optimisations},
 because they involve changing one or a small set of instructions into an
-equivalent set that has better performance. 
+equivalent set that has better performance.
 
 If you look careful at our generated code you will quickly find another
 source of inefficiency in programs like
@@ -989,7 +991,7 @@
 loads the local variable back onto the stack for writing out a number.
 
 \begin{lstlisting}[mathescape,language=JVMIS]
-... 
+...
 istore 0
 iload 0
 ...
@@ -1019,7 +1021,7 @@
 arr[14] := 3 + arr[13]
 \end{lstlisting}
 
-\noindent 
+\noindent
 Admittedly this is a contrived program, and probably not meant to be
 like this by any sane programmer, but it is supposed to make the
 following point: The program generates an array of size 10, and then
@@ -1048,8 +1050,8 @@
 is to modify the code we generate.
 
 \begin{lstlisting}[mathescape,language=JVMIS2]
-  $\textit{index\_aexp}$ 
-  aload loc_var 
+  $\textit{index\_aexp}$
+  aload loc_var
   dup2
   arraylength
   if_icmple L1
@@ -1062,9 +1064,11 @@
 L2:
 \end{lstlisting}
 
+\textbf{TBD}
+
  \begin{lstlisting}[mathescape,language=JVMIS2]
-  $\textit{index\_aexp}$ 
-  aload loc_var 
+  $\textit{index\_aexp}$
+  aload loc_var
   dup2
   arraylength
   if_icmple L1
@@ -1084,55 +1088,55 @@
                                   fill=black!20,draw,text width=1.6cm,line width=0.5mm}]
 \node (A)  {};
 \node[stack,right = 80pt] (0) at (A.east) {$\textit{index}$\nodepart{two} \ldots\phantom{l}};
-\node[stack,right = 60pt] (1) at (0.east)  
+\node[stack,right = 60pt] (1) at (0.east)
    {array\nodepart{two}
     $\textit{index}$\nodepart{three} \ldots\phantom{l}};
-\node[stack,below = 40pt] (2) at (1.south)  
+\node[stack,below = 40pt] (2) at (1.south)
    {array\nodepart{two}
     $\textit{index}$ \nodepart{three}
     array \nodepart{four}
-    $\textit{index}$\nodepart{five} \ldots\phantom{l}}; 
-\node[stack,left = 90pt] (3) at (2.west)  
+    $\textit{index}$\nodepart{five} \ldots\phantom{l}};
+\node[stack,left = 90pt] (3) at (2.west)
    {array\_len\nodepart{two}
     $\textit{index}$ \nodepart{three}
     array \nodepart{four}
-    $\textit{index}$\nodepart{five} \ldots\phantom{l}};    
+    $\textit{index}$\nodepart{five} \ldots\phantom{l}};
 \node[stack,below right of = 3,node distance = 130pt,rectangle split parts = 3] (4b) at (3.south)
    {array\nodepart{two}
     $\textit{index}$\nodepart{three} \ldots\phantom{l}};
 \node[stack,below left of = 3,node distance = 130pt,rectangle split parts = 3] (4a) at (3.south)
    {array\nodepart{two}
-    $\textit{index}$\nodepart{three} \ldots\phantom{l}};  
+    $\textit{index}$\nodepart{three} \ldots\phantom{l}};
 \node[stack,below of = 4a,node distance = 70pt,rectangle split parts = 3] (5a) at (4a.south)
    {$\textit{index}$\nodepart{two}
-    array\nodepart{three} \ldots\phantom{l}};                
+    array\nodepart{three} \ldots\phantom{l}};
 \node[stack,below of = 5a,node distance = 60pt,rectangle split parts = 2] (6a) at (5a.south)
    {$\textit{array\_elem}$\nodepart{two} \ldots\phantom{l}};
 \node[stack,below of = 4b,node distance = 65pt,rectangle split parts = 2] (5b) at (4b.south)
-   {\ldots\phantom{l}};       
+   {\ldots\phantom{l}};
 \node[stack,below of = 5b,node distance = 60pt,rectangle split parts = 2] (6b) at (5b.south)
-   {0\nodepart{two} \ldots\phantom{l}}; 
+   {0\nodepart{two} \ldots\phantom{l}};
 
-\draw [|->,line width=2.5mm] (A) -- node [above,pos=0.45] {$\textit{index\_aexp}$} (0); 
+\draw [|->,line width=2.5mm] (A) -- node [above,pos=0.45] {$\textit{index\_aexp}$} (0);
 \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {\instr{aload}} (1);
-\draw [->,line width=2.5mm] (1) -- node [right,pos=0.35] {\instr{dup2}} (2);  
+\draw [->,line width=2.5mm] (1) -- node [right,pos=0.35] {\instr{dup2}} (2);
 \draw [->,line width=2.5mm] (2) -- node [above,pos=0.40] {\instr{arraylength}} (3);
-\path[->,draw,line width=2.5mm] 
-  let \p1=(3.south), \p2=(4a.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) node [right,pos=0.50] {\instr{if_icmple}} -| (4a.north);  
-\path[->,draw,line width=2.5mm] 
+\path[->,draw,line width=2.5mm]
+  let \p1=(3.south), \p2=(4a.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) node [right,pos=0.50] {\instr{if_icmple}} -| (4a.north);
+\path[->,draw,line width=2.5mm]
   let \p1=(3.south), \p2=(4b.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) -| (4b.north);
 \draw [->,line width=2.5mm] (4a) -- node [right,pos=0.35] {\instr{swap}} (5a);
-\draw [->,line width=2.5mm] (4b) -- node [right,pos=0.35] {\instr{pop2}} (5b);  
+\draw [->,line width=2.5mm] (4b) -- node [right,pos=0.35] {\instr{pop2}} (5b);
 \draw [->,line width=2.5mm] (5a) -- node [right,pos=0.35] {\instr{iaload}} (6a);
 \draw [->,line width=2.5mm] (5b) -- node [right,pos=0.35] {\instr{iconst_0}} (6b);
-\end{tikzpicture}                      
+\end{tikzpicture}
 \end{center}
 \end{figure}
 
 goto\_w problem solved for too large jumps
 \end{document}
 
-%%% Local Variables: 
-%%% mode: latex  
+%%% Local Variables:
+%%% mode: latex
 %%% TeX-master: t
-%%% End: 
+%%% End:

--- a/progs/lexer/lexer.sc	Wed Feb 21 09:14:12 2024 +0000
+++ b/progs/lexer/lexer.sc	Wed May 29 13:25:30 2024 +0100
@@ -59,7 +59,7 @@
   case ONE => true
   case CHAR(_) => false
   case ALT(r1, r2) => nullable(r1) || nullable(r2)
-  case SEQ(r1, r2) => nullable(r1) $$ nullable(r2)
+  case SEQ(r1, r2) => nullable(r1) && nullable(r2)
   case STAR(_) => true
   case RECD(_, r1) => nullable(r1)
 }

--- a/progs/parser-combinators/comb1.sc	Wed Feb 21 09:14:12 2024 +0000
+++ b/progs/parser-combinators/comb1.sc	Wed May 29 13:25:30 2024 +0100
@@ -5,18 +5,18 @@
 //
 //  amm comb1.sc
 
- 
+
 //  Note, in the lectures I did not show the type bound
-//  using is: I => Seq[_], which means that the input 
-//  type 'I' needs to be a sequence. 
+//  using is: I => Seq[_], which means that the input
+//  type 'I' needs to be a sequence.
 
-type IsSeq[I] = I => Seq[_]
+type IsSeq[I] = I => Seq[?]
 
-abstract class Parser[I, T](using is: IsSeq[I])  {
-  def parse(in: I): Set[(T, I)]  
+abstract class Parser[I: IsSeq, T](using is: IsSeq[I])  {
+  def parse(in: I): Set[(T, I)]
 
   def parse_all(in: I) : Set[T] =
-    for ((hd, tl) <- parse(in); 
+    for ((hd, tl) <- parse(in);
         if is(tl).isEmpty) yield hd
 }
 
@@ -25,21 +25,21 @@
 
 
 // alternative parser
-class AltParser[I : IsSeq, T](p: => Parser[I, T], 
+class AltParser[I : IsSeq, T](p: => Parser[I, T],
                               q: => Parser[I, T]) extends Parser[I, T] {
-  def parse(in: I) = p.parse(in) ++ q.parse(in)   
+  def parse(in: I) = p.parse(in) ++ q.parse(in)
 }
 
 // sequence parser
-class SeqParser[I: IsSeq, T, S](p: => Parser[I, T], 
+class SeqParser[I: IsSeq, T, S](p: => Parser[I, T],
                                 q: => Parser[I, S]) extends Parser[I, (T, S)] {
-  def parse(in: I) = 
-    for ((hd1, tl1) <- p.parse(in); 
+  def parse(in: I) =
+    for ((hd1, tl1) <- p.parse(in);
          (hd2, tl2) <- q.parse(tl1)) yield ((hd1, hd2), tl2)
 }
 
 // map parser
-class MapParser[I : IsSeq, T, S](p: => Parser[I, T], 
+class MapParser[I : IsSeq, T, S](p: => Parser[I, T],
                                  f: T => S) extends Parser[I, S] {
   def parse(in: I) = for ((hd, tl) <- p.parse(in)) yield (f(hd), tl)
 }
@@ -48,7 +48,7 @@
 
 // an example of an atomic parser for characters
 case class CharParser(c: Char) extends Parser[String, Char] {
-  def parse(in: String) = 
+  def parse(in: String) =
     if (in != "" && in.head == c) Set((c, in.tail)) else Set()
 }
 
@@ -64,15 +64,15 @@
 case class RegexParser(reg: Regex) extends Parser[String, String] {
   def parse(in: String) = reg.findPrefixMatchOf(in) match {
     case None => Set()
-    case Some(m) => Set((m.matched, m.after.toString))  
+    case Some(m) => Set((m.matched, m.after.toString))
   }
 }
 
-// atomic parsers for numbers and "verbatim" strings 
+// atomic parsers for numbers and "verbatim" strings
 val NumParser = RegexParser("[0-9]+".r)
 def StrParser(s: String) = RegexParser(Regex.quote(s).r)
 
-NumParser.parse("123a123bc") 
+NumParser.parse("123a123bc")
 StrParser("else").parse("elsethen")
 
 
@@ -82,16 +82,16 @@
 val NumParserInt = MapParser(NumParser, (s: String) => s.toInt)
 NumParserInt.parse("123abc")
 
-// the following string interpolation allows us to write 
-// StrParser(_some_string_) more conveniently as 
+// the following string interpolation allows us to write
+// StrParser(_some_string_) more conveniently as
 //
-// p"<_some_string_>" 
+// p"<_some_string_>"
 
-extension (sc: StringContext) 
-  def p(args: Any*) = StrParser(sc.s(args:_*))
+extension (sc: StringContext)
+  def p(args: Any*) = StrParser(sc.s(args*))
 
 
-(p"else").parse("elsethen")           
+(p"else").parse("elsethen")
 
 // more convenient syntax for parser combinators
 extension [I: IsSeq, T](p: Parser[I, T]) {
@@ -100,11 +100,11 @@
   def map[S](f: => T => S) = new MapParser[I, T, S](p, f)
 }
 
-// simple example of transforming the 
+// simple example of transforming the
 // result into capital letters
 def toU(s: String) = s.map(_.toUpper)
 
-(p"else").map(toU(_)).parse("elseifthen")  
+(p"else").map(toU(_)).parse("elseifthen")
 
 // these implicits allow us to use an infix notation for
 // sequences and alternatives; we also can write the usual
@@ -121,10 +121,10 @@
 // A parser for palindromes (just returns them as string)
 //  since the parser is recursive it needs to be lazy
 lazy val Pal : Parser[String, String] = {
-   (p"a" ~ Pal ~ p"a").map{ case ((x, y), z) => s"$x$y$z" } || 
-   (p"b" ~ Pal ~ p"b").map{ case ((x, y), z) => s"$x$y$z" } || 
+   (p"a" ~ Pal ~ p"a").map{ case ((x, y), z) => s"$x$y$z" } ||
+   (p"b" ~ Pal ~ p"b").map{ case ((x, y), z) => s"$x$y$z" } ||
     p"a" || p"b" || p""
-}  
+}
 
 // examples
 Pal.parse_all("abacaba")
@@ -132,7 +132,7 @@
 
 println("Palindrome: " + Pal.parse_all("abaaaba"))
 
-// A parser for wellnested parentheses 
+// A parser for wellnested parentheses
 //
 //   P ::= ( P ) P | epsilon
 //
@@ -140,7 +140,7 @@
 lazy val P : Parser[String, String] = {
   (p"(" ~ P ~ p")" ~ P).map{ case (((_, x), _), y) => "{" + x + "}" + y } ||
   p""
-}  
+}
 
 println(P.parse_all("(((()()))())"))
 println(P.parse_all("(((()()))()))"))
@@ -172,8 +172,8 @@
 // with parser combinators (and other parsing algorithms)
 // no left-recursion is allowed, otherwise the will loop
 
-lazy val EL: Parser[String, Int] = 
-  ((EL ~ p"+" ~ EL).map{ case ((x, y), z) => x + z} || 
+lazy val EL: Parser[String, Int] =
+  ((EL ~ p"+" ~ EL).map{ case ((x, y), z) => x + z} ||
    (EL ~ p"*" ~ EL).map{ case ((x, y), z) => x * z} ||
    (p"(" ~ EL ~ p")").map{ case ((x, y), z) => y} ||
    NumParserInt)
@@ -234,7 +234,7 @@
 
 
 
-// a problem with the arithmetic expression parser: it 
+// a problem with the arithmetic expression parser: it
 // gets very slow with deeply nested parentheses
 
 println("A runtime problem")

--- a/slides/ssy.tex	Wed Feb 21 09:14:12 2024 +0000
+++ b/slides/ssy.tex	Wed May 29 13:25:30 2024 +0100
@@ -13,8 +13,8 @@
 \newtcolorbox{mybox3}[1]{colback=Cyan!5!white,colframe=Cyan!75!black,fonttitle=\bfseries,title=#1}
 
 
-
-\hfuzz=220pt 
+\setbeamersize{text margin left=2mm} % <- like this
+\hfuzz=220pt
 
 \lstset{language=Scala,
         style=mystyle,
@@ -22,20 +22,74 @@
         numbers=none,
         xleftmargin=0mm}
 
-\pgfplotsset{compat=1.17}      
+\pgfplotsset{compat=1.17}
 
-\newcommand{\bl}[1]{\textcolor{blue}{#1}}     
- 
-% beamer stuff 
-\renewcommand{\slidecaption}{SSY, King's College London}
+\newcommand{\bl}[1]{\textcolor{blue}{#1}}
+
+% beamer stuff
+\renewcommand{\slidecaption}{Christian Urban, SSY, King's College London}
 
 
 
 \begin{document}
 
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\begin{frame}[t]
+\frametitle{\begin{tabular}{@ {\hspace{8mm}}c@ {}}Fast Regular Expression Matching\end{tabular}}
+\mbox{}\\[1mm]
+
+\begin{columns}[t,onlytextwidth]
+\begin{column}{.2\textwidth}
+\raisebox{-10mm}{
+\begin{tikzpicture}
+  \begin{axis}[
+    xlabel={size of strings},
+    x label style={at={(0.45,-0.16)}},
+    ylabel={time in secs},
+    enlargelimits=false,
+    xtick={0,10,...,30},
+    xmax=35,
+    ymax=35,
+    ytick={0,10,...,30},
+    scaled ticks=false,
+    axis lines=left,
+    width=5cm,
+    height=4.5cm,
+    legend entries={Python,JavaScript,Swift,Dart},
+    legend style={font=\footnotesize,at={(0.45,-0.48)},anchor=north},
+    legend cell align=left]
+\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
+\addplot[red,mark=*, mark options={fill=white}] table {re-js.data};
+\addplot[magenta,mark=*, mark options={fill=white}] table {re-swift.data};
+\addplot[brown,mark=*, mark options={fill=white}] table {re-dart.data};
+\end{axis}
+\end{tikzpicture}}
+\end{column}
+
+\end{columns}
+
+\begin{textblock}{4}(3.5,3.2)
+{\bl{\texttt{(a*)*$\cdot$b}}}
+\end{textblock}
+
+\begin{textblock}{7.5}(8,3)
+\begin{itemize}
+\item use \textbf{Brzozowski derivatives} for regex matching rather\\ than NFAs/DFAs% and lexing
+\item based on work by \textbf{Christian Urban} and \textbf{Roy Dyckhoff}\bigskip
+\item applications in network security (traffic filtering)
+\item formal verification of correctness and speed (\textbf{Isabelle} theorem prover)
+\end{itemize}
+\end{textblock}
+
+\end{frame}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+
+\end{document}
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
-\frametitle{%  
+\frametitle{%
   \begin{tabular}{@ {}c@ {}}
     \\[8mm]
     \LARGE Derivative-Based\\
@@ -46,7 +100,7 @@
 \end{frame}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[c]
 \frametitle{A Bit of Background}
 
@@ -57,7 +111,7 @@
 \begin{textblock}{11}(1.7,10)
 \begin{bubble}[9.6cm]
   PhD thesis on a particular term-rewriting system:\medskip\\
-  Proved that all rewriting sequences are strongly normalising (terminate). 
+  Proved that all rewriting sequences are strongly normalising (terminate).
 \end{bubble}
 \end{textblock}
 
@@ -69,8 +123,8 @@
          & \bl{|} & \bl{$\lambda x.\,t$}\\
          & \bl{|} & \bl{$t_1\;t_2$}\\
          & \bl{|} & \bl{...}
-\end{tabular}  
-\end{bubble}  
+\end{tabular}
+\end{bubble}
 \end{textblock}}
 
 \end{frame}
@@ -78,14 +132,14 @@
 
 
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
 \frametitle{Quiz 1}
 \begin{bubble}[10cm]\it
   There are many, many regular expression libraries.\bigskip\\
-  
+
   \textbf{Given a regular expression \bl{r} and a string \bl{s}, what is the
   difficulty / complexity of the problem deciding whether \bl{r} matches \bl{s}?}
 \end{bubble}
@@ -102,9 +156,9 @@
 \end{textblock}
 
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
 \frametitle{Quiz 2}
 
@@ -114,13 +168,13 @@
   ($\underbrace{\bl{[a\mbox{-}z0\mbox{-}9\_\!\!\_\,.-]^+}}_{\textrm{name}}$)\bl{$\,@\,$}
   ($\underbrace{\bl{[a\mbox{-}z0\mbox{-}9\,-]^+}}_{\textrm{domain}}$) \bl{$\,.\,$}
   ($\underbrace{\bl{[a\mbox{-}z\,.]^{\{2..5\}}}}_{\textrm{top-level domain}}$)
-\end{center}  
+\end{center}
 \end{textblock}
 
 \only<2>{
 \begin{textblock}{10}(4,8)
 bounded regular expressions:\medskip\\
-\begin{tabular}{ll}    
+\begin{tabular}{ll}
   \bl{$r^{\{n\}}$}     & exactly n-times \bl{$r$}\\
   \bl{$r^{\{n..m\}}$}  & between n and m-times\\
   \bl{$r^{\{n..\}}$}   & from n-times\\
@@ -128,12 +182,12 @@
 \end{tabular}\smallskip\\
 \footnotesize\hfill ${}^*$ in some applications the counters can be in the millions
 \end{textblock}}
- 
+
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
 \frametitle{Quiz 2}
 
@@ -149,8 +203,8 @@
     >=stealth',very thick,
     every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20}]
 \node[state, initial]  (Q_0)  {$\mbox{}$};
-\node[state, initial]  (Q_01) [below=1mm of Q_0] {$\mbox{}$};  
-\node[state, initial]  (Q_02) [above=1mm of Q_0] {$\mbox{}$};  
+\node[state, initial]  (Q_01) [below=1mm of Q_0] {$\mbox{}$};
+\node[state, initial]  (Q_02) [above=1mm of Q_0] {$\mbox{}$};
 \node (R_1)  [right=of Q_0] {$\ldots$};
 \node[state, accepting]  (T_1)  [right=of R_1] {$\mbox{}$};
 \node[state, accepting]  (T_2)  [above=of T_1] {$\mbox{}$};
@@ -181,8 +235,8 @@
     >=stealth',very thick,
     every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \node[state, initial]  (Q_0)  {$\mbox{}$};
-\node[state, initial]  (Q_01) [below=1mm of Q_0] {$\mbox{}$};  
-\node[state, initial]  (Q_02) [above=1mm of Q_0] {$\mbox{}$};  
+\node[state, initial]  (Q_01) [below=1mm of Q_0] {$\mbox{}$};
+\node[state, initial]  (Q_02) [above=1mm of Q_0] {$\mbox{}$};
 \node (r_1)  [right=of Q_0] {$\ldots$};
 \node[state]  (t_1)  [right=of r_1] {$\mbox{}$};
 \node[state]  (t_2)  [above=of t_1] {$\mbox{}$};
@@ -202,7 +256,7 @@
 \path[->] (t_1) edge (A_02);
 \path[->] (t_2) edge (A_02);
 \path[->] (t_3) edge node [below]  {$\varepsilon$\footnotesize{}s} (A_02);
- 
+
 \begin{pgfonlayer}{background}
   \node (3) [rounded corners, inner sep=1mm, thick,
     draw=black!60, fill=black!20, fit= (Q_0) (c_1) (c_2) (c_3)] {};
@@ -213,9 +267,9 @@
 
 
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%   
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
 \frametitle{Quiz 2}
 
@@ -225,14 +279,14 @@
 \end{textblock}
 
 \begin{textblock}{12}(4,6)
-\onslide<1>{  
+\onslide<1>{
 \begin{tikzpicture}[node distance=3mm,
     >=stealth',very thick,
     every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},
     baseline=(current bounding box.north)]
 \node (2)  {$\mbox{}$};
 \node[state, initial]  (4)  [above=1mm of 2] {$\mbox{}$};
-\node[state, initial]  (5)  [below=1mm of 2] {$\mbox{}$};  
+\node[state, initial]  (5)  [below=1mm of 2] {$\mbox{}$};
 \node (a)  [right=of 2] {$\ldots$};
 \node[state, accepting]  (a1)  [right=of a] {$\mbox{}$};
 \node[state, accepting]  (a2)  [above=of a1] {$\mbox{}$};
@@ -245,7 +299,7 @@
 \end{textblock}
 
 \begin{textblock}{12}(2,6)
-\onslide<2->{  
+\onslide<2->{
 \begin{tikzpicture}[node distance=3mm,
     >=stealth',very thick,
     every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},
@@ -253,7 +307,7 @@
 \node at (0,0) [state, initial,accepting]  (1)  {$\mbox{}$};
 \node (2)  [right=16mm of 1] {$\mbox{}$};
 \node[state]  (4)  [above=1mm of 2] {$\mbox{}$};
-\node[state]  (5)  [below=1mm of 2] {$\mbox{}$};  
+\node[state]  (5)  [below=1mm of 2] {$\mbox{}$};
 \node (a)  [right=of 2] {$\ldots$};
 \node[state]  (a1)  [right=of a] {$\mbox{}$};
 \node[state]  (a2)  [above=of a1] {$\mbox{}$};
@@ -271,30 +325,30 @@
 \end{textblock}
 
 \begin{textblock}{12}(2,12)
-\onslide<3->{ 
+\onslide<3->{
   \begin{bubble}[9.5cm]\it
-  Quiz:\smallskip\\  
+  Quiz:\smallskip\\
   \textbf{What is the Thompson Construction for \bl{$r^{\{n\}}$} ?}
 \end{bubble}}
 \end{textblock}
 
 \end{frame}
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
   \frametitle{Quiz 3}
 
-Hierarchy of Languages:  
+Hierarchy of Languages:
 
 \begin{center}
 \begin{tikzpicture}
-[rect/.style={draw=black!50, 
+[rect/.style={draw=black!50,
               top color=white,
-              bottom color=black!20, 
-              rectangle, 
-              very thick, 
+              bottom color=black!20,
+              rectangle,
+              very thick,
               rounded corners}, scale=1.2]
 
 \draw (0,0) node [rect, text depth=39mm, text width=68mm] {all languages};
@@ -303,36 +357,36 @@
 \draw (0,-1.14) node [rect, text depth=9mm, text width=50mm] {context-free languages};
 \draw (0,-1.4) node [rect] {regular languages};
 \end{tikzpicture}
-\end{center}  
+\end{center}
 
 \begin{textblock}{12}(2,11.5)
-\onslide<2->{ 
+\onslide<2->{
   \begin{bubble}[9.5cm]\it
     Quiz:\smallskip\\
-    
+
     \textbf{Can we use standard parsing algorithms for matching / lexing\,?}
     Say CYK, LL, LR(k), PEG, \ldots
 \end{bubble}}
 \end{textblock}
 
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 \defverbatim{\foo}{\footnotesize%
 \begin{verbatim}
 (?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|
 null|undefined|symbol|math)|\`|\-|\+)+[)]*;?((?:\s
-| -|~|!|{}|\|\||\+)*.*(?:.*=.*)))  
+| -|~|!|{}|\|\||\+)*.*(?:.*=.*)))
 \end{verbatim}
 }
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t,fragile]
 \frametitle{Quiz 4}
 
 \begin{textblock}{12}(2,2.5)
 \begin{bubble}[9.5cm]\it
-  Quiz:\smallskip\\    
+  Quiz:\smallskip\\
   Do regular expressions have any security relevance\,?
 \end{bubble}
 \end{textblock}
@@ -361,18 +415,18 @@
    \end{textblock}
 }
 
-    
+
 \end{frame}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
   \frametitle{Quiz 3}
 
-\begin{textblock}{12}(2,2) 
+\begin{textblock}{12}(2,2)
   \begin{bubble}[9.5cm]\it
     Quiz:\smallskip\\
-    
+
     \textbf{Can we use standard parsing algorithms for matching / lexing\,?}
     Say CYK, LL, LR(k), PEG, \ldots
 \end{bubble}
@@ -390,11 +444,11 @@
 
   What should be & \\
   the result for:  & \bl{\texttt{"iffoo"}} \;\;\; \bl{\texttt{"if"}}\\
-\end{tabular}  
+\end{tabular}
 \end{textblock}
 
 \only<2->{
-\begin{textblock}{13.5}(1.5,4) 
+\begin{textblock}{13.5}(1.5,4)
   \begin{mybox3}{POSIX rules}
     \begin{itemize}
     \item \textbf{The Longest Match Rule:} The longest initial
@@ -402,23 +456,23 @@
     \item \textbf{Priority Rule:}  For a particular longest initial
       substring, the first (leftmost) regular expression that can match
       determines the token.
-    \item \ldots  
+    \item \ldots
     \end{itemize}
     \begin{center}
     \bl{$(a + ab) \cdot (bc + c)$} \;\;and\;\; \bl{\texttt{"}$abc$\texttt{"}}
-    \end{center}  
+    \end{center}
   \end{mybox3}
 \end{textblock}}
 
 \end{frame}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
 \frametitle{Quiz 1}
 \begin{bubble}[10cm]\it
   There are many, many regular expression libraries.\bigskip\\
-  
+
   \textbf{Given a regular expression \bl{r} and a string \bl{s}, what is the
   difficulty / complexity of the problem deciding whether \bl{r} matches \bl{s}?}
 \end{bubble}
@@ -431,16 +485,16 @@
 \end{textblock}
 
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
 
-\mbox{}\\[10mm]  
+\mbox{}\\[10mm]
 
-\begin{columns}[t,onlytextwidth]  
+\begin{columns}[t,onlytextwidth]
 \begin{column}{.2\textwidth}
-\raisebox{-10mm}{    
+\raisebox{-10mm}{
 \begin{tikzpicture}
   \begin{axis}[
     xlabel={\bl{$n$} \pcode{a}s},
@@ -454,7 +508,7 @@
     scaled ticks=false,
     axis lines=left,
     width=5cm,
-    height=5cm, 
+    height=5cm,
     legend entries={Python,JavaScript,Swift,Dart},
     legend style={font=\small,at={(0.5,-0.39)},anchor=north},
     legend cell align=left]
@@ -479,34 +533,34 @@
     scaled ticks=false,
     axis lines=left,
     width=5cm,
-    height=5cm, 
+    height=5cm,
     legend entries={Python,Ruby},
     legend style={font=\small,at={(0.5,-0.39)},anchor=north},
     legend cell align=left]
 \addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
-\addplot[brown,mark=pentagon*, mark options={fill=white}] table {re-ruby.data};  
+\addplot[brown,mark=pentagon*, mark options={fill=white}] table {re-ruby.data};
 \end{axis}
-\end{tikzpicture}  
-\end{column}  
+\end{tikzpicture}
+\end{column}
 \end{columns}
 
 \begin{textblock}{4}(9,1.7)
 \textbf{\bl{\texttt{(a?)\boldmath$^{\{n\}}\cdot$(a)\boldmath$^{\{n\}}$}}}
 \end{textblock}
 
-\begin{textblock}{4}(4,1.7)  
+\begin{textblock}{4}(4,1.7)
 \textbf{\bl{\texttt{(a*)*$\cdot$b}}}
 \end{textblock}
 
 \begin{textblock}{3.4}(0.3,10)
 \small{}\textbf{matching with strings}
-\textbf{\bl{$\underbrace{\texttt{a}...\texttt{a}}_n$}}  
+\textbf{\bl{$\underbrace{\texttt{a}...\texttt{a}}_n$}}
 \end{textblock}
 
 \end{frame}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
 \frametitle{Quiz 1}
 \begin{bubble}[10cm]\it
@@ -519,25 +573,25 @@
 answer is:
 \only<2->{\begin{center}
 \fontspec{Hoefler Text Black}\textcolor{ProcessBlue}{\huge{}NP}
-\end{center}}  
+\end{center}}
 \end{textblock}
 
 \begin{textblock}{12}(1.7,12)
 \only<3->{Backreferences: \bl{(\ldots)$\,\ldots{}\backslash{}n$}}
-\end{textblock}  
+\end{textblock}
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}
 \frametitle{Quiz 2}
 
 \begin{textblock}{12}(2,2)
-\onslide<1->{ 
+\onslide<1->{
   \begin{bubble}[9.5cm]\it
-  Quiz:\smallskip\\  
+  Quiz:\smallskip\\
   \textbf{What is the Thompson Construction for \bl{$r^{\{n\}}$} ?}
 \end{bubble}}
 \end{textblock}
@@ -552,8 +606,8 @@
 \end{textblock}}
 
 \only<4>{
-\begin{textblock}{13.5}(1.5,4) 
-  \begin{mybox3}{From Rust's Regex Description}\it 
+\begin{textblock}{13.5}(1.5,4)
+  \begin{mybox3}{From Rust's Regex Description}\it
     ``\ldots [the] syntax is similar to Perl-style regular expressions, but lacks a few features like look
     around and backreferences. In exchange, all searches execute in linear time with respect
     to the size of the regular expression and search text. \ldots''
@@ -561,7 +615,7 @@
 \end{textblock}}
 
 
-\end{frame}  
+\end{frame}
 
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -578,16 +632,16 @@
          & \bl{$\mid$} & \bl{$r^*$}            & star (zero or more)\\
   \end{tabular}
   \end{textblock}
-  
+
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%   
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[c]
 \frametitle{\begin{tabular}{l}The Derivative of a Rexp\end{tabular}}
 
 \large
-If \bl{$r$} matches the string \bl{$c\!::\!s$}, what is a regular 
+If \bl{$r$} matches the string \bl{$c\!::\!s$}, what is a regular
 expression that matches just \bl{$s$}?\bigskip\bigskip\bigskip\bigskip
 
 \small
@@ -607,14 +661,14 @@
   \bl{$\der\, c\, (d)$}                     & \bl{$\dn$} & \bl{if $c = d$ then $\ONE$ else $\ZERO$} & \\
   \bl{$\der\, c\, (r_1 + r_2)$}        & \bl{$\dn$} & \bl{$\der\, c\, r_1 + \der\, c\, r_2$} & \\
   \bl{$\der\, c\, (r_1 \cdot r_2)$}  & \bl{$\dn$}  & \bl{if $nullable (r_1)$}\\
-  & & \bl{then $(\der\,c\,r_1) \cdot r_2 + \der\, c\, r_2$}\\ 
+  & & \bl{then $(\der\,c\,r_1) \cdot r_2 + \der\, c\, r_2$}\\
   & & \bl{else $(\der\, c\, r_1) \cdot r_2$}\\
   \bl{$\der\, c\, (r^*)$}  & \bl{$\dn$} & \bl{$(\der\,c\,r) \cdot (r^*)$} &\medskip\bigskip\\
   \end{tabular}
 \end{center}
 
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%   
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
@@ -630,14 +684,14 @@
         & test whether \bl{$r_4$} can recognise\\
         & the empty string\medskip\\
 Output: & result of the test\\
-        & $\Rightarrow \bl{\textit{true}} \,\text{or}\, 
-                       \bl{\textit{false}}$\\        
+        & $\Rightarrow \bl{\textit{true}} \,\text{or}\,
+                       \bl{\textit{false}}$\\
 \end{tabular}
 \end{center}
 
 
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[c]
@@ -650,14 +704,14 @@
 \bl{$nullable(\ZERO)$}    & \bl{$\dn$} & \bl{\textit{false}}\\
 \bl{$nullable(\ONE)$}       & \bl{$\dn$} & \bl{\textit{true}}\\
 \bl{$nullable (c)$}             & \bl{$\dn$} & \bl{\textit{false}}\\
-\bl{$nullable (r_1 + r_2)$}     & \bl{$\dn$} & \bl{$nullable(r_1) \vee nullable(r_2)$} \\ 
+\bl{$nullable (r_1 + r_2)$}     & \bl{$\dn$} & \bl{$nullable(r_1) \vee nullable(r_2)$} \\
 \bl{$nullable (r_1 \cdot r_2)$} & \bl{$\dn$} & \bl{$nullable(r_1) \wedge nullable(r_2)$} \\
 \bl{$nullable (r^*)$}           & \bl{$\dn$} & \bl{\textit{true}}\\
 \end{tabular}
 \end{center}
 
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%   
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
@@ -679,14 +733,14 @@
 also works for lookarounds, various anchors, etc, but \textbf{not} for backreferences
 
 \begin{textblock}{3}(11,13)
-\onslide<2>{ 
+\onslide<2>{
   \begin{bubble}[3cm]
   \bl{$a^{\{0\}\{4294967295\}}$}
 \end{bubble}}
 \end{textblock}
 
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%   
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[t]
@@ -705,38 +759,37 @@
 \end{tabular}
 \end{center}}
 
-\begin{textblock}{13.5}(1.5,12) 
+\begin{textblock}{13.5}(1.5,12)
 (regular expressions of sizes 98, 169, 283, 468, 767, \ldots)
 \end{textblock}
 
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{frame}[c]
 \frametitle{\mbox{Conclusion}}
 
 \begin{itemize}
-\item The beauty is that this only involves functional programs that can be conveniently reasoned about in theorem provers 
+\item The beauty is that this only involves functional programs that can be conveniently reasoned about in theorem provers
 \item POSIX lexing can be done via an extension by Sulzmann and Lu, 2014 (our current work)\medskip\pause
-\item How surprising that one can still do new work on regular expressions.  
-\end{itemize} 
+\item How surprising that one can still do new work on regular expressions.
+\end{itemize}
 
 \end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%   
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 \begin{frame}<1-10>
-\end{frame}  
+\end{frame}
 
 
 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%   
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \end{document}
 
-%%% Local Variables:  
+%%% Local Variables:
 %%% mode: latex
 %%% TeX-master: t
-%%% End: 
-
+%%% End:

author	Christian Urban <christian.urban@kcl.ac.uk>
	Wed, 29 May 2024 13:25:30 +0100
changeset 960	c7009356ddd8
parent 959	64ec1884d860
child 961	c0600f8b6427

XX		file \| annotate \| diff \| comparison \| revisions
handouts/ho07.pdf		file \| annotate \| diff \| comparison \| revisions
handouts/ho07.tex		file \| annotate \| diff \| comparison \| revisions
progs/lexer/lexer.sc		file \| annotate \| diff \| comparison \| revisions
progs/parser-combinators/comb1.sc		file \| annotate \| diff \| comparison \| revisions
slides/ssy.tex		file \| annotate \| diff \| comparison \| revisions