--- a/XX Wed Feb 21 09:14:12 2024 +0000
+++ b/XX Wed May 29 13:25:30 2024 +0100
@@ -3,3 +3,10 @@
Pygments
+
+======================
+video about disassembly and the backdoor in xz
+
+https://www.youtube.com/watch?v=Q6ovtLdSbEA&t=4705s
+
+Deep Dive into XZ Utils Backdoor - Columbia Engineering, Advanced Systems Programming Guest Lecture
\ No newline at end of file
Binary file handouts/ho07.pdf has changed
--- a/handouts/ho07.tex Wed Feb 21 09:14:12 2024 +0000
+++ b/handouts/ho07.tex Wed May 29 13:25:30 2024 +0100
@@ -24,44 +24,44 @@
that is often pretty fast. This way of producing code has also the
advantage that the virtual machine takes care of things a compiler would
normally need to take care of (hairy things like explicit memory
-management).
+management).
As a first example in this module we will implement a compiler for the
very simple WHILE-language that we parsed in the last lecture. The
compiler will target the Java Virtual Machine (JVM), but not directly.
Pictorially the compiler will work as follows:
-
+
\begin{center}
\begin{tikzpicture}[scale=1,font=\bf,
node/.style={
rectangle,rounded corners=3mm,
- ultra thick,draw=black!50,minimum height=18mm,
+ ultra thick,draw=black!50,minimum height=18mm,
minimum width=20mm,
top color=white,bottom color=black!20}]
- \node (0) at (-3,0) {};
+ \node (0) at (-3,0) {};
\node (A) at (0,0) [node,text width=1.6cm,text centered] {our compiler};
\node (B) at (3.5,0) [node,text width=1.6cm,text centered] {Jasmin / Krakatau};
\node (C) at (7.5,0) [node] {JVM};
-
- \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A);
- \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B);
- \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C);
+
+ \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A);
+ \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B);
+ \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C);
\end{tikzpicture}
\end{center}
\noindent
The input will be WHILE-programs; the output will be assembly files
-(with the file extension .j). Assembly files essentially contain
+(with the file extension *.j). Assembly files essentially contain
human-readable low-level code, meaning they are not just bits and
bytes, but rather something you can read and understand---with a bit
of practice of course. An \emph{assembler} will then translate the
assembly files into unreadable class- or binary-files the JVM or CPU
-can run. Unfortunately, the Java ecosystem does not come with an
+can run, i.e.~bits and bytes. Unfortunately, the Java ecosystem does not come with an
assembler which would be handy for our compiler-endeavour (unlike
Microsoft's Common Language Infrastructure for the .Net platform which
has an assembler out-of-the-box). As a substitute we shall use the
-3rd-party programs Jasmin or Krakatau (Jasmin is the preferred
+3rd-party program Jasmin, or alternatively Krakatau (Jasmin is the preferred
option---a \texttt{jasmin.jar}-file is available on KEATS):
\begin{itemize}
@@ -75,7 +75,7 @@
readable by humans, as opposed to class-files which are pretty much just
(horrible) zeros and ones. Jasmin (respectively Krakatau) will then take
our assembly files as input and generate the corresponding class-files for
-us.
+us.
What is good about the JVM is that it is a stack-based virtual machine,
a fact which will make it easy to generate code for arithmetic
@@ -85,7 +85,7 @@
\begin{lstlisting}[language=JVMIS,numbers=none]
ldc 1
ldc 2
-iadd
+iadd
\end{lstlisting}
\noindent The first instruction loads the constant $1$ onto the stack,
@@ -134,14 +134,14 @@
generates the instructions
\begin{lstlisting}[language=JVMIS,numbers=none]
-ldc 1
-ldc 2
-ldc 3
-imul
-ldc 4
-ldc 3
-isub
-iadd
+ldc 1
+ldc 2
+ldc 3
+imul
+ldc 4
+ldc 3
+isub
+iadd
iadd
\end{lstlisting}
@@ -151,7 +151,7 @@
will be an important convention we always observe in our compiler. Note,
that a different bracketing of the expression, for example $(1 + (2 *
3)) + (4 - 3)$, produces a different abstract syntax tree and thus also
-a different list of instructions.
+a different list of instructions.
Generating code in this post-order-traversal fashion is rather easy to
implement: it can be done with the following recursive
@@ -163,11 +163,11 @@
$\textit{compile}(n)$ & $\dn$ & $\instr{ldc}\; n$\\
$\textit{compile}(a_1 + a_2)$ & $\dn$ &
$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \instr{iadd}$\\
-$\textit{compile}(a_1 - a_2)$ & $\dn$ &
+$\textit{compile}(a_1 - a_2)$ & $\dn$ &
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{isub}$\\
-$\textit{compile}(a_1 * a_2)$ & $\dn$ &
+$\textit{compile}(a_1 * a_2)$ & $\dn$ &
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{imul}$\\
-$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ &
+$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ &
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{idiv}$\\
\end{tabular}
\end{center}
@@ -184,9 +184,9 @@
iload $index$
\end{lstlisting}
-\noindent
-which places the content of the local variable $index$ onto
-the stack. Storing the top of the stack into a local variable
+\noindent
+which places the content of the local variable $index$ onto
+the stack. Storing the top of the stack into a local variable
can be done by the instruction
\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
@@ -207,13 +207,13 @@
\begin{center}
\begin{tabular}{lcl}
$\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\
-$\textit{compile}(a_1 + a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1 + a_2, E)$ & $\dn$ &
$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\
$\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\
$\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\
-$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ &
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\
$\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\
\end{tabular}
@@ -254,7 +254,7 @@
\begin{center}
\begin{tabular}{lcl}
-$\textit{compile}(x := a, E)$ & $\dn$ &
+$\textit{compile}(x := a, E)$ & $\dn$ &
$(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$
\end{tabular}
\end{center}
@@ -280,7 +280,7 @@
istore $n_x$
\end{lstlisting}
-\noindent
+\noindent
where $n_x$ is the index (or pointer to the memory) for the variable
$x$. The Scala code for looking-up the index for the variable is as follow:
@@ -367,8 +367,8 @@
\end{center}
\noindent where we now need a conditional jump (if the
-if-condition is false) from the end of the code for the
-boolean to the beginning of the instructions $cs_2$. Once we
+if-condition is false) from the end of the code for the
+boolean to the beginning of the instructions $cs_2$. Once we
are finished with running $cs_2$ we can continue with whatever
code comes after the if-statement.
@@ -376,33 +376,35 @@
where the jump should go. Since we are generating assembly
code for the JVM, we do not actually have to give (numeric)
addresses, but can just attach (symbolic) labels to our code.
-These labels specify a target for a jump. Therefore the labels
+These labels specify a target for a jump---essentially they mark
+a location in our code. Therefore the labels
need to be unique, as otherwise it would be ambiguous where a
-jump should go to. A label, say \pcode{L}, is attached to assembly
+jump should go to. A label, say \pcode{L}, is attached to assembly
code like
\begin{lstlisting}[mathescape,numbers=none]
+ $\textit{instr\_1}$
L:
- $\textit{instr\_1}$
$\textit{instr\_2}$
+ $\textit{instr\_3}$
$\vdots$
\end{lstlisting}
-
+
\noindent where the label needs to be followed by a colon. The task of
the assembler (in our case Jasmin or Krakatau) is to resolve the labels
to actual (numeric) addresses, for example jump 10 instructions forward,
-or 20 instructions backwards.
-
-Recall the ``trick'' with compiling boolean expressions: the
+or 20 instructions backwards, or jump to this particular address.
+
+Recall the ``trick'' with compiling boolean expressions: the
\textit{compile}-function for boolean expressions takes three
-arguments: an abstract syntax tree, an environment for
-variable indices and also the label, $lab$, to where an conditional
-jump needs to go. The clause for the expression $a_1 = a_2$,
+arguments: an abstract syntax tree, an environment for
+variable indices and also the label, which I called $lab$, to where an conditional
+jump needs to go. The clause for the expression $a_1 = a_2$,
for example, is as follows:
\begin{center}
\begin{tabular}{lcl}
-$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\
+$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\
\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$}
\end{tabular}
\end{center}
@@ -426,7 +428,7 @@
leave it to you to extend the \textit{compile}-function for the other
boolean expressions. Note that we need to jump whenever the boolean is
\emph{not} true, which means we have to ``negate'' the jump
-condition---equals becomes not-equal, less becomes greater-or-equal.
+condition---equals becomes not-equal, less becomes greater-or-equal.
Other jump instructions for boolean operators are
\begin{center}
@@ -444,13 +446,13 @@
``upside-down-inside-out'' way of generating code still seems the most
convenient.
-We are now ready to give the compile function for
-if-statements---remember this function returns for statements a
+We are now ready to give the compile function for
+if-statements---remember this function returns for statements a
pair consisting of the code and an environment:
\begin{center}
\begin{tabular}{lcl}
-$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\
+$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\
\multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\
\multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\
\multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\
@@ -468,18 +470,18 @@
for the jump addresses (just before the else-branch and just
after). In the next two lines we generate the instructions for
the two branches, $is_1$ and $is_2$. The final code will
-be first the code for $b$ (including the label
+be first the code for $b$ (including the label
just-before-the-else-branch), then the \pcode{goto} for after
-the else-branch, the label $L_\textit{ifelse}$, followed by
-the instructions for the else-branch, followed by the
-after-the-else-branch label. Consider for example the
+the else-branch, the label $L_\textit{--ifelse}$, followed by
+the instructions for the else-branch, followed by the
+after-the-else-branch label. Consider for example the
if-statement:
\begin{lstlisting}[mathescape,numbers=none,language=While]
if 1 == 1 then x := 2 else y := 3
\end{lstlisting}
-\noindent
+\noindent
The generated code is as follows:
\begin{lstlisting}[language=JVMIS,mathescape,numbers=left]
@@ -496,16 +498,16 @@
\end{lstlisting}
\begin{tikzpicture}[remember picture,overlay]
- \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm)
+ \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm)
-- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east);
- \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm)
+ \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm)
-- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east);
\end{tikzpicture}
\noindent The first three lines correspond to the the boolean
expression \texttt{1 == 1}. The jump for when this boolean expression
is false is in Line~3. Lines 4-6 corresponds to the if-branch;
-the else-branch is in Lines 8 and 9.
+the else-branch is in Lines 8 and 9.
Note carefully how the environment $E$ is threaded through the recursive
calls of \textit{compile}. The function receives an environment $E$, but
@@ -515,9 +517,9 @@
for the second call to \textit{compile}. $E''$ is also the environment
that needs to be returned as part of the answer.
-The compilation of the while-loops, say
-\pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
-the condition is true and we need to do another iteration,
+The compilation of the while-loops of the form
+\pcode{while} $b$ \pcode{do} $cs$ is very similar. In case
+the condition is true and we need to do another iteration,
and the control-flow needs to be as follows
\begin{center}
@@ -568,19 +570,19 @@
\noindent Again we can use the \textit{compile}-function for
boolean expressions to insert the appropriate jump to the
-end of the loop (label $L_{wend}$ below).
+end of the loop (label $L_\textit{--wend}$ below).
\begin{center}
\begin{tabular}{lcl}
-$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\
-\multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\
-\multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\
+$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\
+\multicolumn{3}{l}{$\qquad L_\textit{--wbegin}\;$ (fresh label)}\\
+\multicolumn{3}{l}{$\qquad L_\textit{--wend}\;$ (fresh label)}\\
\multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\
-\multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\
+\multicolumn{3}{l}{$\qquad(L_\textit{--wbegin}$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_\textit{--wend})$}\\
\multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_\textit{--wbegin}$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{--wend}, E')$}\\
\end{tabular}
\end{center}
@@ -605,16 +607,16 @@
goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$
L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$
\end{lstlisting}
-
+
\begin{tikzpicture}[remember picture,overlay]
- \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm)
+ \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm)
-- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east);
- \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm)
+ \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm)
-- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
\end{tikzpicture}
\noindent
-As said, I leave it to you to decide whether the code implements
+As said, I leave it to you to check that the code really implements
the usual controlflow of while-loops.
Next we need to consider the WHILE-statement \pcode{write x}, which can
@@ -626,13 +628,13 @@
follows.
\begin{lstlisting}[language=JVMIS,numbers=left,basicstyle=\ttfamily\small]
-.method public static write(I)V
- .limit locals 1
- .limit stack 2
- getstatic java/lang/System/out Ljava/io/PrintStream;
+.method public static write(I)V
+ .limit locals 1
+ .limit stack 2
+ getstatic java/lang/System/out Ljava/io/PrintStream;
iload 0
- invokevirtual java/io/PrintStream/println(I)V
- return
+ invokevirtual java/io/PrintStream/println(I)V
+ return
.end method
\end{lstlisting}
@@ -662,7 +664,7 @@
\pcode{write} can be invoked with the two instructions
\begin{lstlisting}[mathescape,language=JVMIS]
-iload $E(x)$
+iload $E(x)$
invokestatic XXX/XXX/write(I)V
\end{lstlisting}
@@ -727,8 +729,8 @@
\begin{framed}
\lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
\begin{tikzpicture}[remember picture,overlay]
- \draw[|<->|,very thick] (LA.north) -- (LB.south)
- node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}};
+ \draw[|<->|,very thick] (LA.north) -- (LB.south)
+ node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}};
\draw[|<->|,very thick] (LC.north) -- (LD.south)
node[left=-0.5mm,midway] {\footnotesize\texttt{write x}};
\end{tikzpicture}
@@ -742,7 +744,7 @@
Maybe a useful addition to the WHILE-language would be arrays. This
would allow us to generate more interesting WHILE-programs by
-translating BF*** programs into equivalent WHILE-code. Therefore in this
+translating BF*** programs into equivalent WHILE-code. Yeah! Therefore in this
section let us have a look at how we can support the following three
constructions
@@ -750,7 +752,7 @@
\item \lstinline|new(arr[15000])|
\item \lstinline|x := 3 + arr[3 + y]|
\item \lstinline|arr[42 * n] := ...|
-\end{itemize}
+\end{itemize}
\noindent
The first construct is for creating new arrays. In this instance the
@@ -760,14 +762,14 @@
inside an arithmetic expression---we need to be able to look up the
contents of an array at an index determined by an arithmetic expression.
Similarly in the line below, we need to be able to update the content of
-an array at a calculated index.
+an array at a calculated index.
For creating a new array we can generate the following three JVM
instructions:
\begin{lstlisting}[mathescape,language=JVMIS]
-ldc number
-newarray int
+ldc number
+newarray int
astore loc_var
\end{lstlisting}
@@ -781,8 +783,8 @@
array we can use the following JVM code
\begin{lstlisting}[mathescape,language=JVMIS]
-aload loc_var
-$\textit{index\_aexp}$
+aload loc_var
+$\textit{index\_aexp}$
iaload
\end{lstlisting}
@@ -796,9 +798,9 @@
at an index with a value is as follows.
\begin{lstlisting}[mathescape,language=JVMIS]
-aload loc_var
-$\textit{index\_aexp}$
-$\textit{value\_aexp}$
+aload loc_var
+$\textit{index\_aexp}$
+$\textit{value\_aexp}$
iastore
\end{lstlisting}
@@ -833,7 +835,7 @@
\begin{plstx}[rhs style=, margin=2cm, one per line]
: \meta{Stmt} ::= \ldots
- | \texttt{new}(\meta{Id}\,[\,\meta{Num}\,])
+ | \texttt{new}(\meta{Id}\,[\,\meta{Num}\,])
| \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\
\end{plstx}
@@ -858,7 +860,7 @@
}
\end{lstlisting}
-\noindent
+\noindent
The idea behind the translation is that BF-programs operate on an array,
called here \texttt{mem}. The BF-memory pointer into this array is
represented as the variable \texttt{ptr}. As usual the BF-instructions
@@ -871,7 +873,7 @@
generate a \code{skip} instruction just before finishing with the
closing \code{"\}"}. The reason is that we are rather pedantic about
semicolons in our WHILE-grammar: the last command cannot have a
-semicolon---adding a \code{skip} works around this snag.
+semicolon---adding a \code{skip} works around this snag.
Putting this all together and we can generate WHILE-programs with more
than 15K JVM-instructions; run the compiled JVM code for such
@@ -884,8 +886,8 @@
around 32K---not too bad). The generation of the picture completes
within 20 or so seconds. Try replicating this with an interpreter! The
good point is that we now have a sufficiently complicated program in our
-WHILE-language in order to do some benchmarking. Which means we now face
-the question about what to do next\ldots
+WHILE-language in order to do some benchmarking (your task!). Having done
+this, we now face the question about what to do next in our compiler\ldots?
\subsection*{Optimisations \& Co}
@@ -907,7 +909,7 @@
(which many integers will be when compiling a BF-program). To counter
this ``waste'', the JVM has specific instructions for small integers,
for example
-
+
\begin{itemize}
\item \instr{iconst_0},\ldots, \instr{iconst_5}
\item \instr{bipush n}
@@ -933,13 +935,13 @@
following Scala helper-function
\begin{lstlisting}[language=Scala]
-def compile_num(i: Int) =
- if (0 <= i && i <= 5) i"iconst_$i" else
- if (-128 <= i && i <= 127) i"bipush $i"
+def compile_num(i: Int) =
+ if (0 <= i && i <= 5) i"iconst_$i" else
+ if (-128 <= i && i <= 127) i"bipush $i"
else i"ldc $i"
\end{lstlisting}
-\noindent
+\noindent
It generates the more efficient instructions when pushing a small integer
constant onto the stack. The default is \instr{ldc} for any other constants.
@@ -970,10 +972,10 @@
\end{tabular}
\end{center}
-\noindent
+\noindent
Quite good! Such optimisations are called \emph{peephole optimisations},
because they involve changing one or a small set of instructions into an
-equivalent set that has better performance.
+equivalent set that has better performance.
If you look careful at our generated code you will quickly find another
source of inefficiency in programs like
@@ -989,7 +991,7 @@
loads the local variable back onto the stack for writing out a number.
\begin{lstlisting}[mathescape,language=JVMIS]
-...
+...
istore 0
iload 0
...
@@ -1019,7 +1021,7 @@
arr[14] := 3 + arr[13]
\end{lstlisting}
-\noindent
+\noindent
Admittedly this is a contrived program, and probably not meant to be
like this by any sane programmer, but it is supposed to make the
following point: The program generates an array of size 10, and then
@@ -1048,8 +1050,8 @@
is to modify the code we generate.
\begin{lstlisting}[mathescape,language=JVMIS2]
- $\textit{index\_aexp}$
- aload loc_var
+ $\textit{index\_aexp}$
+ aload loc_var
dup2
arraylength
if_icmple L1
@@ -1062,9 +1064,11 @@
L2:
\end{lstlisting}
+\textbf{TBD}
+
\begin{lstlisting}[mathescape,language=JVMIS2]
- $\textit{index\_aexp}$
- aload loc_var
+ $\textit{index\_aexp}$
+ aload loc_var
dup2
arraylength
if_icmple L1
@@ -1084,55 +1088,55 @@
fill=black!20,draw,text width=1.6cm,line width=0.5mm}]
\node (A) {};
\node[stack,right = 80pt] (0) at (A.east) {$\textit{index}$\nodepart{two} \ldots\phantom{l}};
-\node[stack,right = 60pt] (1) at (0.east)
+\node[stack,right = 60pt] (1) at (0.east)
{array\nodepart{two}
$\textit{index}$\nodepart{three} \ldots\phantom{l}};
-\node[stack,below = 40pt] (2) at (1.south)
+\node[stack,below = 40pt] (2) at (1.south)
{array\nodepart{two}
$\textit{index}$ \nodepart{three}
array \nodepart{four}
- $\textit{index}$\nodepart{five} \ldots\phantom{l}};
-\node[stack,left = 90pt] (3) at (2.west)
+ $\textit{index}$\nodepart{five} \ldots\phantom{l}};
+\node[stack,left = 90pt] (3) at (2.west)
{array\_len\nodepart{two}
$\textit{index}$ \nodepart{three}
array \nodepart{four}
- $\textit{index}$\nodepart{five} \ldots\phantom{l}};
+ $\textit{index}$\nodepart{five} \ldots\phantom{l}};
\node[stack,below right of = 3,node distance = 130pt,rectangle split parts = 3] (4b) at (3.south)
{array\nodepart{two}
$\textit{index}$\nodepart{three} \ldots\phantom{l}};
\node[stack,below left of = 3,node distance = 130pt,rectangle split parts = 3] (4a) at (3.south)
{array\nodepart{two}
- $\textit{index}$\nodepart{three} \ldots\phantom{l}};
+ $\textit{index}$\nodepart{three} \ldots\phantom{l}};
\node[stack,below of = 4a,node distance = 70pt,rectangle split parts = 3] (5a) at (4a.south)
{$\textit{index}$\nodepart{two}
- array\nodepart{three} \ldots\phantom{l}};
+ array\nodepart{three} \ldots\phantom{l}};
\node[stack,below of = 5a,node distance = 60pt,rectangle split parts = 2] (6a) at (5a.south)
{$\textit{array\_elem}$\nodepart{two} \ldots\phantom{l}};
\node[stack,below of = 4b,node distance = 65pt,rectangle split parts = 2] (5b) at (4b.south)
- {\ldots\phantom{l}};
+ {\ldots\phantom{l}};
\node[stack,below of = 5b,node distance = 60pt,rectangle split parts = 2] (6b) at (5b.south)
- {0\nodepart{two} \ldots\phantom{l}};
+ {0\nodepart{two} \ldots\phantom{l}};
-\draw [|->,line width=2.5mm] (A) -- node [above,pos=0.45] {$\textit{index\_aexp}$} (0);
+\draw [|->,line width=2.5mm] (A) -- node [above,pos=0.45] {$\textit{index\_aexp}$} (0);
\draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {\instr{aload}} (1);
-\draw [->,line width=2.5mm] (1) -- node [right,pos=0.35] {\instr{dup2}} (2);
+\draw [->,line width=2.5mm] (1) -- node [right,pos=0.35] {\instr{dup2}} (2);
\draw [->,line width=2.5mm] (2) -- node [above,pos=0.40] {\instr{arraylength}} (3);
-\path[->,draw,line width=2.5mm]
- let \p1=(3.south), \p2=(4a.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) node [right,pos=0.50] {\instr{if_icmple}} -| (4a.north);
-\path[->,draw,line width=2.5mm]
+\path[->,draw,line width=2.5mm]
+ let \p1=(3.south), \p2=(4a.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) node [right,pos=0.50] {\instr{if_icmple}} -| (4a.north);
+\path[->,draw,line width=2.5mm]
let \p1=(3.south), \p2=(4b.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) -| (4b.north);
\draw [->,line width=2.5mm] (4a) -- node [right,pos=0.35] {\instr{swap}} (5a);
-\draw [->,line width=2.5mm] (4b) -- node [right,pos=0.35] {\instr{pop2}} (5b);
+\draw [->,line width=2.5mm] (4b) -- node [right,pos=0.35] {\instr{pop2}} (5b);
\draw [->,line width=2.5mm] (5a) -- node [right,pos=0.35] {\instr{iaload}} (6a);
\draw [->,line width=2.5mm] (5b) -- node [right,pos=0.35] {\instr{iconst_0}} (6b);
-\end{tikzpicture}
+\end{tikzpicture}
\end{center}
\end{figure}
goto\_w problem solved for too large jumps
\end{document}
-%%% Local Variables:
-%%% mode: latex
+%%% Local Variables:
+%%% mode: latex
%%% TeX-master: t
-%%% End:
+%%% End:
--- a/progs/lexer/lexer.sc Wed Feb 21 09:14:12 2024 +0000
+++ b/progs/lexer/lexer.sc Wed May 29 13:25:30 2024 +0100
@@ -59,7 +59,7 @@
case ONE => true
case CHAR(_) => false
case ALT(r1, r2) => nullable(r1) || nullable(r2)
- case SEQ(r1, r2) => nullable(r1) $$ nullable(r2)
+ case SEQ(r1, r2) => nullable(r1) && nullable(r2)
case STAR(_) => true
case RECD(_, r1) => nullable(r1)
}
--- a/progs/parser-combinators/comb1.sc Wed Feb 21 09:14:12 2024 +0000
+++ b/progs/parser-combinators/comb1.sc Wed May 29 13:25:30 2024 +0100
@@ -5,18 +5,18 @@
//
// amm comb1.sc
-
+
// Note, in the lectures I did not show the type bound
-// using is: I => Seq[_], which means that the input
-// type 'I' needs to be a sequence.
+// using is: I => Seq[_], which means that the input
+// type 'I' needs to be a sequence.
-type IsSeq[I] = I => Seq[_]
+type IsSeq[I] = I => Seq[?]
-abstract class Parser[I, T](using is: IsSeq[I]) {
- def parse(in: I): Set[(T, I)]
+abstract class Parser[I: IsSeq, T](using is: IsSeq[I]) {
+ def parse(in: I): Set[(T, I)]
def parse_all(in: I) : Set[T] =
- for ((hd, tl) <- parse(in);
+ for ((hd, tl) <- parse(in);
if is(tl).isEmpty) yield hd
}
@@ -25,21 +25,21 @@
// alternative parser
-class AltParser[I : IsSeq, T](p: => Parser[I, T],
+class AltParser[I : IsSeq, T](p: => Parser[I, T],
q: => Parser[I, T]) extends Parser[I, T] {
- def parse(in: I) = p.parse(in) ++ q.parse(in)
+ def parse(in: I) = p.parse(in) ++ q.parse(in)
}
// sequence parser
-class SeqParser[I: IsSeq, T, S](p: => Parser[I, T],
+class SeqParser[I: IsSeq, T, S](p: => Parser[I, T],
q: => Parser[I, S]) extends Parser[I, (T, S)] {
- def parse(in: I) =
- for ((hd1, tl1) <- p.parse(in);
+ def parse(in: I) =
+ for ((hd1, tl1) <- p.parse(in);
(hd2, tl2) <- q.parse(tl1)) yield ((hd1, hd2), tl2)
}
// map parser
-class MapParser[I : IsSeq, T, S](p: => Parser[I, T],
+class MapParser[I : IsSeq, T, S](p: => Parser[I, T],
f: T => S) extends Parser[I, S] {
def parse(in: I) = for ((hd, tl) <- p.parse(in)) yield (f(hd), tl)
}
@@ -48,7 +48,7 @@
// an example of an atomic parser for characters
case class CharParser(c: Char) extends Parser[String, Char] {
- def parse(in: String) =
+ def parse(in: String) =
if (in != "" && in.head == c) Set((c, in.tail)) else Set()
}
@@ -64,15 +64,15 @@
case class RegexParser(reg: Regex) extends Parser[String, String] {
def parse(in: String) = reg.findPrefixMatchOf(in) match {
case None => Set()
- case Some(m) => Set((m.matched, m.after.toString))
+ case Some(m) => Set((m.matched, m.after.toString))
}
}
-// atomic parsers for numbers and "verbatim" strings
+// atomic parsers for numbers and "verbatim" strings
val NumParser = RegexParser("[0-9]+".r)
def StrParser(s: String) = RegexParser(Regex.quote(s).r)
-NumParser.parse("123a123bc")
+NumParser.parse("123a123bc")
StrParser("else").parse("elsethen")
@@ -82,16 +82,16 @@
val NumParserInt = MapParser(NumParser, (s: String) => s.toInt)
NumParserInt.parse("123abc")
-// the following string interpolation allows us to write
-// StrParser(_some_string_) more conveniently as
+// the following string interpolation allows us to write
+// StrParser(_some_string_) more conveniently as
//
-// p"<_some_string_>"
+// p"<_some_string_>"
-extension (sc: StringContext)
- def p(args: Any*) = StrParser(sc.s(args:_*))
+extension (sc: StringContext)
+ def p(args: Any*) = StrParser(sc.s(args*))
-(p"else").parse("elsethen")
+(p"else").parse("elsethen")
// more convenient syntax for parser combinators
extension [I: IsSeq, T](p: Parser[I, T]) {
@@ -100,11 +100,11 @@
def map[S](f: => T => S) = new MapParser[I, T, S](p, f)
}
-// simple example of transforming the
+// simple example of transforming the
// result into capital letters
def toU(s: String) = s.map(_.toUpper)
-(p"else").map(toU(_)).parse("elseifthen")
+(p"else").map(toU(_)).parse("elseifthen")
// these implicits allow us to use an infix notation for
// sequences and alternatives; we also can write the usual
@@ -121,10 +121,10 @@
// A parser for palindromes (just returns them as string)
// since the parser is recursive it needs to be lazy
lazy val Pal : Parser[String, String] = {
- (p"a" ~ Pal ~ p"a").map{ case ((x, y), z) => s"$x$y$z" } ||
- (p"b" ~ Pal ~ p"b").map{ case ((x, y), z) => s"$x$y$z" } ||
+ (p"a" ~ Pal ~ p"a").map{ case ((x, y), z) => s"$x$y$z" } ||
+ (p"b" ~ Pal ~ p"b").map{ case ((x, y), z) => s"$x$y$z" } ||
p"a" || p"b" || p""
-}
+}
// examples
Pal.parse_all("abacaba")
@@ -132,7 +132,7 @@
println("Palindrome: " + Pal.parse_all("abaaaba"))
-// A parser for wellnested parentheses
+// A parser for wellnested parentheses
//
// P ::= ( P ) P | epsilon
//
@@ -140,7 +140,7 @@
lazy val P : Parser[String, String] = {
(p"(" ~ P ~ p")" ~ P).map{ case (((_, x), _), y) => "{" + x + "}" + y } ||
p""
-}
+}
println(P.parse_all("(((()()))())"))
println(P.parse_all("(((()()))()))"))
@@ -172,8 +172,8 @@
// with parser combinators (and other parsing algorithms)
// no left-recursion is allowed, otherwise the will loop
-lazy val EL: Parser[String, Int] =
- ((EL ~ p"+" ~ EL).map{ case ((x, y), z) => x + z} ||
+lazy val EL: Parser[String, Int] =
+ ((EL ~ p"+" ~ EL).map{ case ((x, y), z) => x + z} ||
(EL ~ p"*" ~ EL).map{ case ((x, y), z) => x * z} ||
(p"(" ~ EL ~ p")").map{ case ((x, y), z) => y} ||
NumParserInt)
@@ -234,7 +234,7 @@
-// a problem with the arithmetic expression parser: it
+// a problem with the arithmetic expression parser: it
// gets very slow with deeply nested parentheses
println("A runtime problem")
--- a/slides/ssy.tex Wed Feb 21 09:14:12 2024 +0000
+++ b/slides/ssy.tex Wed May 29 13:25:30 2024 +0100
@@ -13,8 +13,8 @@
\newtcolorbox{mybox3}[1]{colback=Cyan!5!white,colframe=Cyan!75!black,fonttitle=\bfseries,title=#1}
-
-\hfuzz=220pt
+\setbeamersize{text margin left=2mm} % <- like this
+\hfuzz=220pt
\lstset{language=Scala,
style=mystyle,
@@ -22,20 +22,74 @@
numbers=none,
xleftmargin=0mm}
-\pgfplotsset{compat=1.17}
+\pgfplotsset{compat=1.17}
-\newcommand{\bl}[1]{\textcolor{blue}{#1}}
-
-% beamer stuff
-\renewcommand{\slidecaption}{SSY, King's College London}
+\newcommand{\bl}[1]{\textcolor{blue}{#1}}
+
+% beamer stuff
+\renewcommand{\slidecaption}{Christian Urban, SSY, King's College London}
\begin{document}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\begin{frame}[t]
+\frametitle{\begin{tabular}{@ {\hspace{8mm}}c@ {}}Fast Regular Expression Matching\end{tabular}}
+\mbox{}\\[1mm]
+
+\begin{columns}[t,onlytextwidth]
+\begin{column}{.2\textwidth}
+\raisebox{-10mm}{
+\begin{tikzpicture}
+ \begin{axis}[
+ xlabel={size of strings},
+ x label style={at={(0.45,-0.16)}},
+ ylabel={time in secs},
+ enlargelimits=false,
+ xtick={0,10,...,30},
+ xmax=35,
+ ymax=35,
+ ytick={0,10,...,30},
+ scaled ticks=false,
+ axis lines=left,
+ width=5cm,
+ height=4.5cm,
+ legend entries={Python,JavaScript,Swift,Dart},
+ legend style={font=\footnotesize,at={(0.45,-0.48)},anchor=north},
+ legend cell align=left]
+\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
+\addplot[red,mark=*, mark options={fill=white}] table {re-js.data};
+\addplot[magenta,mark=*, mark options={fill=white}] table {re-swift.data};
+\addplot[brown,mark=*, mark options={fill=white}] table {re-dart.data};
+\end{axis}
+\end{tikzpicture}}
+\end{column}
+
+\end{columns}
+
+\begin{textblock}{4}(3.5,3.2)
+{\bl{\texttt{(a*)*$\cdot$b}}}
+\end{textblock}
+
+\begin{textblock}{7.5}(8,3)
+\begin{itemize}
+\item use \textbf{Brzozowski derivatives} for regex matching rather\\ than NFAs/DFAs% and lexing
+\item based on work by \textbf{Christian Urban} and \textbf{Roy Dyckhoff}\bigskip
+\item applications in network security (traffic filtering)
+\item formal verification of correctness and speed (\textbf{Isabelle} theorem prover)
+\end{itemize}
+\end{textblock}
+
+\end{frame}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+
+\end{document}
+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
-\frametitle{%
+\frametitle{%
\begin{tabular}{@ {}c@ {}}
\\[8mm]
\LARGE Derivative-Based\\
@@ -46,7 +100,7 @@
\end{frame}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[c]
\frametitle{A Bit of Background}
@@ -57,7 +111,7 @@
\begin{textblock}{11}(1.7,10)
\begin{bubble}[9.6cm]
PhD thesis on a particular term-rewriting system:\medskip\\
- Proved that all rewriting sequences are strongly normalising (terminate).
+ Proved that all rewriting sequences are strongly normalising (terminate).
\end{bubble}
\end{textblock}
@@ -69,8 +123,8 @@
& \bl{|} & \bl{$\lambda x.\,t$}\\
& \bl{|} & \bl{$t_1\;t_2$}\\
& \bl{|} & \bl{...}
-\end{tabular}
-\end{bubble}
+\end{tabular}
+\end{bubble}
\end{textblock}}
\end{frame}
@@ -78,14 +132,14 @@
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
\frametitle{Quiz 1}
\begin{bubble}[10cm]\it
There are many, many regular expression libraries.\bigskip\\
-
+
\textbf{Given a regular expression \bl{r} and a string \bl{s}, what is the
difficulty / complexity of the problem deciding whether \bl{r} matches \bl{s}?}
\end{bubble}
@@ -102,9 +156,9 @@
\end{textblock}
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
\frametitle{Quiz 2}
@@ -114,13 +168,13 @@
($\underbrace{\bl{[a\mbox{-}z0\mbox{-}9\_\!\!\_\,.-]^+}}_{\textrm{name}}$)\bl{$\,@\,$}
($\underbrace{\bl{[a\mbox{-}z0\mbox{-}9\,-]^+}}_{\textrm{domain}}$) \bl{$\,.\,$}
($\underbrace{\bl{[a\mbox{-}z\,.]^{\{2..5\}}}}_{\textrm{top-level domain}}$)
-\end{center}
+\end{center}
\end{textblock}
\only<2>{
\begin{textblock}{10}(4,8)
bounded regular expressions:\medskip\\
-\begin{tabular}{ll}
+\begin{tabular}{ll}
\bl{$r^{\{n\}}$} & exactly n-times \bl{$r$}\\
\bl{$r^{\{n..m\}}$} & between n and m-times\\
\bl{$r^{\{n..\}}$} & from n-times\\
@@ -128,12 +182,12 @@
\end{tabular}\smallskip\\
\footnotesize\hfill ${}^*$ in some applications the counters can be in the millions
\end{textblock}}
-
+
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
\frametitle{Quiz 2}
@@ -149,8 +203,8 @@
>=stealth',very thick,
every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20}]
\node[state, initial] (Q_0) {$\mbox{}$};
-\node[state, initial] (Q_01) [below=1mm of Q_0] {$\mbox{}$};
-\node[state, initial] (Q_02) [above=1mm of Q_0] {$\mbox{}$};
+\node[state, initial] (Q_01) [below=1mm of Q_0] {$\mbox{}$};
+\node[state, initial] (Q_02) [above=1mm of Q_0] {$\mbox{}$};
\node (R_1) [right=of Q_0] {$\ldots$};
\node[state, accepting] (T_1) [right=of R_1] {$\mbox{}$};
\node[state, accepting] (T_2) [above=of T_1] {$\mbox{}$};
@@ -181,8 +235,8 @@
>=stealth',very thick,
every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
\node[state, initial] (Q_0) {$\mbox{}$};
-\node[state, initial] (Q_01) [below=1mm of Q_0] {$\mbox{}$};
-\node[state, initial] (Q_02) [above=1mm of Q_0] {$\mbox{}$};
+\node[state, initial] (Q_01) [below=1mm of Q_0] {$\mbox{}$};
+\node[state, initial] (Q_02) [above=1mm of Q_0] {$\mbox{}$};
\node (r_1) [right=of Q_0] {$\ldots$};
\node[state] (t_1) [right=of r_1] {$\mbox{}$};
\node[state] (t_2) [above=of t_1] {$\mbox{}$};
@@ -202,7 +256,7 @@
\path[->] (t_1) edge (A_02);
\path[->] (t_2) edge (A_02);
\path[->] (t_3) edge node [below] {$\varepsilon$\footnotesize{}s} (A_02);
-
+
\begin{pgfonlayer}{background}
\node (3) [rounded corners, inner sep=1mm, thick,
draw=black!60, fill=black!20, fit= (Q_0) (c_1) (c_2) (c_3)] {};
@@ -213,9 +267,9 @@
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
\frametitle{Quiz 2}
@@ -225,14 +279,14 @@
\end{textblock}
\begin{textblock}{12}(4,6)
-\onslide<1>{
+\onslide<1>{
\begin{tikzpicture}[node distance=3mm,
>=stealth',very thick,
every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},
baseline=(current bounding box.north)]
\node (2) {$\mbox{}$};
\node[state, initial] (4) [above=1mm of 2] {$\mbox{}$};
-\node[state, initial] (5) [below=1mm of 2] {$\mbox{}$};
+\node[state, initial] (5) [below=1mm of 2] {$\mbox{}$};
\node (a) [right=of 2] {$\ldots$};
\node[state, accepting] (a1) [right=of a] {$\mbox{}$};
\node[state, accepting] (a2) [above=of a1] {$\mbox{}$};
@@ -245,7 +299,7 @@
\end{textblock}
\begin{textblock}{12}(2,6)
-\onslide<2->{
+\onslide<2->{
\begin{tikzpicture}[node distance=3mm,
>=stealth',very thick,
every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},
@@ -253,7 +307,7 @@
\node at (0,0) [state, initial,accepting] (1) {$\mbox{}$};
\node (2) [right=16mm of 1] {$\mbox{}$};
\node[state] (4) [above=1mm of 2] {$\mbox{}$};
-\node[state] (5) [below=1mm of 2] {$\mbox{}$};
+\node[state] (5) [below=1mm of 2] {$\mbox{}$};
\node (a) [right=of 2] {$\ldots$};
\node[state] (a1) [right=of a] {$\mbox{}$};
\node[state] (a2) [above=of a1] {$\mbox{}$};
@@ -271,30 +325,30 @@
\end{textblock}
\begin{textblock}{12}(2,12)
-\onslide<3->{
+\onslide<3->{
\begin{bubble}[9.5cm]\it
- Quiz:\smallskip\\
+ Quiz:\smallskip\\
\textbf{What is the Thompson Construction for \bl{$r^{\{n\}}$} ?}
\end{bubble}}
\end{textblock}
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
\frametitle{Quiz 3}
-Hierarchy of Languages:
+Hierarchy of Languages:
\begin{center}
\begin{tikzpicture}
-[rect/.style={draw=black!50,
+[rect/.style={draw=black!50,
top color=white,
- bottom color=black!20,
- rectangle,
- very thick,
+ bottom color=black!20,
+ rectangle,
+ very thick,
rounded corners}, scale=1.2]
\draw (0,0) node [rect, text depth=39mm, text width=68mm] {all languages};
@@ -303,36 +357,36 @@
\draw (0,-1.14) node [rect, text depth=9mm, text width=50mm] {context-free languages};
\draw (0,-1.4) node [rect] {regular languages};
\end{tikzpicture}
-\end{center}
+\end{center}
\begin{textblock}{12}(2,11.5)
-\onslide<2->{
+\onslide<2->{
\begin{bubble}[9.5cm]\it
Quiz:\smallskip\\
-
+
\textbf{Can we use standard parsing algorithms for matching / lexing\,?}
Say CYK, LL, LR(k), PEG, \ldots
\end{bubble}}
\end{textblock}
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\defverbatim{\foo}{\footnotesize%
\begin{verbatim}
(?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|
null|undefined|symbol|math)|\`|\-|\+)+[)]*;?((?:\s
-| -|~|!|{}|\|\||\+)*.*(?:.*=.*)))
+| -|~|!|{}|\|\||\+)*.*(?:.*=.*)))
\end{verbatim}
}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t,fragile]
\frametitle{Quiz 4}
\begin{textblock}{12}(2,2.5)
\begin{bubble}[9.5cm]\it
- Quiz:\smallskip\\
+ Quiz:\smallskip\\
Do regular expressions have any security relevance\,?
\end{bubble}
\end{textblock}
@@ -361,18 +415,18 @@
\end{textblock}
}
-
+
\end{frame}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
\frametitle{Quiz 3}
-\begin{textblock}{12}(2,2)
+\begin{textblock}{12}(2,2)
\begin{bubble}[9.5cm]\it
Quiz:\smallskip\\
-
+
\textbf{Can we use standard parsing algorithms for matching / lexing\,?}
Say CYK, LL, LR(k), PEG, \ldots
\end{bubble}
@@ -390,11 +444,11 @@
What should be & \\
the result for: & \bl{\texttt{"iffoo"}} \;\;\; \bl{\texttt{"if"}}\\
-\end{tabular}
+\end{tabular}
\end{textblock}
\only<2->{
-\begin{textblock}{13.5}(1.5,4)
+\begin{textblock}{13.5}(1.5,4)
\begin{mybox3}{POSIX rules}
\begin{itemize}
\item \textbf{The Longest Match Rule:} The longest initial
@@ -402,23 +456,23 @@
\item \textbf{Priority Rule:} For a particular longest initial
substring, the first (leftmost) regular expression that can match
determines the token.
- \item \ldots
+ \item \ldots
\end{itemize}
\begin{center}
\bl{$(a + ab) \cdot (bc + c)$} \;\;and\;\; \bl{\texttt{"}$abc$\texttt{"}}
- \end{center}
+ \end{center}
\end{mybox3}
\end{textblock}}
\end{frame}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
\frametitle{Quiz 1}
\begin{bubble}[10cm]\it
There are many, many regular expression libraries.\bigskip\\
-
+
\textbf{Given a regular expression \bl{r} and a string \bl{s}, what is the
difficulty / complexity of the problem deciding whether \bl{r} matches \bl{s}?}
\end{bubble}
@@ -431,16 +485,16 @@
\end{textblock}
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
-\mbox{}\\[10mm]
+\mbox{}\\[10mm]
-\begin{columns}[t,onlytextwidth]
+\begin{columns}[t,onlytextwidth]
\begin{column}{.2\textwidth}
-\raisebox{-10mm}{
+\raisebox{-10mm}{
\begin{tikzpicture}
\begin{axis}[
xlabel={\bl{$n$} \pcode{a}s},
@@ -454,7 +508,7 @@
scaled ticks=false,
axis lines=left,
width=5cm,
- height=5cm,
+ height=5cm,
legend entries={Python,JavaScript,Swift,Dart},
legend style={font=\small,at={(0.5,-0.39)},anchor=north},
legend cell align=left]
@@ -479,34 +533,34 @@
scaled ticks=false,
axis lines=left,
width=5cm,
- height=5cm,
+ height=5cm,
legend entries={Python,Ruby},
legend style={font=\small,at={(0.5,-0.39)},anchor=north},
legend cell align=left]
\addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
-\addplot[brown,mark=pentagon*, mark options={fill=white}] table {re-ruby.data};
+\addplot[brown,mark=pentagon*, mark options={fill=white}] table {re-ruby.data};
\end{axis}
-\end{tikzpicture}
-\end{column}
+\end{tikzpicture}
+\end{column}
\end{columns}
\begin{textblock}{4}(9,1.7)
\textbf{\bl{\texttt{(a?)\boldmath$^{\{n\}}\cdot$(a)\boldmath$^{\{n\}}$}}}
\end{textblock}
-\begin{textblock}{4}(4,1.7)
+\begin{textblock}{4}(4,1.7)
\textbf{\bl{\texttt{(a*)*$\cdot$b}}}
\end{textblock}
\begin{textblock}{3.4}(0.3,10)
\small{}\textbf{matching with strings}
-\textbf{\bl{$\underbrace{\texttt{a}...\texttt{a}}_n$}}
+\textbf{\bl{$\underbrace{\texttt{a}...\texttt{a}}_n$}}
\end{textblock}
\end{frame}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
\frametitle{Quiz 1}
\begin{bubble}[10cm]\it
@@ -519,25 +573,25 @@
answer is:
\only<2->{\begin{center}
\fontspec{Hoefler Text Black}\textcolor{ProcessBlue}{\huge{}NP}
-\end{center}}
+\end{center}}
\end{textblock}
\begin{textblock}{12}(1.7,12)
\only<3->{Backreferences: \bl{(\ldots)$\,\ldots{}\backslash{}n$}}
-\end{textblock}
+\end{textblock}
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}
\frametitle{Quiz 2}
\begin{textblock}{12}(2,2)
-\onslide<1->{
+\onslide<1->{
\begin{bubble}[9.5cm]\it
- Quiz:\smallskip\\
+ Quiz:\smallskip\\
\textbf{What is the Thompson Construction for \bl{$r^{\{n\}}$} ?}
\end{bubble}}
\end{textblock}
@@ -552,8 +606,8 @@
\end{textblock}}
\only<4>{
-\begin{textblock}{13.5}(1.5,4)
- \begin{mybox3}{From Rust's Regex Description}\it
+\begin{textblock}{13.5}(1.5,4)
+ \begin{mybox3}{From Rust's Regex Description}\it
``\ldots [the] syntax is similar to Perl-style regular expressions, but lacks a few features like look
around and backreferences. In exchange, all searches execute in linear time with respect
to the size of the regular expression and search text. \ldots''
@@ -561,7 +615,7 @@
\end{textblock}}
-\end{frame}
+\end{frame}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -578,16 +632,16 @@
& \bl{$\mid$} & \bl{$r^*$} & star (zero or more)\\
\end{tabular}
\end{textblock}
-
+
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[c]
\frametitle{\begin{tabular}{l}The Derivative of a Rexp\end{tabular}}
\large
-If \bl{$r$} matches the string \bl{$c\!::\!s$}, what is a regular
+If \bl{$r$} matches the string \bl{$c\!::\!s$}, what is a regular
expression that matches just \bl{$s$}?\bigskip\bigskip\bigskip\bigskip
\small
@@ -607,14 +661,14 @@
\bl{$\der\, c\, (d)$} & \bl{$\dn$} & \bl{if $c = d$ then $\ONE$ else $\ZERO$} & \\
\bl{$\der\, c\, (r_1 + r_2)$} & \bl{$\dn$} & \bl{$\der\, c\, r_1 + \der\, c\, r_2$} & \\
\bl{$\der\, c\, (r_1 \cdot r_2)$} & \bl{$\dn$} & \bl{if $nullable (r_1)$}\\
- & & \bl{then $(\der\,c\,r_1) \cdot r_2 + \der\, c\, r_2$}\\
+ & & \bl{then $(\der\,c\,r_1) \cdot r_2 + \der\, c\, r_2$}\\
& & \bl{else $(\der\, c\, r_1) \cdot r_2$}\\
\bl{$\der\, c\, (r^*)$} & \bl{$\dn$} & \bl{$(\der\,c\,r) \cdot (r^*)$} &\medskip\bigskip\\
\end{tabular}
\end{center}
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
@@ -630,14 +684,14 @@
& test whether \bl{$r_4$} can recognise\\
& the empty string\medskip\\
Output: & result of the test\\
- & $\Rightarrow \bl{\textit{true}} \,\text{or}\,
- \bl{\textit{false}}$\\
+ & $\Rightarrow \bl{\textit{true}} \,\text{or}\,
+ \bl{\textit{false}}$\\
\end{tabular}
\end{center}
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[c]
@@ -650,14 +704,14 @@
\bl{$nullable(\ZERO)$} & \bl{$\dn$} & \bl{\textit{false}}\\
\bl{$nullable(\ONE)$} & \bl{$\dn$} & \bl{\textit{true}}\\
\bl{$nullable (c)$} & \bl{$\dn$} & \bl{\textit{false}}\\
-\bl{$nullable (r_1 + r_2)$} & \bl{$\dn$} & \bl{$nullable(r_1) \vee nullable(r_2)$} \\
+\bl{$nullable (r_1 + r_2)$} & \bl{$\dn$} & \bl{$nullable(r_1) \vee nullable(r_2)$} \\
\bl{$nullable (r_1 \cdot r_2)$} & \bl{$\dn$} & \bl{$nullable(r_1) \wedge nullable(r_2)$} \\
\bl{$nullable (r^*)$} & \bl{$\dn$} & \bl{\textit{true}}\\
\end{tabular}
\end{center}
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
@@ -679,14 +733,14 @@
also works for lookarounds, various anchors, etc, but \textbf{not} for backreferences
\begin{textblock}{3}(11,13)
-\onslide<2>{
+\onslide<2>{
\begin{bubble}[3cm]
\bl{$a^{\{0\}\{4294967295\}}$}
\end{bubble}}
\end{textblock}
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[t]
@@ -705,38 +759,37 @@
\end{tabular}
\end{center}}
-\begin{textblock}{13.5}(1.5,12)
+\begin{textblock}{13.5}(1.5,12)
(regular expressions of sizes 98, 169, 283, 468, 767, \ldots)
\end{textblock}
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}[c]
\frametitle{\mbox{Conclusion}}
\begin{itemize}
-\item The beauty is that this only involves functional programs that can be conveniently reasoned about in theorem provers
+\item The beauty is that this only involves functional programs that can be conveniently reasoned about in theorem provers
\item POSIX lexing can be done via an extension by Sulzmann and Lu, 2014 (our current work)\medskip\pause
-\item How surprising that one can still do new work on regular expressions.
-\end{itemize}
+\item How surprising that one can still do new work on regular expressions.
+\end{itemize}
\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}<1-10>
-\end{frame}
+\end{frame}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{document}
-%%% Local Variables:
+%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
-%%% End:
-
+%%% End: