--- a/text/proposal.tex Wed Sep 18 16:35:57 2019 +0100
+++ b/text/proposal.tex Sat Oct 24 12:13:39 2020 +0100
@@ -1,9 +1,10 @@
+% !TEX program = pfdlatex
\documentclass{article}
\usepackage{url}
\usepackage{color}\definecolor{darkblue}{rgb}{0,0,0.5}
\usepackage[a4paper,colorlinks=true,linkcolor=darkblue,citecolor=darkblue,filecolor=darkblue,pagecolor=darkblue,urlcolor=darkblue]{hyperref}
\usepackage{style}
-\usepackage{langs}
+%\usepackage{langs}
\usepackage{tikz}
\usepackage{pgf}
\usepackage{pgfplots}
@@ -11,12 +12,12 @@
\addtolength{\oddsidemargin}{-6mm}
\addtolength{\evensidemargin}{-6mm}
-\addtolength{\textwidth}{12mm}
+\addtolength{\textwidth}{12mm}
%% \usepackage{accents}
-\newcommand\barbelow[1]{\stackunder[1.2pt]{#1}{\raisebox{-4mm}{\boldmath$\uparrow$}}}
+\newcommand\barbelow[1]{\stackunbder[1.2pt]{#1}{\raisebox{-4mm}{\boldmath$\uparrow$}}}
\begin{filecontents}{re-python2.data}
1 0.033
@@ -88,7 +89,7 @@
\section*{PPProposal: \\
Fast Regular Expression Matching\\
- with Brzozowski Derivatives}
+ with Brzozowski bderivatives}
\subsubsection*{Summary}
@@ -102,19 +103,19 @@
network intrusion prevention systems that use regular expressions for
traffic analysis are Snort and Bro.
-Given the large
-volume of Internet traffic even the smallest servers can handle
-nowadays, the regular expressions for traffic analysis have become a
-real bottleneck. This is not just a nuisance, but actually a security
-vulnerability in itself: it can lead to denial-of-service attacks.
-The proposed project aims to remove this bottleneck by using a little-known
-technique of building derivatives of regular expressions. These
-derivatives have not yet been used in the area of network traffic
-analysis, but have the potential to solve some tenacious problems with
-existing approaches. The project will require theoretical work, but
-also a practical implementation (a proof-of-concept at least). The
-work will build on earlier work by Ausaf and Urban from King's College
-London \cite{AusafDyckhoffUrban2016}.
+Given the large volume of Internet traffic even the smallest servers
+can handle nowadays, the regular expressions for traffic analysis have
+become a real bottleneck. This is not just a nuisance, but actually a
+security vulnerability in itself: it can lead to denial-of-service
+attacks. The proposed project aims to remove this bottleneck by using
+a little-known technique of building bderivatives of regular
+expressions. These derivatives have not yet been used in the area of
+network traffic analysis, but have the potential to solve some
+tenacious problems with existing approaches. The project will require
+theoretical work, but also a practical implementation (a
+proof-of-concept at least). The work will build on earlier work by
+Ausaf and Urban from King's College London
+\cite{AusafDyckhoffUrban2016}.
\subsubsection*{Background}
@@ -168,10 +169,10 @@
mistakes that can be abused to gain access to a computer. This means
server administrators have a suite of sometimes thousands of regular
expressions, all prescribing some suspicious pattern for when a
-computer is under attack.
+computer is unbder attack.
-The underlying algorithmic problem is to decide whether a string in
+The unbderlying algorithmic problem is to decide whether a string in
the logs matches the patterns determined by the regular
expressions. Such decisions need to be done as fast as possible,
otherwise the scanning programs would not be useful as a hardening
@@ -207,7 +208,7 @@
\begin{center}
\begin{tabular}{@{}cc@{}}
%\multicolumn{2}{c}{Graph: $(a^*)^*\cdot b$ and strings
-% $\underbrace{a\ldots a}_{n}$}\smallskip\\
+% $\unbderbrace{a\ldots a}_{n}$}\smallskip\\
\begin{tikzpicture}
\begin{axis}[
xlabel={$n$},
@@ -259,7 +260,7 @@
matchers in the popular programming languages Python and Java (Version
8 and Version 9). There are many more regular expressions that show
the same behaviour. On the left-hand side the graphs show that for a
-string consisting of just 28 $a\,$'s, Python and the older Java (which
+string consisting of just 28 $a\,$'s, Python and the olbder Java (which
was the latest version of Java until September 2017) already need 30
seconds to decide whether this string is matched or not. This is an
abysmal result given that Python and Java are very popular programming
@@ -272,7 +273,7 @@
$a\,$'s in 30 seconds---however this would still not be good enough
for being a useful tool for network traffic analysis. What is
interesting is that even a relatively simple-minded regular expression
-matcher based on Brzozowski derivatives can out-compete the regular
+matcher based on Brzozowski bderivatives can out-compete the regular
expressions matchers in Java and Python on such catastrophic
backtracking examples. This gain in speed is the motivation and
starting point for the proposed project. \smallskip
@@ -302,21 +303,21 @@
\noindent
For practical purposes, however, regular expressions have been
-extended in various ways in order to make them even more powerful and
+extended in various ways in orbder to make them even more powerful and
even more useful for applications. Some of the problems to do with
catastrophic backtracking stem, unfortunately, from these extensions.
The crux in this project is to not use automata for regular expression
-matching (the traditional technique), but to use derivatives of
-regular expressions instead. These derivatives were introduced by
+matching (the traditional technique), but to use bderivatives of
+regular expressions instead. These bderivatives were introduced by
Brzozowski in 1964 \cite{Brzozowski1964}, but somehow had been lost
``in the sands of time'' \cite{Owens2009}. However, they recently have
become again a ``hot'' research topic with numerous research papers
appearing in the last five years. One reason for this interest is that
-Brzozowski derivatives straightforwardly extend to more general
+Brzozowski bderivatives straightforwardly extend to more general
regular expression constructors, which cannot be easily treated with
standard techniques. They can also be implemented with ease in any
-functional programming language: the definition for derivatives
+functional programming language: the definition for bderivatives
consists of just two simple recursive functions shown in
Figure~\ref{f}. Moreover, it can be easily shown (by a mathematical
proof) that the resulting regular matcher is in fact always producing
@@ -336,17 +337,17 @@
$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
$\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\medskip\\
-$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
-$\textit{der}\;c\;(\ONE)$ & $\dn$ & $\ZERO$\\
-$\textit{der}\;c\;(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
-$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
-$\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
- & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
- & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
-$\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\[-5mm]
+$\textit{bder}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
+$\textit{bder}\;c\;(\ONE)$ & $\dn$ & $\ZERO$\\
+$\textit{bder}\;c\;(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
+$\textit{bder}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{bder}\;c\;r_1) + (\textit{bder}\;c\;r_2)$\\
+$\textit{bder}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
+ & & $\textit{then}\;((\textit{bder}\;c\;r_1)\cdot r_2) + (\textit{bder}\;c\;r_2)$\\
+ & & $\textit{else}\;(\textit{bder}\;c\;r_1)\cdot r_2$\\
+$\textit{bder}\;c\;(r^*)$ & $\dn$ & $(\textit{bder}\;c\;r)\cdot (r^*)$\\[-5mm]
\end{tabular}
\end{center}
-\caption{The complete definition of derivatives of regular expressions
+\caption{The complete definition of bderivatives of regular expressions
\cite{Brzozowski1964}.
This definition can be implemented with ease in functional programming languages and can be easily extended to regular expressions used in practice.
The are more flexible for applications and easier to manipulate in
@@ -354,17 +355,17 @@
matching.\medskip\label{f}}
\end{figure}
-There are a number of avenues for research on Brzozowski derivatives.
+There are a number of avenues for research on Brzozowski bderivatives.
One problem I like to immediately tackle in this project is how to
handle \emph{back-references} in regular expressions. It is known that
-the inclusion of back-references causes the underlying algorithmic
+the inclusion of back-references causes the unbderlying algorithmic
problem to become NP-hard \cite{Aho}, and is the main reason why
regular expression matchers suffer from the catastrophic backtracking
problem. However, the full generality of back-references are not
needed in practical applications. The goal therefore is to sufficiently
restrict the problem so that an efficient algorithm can be
designed. There seem to be good indications that Brzozowski
-derivatives are useful for that.
+bderivatives are useful for that.
Another problem is \emph{how} regular expressions match a string
\cite{Sulzmann2014}. In this case the algorithm does not just give a
@@ -390,8 +391,8 @@
from the truth. In fact regular expressions have become a ``hot''
research topic in recent years because of problems with the
traditional methods (in applications such as network traffic
-analysis), but also because the technique of derivatives of regular
-expression has been re-discovered. These derivatives provide an
+analysis), but also because the technique of bderivatives of regular
+expression has been re-discovered. These bderivatives provide an
alternative approach to regular expression matching. Their potential
lies in the fact that they are more flexible in implementations and
much easier to manipulate in mathematical proofs. Therefore I like to
@@ -420,6 +421,24 @@
\renewcommand{\refname}{References\\[-7mm]\mbox{}}
\bibliography{proposal}
+
+\newpage
+\begin{center}\small
+\begin{tabular}{lcl}
+$\textit{bder}\;c\;_{[]}(\ZERO)$ & $\dn$ & $_{[]}\ZERO$\\
+$\textit{bder}\;c\;_{bs}(\ONE)$ & $\dn$ & $_{[]}\ZERO$\\
+$\textit{bder}\;c\;_{bs}(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;_{bs}\ONE \; \textit{else} \;_{[]}\ZERO$\\
+$\textit{bder}\;c\;_{bs}(r_1 + r_2)$ & $\dn$ & $_{bs}(\textit{bder}\;c\;r_1 + \textit{bder}\;c\;r_2)$\\
+$\textit{bder}\;c\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
+ & & $\textit{then}\;_{bs}(_{[]}(\textit{bder}\;c\;r_1)\cdot r_2) + (_{\textit{bmkeps}\,r_1\rightarrow}\textit{bder}\;c\;r_2)$\\
+ & & $\textit{else}\;_{bs}(\textit{bder}\;c\;r_1)\cdot r_2$\\
+$\textit{bder}\;c\;_{bs}(r^*)$ & $\dn$ & $_{bs}(_{0\rightarrow}\textit{bder}\;c\;r)\cdot (r^*)$\\[-5mm]
+\end{tabular}
+\end{center}
+
+
+
+
\end{document}