text/proposal.tex
changeset 359 fedc16924b76
parent 305 6e2cef17a9b3
--- a/text/proposal.tex	Wed Sep 18 16:35:57 2019 +0100
+++ b/text/proposal.tex	Sat Oct 24 12:13:39 2020 +0100
@@ -1,9 +1,10 @@
+% !TEX program = pfdlatex
 \documentclass{article}
 \usepackage{url}
 \usepackage{color}\definecolor{darkblue}{rgb}{0,0,0.5}
 \usepackage[a4paper,colorlinks=true,linkcolor=darkblue,citecolor=darkblue,filecolor=darkblue,pagecolor=darkblue,urlcolor=darkblue]{hyperref}
 \usepackage{style}
-\usepackage{langs}
+%\usepackage{langs}
 \usepackage{tikz}
 \usepackage{pgf}
 \usepackage{pgfplots}
@@ -11,12 +12,12 @@
 
 \addtolength{\oddsidemargin}{-6mm}
 \addtolength{\evensidemargin}{-6mm}
-\addtolength{\textwidth}{12mm}
+\addtolength{\textwidth}{12mm} 
 
 
 
 %% \usepackage{accents}
-\newcommand\barbelow[1]{\stackunder[1.2pt]{#1}{\raisebox{-4mm}{\boldmath$\uparrow$}}}
+\newcommand\barbelow[1]{\stackunbder[1.2pt]{#1}{\raisebox{-4mm}{\boldmath$\uparrow$}}}
 
 \begin{filecontents}{re-python2.data}
 1 0.033
@@ -88,7 +89,7 @@
   
 \section*{PPProposal: \\
   Fast Regular Expression Matching\\
-  with Brzozowski Derivatives}
+  with Brzozowski bderivatives}
 
 \subsubsection*{Summary}
 
@@ -102,19 +103,19 @@
 network intrusion prevention systems that use regular expressions for
 traffic analysis are Snort and Bro.
 
-Given the large
-volume of Internet traffic even the smallest servers can handle
-nowadays, the regular expressions for traffic analysis have become a
-real bottleneck.  This is not just a nuisance, but actually a security
-vulnerability in itself: it can lead to denial-of-service attacks.
-The proposed project aims to remove this bottleneck by using a little-known
-technique of building derivatives of regular expressions. These
-derivatives have not yet been used in the area of network traffic
-analysis, but have the potential to solve some tenacious problems with
-existing approaches. The project will require theoretical work, but
-also a practical implementation (a proof-of-concept at least).  The
-work will build on earlier work by Ausaf and Urban from King's College
-London \cite{AusafDyckhoffUrban2016}.
+Given the large volume of Internet traffic even the smallest servers
+can handle nowadays, the regular expressions for traffic analysis have
+become a real bottleneck.  This is not just a nuisance, but actually a
+security vulnerability in itself: it can lead to denial-of-service
+attacks.  The proposed project aims to remove this bottleneck by using
+a little-known technique of building bderivatives of regular
+expressions. These derivatives have not yet been used in the area of
+network traffic analysis, but have the potential to solve some
+tenacious problems with existing approaches. The project will require
+theoretical work, but also a practical implementation (a
+proof-of-concept at least).  The work will build on earlier work by
+Ausaf and Urban from King's College London
+\cite{AusafDyckhoffUrban2016}.
 
 
 \subsubsection*{Background}
@@ -168,10 +169,10 @@
 mistakes that can be abused to gain access to a computer. This means
 server administrators have a suite of sometimes thousands of regular
 expressions, all prescribing some suspicious pattern for when a
-computer is under attack.
+computer is unbder attack.
 
 
-The underlying algorithmic problem is to decide whether a string in
+The unbderlying algorithmic problem is to decide whether a string in
 the logs matches the patterns determined by the regular
 expressions. Such decisions need to be done as fast as possible,
 otherwise the scanning programs would not be useful as a hardening
@@ -207,7 +208,7 @@
 \begin{center}
 \begin{tabular}{@{}cc@{}}
 %\multicolumn{2}{c}{Graph: $(a^*)^*\cdot b$ and strings 
-%           $\underbrace{a\ldots a}_{n}$}\smallskip\\  
+%           $\unbderbrace{a\ldots a}_{n}$}\smallskip\\  
 \begin{tikzpicture}
 \begin{axis}[
     xlabel={$n$},
@@ -259,7 +260,7 @@
 matchers in the popular programming languages Python and Java (Version
 8 and Version 9). There are many more regular expressions that show
 the same behaviour.  On the left-hand side the graphs show that for a
-string consisting of just 28 $a\,$'s, Python and the older Java (which
+string consisting of just 28 $a\,$'s, Python and the olbder Java (which
 was the latest version of Java until September 2017) already need 30
 seconds to decide whether this string is matched or not. This is an
 abysmal result given that Python and Java are very popular programming
@@ -272,7 +273,7 @@
 $a\,$'s in 30 seconds---however this would still not be good enough
 for being a useful tool for network traffic analysis. What is
 interesting is that even a relatively simple-minded regular expression
-matcher based on Brzozowski derivatives can out-compete the regular
+matcher based on Brzozowski bderivatives can out-compete the regular
 expressions matchers in Java and Python on such catastrophic
 backtracking examples. This gain in speed is the motivation and
 starting point for the proposed project. \smallskip
@@ -302,21 +303,21 @@
 
 \noindent
 For practical purposes, however, regular expressions have been
-extended in various ways in order to make them even more powerful and
+extended in various ways in orbder to make them even more powerful and
 even more useful for applications. Some of the problems to do with
 catastrophic backtracking stem, unfortunately, from these extensions.
 
 The crux in this project is to not use automata for regular expression
-matching (the traditional technique), but to use derivatives of
-regular expressions instead.  These derivatives were introduced by
+matching (the traditional technique), but to use bderivatives of
+regular expressions instead.  These bderivatives were introduced by
 Brzozowski in 1964 \cite{Brzozowski1964}, but somehow had been lost
 ``in the sands of time'' \cite{Owens2009}. However, they recently have
 become again a ``hot'' research topic with numerous research papers
 appearing in the last five years. One reason for this interest is that
-Brzozowski derivatives straightforwardly extend to more general
+Brzozowski bderivatives straightforwardly extend to more general
 regular expression constructors, which cannot be easily treated with
 standard techniques. They can also be implemented with ease in any
-functional programming language: the definition for derivatives
+functional programming language: the definition for bderivatives
 consists of just two simple recursive functions shown in
 Figure~\ref{f}. Moreover, it can be easily shown (by a mathematical
 proof) that the resulting regular matcher is in fact always producing
@@ -336,17 +337,17 @@
 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\medskip\\
 
-$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
-$\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
-$\textit{der}\;c\;(d)$     & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
-$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
-$\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
-      & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
-      & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
-$\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\[-5mm]
+$\textit{bder}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
+$\textit{bder}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
+$\textit{bder}\;c\;(d)$     & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
+$\textit{bder}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{bder}\;c\;r_1) + (\textit{bder}\;c\;r_2)$\\
+$\textit{bder}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
+      & & $\textit{then}\;((\textit{bder}\;c\;r_1)\cdot r_2) + (\textit{bder}\;c\;r_2)$\\
+      & & $\textit{else}\;(\textit{bder}\;c\;r_1)\cdot r_2$\\
+$\textit{bder}\;c\;(r^*)$ & $\dn$ & $(\textit{bder}\;c\;r)\cdot (r^*)$\\[-5mm]
 \end{tabular}
 \end{center}
-\caption{The complete definition of derivatives of regular expressions
+\caption{The complete definition of bderivatives of regular expressions
   \cite{Brzozowski1964}.
   This definition can be implemented with ease in functional programming languages and can be easily extended to regular expressions used in practice.
   The are more flexible for applications and easier to manipulate in
@@ -354,17 +355,17 @@
   matching.\medskip\label{f}}
 \end{figure}
 
-There are a number of avenues for research on Brzozowski derivatives.
+There are a number of avenues for research on Brzozowski bderivatives.
 One problem I like to immediately tackle in this project is how to
 handle \emph{back-references} in regular expressions. It is known that
-the inclusion of back-references causes the underlying algorithmic
+the inclusion of back-references causes the unbderlying algorithmic
 problem to become NP-hard \cite{Aho}, and is the main reason why
 regular expression matchers suffer from the catastrophic backtracking
 problem. However, the full generality of back-references are not
 needed in practical applications. The goal therefore is to sufficiently
 restrict the problem so that an efficient algorithm can be
 designed. There seem to be good indications that Brzozowski
-derivatives are useful for that.
+bderivatives are useful for that.
 
 Another problem is \emph{how} regular expressions match a string
 \cite{Sulzmann2014}. In this case the algorithm does not just give a
@@ -390,8 +391,8 @@
 from the truth. In fact regular expressions have become a ``hot''
 research topic in recent years because of problems with the
 traditional methods (in applications such as network traffic
-analysis), but also because the technique of derivatives of regular
-expression has been re-discovered. These derivatives provide an
+analysis), but also because the technique of bderivatives of regular
+expression has been re-discovered. These bderivatives provide an
 alternative approach to regular expression matching. Their potential
 lies in the fact that they are more flexible in implementations and
 much easier to manipulate in mathematical proofs. Therefore I like to
@@ -420,6 +421,24 @@
 \renewcommand{\refname}{References\\[-7mm]\mbox{}}
 \bibliography{proposal}
 
+
+\newpage
+\begin{center}\small 
+\begin{tabular}{lcl}
+$\textit{bder}\;c\;_{[]}(\ZERO)$ & $\dn$ & $_{[]}\ZERO$\\
+$\textit{bder}\;c\;_{bs}(\ONE)$  & $\dn$ & $_{[]}\ZERO$\\
+$\textit{bder}\;c\;_{bs}(d)$     & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;_{bs}\ONE \; \textit{else} \;_{[]}\ZERO$\\
+$\textit{bder}\;c\;_{bs}(r_1 + r_2)$ & $\dn$ & $_{bs}(\textit{bder}\;c\;r_1 + \textit{bder}\;c\;r_2)$\\
+$\textit{bder}\;c\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
+      & & $\textit{then}\;_{bs}(_{[]}(\textit{bder}\;c\;r_1)\cdot r_2) + (_{\textit{bmkeps}\,r_1\rightarrow}\textit{bder}\;c\;r_2)$\\
+      & & $\textit{else}\;_{bs}(\textit{bder}\;c\;r_1)\cdot r_2$\\
+$\textit{bder}\;c\;_{bs}(r^*)$ & $\dn$ & $_{bs}(_{0\rightarrow}\textit{bder}\;c\;r)\cdot (r^*)$\\[-5mm]
+\end{tabular}
+\end{center}
+
+
+
+
 \end{document}