--- a/ChengsongTanPhdThesis/Chapters/Introduction.tex Thu Dec 01 08:49:19 2022 +0000
+++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex Mon Dec 05 17:39:10 2022 +0000
@@ -265,6 +265,7 @@
requests. This made the whole site become unavailable.
\begin{figure}[p]
+\begin{center}
\begin{tabular}{@{}c@{\hspace{0mm}}c@{}}
\begin{tikzpicture}
\begin{axis}[
@@ -393,6 +394,7 @@
\end{tikzpicture}\\
\multicolumn{2}{c}{Graphs}
\end{tabular}
+\end{center}
\caption{Graphs showing runtime for matching $(a^*)^*\,b$ with strings
of the form $\protect\underbrace{aa..a}_{n}$ in various existing regular expression libraries.
The reason for their superlinear behaviour is that they do a depth-first-search
@@ -444,7 +446,7 @@
w.r.t the input.
This assumes that the regular expression
$r$ was pre-processed and turned into a
-deterministic finite automaton (DFA) before matching\cite{Sakarovitch2009}.
+deterministic finite automaton (DFA) before matching~\cite{Sakarovitch2009}.
By basic we mean textbook definitions such as the one
below, involving only regular expressions for characters, alternatives,
sequences, and Kleene stars:
@@ -533,7 +535,7 @@
algorithms or algorithms that consume large amounts of memory.
Implementations using $\DFA$s will
either become excruciatingly slow
-(for example Verbatim++\cite{Verbatimpp}) or get
+(for example Verbatim++~\cite{Verbatimpp}) or get
out of memory errors (for example $\mathit{LEX}$ and
$\mathit{JFLEX}$\footnote{which are lexer generators
in C and JAVA that generate $\mathit{DFA}$-based
@@ -633,7 +635,7 @@
can be above 1000, and in the built-in Rust regular expression library
expressions such as $a^{\{1000\}\{100\}\{5\}}$ give an error message
for being too big.
-As Becchi and Crawley\cite{Becchi08} have pointed out,
+As Becchi and Crawley~\cite{Becchi08} have pointed out,
the reason for these restrictions
is that they simulate a non-deterministic finite
automata (NFA) with a breadth-first search.
@@ -815,7 +817,7 @@
Back-references is a regex construct
that programmers find quite useful.
-According to Becchi and Crawley\cite{Becchi08},
+According to Becchi and Crawley~\cite{Becchi08},
6\% of Snort rules (up until 2008) use them.
The most common use of back-references
is to express well-formed html files,
@@ -1202,11 +1204,11 @@
One of the most recent work in the context of lexing
%with this issue
-is the Verbatim lexer by Egolf, Lasser and Fisher\cite{Verbatim}.
+is the Verbatim lexer by Egolf, Lasser and Fisher~\cite{Verbatim}.
This is relevant work for us and we will compare it later with
our derivative-based matcher we are going to present.
There is also some newer work called
-Verbatim++\cite{Verbatimpp}, which does not use derivatives,
+Verbatim++~\cite{Verbatimpp}, which does not use derivatives,
but deterministic finite automaton instead.
%An example that gives problem to automaton approaches would be
%the regular expression $(a|b)^*a(a|b)^{\{n\}}$.