diff -r 96e009a446d5 -r d50a309a0645 ChengsongTanPhdThesis/Chapters/Introduction.tex --- a/ChengsongTanPhdThesis/Chapters/Introduction.tex Thu Dec 01 08:49:19 2022 +0000 +++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex Mon Dec 05 17:39:10 2022 +0000 @@ -265,6 +265,7 @@ requests. This made the whole site become unavailable. \begin{figure}[p] +\begin{center} \begin{tabular}{@{}c@{\hspace{0mm}}c@{}} \begin{tikzpicture} \begin{axis}[ @@ -393,6 +394,7 @@ \end{tikzpicture}\\ \multicolumn{2}{c}{Graphs} \end{tabular} +\end{center} \caption{Graphs showing runtime for matching $(a^*)^*\,b$ with strings of the form $\protect\underbrace{aa..a}_{n}$ in various existing regular expression libraries. The reason for their superlinear behaviour is that they do a depth-first-search @@ -444,7 +446,7 @@ w.r.t the input. This assumes that the regular expression $r$ was pre-processed and turned into a -deterministic finite automaton (DFA) before matching\cite{Sakarovitch2009}. +deterministic finite automaton (DFA) before matching~\cite{Sakarovitch2009}. By basic we mean textbook definitions such as the one below, involving only regular expressions for characters, alternatives, sequences, and Kleene stars: @@ -533,7 +535,7 @@ algorithms or algorithms that consume large amounts of memory. Implementations using $\DFA$s will either become excruciatingly slow -(for example Verbatim++\cite{Verbatimpp}) or get +(for example Verbatim++~\cite{Verbatimpp}) or get out of memory errors (for example $\mathit{LEX}$ and $\mathit{JFLEX}$\footnote{which are lexer generators in C and JAVA that generate $\mathit{DFA}$-based @@ -633,7 +635,7 @@ can be above 1000, and in the built-in Rust regular expression library expressions such as $a^{\{1000\}\{100\}\{5\}}$ give an error message for being too big. -As Becchi and Crawley\cite{Becchi08} have pointed out, +As Becchi and Crawley~\cite{Becchi08} have pointed out, the reason for these restrictions is that they simulate a non-deterministic finite automata (NFA) with a breadth-first search. @@ -815,7 +817,7 @@ Back-references is a regex construct that programmers find quite useful. -According to Becchi and Crawley\cite{Becchi08}, +According to Becchi and Crawley~\cite{Becchi08}, 6\% of Snort rules (up until 2008) use them. The most common use of back-references is to express well-formed html files, @@ -1202,11 +1204,11 @@ One of the most recent work in the context of lexing %with this issue -is the Verbatim lexer by Egolf, Lasser and Fisher\cite{Verbatim}. +is the Verbatim lexer by Egolf, Lasser and Fisher~\cite{Verbatim}. This is relevant work for us and we will compare it later with our derivative-based matcher we are going to present. There is also some newer work called -Verbatim++\cite{Verbatimpp}, which does not use derivatives, +Verbatim++~\cite{Verbatimpp}, which does not use derivatives, but deterministic finite automaton instead. %An example that gives problem to automaton approaches would be %the regular expression $(a|b)^*a(a|b)^{\{n\}}$.