--- a/ChengsongTanPhdThesis/Chapters/Introduction.tex Wed Oct 12 14:08:06 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex Sun Nov 06 23:05:47 2022 +0000
@@ -780,13 +780,13 @@
% \end{tabular}
%\end{center}
A concrete example
-for back-references would be
+for back-references is
\begin{center}
$(.^*)\backslash 1$,
\end{center}
-which would match
+which matches
strings that can be split into two identical halves,
-for example $\mathit{foofoo}$, $\mathit{ww}$ and etc.
+for example $\mathit{foofoo}$, $\mathit{ww}$ and so on.
Note that this is different from
repeating the sub-expression verbatim like
\begin{center}
@@ -795,32 +795,32 @@
which does not impose any restrictions on what strings the second
sub-expression $.^*$
might match.
-Another example of back-references would be
+Another example of back-references is
\begin{center}
$(.)(.)\backslash 2\backslash 1$
\end{center}
-which expresses four-character palindromes
-like $abba$, $x??x$ etc.
+which matches four-character palindromes
+like $abba$, $x??x$ and so on.
Back-references is a regex construct
-that programmers found quite useful.
+that programmers find quite useful.
According to Becchi and Crawley\cite{Becchi08},
-6\% of Snort rules (up until 2008) include the use of them.
+6\% of Snort rules (up until 2008) use them.
The most common use of back-references
-would be expressing well-formed html files,
-where back-references would be handy in expressing
-a pair of opening and closing tags like
+is to express well-formed html files,
+where back-references are convenient for matching
+opening and closing tags like
\begin{center}
$\langle html \rangle \ldots \langle / html \rangle$
\end{center}
A regex describing such a format
-could be
+is
\begin{center}
$\langle (.^+) \rangle \ldots \langle / \backslash 1 \rangle$
\end{center}
-Despite being useful, the syntax and expressive power of regexes
+Despite being useful, the expressive power of regexes
go beyond the regular language hierarchy
-with back-references.
+once back-references are included.
In fact, they allow the regex construct to express
languages that cannot be contained in context-free
languages either.
@@ -829,17 +829,15 @@
which cannot be expressed by context-free grammars\parencite{campeanu2003formal}.
Such a language is contained in the context-sensitive hierarchy
of formal languages.
-Solving the back-reference expressions matching problem
+Also solving the matching problem involving back-references
is known to be NP-complete \parencite{alfred2014algorithms}.
-A non-bactracking,
-efficient solution is not known to exist.
Regex libraries supporting back-references such as
PCRE \cite{pcre} therefore have to
revert to a depth-first search algorithm which backtracks.
What is unexpected is that even in the cases
not involving back-references, there is still
a (non-negligible) chance they might backtrack super-linearly,
-as shown in the graphs in \ref{fig:aStarStarb}.
+as shown in the graphs in figure\ref{fig:aStarStarb}.
\subsection{Summary of the Catastrophic Backtracking Problem}
Summing these up, we can categorise existing