ChengsongTanPhdThesis/Chapters/Introduction.tex
changeset 618 233cf2b97d1a
parent 612 8c234a1bc7e0
child 620 ae6010c14e49
--- a/ChengsongTanPhdThesis/Chapters/Introduction.tex	Wed Oct 12 14:08:06 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex	Sun Nov 06 23:05:47 2022 +0000
@@ -780,13 +780,13 @@
 %	\end{tabular}
 %\end{center}
 A concrete example
-for back-references would be
+for back-references is
 \begin{center}
 $(.^*)\backslash 1$,
 \end{center}
-which would match 
+which matches
 strings that can be split into two identical halves,
-for example $\mathit{foofoo}$, $\mathit{ww}$ and etc.
+for example $\mathit{foofoo}$, $\mathit{ww}$ and so on.
 Note that this is different from 
 repeating the  sub-expression verbatim like
 \begin{center}
@@ -795,32 +795,32 @@
 which does not impose any restrictions on what strings the second 
 sub-expression $.^*$
 might match.
-Another example of back-references would be
+Another example of back-references is
 \begin{center}
 $(.)(.)\backslash 2\backslash 1$
 \end{center}
-which expresses four-character palindromes
-like $abba$, $x??x$ etc.
+which matches four-character palindromes
+like $abba$, $x??x$ and so on.
 
 Back-references is a regex construct 
-that programmers found quite useful.
+that programmers find quite useful.
 According to Becchi and Crawley\cite{Becchi08},
-6\% of Snort rules (up until 2008) include the use of them.
+6\% of Snort rules (up until 2008) use them.
 The most common use of back-references
-would be expressing well-formed html files,
-where back-references would be handy in expressing
-a pair of opening and closing tags like 
+is to express well-formed html files,
+where back-references are convenient for matching
+opening and closing tags like 
 \begin{center}
 	$\langle html \rangle \ldots \langle / html \rangle$
 \end{center}
 A regex describing such a format
-could be
+is
 \begin{center}
 	$\langle (.^+) \rangle \ldots \langle / \backslash 1 \rangle$
 \end{center}
-Despite being useful, the syntax and expressive power of regexes 
+Despite being useful, the expressive power of regexes 
 go beyond the regular language hierarchy
-with back-references.
+once back-references are included.
 In fact, they allow the regex construct to express 
 languages that cannot be contained in context-free
 languages either.
@@ -829,17 +829,15 @@
 which cannot be expressed by context-free grammars\parencite{campeanu2003formal}.
 Such a language is contained in the context-sensitive hierarchy
 of formal languages. 
-Solving the back-reference expressions matching problem
+Also solving the matching problem involving back-references
 is known to be NP-complete \parencite{alfred2014algorithms}.
-A non-bactracking,
-efficient solution is not known to exist.
 Regex libraries supporting back-references such as 
 PCRE \cite{pcre} therefore have to
 revert to a depth-first search algorithm which backtracks.
 What is unexpected is that even in the cases 
 not involving back-references, there is still
 a (non-negligible) chance they might backtrack super-linearly,
-as shown in the graphs in \ref{fig:aStarStarb}.
+as shown in the graphs in figure\ref{fig:aStarStarb}.
 
 \subsection{Summary of the Catastrophic Backtracking Problem}
 Summing these up, we can categorise existing