ChengsongTanPhdThesis/Chapters/Introduction.tex
changeset 634 b079aaee5e10
parent 633 f1edd5ee1a12
child 635 7ce2389dff4b
--- a/ChengsongTanPhdThesis/Chapters/Introduction.tex	Sun Dec 18 16:53:53 2022 +0000
+++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex	Sun Dec 18 18:17:52 2022 +0000
@@ -211,21 +211,21 @@
 Regular expressions are widely used in computer science: 
 be it in text-editors \parencite{atomEditor} with syntax highlighting and auto-completion;
 command-line tools like $\mathit{grep}$ that facilitate easy 
-text-processing; network intrusion
+text-processing \cite{grep}; network intrusion
 detection systems that inspect suspicious traffic; or compiler
 front ends.
 Given their usefulness and ubiquity, one would assume that
 modern regular expression matching implementations
 are mature and fully studied.
-Indeed, in a popular programming language's regex engine, 
+Indeed, in a popular programming language's regular expression engine, 
 supplying it with regular expressions and strings,
 in most cases one can
 get the matching information in a very short time.
 Those engines can be blindingly fast--some 
 network intrusion detection systems
-use regex engines that are able to process 
-megabytes or even gigabytes of data per second \parencite{Turo_ov__2020}.
-However, those engines can exhibit a surprising security vulnerability
+use regular expression engines that are able to process 
+hundreds of megabytes or even gigabytes of data per second \parencite{Turo_ov__2020}.
+However, those engines can sometimes exhibit a surprising security vulnerability
 under a certain class of inputs.
 %However, , this is not the case for $\mathbf{all}$ inputs.
 %TODO: get source for SNORT/BRO's regex matching engine/speed
@@ -238,17 +238,17 @@
 matching engines can find this out very quickly. Surprisingly, if one tries
 this example in JavaScript, Python or Java 8, even with small strings, 
 say of lenght of around 30 $a$'s,
-the decision takes an absurd time to finish (see graphs in figure \ref{fig:aStarStarb}).
-This is clearly exponential behaviour, and 
+the decision takes an absurd amount of time to finish (see graphs in figure \ref{fig:aStarStarb}).
+This is clearly exponential behaviour, and as can be seen
 is triggered by some relatively simple regular expressions.
 Java 9 and newer
-versions improve this behaviour somewhat, but is still slow compared 
+versions improve this behaviour somewhat, but are still slow compared 
 with the approach we are going to use in this thesis.
 
 
 
 This superlinear blowup in regular expression engines
-had repeatedly caused grief in ``real life'' where it is 
+has caused grief in ``real life'' in the past where it is 
 given the name ``catastrophic backtracking'' or ``evil'' regular expressions.
 For example, on 20 July 2016 one evil
 regular expression brought the webpage
@@ -256,9 +256,10 @@
 knees.\footnote{\url{https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}(Last accessed in 2019)}
 In this instance, a regular expression intended to just trim white
 spaces from the beginning and the end of a line actually consumed
-massive amounts of CPU resources---causing web servers to grind to a
+massive amounts of CPU resources---causing the web servers to grind to a
 halt. In this example, the time needed to process
-the string was $O(n^2)$ with respect to the string length. This
+the string was 
+$O(n^2)$ with respect to the string length $n$. This
 quadratic overhead was enough for the homepage of Stack Exchange to
 respond so slowly that the load balancer assumed a $\mathit{DoS}$ 
 attack and therefore stopped the servers from responding to any
@@ -412,22 +413,24 @@
 traffic.\footnote{\url{https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/}(Last accessed in 2022)}
 These problems with regular expressions 
 are not isolated events that happen
-very occasionally, but actually widespread.
-They occur so often that they have a 
+very rarely, 
+%but actually widespread.
+%They occur so often that they have a 
+but they occur actually often enough that they have a
 name: Regular-Expression-Denial-Of-Service (ReDoS)
-attack.
-\citeauthor{Davis18} detected more
+attacks.
+Davis et al. \cite{Davis18} detected more
 than 1000 evil regular expressions
 in Node.js, Python core libraries, npm and in pypi. 
 They therefore concluded that evil regular expressions
-are real problems rather than "a parlour trick".
+are real problems rather than just "a parlour trick".
 
 This work aims to address this issue
 with the help of formal proofs.
 We describe a lexing algorithm based
 on Brzozowski derivatives with verified correctness (in 
 Isabelle/HOL)
-and a finiteness property.
+and a finiteness property for the size of derivatives.
 Such properties %guarantee the absence of 
 are an important step in preventing
 catastrophic backtracking once and for all.