--- a/ChengsongTanPhdThesis/main.tex Sat May 28 01:04:56 2022 +0100
+++ b/ChengsongTanPhdThesis/main.tex Sat May 28 16:29:32 2022 +0100
@@ -211,24 +211,23 @@
Theoretical results say that regular expression matching
should be linear with respect to the input.
Under a certain class of regular expressions and inputs though,
-practical implementations suffer from non-linear or even
+practical implementations often suffer from non-linear or even
exponential running time,
-allowing a ReDoS (regular expression denial-of-service ) attack.
-
+allowing ReDoS (regular expression denial-of-service ) attacks.
+The reason behind this is that regex libraries in popular programming
+languages often want to support richer constructs
+than the usual basic regular expressions. Unfortunately, this means that even simple cases
+are "infected" with atrocious running time.
+
-The reason behind is that regex libraries in popular language engines
- often want to support richer constructs
-than the most basic regular expressions, and lexing rather than matching
-is needed for sub-match extraction.
-
-This work aims to address the above vulnerability by the combination
+This work aims to address the above vulnerability by a combination
of Brzozowski's derivatives and interactive theorem proving. We give an
improved version of Sulzmann and Lu's bit-coded algorithm using
-derivatives, which come with a formal guarantee in terms of correctness and
-running time as an Isabelle/HOL proof.
-Then we improve the algorithm with an even stronger version of
+derivatives, and give a formal guarantee in terms of correctness and
+size of derivatives, which we formalised in Isabelle/HOL.
+Then we improve the algorithm with a stronger version of
simplification, and prove a time bound linear to input and
-cubic to regular expression size using a technique by
+cubic to regular expression size using a technique introduced by
Antimirov.
The result is an algorithm with provable guarantees on
correctness and running time. We believe this is the first