diff -r d8740017324c -r cb702fb4227f ChengsongTanPhdThesis/main.tex --- a/ChengsongTanPhdThesis/main.tex Sat May 28 01:04:56 2022 +0100 +++ b/ChengsongTanPhdThesis/main.tex Sat May 28 16:29:32 2022 +0100 @@ -211,24 +211,23 @@ Theoretical results say that regular expression matching should be linear with respect to the input. Under a certain class of regular expressions and inputs though, -practical implementations suffer from non-linear or even +practical implementations often suffer from non-linear or even exponential running time, -allowing a ReDoS (regular expression denial-of-service ) attack. - +allowing ReDoS (regular expression denial-of-service ) attacks. +The reason behind this is that regex libraries in popular programming +languages often want to support richer constructs +than the usual basic regular expressions. Unfortunately, this means that even simple cases +are "infected" with atrocious running time. + -The reason behind is that regex libraries in popular language engines - often want to support richer constructs -than the most basic regular expressions, and lexing rather than matching -is needed for sub-match extraction. - -This work aims to address the above vulnerability by the combination +This work aims to address the above vulnerability by a combination of Brzozowski's derivatives and interactive theorem proving. We give an improved version of Sulzmann and Lu's bit-coded algorithm using -derivatives, which come with a formal guarantee in terms of correctness and -running time as an Isabelle/HOL proof. -Then we improve the algorithm with an even stronger version of +derivatives, and give a formal guarantee in terms of correctness and +size of derivatives, which we formalised in Isabelle/HOL. +Then we improve the algorithm with a stronger version of simplification, and prove a time bound linear to input and -cubic to regular expression size using a technique by +cubic to regular expression size using a technique introduced by Antimirov. The result is an algorithm with provable guarantees on correctness and running time. We believe this is the first