diff -r a5f666410101 -r fd068f39ac23 ChengsongTanPhdThesis/main.tex --- a/ChengsongTanPhdThesis/main.tex Sat Sep 10 22:30:22 2022 +0100 +++ b/ChengsongTanPhdThesis/main.tex Mon Sep 12 23:32:18 2022 +0200 @@ -290,8 +290,14 @@ \addchaptertocentry{\abstractname} % Add the abstract to the table of contents %\addchap{Abstract} This thesis is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover. - -Theoretical results say that regular expression matching should be linear with respect to the input. Under a certain class of regular expressions and inputs though, practical implementations often suffer from non-linear or even exponential running time, allowing ReDoS (regular expression denial-of-service ) attacks. This makes levers with formalised properties such as correctness and time complexity appealing. +Theoretical results say that regular expression matching should be +linear with respect to the input. +However with some regular expressions and inputs, existing implementations +often suffer from non-linear or even exponential running time, +allowing for example ReDoS (regular expression denial-of-service ) attacks. +To avoid these attacks, lexers with formalised correctness and running time related +properties become appealing because the guarantee applies to all inputs, not just +a few empirical test cases. Sulzmann and Lu describe a lexing algorithm that calculates Brzozowski derivatives using bitcodes annotated to regular expressions. Their algorithm generates POSIX values which encode the information of how a regular expression matches a string—that is, which part of the string is matched by which part of the regular expression. This information is needed in the context of lexing in order to extract and to classify tokens. The purpose of the bitcodes is to generate POSIX values incrementally while derivatives are calculated. They also help with designing an “aggressive” simplification function that keeps the size of derivatives finitely bounded. Without simplification the size of some derivatives can grow arbitrarily big resulting in an extremely slow lexing algorithm.