lexing: comparison ChengsongTanPhdThesis/main.tex

equal deleted inserted replaced

-:28cb8089ec36
+:7a579f5533f8
 \begin{abstract}
 \addchaptertocentry{\abstractname} % Add the abstract to the table of contents
 %\addchap{Abstract}
-This work is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover.
+This thesis is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover.
-Theoretical results say that regular expression matching
+Theoretical results say that regular expression matching should be linear with respect to the input. Under a certain class of regular expressions and inputs though, practical implementations often suffer from non-linear or even exponential running time, allowing ReDoS (regular expression denial-of-service ) attacks. This makes levers with formalised properties such as correctness and time complexity appealing.
-should be linear with respect to the input.
-Under a certain class of regular expressions and inputs though,
+Sulzmann and Lu describe a lexing algorithm that calculates Brzozowski derivatives using bitcodes annotated to regular expressions. Their algorithm generates POSIX values which encode the information of how a regular expression matches a string—that is, which part of the string is matched by which part of the regular expression. This information is needed in the context of lexing in order to extract and to classify tokens. The purpose of the bitcodes is to generate POSIX values incrementally while derivatives are calculated. They also help with designing an “aggressive” simplification function that keeps the size of derivatives finitely bounded. Without simplification the size of some derivatives can grow arbitrarily big resulting in an extremely slow lexing algorithm.
-practical implementations often suffer from non-linear or even
-exponential running time,
+In this thesis we describe a variant of Sulzmann and Lu’s algorithm: Our variant is a recursive functional program, whereas Sulzmann and Lu’s version involves a fixpoint construction. We (i) prove in Isabelle/HOL that our algorithm is correct and generates unique POSIX values; we also (ii) establish a finite bound for the size of the derivatives.
-allowing ReDoS (regular expression denial-of-service ) attacks.
-The reason behind this is that regex libraries in popular programming
-languages often want to support richer constructs
-than the usual basic regular expressions. Unfortunately, this means that even simple cases
-are "infected" with atrocious running time.
-This work aims to address the above vulnerability by a combination
-of Brzozowski's derivatives and interactive theorem proving. We give an
-improved version of  Sulzmann and Lu's bit-coded algorithm using
-derivatives, and give a formal guarantee in terms of correctness and
-size of derivatives, which we formalised in Isabelle/HOL.
-Then we improve the algorithm with a stronger version of
-simplification, and prove a time bound linear to input and
-cubic to regular expression size using a technique introduced by
-Antimirov.
-The result is an algorithm with provable guarantees on
-correctness and finiteness. We believe this is the first
-work with these two guarantees together.
 \end{abstract}
 %----------------------------------------------------------------------------------------

changeset 568	7a579f5533f8
parent 564	3cbcd7cda0a9
child 577	f47fc4840579