ChengsongTanPhdThesis/main.tex
changeset 612 8c234a1bc7e0
parent 608 37b6fd310a16
child 626 1c8525061545
equal deleted inserted replaced
611:bc1df466150a 612:8c234a1bc7e0
   293 This thesis is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover. 
   293 This thesis is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover. 
   294 Theoretical results say that regular expression matching should be 
   294 Theoretical results say that regular expression matching should be 
   295 linear with respect to the input. 
   295 linear with respect to the input. 
   296 However with some regular expressions and inputs, existing implementations 
   296 However with some regular expressions and inputs, existing implementations 
   297 often suffer from non-linear or even exponential running time, 
   297 often suffer from non-linear or even exponential running time, 
   298 allowing for example ReDoS (regular expression denial-of-service ) attacks. 
   298 giving to for example ReDoS (regular expression denial-of-service ) attacks. 
   299 To avoid these attacks, lexers with formalised correctness and running time related
   299 To avoid these attacks, lexers with formalised correctness and running time related
   300 properties become appealing because the guarantee applies to all inputs, not just 
   300 properties become appealing because the guarantee applies to all inputs, not a finite
   301 a few empirical test cases.
   301 number of empirical test cases.
   302 
   302 
   303 Sulzmann and Lu describe a lexing algorithm that calculates Brzozowski derivatives using bitcodes annotated to regular expressions. Their algorithm generates POSIX values which encode the information of how a regular expression matches a string—that is, which part of the string is matched by which part of the regular expression. This information is needed in the context of lexing in order to extract and to classify tokens. The purpose of the bitcodes is to generate POSIX values incrementally while derivatives are calculated. They also help with designing an “aggressive” simplification function that keeps the size of derivatives finitely bounded. Without simplification the size of some derivatives can grow arbitrarily big resulting in an extremely slow lexing algorithm. 
   303 Sulzmann and Lu describe a lexing algorithm that calculates Brzozowski derivatives using bitcodes annotated to regular expressions. Their algorithm generates POSIX values which encode the information of how a regular expression matches a string—that is, which part of the string is matched by which part of the regular expression. This information is needed in the context of lexing in order to extract and to classify tokens. The purpose of the bitcodes is to generate POSIX values incrementally while derivatives are calculated. They also help with designing an “aggressive” simplification function that keeps the size of derivatives finitely bounded. Without simplification the size of some derivatives can grow arbitrarily big resulting in an extremely slow lexing algorithm. 
   304 
   304 
   305 In this thesis we describe a variant of Sulzmann and Lu’s algorithm: Our variant is a recursive functional program, whereas Sulzmann and Lu’s version involves a fixpoint construction. We (i) prove in Isabelle/HOL that our algorithm is correct and generates unique POSIX values; we also (ii) establish a finite bound for the size of the derivatives. 
   305 In this thesis we describe a variant of Sulzmann and Lu’s algorithm: Our variant is a recursive functional program, whereas Sulzmann and Lu’s version involves a fixpoint construction. We (i) prove in Isabelle/HOL that our algorithm is correct and generates unique POSIX values; we also (ii) establish a finite bound for the size of the derivatives. 
   306 
   306