ChengsongTanPhdThesis/main.tex
changeset 568 7a579f5533f8
parent 564 3cbcd7cda0a9
child 577 f47fc4840579
equal deleted inserted replaced
567:28cb8089ec36 568:7a579f5533f8
   218 
   218 
   219 
   219 
   220 \begin{abstract}
   220 \begin{abstract}
   221 \addchaptertocentry{\abstractname} % Add the abstract to the table of contents
   221 \addchaptertocentry{\abstractname} % Add the abstract to the table of contents
   222 %\addchap{Abstract} 
   222 %\addchap{Abstract} 
   223 This work is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover.
   223 This thesis is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover. 
   224 
   224 
   225 Theoretical results say that regular expression matching
   225 Theoretical results say that regular expression matching should be linear with respect to the input. Under a certain class of regular expressions and inputs though, practical implementations often suffer from non-linear or even exponential running time, allowing ReDoS (regular expression denial-of-service ) attacks. This makes levers with formalised properties such as correctness and time complexity appealing.
   226 should be linear with respect to the input.
   226 
   227 Under a certain class of regular expressions and inputs though,
   227 Sulzmann and Lu describe a lexing algorithm that calculates Brzozowski derivatives using bitcodes annotated to regular expressions. Their algorithm generates POSIX values which encode the information of how a regular expression matches a string—that is, which part of the string is matched by which part of the regular expression. This information is needed in the context of lexing in order to extract and to classify tokens. The purpose of the bitcodes is to generate POSIX values incrementally while derivatives are calculated. They also help with designing an “aggressive” simplification function that keeps the size of derivatives finitely bounded. Without simplification the size of some derivatives can grow arbitrarily big resulting in an extremely slow lexing algorithm. 
   228 practical implementations often suffer from non-linear or even 
   228 
   229 exponential running time,
   229 In this thesis we describe a variant of Sulzmann and Lu’s algorithm: Our variant is a recursive functional program, whereas Sulzmann and Lu’s version involves a fixpoint construction. We (i) prove in Isabelle/HOL that our algorithm is correct and generates unique POSIX values; we also (ii) establish a finite bound for the size of the derivatives. 
   230 allowing ReDoS (regular expression denial-of-service ) attacks.
   230 
   231 The reason behind this is that regex libraries in popular programming
   231 
   232 languages often want to support richer constructs
   232 
   233 than the usual basic regular expressions. Unfortunately, this means that even simple cases
   233 
   234 are "infected" with atrocious running time.
       
   235  
       
   236 
       
   237 This work aims to address the above vulnerability by a combination
       
   238 of Brzozowski's derivatives and interactive theorem proving. We give an 
       
   239 improved version of  Sulzmann and Lu's bit-coded algorithm using 
       
   240 derivatives, and give a formal guarantee in terms of correctness and 
       
   241 size of derivatives, which we formalised in Isabelle/HOL.
       
   242 Then we improve the algorithm with a stronger version of 
       
   243 simplification, and prove a time bound linear to input and
       
   244 cubic to regular expression size using a technique introduced by
       
   245 Antimirov.
       
   246 The result is an algorithm with provable guarantees on 
       
   247 correctness and finiteness. We believe this is the first 
       
   248 work with these two guarantees together.
       
   249 
   234 
   250 \end{abstract}
   235 \end{abstract}
   251 
   236 
   252 
   237 
   253 %----------------------------------------------------------------------------------------
   238 %----------------------------------------------------------------------------------------