ChengsongTanPhdThesis/main.tex
changeset 526 cb702fb4227f
parent 519 856d025dbc15
child 527 2c907b118f78
equal deleted inserted replaced
525:d8740017324c 526:cb702fb4227f
   209 This work is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover.
   209 This work is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover.
   210 
   210 
   211 Theoretical results say that regular expression matching
   211 Theoretical results say that regular expression matching
   212 should be linear with respect to the input.
   212 should be linear with respect to the input.
   213 Under a certain class of regular expressions and inputs though,
   213 Under a certain class of regular expressions and inputs though,
   214 practical implementations  suffer from non-linear or even 
   214 practical implementations often suffer from non-linear or even 
   215 exponential running time,
   215 exponential running time,
   216 allowing a ReDoS (regular expression denial-of-service ) attack.
   216 allowing ReDoS (regular expression denial-of-service ) attacks.
   217 
   217 The reason behind this is that regex libraries in popular programming
   218 
   218 languages often want to support richer constructs
   219 The reason behind is that regex libraries in popular language engines
   219 than the usual basic regular expressions. Unfortunately, this means that even simple cases
   220  often want to support richer constructs
   220 are "infected" with atrocious running time.
   221 than  the most basic regular expressions, and lexing rather than matching
   221  
   222 is needed for sub-match extraction.
   222 
   223 
   223 This work aims to address the above vulnerability by a combination
   224 This work aims to address the above vulnerability by the combination
       
   225 of Brzozowski's derivatives and interactive theorem proving. We give an 
   224 of Brzozowski's derivatives and interactive theorem proving. We give an 
   226 improved version of  Sulzmann and Lu's bit-coded algorithm using 
   225 improved version of  Sulzmann and Lu's bit-coded algorithm using 
   227 derivatives, which come with a formal guarantee in terms of correctness and 
   226 derivatives, and give a formal guarantee in terms of correctness and 
   228 running time as an Isabelle/HOL proof.
   227 size of derivatives, which we formalised in Isabelle/HOL.
   229 Then we improve the algorithm with an even stronger version of 
   228 Then we improve the algorithm with a stronger version of 
   230 simplification, and prove a time bound linear to input and
   229 simplification, and prove a time bound linear to input and
   231 cubic to regular expression size using a technique by
   230 cubic to regular expression size using a technique introduced by
   232 Antimirov.
   231 Antimirov.
   233 The result is an algorithm with provable guarantees on 
   232 The result is an algorithm with provable guarantees on 
   234 correctness and running time. We believe this is the first 
   233 correctness and running time. We believe this is the first 
   235 work with these two guarantees together.
   234 work with these two guarantees together.
   236 
   235