ChengsongTanPhdThesis/main.tex
changeset 629 96e009a446d5
parent 628 7af4e2420a8c
child 638 dd9dde2d902b
equal deleted inserted replaced
628:7af4e2420a8c 629:96e009a446d5
   248 \begin{declaration}
   248 \begin{declaration}
   249 \addchaptertocentry{\authorshipname} % Add the declaration to the table of contents
   249 \addchaptertocentry{\authorshipname} % Add the declaration to the table of contents
   250 \noindent I, \authorname, declare that this thesis titled, \enquote{\ttitle} and the work presented in it are my own. I confirm that:
   250 \noindent I, \authorname, declare that this thesis titled, \enquote{\ttitle} and the work presented in it are my own. I confirm that:
   251 
   251 
   252 \begin{itemize} 
   252 \begin{itemize} 
   253 \item This work was done wholly or mainly while in candidature for a research degree at this University.
   253 \item This work was done wholly while in candidature for a research degree at this University.
   254 \item Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated.
   254 \item Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated.
   255 \item Where I have consulted the published work of others, this is always clearly attributed.
   255 \item Where I have consulted the published work of others, this is always clearly attributed.
   256 \item Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work.
   256 \item Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work.
   257 \item I have acknowledged all main sources of help.
   257 \item I have acknowledged all main sources of help.
   258 \item Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself.\\
   258 \item Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself.\\
   289 
   289 
   290 \begin{abstract}
   290 \begin{abstract}
   291 \addchaptertocentry{\abstractname} % Add the abstract to the table of contents
   291 \addchaptertocentry{\abstractname} % Add the abstract to the table of contents
   292 %\addchap{Abstract} 
   292 %\addchap{Abstract} 
   293 This thesis is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover. 
   293 This thesis is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover. 
   294 Theoretical results say that regular expression matching should be 
   294 Classic results say that regular expression matching should be 
   295 linear with respect to the input. 
   295 linear with respect to the input. The size of the regular expressions
       
   296 are often treated as a constant factor.
   296 However with some regular expressions and inputs, existing implementations 
   297 However with some regular expressions and inputs, existing implementations 
   297 often suffer from non-linear or even exponential running time, 
   298 often suffer from non-linear or even exponential running time, 
   298 giving to for example ReDoS (regular expression denial-of-service ) attacks. 
   299 giving to for example ReDoS (regular expression denial-of-service ) attacks. 
   299 To avoid these attacks, lexers with formalised correctness and running time related
   300 To avoid these attacks, lexers with formalised correctness and running time related
   300 properties become appealing because the guarantee applies to all inputs, not a finite
   301 properties are of interest because the guarantees apply to all inputs, not just a finite
   301 number of empirical test cases.
   302 number of empirical test cases.
   302 
   303 
   303 Sulzmann and Lu describe a lexing algorithm that calculates Brzozowski derivatives using bitcodes annotated to regular expressions. Their algorithm generates POSIX values which encode the information of how a regular expression matches a string—that is, which part of the string is matched by which part of the regular expression. This information is needed in the context of lexing in order to extract and to classify tokens. The purpose of the bitcodes is to generate POSIX values incrementally while derivatives are calculated. They also help with designing an “aggressive” simplification function that keeps the size of derivatives finitely bounded. Without simplification the size of some derivatives can grow arbitrarily big resulting in an extremely slow lexing algorithm. 
   304 Sulzmann and Lu describe a lexing algorithm that calculates Brzozowski derivatives using bitcodes annotated to regular expressions. Their algorithm generates POSIX values which encode the information of how a regular expression matches a string—that is, which part of the string is matched by which part of the regular expression. This information is needed in the context of lexing in order to extract and to classify tokens. The purpose of the bitcodes is to generate POSIX values incrementally while derivatives are calculated. They also help with designing an “aggressive” simplification function that keeps the size of derivatives finitely bounded. Without simplification the size of some derivatives can grow arbitrarily big resulting in an extremely slow lexing algorithm.
   304 
   305 In this thesis we describe a variant of Sulzmann and Lu’s algorithm: Our variant is a recursive functional program, whereas Sulzmann and Lu’s version involves a fixpoint construction. We (i) prove in Isabelle/HOL that our algorithm is correct and generates unique POSIX values; we also (ii) establish a finite bound for the size of the derivatives for every input string; we also
   305 In this thesis we describe a variant of Sulzmann and Lu’s algorithm: Our variant is a recursive functional program, whereas Sulzmann and Lu’s version involves a fixpoint construction. We (i) prove in Isabelle/HOL that our algorithm is correct and generates unique POSIX values; we also (ii) establish a finite bound for the size of the derivatives. 
   306 (iii) give a program and a conjecture that the finite 
       
   307 bound can be improved to be cubic if stronger simplification rules are applied. 
   306 
   308 
   307 
   309 
   308 
   310 
   309 
   311 
   310 
   312