ChengsongTanPhdThesis/main.tex
changeset 646 56057198e4f5
parent 638 dd9dde2d902b
child 653 bc5571c38d1f
equal deleted inserted replaced
640:bd1354127574 646:56057198e4f5
    39 ]{MastersDoctoralThesis} % The class file specifying the document structure
    39 ]{MastersDoctoralThesis} % The class file specifying the document structure
    40 
    40 
    41 \usepackage[utf8]{inputenc} % Required for inputting international characters
    41 \usepackage[utf8]{inputenc} % Required for inputting international characters
    42 \usepackage[T1]{fontenc} % Output font encoding for international characters
    42 \usepackage[T1]{fontenc} % Output font encoding for international characters
    43 %\usepackage{fdsymbol} % Loads unicode-math
    43 %\usepackage{fdsymbol} % Loads unicode-math
    44 
       
    45 \usepackage{cancel}
    44 \usepackage{cancel}
       
    45 \usepackage{fontawesome5}
       
    46 \usepackage{bbding,pifont,dingbat}
    46 
    47 
    47 \usepackage{listings}
    48 \usepackage{listings}
    48 \usepackage{xcolor}
    49 \usepackage{xcolor}
    49 \usepackage{beramono}
    50 \usepackage{beramono}
    50 
    51 
   288 
   289 
   289 
   290 
   290 \begin{abstract}
   291 \begin{abstract}
   291 \addchaptertocentry{\abstractname} % Add the abstract to the table of contents
   292 \addchaptertocentry{\abstractname} % Add the abstract to the table of contents
   292 %\addchap{Abstract} 
   293 %\addchap{Abstract} 
       
   294 \textbf{Problem: not like an abstract, more like a summary}
       
   295 \textbf{Goal for new abstract: more high-level and abstract, tell the problem and solution in a concise way.}
       
   296 
       
   297 New abstract:\\
       
   298 %Modern computer systems rely on lexing for essential applications such as
       
   299 %compilers, IDEs, file systems, and network intrusion detection systems.
       
   300 %These applications require correctness with respect to
       
   301 %the POSIX standard and high performance.
       
   302 %%While existing implementations of lexers often achieve high performance,
       
   303 %Existing implementations had drawbacks such as bugs and catastrophic backtracking,
       
   304 %preventing them from solving the problem once
       
   305 %and for all.
       
   306 %To address these drawbacks,
       
   307 %this thesis offers an algorithm with formally proven correctness and internal data structures' size bound. 
       
   308 %These mechanised proofs ensure that our algorithm is fast and correct in \textbf{all} cases.
       
   309 %Our proofs use term-rewriting relations to establish invariants during derivatives and simplifications,
       
   310 %which is extensible and friendly to theorem provers.
       
   311 
       
   312 POSIX is the most widely used disambiguation strategy for regular expression matching.  There are some difficulties associated with the POSIX strategy and according to tests conducted by Kulkewitz, many regular expression matchers implementing this strategy produce incorrect results. This thesis is concerned with an POSIX regular expression matching algorithm introduced by Sulzmann and Lu. This algorithm uses bitcoded regular expressions and is based on the idea of Brzozowski derivatives. The algorithm generates POSIX values which encode the information of how a regular expression matches a string - that is, which part of the string is matched by which part of the regular expression. This information is needed in the context of lexing in order to extract and to classify tokens.
       
   313 
       
   314 While a formalised correctness proof for Sulzmann and Lu's algorithm already exists, this proof does not include any of the crucial simplification rules. These simplification rules are however necessary in order to have an acceptable runtime for this algorithm. Our version of the simplification rules includes a number of fixes and improvements: one problem we fix has to do with their use of the nub function that does not remove non-trivial duplicates. We improve the simplification rules by formulating them as simple recursive function and also by simplifying more instances of regular expressions.  As a result we can establish a bound on the size of derivatives. Our proofs are formalised in Isabelle/HOL.
       
   315 
       
   316 
       
   317 
       
   318 Old abstract:
   293 This thesis is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover. 
   319 This thesis is about regular expressions and derivatives. It combines functional algorithms and their formal verification in the Isabelle/HOL theorem prover. 
   294 Classic results say that regular expression matching should be 
   320 Classic results say that regular expression matching should be 
   295 linear with respect to the input. The size of the regular expressions
   321 linear with respect to the input. The size of the regular expressions
   296 are often treated as a constant factor.
   322 are often treated as a constant factor.
   297 However with some regular expressions and inputs, existing implementations 
   323 However with some regular expressions and inputs, existing implementations