ChengsongTanPhdThesis/Chapters/Introduction.tex
changeset 664 ba44144875b1
parent 654 2ad20ba5b178
child 665 3bedbdce3a3b
equal deleted inserted replaced
663:0d1e68268d0f 664:ba44144875b1
   543 %TODO: get source for SNORT/BRO's regex matching engine/speed
   543 %TODO: get source for SNORT/BRO's regex matching engine/speed
   544 
   544 
   545 
   545 
   546 %----------------------------------------------------------------------------------------
   546 %----------------------------------------------------------------------------------------
   547 \section{Contribution}
   547 \section{Contribution}
   548 {\color{red} \rule{\linewidth}{0.5mm}}
   548 %{\color{red} \rule{\linewidth}{0.5mm}}
   549 \textbf{issue with this part: way too short, not enough details of what I have done.}
   549 %\textbf{issue with this part: way too short, not enough details of what I have done.}
   550 {\color{red} \rule{\linewidth}{0.5mm}}
   550  %{\color{red} \rule{\linewidth}{0.5mm}}
   551 
   551 \marginpar{\em Gerog: No sufficient context on related work making contribution hard to digest.}
   552 In this thesis,
   552 
   553 we propose a solution to catastrophic
   553 
   554 backtracking and error-prone matchers: a formally verified
   554 %In this thesis,
   555 regular expression lexing algorithm
   555 %we propose a solution to catastrophic
   556 that is both fast
   556 %backtracking and error-prone matchers: a formally verified
   557 and correct. 
   557 %regular expression lexing algorithm
   558 {\color{red} \rule{\linewidth}{0.5mm}}
   558 %that is both fast
   559 \HandRight Added content:
   559 %and correct. 
       
   560 %%{\color{red} \rule{\linewidth}{0.5mm}}
       
   561 %\HandRight Added content:
   560 %Package \verb`pifont`: \ding{43}
   562 %Package \verb`pifont`: \ding{43}
   561 %Package \verb`utfsym`: \usym{261E} 
   563 %Package \verb`utfsym`: \usym{261E} 
   562 %Package \verb`dingbat`: \leftpointright 
   564 %Package \verb`dingbat`: \leftpointright 
   563 %Package \verb`dingbat`: \rightpointright 
   565 %Package \verb`dingbat`: \rightpointright 
       
   566 We have made mainly two contributions in this thesis: %the
       
   567 %lexer we developed based on Brzozowski derivatives and 
       
   568 %Sulzmanna and Lu's developments called 
       
   569 proving the lexer $\blexersimp$ is both correctness and fast.
       
   570 It is correct w.r.t a formalisation of POSIX lexing by Ausaf et al.\ref{AusafDyckhoffUrban2016}.
       
   571 It is fast compared with un-optimised implementations like Sulzmann and Lu's orignal
       
   572 development by our metric of internal data structures not growing unbounded.
       
   573 
       
   574 Our formalisation of complexity is unique among similar works in the sense that
       
   575 %is about the size of internal data structures.
       
   576 to our knowledge %we don't know of a 
       
   577 there is not a lexing/parsing algorithm with similar formalised correctness theorems.
       
   578 Common practices involve making some empirical analysis of the complexity of the algorithm
       
   579 in question (\ref{Verbatim}, \ref{Verbatimpp}), or relying 
       
   580 on prior (unformalised) complexity analysis of well-known algorithms (\ref{ValiantParsing}),
       
   581 making them prone to complexity bugs.
       
   582 %TODO: add citation
       
   583 %, for example in the Verbatim work \ref{Verbatim}
       
   584 
       
   585 %While formalised proofs have not included 
       
   586 %Issues known as "performance bugs" can
       
   587 Whilst formalised complexity theorems 
       
   588 have not yet appeared in other certified lexers/parsers,
       
   589 %while this work is done,
       
   590 they do find themselves in the broader theorem-proving literature:
       
   591 \emph{time credits} have been formalised for separation logic in Coq 
       
   592 \ref{atkey2010amortised}%not used in 
       
   593 to characterise the runtime complexity of an algorithm,
       
   594 where integer values are recorded %from the beginning of an execution
       
   595 as part of the program state
       
   596 and decremented in each step. 
       
   597 The idea is that the total number of decrements
       
   598 from the time credits during execution represents the complexity of an algorithm.
       
   599 %each time a basic step is executed. 
       
   600 %The way to use a time credit Time credit is an integer
       
   601 %is assigned to a program prior to execution, and each step in the program consumes
       
   602 %one credit.
       
   603 This has been further developed to include the big-O notation
       
   604 so one can prove both the functional correctness and asymptotic complexity
       
   605 of higher-order imperative algorithms \ref{bigOImperative}.
       
   606 %for example the complexity of
       
   607 %the Quicksort algorithm 
       
   608 %is $\Theta n\ln n$ 
       
   609 
       
   610 \marginpar{more work on formalising complexity}.
       
   611 \marginpar{new changes up to this point.}
       
   612 
   564 
   613 
   565 In particular, the main problem we solved on top of previous work was
   614 In particular, the main problem we solved on top of previous work was
   566 coming up with a formally verified algorithm called $\blexersimp$ based
   615 coming up with a formally verified algorithm called $\blexersimp$ based
   567 on Brzozowski derivatives. It calculates a POSIX
   616 on Brzozowski derivatives. It calculates a POSIX
   568 lexical value from a string and a regular expression. This algorithm was originally
   617 lexical value from a string and a regular expression. This algorithm was originally