543 %TODO: get source for SNORT/BRO's regex matching engine/speed |
543 %TODO: get source for SNORT/BRO's regex matching engine/speed |
544 |
544 |
545 |
545 |
546 %---------------------------------------------------------------------------------------- |
546 %---------------------------------------------------------------------------------------- |
547 \section{Contribution} |
547 \section{Contribution} |
548 {\color{red} \rule{\linewidth}{0.5mm}} |
548 %{\color{red} \rule{\linewidth}{0.5mm}} |
549 \textbf{issue with this part: way too short, not enough details of what I have done.} |
549 %\textbf{issue with this part: way too short, not enough details of what I have done.} |
550 {\color{red} \rule{\linewidth}{0.5mm}} |
550 %{\color{red} \rule{\linewidth}{0.5mm}} |
551 |
551 \marginpar{\em Gerog: No sufficient context on related work making contribution hard to digest.} |
552 In this thesis, |
552 |
553 we propose a solution to catastrophic |
553 |
554 backtracking and error-prone matchers: a formally verified |
554 %In this thesis, |
555 regular expression lexing algorithm |
555 %we propose a solution to catastrophic |
556 that is both fast |
556 %backtracking and error-prone matchers: a formally verified |
557 and correct. |
557 %regular expression lexing algorithm |
558 {\color{red} \rule{\linewidth}{0.5mm}} |
558 %that is both fast |
559 \HandRight Added content: |
559 %and correct. |
|
560 %%{\color{red} \rule{\linewidth}{0.5mm}} |
|
561 %\HandRight Added content: |
560 %Package \verb`pifont`: \ding{43} |
562 %Package \verb`pifont`: \ding{43} |
561 %Package \verb`utfsym`: \usym{261E} |
563 %Package \verb`utfsym`: \usym{261E} |
562 %Package \verb`dingbat`: \leftpointright |
564 %Package \verb`dingbat`: \leftpointright |
563 %Package \verb`dingbat`: \rightpointright |
565 %Package \verb`dingbat`: \rightpointright |
|
566 We have made mainly two contributions in this thesis: %the |
|
567 %lexer we developed based on Brzozowski derivatives and |
|
568 %Sulzmanna and Lu's developments called |
|
569 proving the lexer $\blexersimp$ is both correctness and fast. |
|
570 It is correct w.r.t a formalisation of POSIX lexing by Ausaf et al.\ref{AusafDyckhoffUrban2016}. |
|
571 It is fast compared with un-optimised implementations like Sulzmann and Lu's orignal |
|
572 development by our metric of internal data structures not growing unbounded. |
|
573 |
|
574 Our formalisation of complexity is unique among similar works in the sense that |
|
575 %is about the size of internal data structures. |
|
576 to our knowledge %we don't know of a |
|
577 there is not a lexing/parsing algorithm with similar formalised correctness theorems. |
|
578 Common practices involve making some empirical analysis of the complexity of the algorithm |
|
579 in question (\ref{Verbatim}, \ref{Verbatimpp}), or relying |
|
580 on prior (unformalised) complexity analysis of well-known algorithms (\ref{ValiantParsing}), |
|
581 making them prone to complexity bugs. |
|
582 %TODO: add citation |
|
583 %, for example in the Verbatim work \ref{Verbatim} |
|
584 |
|
585 %While formalised proofs have not included |
|
586 %Issues known as "performance bugs" can |
|
587 Whilst formalised complexity theorems |
|
588 have not yet appeared in other certified lexers/parsers, |
|
589 %while this work is done, |
|
590 they do find themselves in the broader theorem-proving literature: |
|
591 \emph{time credits} have been formalised for separation logic in Coq |
|
592 \ref{atkey2010amortised}%not used in |
|
593 to characterise the runtime complexity of an algorithm, |
|
594 where integer values are recorded %from the beginning of an execution |
|
595 as part of the program state |
|
596 and decremented in each step. |
|
597 The idea is that the total number of decrements |
|
598 from the time credits during execution represents the complexity of an algorithm. |
|
599 %each time a basic step is executed. |
|
600 %The way to use a time credit Time credit is an integer |
|
601 %is assigned to a program prior to execution, and each step in the program consumes |
|
602 %one credit. |
|
603 This has been further developed to include the big-O notation |
|
604 so one can prove both the functional correctness and asymptotic complexity |
|
605 of higher-order imperative algorithms \ref{bigOImperative}. |
|
606 %for example the complexity of |
|
607 %the Quicksort algorithm |
|
608 %is $\Theta n\ln n$ |
|
609 |
|
610 \marginpar{more work on formalising complexity}. |
|
611 \marginpar{new changes up to this point.} |
|
612 |
564 |
613 |
565 In particular, the main problem we solved on top of previous work was |
614 In particular, the main problem we solved on top of previous work was |
566 coming up with a formally verified algorithm called $\blexersimp$ based |
615 coming up with a formally verified algorithm called $\blexersimp$ based |
567 on Brzozowski derivatives. It calculates a POSIX |
616 on Brzozowski derivatives. It calculates a POSIX |
568 lexical value from a string and a regular expression. This algorithm was originally |
617 lexical value from a string and a regular expression. This algorithm was originally |