lexing: comparison thys2/Paper/Paper.thy

equal deleted inserted replaced

-:6e269f557fc5
+:c4b6906068a9
 \noindent where @{const ZERO} stands for the regular expression that does
 not match any string, @{const ONE} for the regular expression that matches
 only the empty string and @{term c} for matching a character literal.
 The constructors $+$ and $\cdot$ represent alternatives and sequences, respectively.
+We sometimes omit the $\cdot$ in a sequence regular expression for brevity.
 The
 \emph{language} of a regular expression, written $L$, is defined as usual
 and we omit giving the definition here (see for example \cite{AusafDyckhoffUrban2016}).
 Central to Brzozowski's regular expression matcher are two functions
 from the list provided @{text "f x"} is already in the accumulator;
 otherwise we keep @{text x} and scan the rest of the list but
 add @{text "f x"} as another ``seen'' element to @{text acc}. We will use
 @{term distinctBy} where @{text f} is the erase function, @{term "erase (DUMMY)"},
 that deletes bitsequences from bitcoded regular expressions.
-This is clearly a computationally more expensive operation, than @{text nub},
+This is clearly a computationally more expensive operation than @{text nub},
 but is needed in order to make the removal of unnecessary copies
 to work properly.
 Our simplification function depends on three helper functions, one is called
 @{text flts} and analyses lists of regular expressions coming from alternatives.
 @{thm[mode=Rule] ss6[of "r\<^sub>2" "r\<^sub>1" "rs\<^sub>1" "rs\<^sub>2" "rs\<^sub>3"]}$LD$\\
 \end{tabular}
 \end{center}
 \caption{The rewrite rules that generate simplified regular expressions
 in small steps: @{term "rrewrite r\<^sub>1 r\<^sub>2"} is for bitcoded regular
-expressions and @{term "rrewrites rs\<^sub>1 rs\<^sub>2"} for \emph{lists} of bitcoded
+expressions and @{term "srewrite rs\<^sub>1 rs\<^sub>2"} for \emph{lists} of bitcoded
 regular expressions. Interesting is the $LD$ rule that allows copies of regular
-expressions be removed provided a regular expression earlier in the list can
+expressions to be removed provided a regular expression earlier in the list can
 match the same strings.}\label{SimpRewrites}
 \end{figure}
 *}
 section {* Finiteness of Derivatives *}
 \cite[Page 14]{Sulzmann2014}.
 Given the growth of the
 derivatives in some cases even after aggressive simplification, this
 is a hard to believe fact. A similar claim about a theoretical runtime
 of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates
-tokens according to POSIX rules \cite{verbatim}. For this it uses Brzozowski's
+tokens according to POSIX rules~\cite{verbatim}. For this it uses Brzozowski's
-derivatives .
+derivatives.
 They write: ``The results of our empirical tests [..] confirm that Verbatim has
-$O(n^2)$ time complexity.'' \cite[Section~VII]{verbatim}.
+@{text "O(n\<^sup>2)"} time complexity.'' \cite[Section~VII]{verbatim}.
 While their correctness proof for Verbatim is formalised in Coq, the claim about
-the runtime complexity is only supported by some emperical evidence.
+the runtime complexity is only supported by some emperical evidence obtained
+by using the code extraction facilities of Coq.
 In the context of our observation with the ``growth problem'' of derivatives,
 we
 tried out their extracted OCaml code with the example
 \mbox{@{text "(a + aa)\<^sup>*"}} as a single lexing rule, and it took for us around 5 minutes to tokenise a
 string of 40 $a$'s and that increased to approximately 19 minutes when the

changeset 461	c4b6906068a9
parent 460	6e269f557fc5
child 462	d9b672c4c0ac