lexing: comparison thys2/Journal/Paper.thy

equal deleted inserted replaced

-:5499ba68188c
+:f65444d29e74
 *)
 (*>*)
+section\<open>Core of the proof\<close>
-section \<open>Introduction\<close>
 text \<open>
+This paper builds on previous work by Ausaf and Urban using
-This works builds on previous work by Ausaf and Urban using
 regular expression'd bit-coded derivatives to do lexing that
-is both fast and satisfied the POSIX specification.
+is both fast and satisfies the POSIX specification.
 In their work, a bit-coded algorithm introduced by Sulzmann and Lu
 was formally verified in Isabelle, by a very clever use of
 flex function and retrieve to carefully mimic the way a value is
 built up by the injection funciton.
 In the previous work, Ausaf and Urban established the below equality:
 \begin{lemma}
-@{thm [mode=IfThen] bder_retrieve}
+@{thm [mode=IfThen] MAIN_decode}
 \end{lemma}
-This lemma links the derivative of a bit-coded regular expression with
+This lemma establishes a link with the lexer without bit-codes.
-the regular expression itself before the derivative.
+With it we get the correctness of bit-coded algorithm.
+\begin{lemma}
+@{thm [mode=IfThen] blexer_correctness}
+\end{lemma}
+However what is not certain is whether we can add simplification
+to the bit-coded algorithm, without breaking the correct lexing output.
+This might sound trivial in the case of producing a YES/NO answer,
+but once we require a lexing output to be produced (which is required
+in applications like compiler front-end, malicious attack domain extraction,
+etc.), it is not straightforward if we still extract what is needed according
+to the POSIX standard.
+By simplification, we mean specifically the following rules:
+\begin{center}
+\begin{tabular}{lcl}
+@{thm[mode=Axiom] rrewrite.intros(1)[of "bs" "r\<^sub>2"]}\\
+@{thm[mode=Axiom] rrewrite.intros(2)[of "bs" "r\<^sub>1"]}\\
+@{thm[mode=Axiom] rrewrite.intros(3)[of  "bs" "bs\<^sub>1" "r\<^sub>1"]}\\
+@{thm[mode=Rule] rrewrite.intros(4)[of  "r\<^sub>1" "r\<^sub>2" "bs" "r\<^sub>3"]}\\
+@{thm[mode=Rule] rrewrite.intros(5)[of "bs" "r\<^sub>2"]}\\
+@{thm[mode=Rule] rrewrite.intros(6)[of "bs" "r\<^sub>1"]}\\
+@{thm[mode=Axiom] rrewrite.intros(7)[of "bs" "r\<^sub>2"]}\\
+@{thm[mode=Axiom] rrewrite.intros(8)[of "bs" "r\<^sub>1"]}\\
+@{thm[mode=Axiom] rrewrite.intros(9)[of "bs" "r\<^sub>2"]}\\
+@{thm[mode=Axiom] rrewrite.intros(10)[of "bs" "r\<^sub>1"]}\\
+@{thm[mode=Axiom] rrewrite.intros(11)[of "bs" ]}\\
+@{thm[mode=Axiom] rrewrite.intros(12)[of "bs" "r\<^sub>1"]}\\
+@{thm[mode=Rule] rrewrite.intros(13)[of "a\<^sub>1" "a\<^sub>2" "bs" "rs\<^sub>a" "rs\<^sub>b" "rs\<^sub>c"]}\\
+\end{tabular}
+\end{center}
+And these can be made compact by the following simplification function:
+\begin{center}
+\begin{tabular}{lcl}
+@{thm (lhs) bsimp.simps(1)[of  "bs" "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) bsimp.simps(1)[of  "bs" "r\<^sub>1" "r\<^sub>2"]}\\
+@{thm (lhs) bsimp.simps(2)} & $\dn$ & @{thm (rhs) bsimp.simps(2)}\\
+@{thm (lhs) bsimp.simps(3)} & $\dn$ & @{thm (rhs) bsimp.simps(3)}\\
+@{thm (lhs) intern.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) intern.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
+\end{tabular}
+\end{center}
+where the function $\mathit{bsimp_AALTs}$
+The core idea of the proof is that two regular expressions,
+if "isomorphic" up to a finite number of rewrite steps, will
+remain "isomorphic" when we take the same sequence of
+derivatives on both of them.
+This can be expressed by the following rewrite relation lemma:
+\begin{lemma}
+@{thm [mode=IfThen] central}
+\end{lemma}
+This isomorphic relation implies a property that leads to the
+correctness result:
+if two (nullable) regular expressions are "rewritable" in many steps
+from one another,
+then a call to function $textit{bmkeps}$ gives the same
+bit-sequence :
+\begin{lemma}
+@{thm [mode=IfThen] rewrites_bmkeps}
+\end{lemma}
+Given the same bit-sequence, the decode function
+will give out the same value, which is the output
+of both lexers:
+\begin{lemma}
+@{thm blexer_def}
+\end{lemma}
+\begin{lemma}
+@{thm blexer_simp_def}
+\end{lemma}
+And that yields the correctness result:
+\begin{lemma}
+@{thm blexersimp_correctness}
+\end{lemma}
+\<close>
+section \<open>Introduction\<close>
+text \<open>
 Brzozowski \cite{Brzozowski1964} introduced the notion of the {\em
 derivative} @{term "der c r"} of a regular expression \<open>r\<close> w.r.t.\
 a character~\<open>c\<close>, and showed that it gave a simple solution to the
 problem of matching a string @{term s} with a regular expression @{term
 %@{term r} are applied.
 We extend our results to ??? Bitcoded version??
 \<close>
 section \<open>Preliminaries\<close>
 text \<open>\noindent Strings in Isabelle/HOL are lists of characters with
 the empty string being represented by the empty list, written @{term

changeset 371	f65444d29e74
parent 367	5de2d217b682
child 372	78cc255e286f