--- a/thys2/Journal/Paper.thy Mon Nov 01 10:52:44 2021 +0000
+++ b/thys2/Journal/Paper.thy Tue Nov 02 13:57:59 2021 +0000
@@ -131,16 +131,11 @@
(*>*)
-
-
-section \<open>Introduction\<close>
-
-
+section\<open>Core of the proof\<close>
text \<open>
-
-This works builds on previous work by Ausaf and Urban using
+This paper builds on previous work by Ausaf and Urban using
regular expression'd bit-coded derivatives to do lexing that
-is both fast and satisfied the POSIX specification.
+is both fast and satisfies the POSIX specification.
In their work, a bit-coded algorithm introduced by Sulzmann and Lu
was formally verified in Isabelle, by a very clever use of
flex function and retrieve to carefully mimic the way a value is
@@ -148,11 +143,102 @@
In the previous work, Ausaf and Urban established the below equality:
\begin{lemma}
-@{thm [mode=IfThen] bder_retrieve}
+@{thm [mode=IfThen] MAIN_decode}
+\end{lemma}
+
+This lemma establishes a link with the lexer without bit-codes.
+
+With it we get the correctness of bit-coded algorithm.
+\begin{lemma}
+@{thm [mode=IfThen] blexer_correctness}
\end{lemma}
-This lemma links the derivative of a bit-coded regular expression with
-the regular expression itself before the derivative.
+However what is not certain is whether we can add simplification
+to the bit-coded algorithm, without breaking the correct lexing output.
+This might sound trivial in the case of producing a YES/NO answer,
+but once we require a lexing output to be produced (which is required
+in applications like compiler front-end, malicious attack domain extraction,
+etc.), it is not straightforward if we still extract what is needed according
+to the POSIX standard.
+
+By simplification, we mean specifically the following rules:
+
+\begin{center}
+ \begin{tabular}{lcl}
+ @{thm[mode=Axiom] rrewrite.intros(1)[of "bs" "r\<^sub>2"]}\\
+ @{thm[mode=Axiom] rrewrite.intros(2)[of "bs" "r\<^sub>1"]}\\
+ @{thm[mode=Axiom] rrewrite.intros(3)[of "bs" "bs\<^sub>1" "r\<^sub>1"]}\\
+ @{thm[mode=Rule] rrewrite.intros(4)[of "r\<^sub>1" "r\<^sub>2" "bs" "r\<^sub>3"]}\\
+ @{thm[mode=Rule] rrewrite.intros(5)[of "bs" "r\<^sub>2"]}\\
+ @{thm[mode=Rule] rrewrite.intros(6)[of "bs" "r\<^sub>1"]}\\
+ @{thm[mode=Axiom] rrewrite.intros(7)[of "bs" "r\<^sub>2"]}\\
+ @{thm[mode=Axiom] rrewrite.intros(8)[of "bs" "r\<^sub>1"]}\\
+ @{thm[mode=Axiom] rrewrite.intros(9)[of "bs" "r\<^sub>2"]}\\
+ @{thm[mode=Axiom] rrewrite.intros(10)[of "bs" "r\<^sub>1"]}\\
+ @{thm[mode=Axiom] rrewrite.intros(11)[of "bs" ]}\\
+ @{thm[mode=Axiom] rrewrite.intros(12)[of "bs" "r\<^sub>1"]}\\
+ @{thm[mode=Rule] rrewrite.intros(13)[of "a\<^sub>1" "a\<^sub>2" "bs" "rs\<^sub>a" "rs\<^sub>b" "rs\<^sub>c"]}\\
+
+ \end{tabular}
+\end{center}
+
+
+And these can be made compact by the following simplification function:
+
+\begin{center}
+ \begin{tabular}{lcl}
+ @{thm (lhs) bsimp.simps(1)[of "bs" "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) bsimp.simps(1)[of "bs" "r\<^sub>1" "r\<^sub>2"]}\\
+ @{thm (lhs) bsimp.simps(2)} & $\dn$ & @{thm (rhs) bsimp.simps(2)}\\
+ @{thm (lhs) bsimp.simps(3)} & $\dn$ & @{thm (rhs) bsimp.simps(3)}\\
+ @{thm (lhs) intern.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) intern.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
+\end{tabular}
+\end{center}
+
+where the function $\mathit{bsimp_AALTs}$
+
+The core idea of the proof is that two regular expressions,
+if "isomorphic" up to a finite number of rewrite steps, will
+remain "isomorphic" when we take the same sequence of
+derivatives on both of them.
+This can be expressed by the following rewrite relation lemma:
+\begin{lemma}
+@{thm [mode=IfThen] central}
+\end{lemma}
+
+This isomorphic relation implies a property that leads to the
+correctness result:
+if two (nullable) regular expressions are "rewritable" in many steps
+from one another,
+then a call to function $textit{bmkeps}$ gives the same
+bit-sequence :
+\begin{lemma}
+@{thm [mode=IfThen] rewrites_bmkeps}
+\end{lemma}
+
+Given the same bit-sequence, the decode function
+will give out the same value, which is the output
+of both lexers:
+\begin{lemma}
+@{thm blexer_def}
+\end{lemma}
+
+\begin{lemma}
+@{thm blexer_simp_def}
+\end{lemma}
+
+And that yields the correctness result:
+\begin{lemma}
+@{thm blexersimp_correctness}
+\end{lemma}
+
+
+\<close>
+
+section \<open>Introduction\<close>
+
+
+text \<open>
+
Brzozowski \cite{Brzozowski1964} introduced the notion of the {\em
derivative} @{term "der c r"} of a regular expression \<open>r\<close> w.r.t.\
@@ -332,6 +418,9 @@
\<close>
+
+
+
section \<open>Preliminaries\<close>
text \<open>\noindent Strings in Isabelle/HOL are lists of characters with