thys2/Journal/Paper.thy
changeset 371 f65444d29e74
parent 367 5de2d217b682
child 372 78cc255e286f
equal deleted inserted replaced
370:5499ba68188c 371:f65444d29e74
   129 *)
   129 *)
   130 
   130 
   131 
   131 
   132 (*>*)
   132 (*>*)
   133 
   133 
   134 
   134 section\<open>Core of the proof\<close>
   135 
       
   136 section \<open>Introduction\<close>
       
   137 
       
   138 
       
   139 text \<open>
   135 text \<open>
   140 
   136 This paper builds on previous work by Ausaf and Urban using 
   141 This works builds on previous work by Ausaf and Urban using 
       
   142 regular expression'd bit-coded derivatives to do lexing that 
   137 regular expression'd bit-coded derivatives to do lexing that 
   143 is both fast and satisfied the POSIX specification.
   138 is both fast and satisfies the POSIX specification.
   144 In their work, a bit-coded algorithm introduced by Sulzmann and Lu
   139 In their work, a bit-coded algorithm introduced by Sulzmann and Lu
   145 was formally verified in Isabelle, by a very clever use of
   140 was formally verified in Isabelle, by a very clever use of
   146 flex function and retrieve to carefully mimic the way a value is 
   141 flex function and retrieve to carefully mimic the way a value is 
   147 built up by the injection funciton.
   142 built up by the injection funciton.
   148 
   143 
   149 In the previous work, Ausaf and Urban established the below equality:
   144 In the previous work, Ausaf and Urban established the below equality:
   150 \begin{lemma}
   145 \begin{lemma}
   151 @{thm [mode=IfThen] bder_retrieve}
   146 @{thm [mode=IfThen] MAIN_decode}
   152 \end{lemma}
   147 \end{lemma}
   153 
   148 
   154 This lemma links the derivative of a bit-coded regular expression with
   149 This lemma establishes a link with the lexer without bit-codes.
   155 the regular expression itself before the derivative. 
   150 
       
   151 With it we get the correctness of bit-coded algorithm.
       
   152 \begin{lemma}
       
   153 @{thm [mode=IfThen] blexer_correctness}
       
   154 \end{lemma}
       
   155 
       
   156 However what is not certain is whether we can add simplification
       
   157 to the bit-coded algorithm, without breaking the correct lexing output.
       
   158 This might sound trivial in the case of producing a YES/NO answer,
       
   159 but once we require a lexing output to be produced (which is required
       
   160 in applications like compiler front-end, malicious attack domain extraction, 
       
   161 etc.), it is not straightforward if we still extract what is needed according
       
   162 to the POSIX standard.
       
   163 
       
   164 By simplification, we mean specifically the following rules:
       
   165 
       
   166 \begin{center}
       
   167   \begin{tabular}{lcl}
       
   168   @{thm[mode=Axiom] rrewrite.intros(1)[of "bs" "r\<^sub>2"]}\\
       
   169   @{thm[mode=Axiom] rrewrite.intros(2)[of "bs" "r\<^sub>1"]}\\
       
   170   @{thm[mode=Axiom] rrewrite.intros(3)[of  "bs" "bs\<^sub>1" "r\<^sub>1"]}\\
       
   171   @{thm[mode=Rule] rrewrite.intros(4)[of  "r\<^sub>1" "r\<^sub>2" "bs" "r\<^sub>3"]}\\
       
   172   @{thm[mode=Rule] rrewrite.intros(5)[of "bs" "r\<^sub>2"]}\\
       
   173   @{thm[mode=Rule] rrewrite.intros(6)[of "bs" "r\<^sub>1"]}\\
       
   174   @{thm[mode=Axiom] rrewrite.intros(7)[of "bs" "r\<^sub>2"]}\\
       
   175   @{thm[mode=Axiom] rrewrite.intros(8)[of "bs" "r\<^sub>1"]}\\
       
   176   @{thm[mode=Axiom] rrewrite.intros(9)[of "bs" "r\<^sub>2"]}\\
       
   177   @{thm[mode=Axiom] rrewrite.intros(10)[of "bs" "r\<^sub>1"]}\\
       
   178   @{thm[mode=Axiom] rrewrite.intros(11)[of "bs" ]}\\
       
   179   @{thm[mode=Axiom] rrewrite.intros(12)[of "bs" "r\<^sub>1"]}\\
       
   180   @{thm[mode=Rule] rrewrite.intros(13)[of "a\<^sub>1" "a\<^sub>2" "bs" "rs\<^sub>a" "rs\<^sub>b" "rs\<^sub>c"]}\\
       
   181 
       
   182   \end{tabular}
       
   183 \end{center}
       
   184 
       
   185 
       
   186 And these can be made compact by the following simplification function:
       
   187 
       
   188 \begin{center}
       
   189   \begin{tabular}{lcl}
       
   190   @{thm (lhs) bsimp.simps(1)[of  "bs" "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) bsimp.simps(1)[of  "bs" "r\<^sub>1" "r\<^sub>2"]}\\
       
   191   @{thm (lhs) bsimp.simps(2)} & $\dn$ & @{thm (rhs) bsimp.simps(2)}\\
       
   192   @{thm (lhs) bsimp.simps(3)} & $\dn$ & @{thm (rhs) bsimp.simps(3)}\\
       
   193   @{thm (lhs) intern.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) intern.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
       
   194 \end{tabular}
       
   195 \end{center}
       
   196 
       
   197 where the function $\mathit{bsimp_AALTs}$
       
   198 
       
   199 The core idea of the proof is that two regular expressions,
       
   200 if "isomorphic" up to a finite number of rewrite steps, will
       
   201 remain "isomorphic" when we take the same sequence of
       
   202 derivatives on both of them.
       
   203 This can be expressed by the following rewrite relation lemma:
       
   204 \begin{lemma}
       
   205 @{thm [mode=IfThen] central}
       
   206 \end{lemma}
       
   207 
       
   208 This isomorphic relation implies a property that leads to the 
       
   209 correctness result: 
       
   210 if two (nullable) regular expressions are "rewritable" in many steps
       
   211 from one another, 
       
   212 then a call to function $textit{bmkeps}$ gives the same
       
   213 bit-sequence :
       
   214 \begin{lemma}
       
   215 @{thm [mode=IfThen] rewrites_bmkeps}
       
   216 \end{lemma}
       
   217 
       
   218 Given the same bit-sequence, the decode function
       
   219 will give out the same value, which is the output
       
   220 of both lexers:
       
   221 \begin{lemma}
       
   222 @{thm blexer_def}
       
   223 \end{lemma}
       
   224 
       
   225 \begin{lemma}
       
   226 @{thm blexer_simp_def}
       
   227 \end{lemma}
       
   228 
       
   229 And that yields the correctness result:
       
   230 \begin{lemma}
       
   231 @{thm blexersimp_correctness}
       
   232 \end{lemma}
       
   233 
       
   234 
       
   235 \<close>
       
   236 
       
   237 section \<open>Introduction\<close>
       
   238 
       
   239 
       
   240 text \<open>
       
   241 
   156 
   242 
   157 Brzozowski \cite{Brzozowski1964} introduced the notion of the {\em
   243 Brzozowski \cite{Brzozowski1964} introduced the notion of the {\em
   158 derivative} @{term "der c r"} of a regular expression \<open>r\<close> w.r.t.\
   244 derivative} @{term "der c r"} of a regular expression \<open>r\<close> w.r.t.\
   159 a character~\<open>c\<close>, and showed that it gave a simple solution to the
   245 a character~\<open>c\<close>, and showed that it gave a simple solution to the
   160 problem of matching a string @{term s} with a regular expression @{term
   246 problem of matching a string @{term s} with a regular expression @{term
   329 %@{term r} are applied. 
   415 %@{term r} are applied. 
   330 
   416 
   331 We extend our results to ??? Bitcoded version??
   417 We extend our results to ??? Bitcoded version??
   332 
   418 
   333 \<close>
   419 \<close>
       
   420 
       
   421 
       
   422 
   334 
   423 
   335 section \<open>Preliminaries\<close>
   424 section \<open>Preliminaries\<close>
   336 
   425 
   337 text \<open>\noindent Strings in Isabelle/HOL are lists of characters with
   426 text \<open>\noindent Strings in Isabelle/HOL are lists of characters with
   338 the empty string being represented by the empty list, written @{term
   427 the empty string being represented by the empty list, written @{term