thys/Paper/Paper.thy
changeset 111 289728193164
parent 110 267afb7fb700
child 112 698967eceaf1
equal deleted inserted replaced
110:267afb7fb700 111:289728193164
     7 
     7 
     8 abbreviation 
     8 abbreviation 
     9  "der_syn r c \<equiv> der c r"
     9  "der_syn r c \<equiv> der c r"
    10 
    10 
    11 notation (latex output)
    11 notation (latex output)
    12    If  ("(\<^raw:\textrm{>if\<^raw:}> (_)/ \<^raw:\textrm{>then\<^raw:}> (_)/ \<^raw:\textrm{>else\<^raw:}> (_))" 10) and
    12   If  ("(\<^raw:\textrm{>if\<^raw:}> (_)/ \<^raw:\textrm{>then\<^raw:}> (_)/ \<^raw:\textrm{>else\<^raw:}> (_))" 10) and
    13   Cons ("_\<^raw:\mbox{$\,$}>::\<^raw:\mbox{$\,$}>_" [78,77] 73) and
    13   Cons ("_\<^raw:\mbox{$\,$}>::\<^raw:\mbox{$\,$}>_" [78,77] 73) and
       
    14   
    14   ZERO ("\<^bold>0" 80) and 
    15   ZERO ("\<^bold>0" 80) and 
    15   ONE ("\<^bold>1" 80) and 
    16   ONE ("\<^bold>1" 80) and 
    16   CHAR ("_" [1000] 80) and
    17   CHAR ("_" [1000] 80) and
    17   ALT ("_ + _" [77,77] 78) and
    18   ALT ("_ + _" [77,77] 78) and
    18   SEQ ("_ \<cdot> _" [77,77] 78) and
    19   SEQ ("_ \<cdot> _" [77,77] 78) and
    19   STAR ("_\<^sup>\<star>" [1000] 78) and
    20   STAR ("_\<^sup>\<star>" [1000] 78) and
       
    21   
       
    22   val.Void ("'(')" 78) and
    20   val.Char ("Char _" [1000] 78) and
    23   val.Char ("Char _" [1000] 78) and
    21   val.Left ("Left _" [1000] 78) and
    24   val.Left ("Left _" [1000] 78) and
    22   val.Right ("Right _" [1000] 78) and
    25   val.Right ("Right _" [1000] 78) and
       
    26   
    23   L ("L'(_')" [10] 78) and
    27   L ("L'(_')" [10] 78) and
    24   der_syn ("_\\_" [79, 1000] 76) and
    28   der_syn ("_\\_" [79, 1000] 76) and
    25   flat ("|_|" [70] 73) and
    29   flat ("|_|" [70] 73) and
    26   Sequ ("_ @ _" [78,77] 63) and
    30   Sequ ("_ @ _" [78,77] 63) and
    27   injval ("inj _ _ _" [1000,77,1000] 77) and 
    31   injval ("inj _ _ _" [1000,77,1000] 77) and 
   151 
   155 
   152 *}
   156 *}
   153 
   157 
   154 section {* Preliminaries *}
   158 section {* Preliminaries *}
   155 
   159 
   156 text {* \noindent Strings in Isabelle/HOL are lists of characters with
   160 text {* \noindent Strings in Isabelle/HOL are lists of characters with the
   157   the empty string being represented by the empty list, written @{term
   161 empty string being represented by the empty list, written @{term "[]"}, and
   158   "[]"}, and list-cons being written as @{term "DUMMY # DUMMY"}.
   162 list-cons being written as @{term "DUMMY # DUMMY"}. Often we use the usual
   159   Often we use the usual bracket notation for strings; for example a
   163 bracket notation for strings; for example a string consisting of just a
   160   string consisting of a single character is written @{term "[c]"}.  By
   164 single character is written @{term "[c]"}. By using the type @{type char}
   161   using the type @{type char} for characters we have a supply of
   165 for characters we have a supply of finitely many characters roughly
   162   finitely many characters roughly corresponding to the ASCII
   166 corresponding to the ASCII character set. Regular expressions are defined as
   163   character set.  Regular expressions are defined as usual as the
   167 usual as the following inductive datatype:
   164   following inductive datatype:
       
   165 
   168 
   166   \begin{center}
   169   \begin{center}
   167   @{text "r :="}
   170   @{text "r :="}
   168   @{const "ZERO"} $\mid$
   171   @{const "ZERO"} $\mid$
   169   @{const "ONE"} $\mid$
   172   @{const "ONE"} $\mid$
   246   \end{tabular}
   249   \end{tabular}
   247   \end{center}
   250   \end{center}
   248 *}
   251 *}
   249 
   252 
   250 section {* POSIX Regular Expression Matching *}
   253 section {* POSIX Regular Expression Matching *}
       
   254 
       
   255 text {* 
       
   256 
       
   257 The clever idea in \cite{Sulzmann2014} is to define a function on values that mirrors
       
   258 (but inverts) the construction of the derivative on regular expressions. We
       
   259 begin with the case of a nullable regular expression: from the nullability
       
   260 we need to construct a value that witnesses the nullability. This is as
       
   261 follows. The @{const mkeps} function (from \cite{Sulzmann2014}) is a partial (but
       
   262 unambiguous) function from regular expressions to values, total on exactly
       
   263 the set of nullable regular expressions. 
       
   264 
       
   265  \begin{center}
       
   266   \begin{tabular}{lcl}
       
   267   @{thm (lhs) mkeps.simps(1)} & $\dn$ & @{thm (rhs) mkeps.simps(1)}\\
       
   268   @{thm (lhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]}\\
       
   269   @{thm (lhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]}\\
       
   270   @{thm (lhs) mkeps.simps(4)} & $\dn$ & @{thm (rhs) mkeps.simps(4)}\\
       
   271   \end{tabular}
       
   272   \end{center}
       
   273 
       
   274 The well-known idea of POSIX lexing is informally defined in (for example)
       
   275 \cite{posix}; as correctly argued in \cite{Sulzmann2014}, this needs formal
       
   276 specification. The rough idea is that, in contrast to the so-called GREEDY
       
   277 algorithm, POSIX lexing chooses to match more deeply and using left choices
       
   278 rather than a right choices. For example, note that to match the string 
       
   279 @{term "[a, b]"} with the regular expression $(a + \mts)\circ (b+ab)$ the matching
       
   280 will return $( Void, Right(ab))$ rather than $(Left\ a, Left\ b)$. [The
       
   281 regular expression $ab$ is short for $(Lit\ a) \circ (Lit\ b)$.] Similarly,
       
   282 to match {\em ``a''} with $(a+a)$ the leftmost $a$ will be chosen.
       
   283 
       
   284 We use a simple inductive definition to specify this notion, incorporating
       
   285 the POSIX-specific choices into the side-conditions for the rules $R tl
       
   286 +_2$, $R tl\circ$ and $R tl*$ (as they are now called). By contrast,
       
   287 \cite{Sulzmann2014} defines a relation between values and argues that there is a
       
   288 maximum value, as given by the derivative-based algorithm yet to be spelt
       
   289 out. The relation we define is ternary, relating strings, values and regular
       
   290 expressions.
       
   291 
       
   292 *}
   251 
   293 
   252 section {* The Argument by Sulzmmann and Lu *}
   294 section {* The Argument by Sulzmmann and Lu *}
   253 
   295 
   254 section {* Conclusion *}
   296 section {* Conclusion *}
   255 
   297 
   276   @{term "Right v"} $\mid$
   318   @{term "Right v"} $\mid$
   277   @{term "Seq v\<^sub>1 v\<^sub>2"} $\mid$ 
   319   @{term "Seq v\<^sub>1 v\<^sub>2"} $\mid$ 
   278   @{term "Stars vs"} 
   320   @{term "Stars vs"} 
   279   \end{center}  
   321   \end{center}  
   280 
   322 
   281   \noindent
       
   282   The language of a regular expression
       
   283 
       
   284   \begin{center}
       
   285   \begin{tabular}{lcl}
       
   286   @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\
       
   287   @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\
       
   288   @{thm (lhs) L.simps(3)} & $\dn$ & @{thm (rhs) L.simps(3)}\\
       
   289   @{thm (lhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
       
   290   @{thm (lhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\
       
   291   @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\
       
   292   \end{tabular}
       
   293   \end{center}
       
   294 
       
   295  
   323  
   296 
   324 
   297   \noindent
   325   \noindent
   298   The @{const flat} function for values
   326   The @{const flat} function for values
   299 
   327 
   310   \end{center}
   338   \end{center}
   311 
   339 
   312   \noindent
   340   \noindent
   313   The @{const mkeps} function
   341   The @{const mkeps} function
   314 
   342 
   315   \begin{center}
   343  
   316   \begin{tabular}{lcl}
       
   317   @{thm (lhs) mkeps.simps(1)} & $\dn$ & @{thm (rhs) mkeps.simps(1)}\\
       
   318   @{thm (lhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]}\\
       
   319   @{thm (lhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]}\\
       
   320   @{thm (lhs) mkeps.simps(4)} & $\dn$ & @{thm (rhs) mkeps.simps(4)}\\
       
   321   \end{tabular}
       
   322   \end{center}
       
   323 
   344 
   324   \noindent
   345   \noindent
   325   The @{text inj} function
   346   The @{text inj} function
   326 
   347 
   327   \begin{center}
   348   \begin{center}