thys3/Paper.thy
changeset 563 c92a41d9c4da
parent 502 1ab693d6342f
child 569 5af61c89f51e
equal deleted inserted replaced
562:57e33978e55d 563:c92a41d9c4da
    30   ONE ("\<^bold>1" 81) and 
    30   ONE ("\<^bold>1" 81) and 
    31   CH ("_" [1000] 80) and
    31   CH ("_" [1000] 80) and
    32   ALT ("_ + _" [77,77] 78) and
    32   ALT ("_ + _" [77,77] 78) and
    33   SEQ ("_ \<cdot> _" [77,77] 78) and
    33   SEQ ("_ \<cdot> _" [77,77] 78) and
    34   STAR ("_\<^sup>*" [79] 78) and
    34   STAR ("_\<^sup>*" [79] 78) and
       
    35   NTIMES ("_\<^sup>{\<^sup>_\<^sup>}" [79] 78) and
    35 
    36 
    36   val.Void ("Empty" 78) and
    37   val.Void ("Empty" 78) and
    37   val.Char ("Char _" [1000] 78) and
    38   val.Char ("Char _" [1000] 78) and
    38   val.Left ("Left _" [79] 78) and
    39   val.Left ("Left _" [79] 78) and
    39   val.Right ("Right _" [1000] 78) and
    40   val.Right ("Right _" [1000] 78) and
    92 expressions have sparked quite a bit of interest in the functional
    93 expressions have sparked quite a bit of interest in the functional
    93 programming and theorem prover communities.  The beauty of
    94 programming and theorem prover communities.  The beauty of
    94 Brzozowski's derivatives \cite{Brzozowski1964} is that they are neatly
    95 Brzozowski's derivatives \cite{Brzozowski1964} is that they are neatly
    95 expressible in any functional language, and easily definable and
    96 expressible in any functional language, and easily definable and
    96 reasoned about in theorem provers---the definitions just consist of
    97 reasoned about in theorem provers---the definitions just consist of
    97 inductive datatypes and simple recursive functions.  Derivatives of a
    98 inductive datatypes and simple recursive functions. Another neat
       
    99 feature of derivatives is that they can be easily extended to bounded
       
   100 regular expressions, such as @{term "NTIMES r n"}, where numbers or
       
   101 intervals specify how many times a regular expression should be used
       
   102 during matching.
       
   103 
       
   104 Derivatives of a
    98 regular expression, written @{term "der c r"}, give a simple solution
   105 regular expression, written @{term "der c r"}, give a simple solution
    99 to the problem of matching a string @{term s} with a regular
   106 to the problem of matching a string @{term s} with a regular
   100 expression @{term r}: if the derivative of @{term r} w.r.t.\ (in
   107 expression @{term r}: if the derivative of @{term r} w.r.t.\ (in
   101 succession) all the characters of the string matches the empty string,
   108 succession) all the characters of the string matches the empty string,
   102 then @{term r} matches @{term s} (and {\em vice versa}).  We are aware
   109 then @{term r} matches @{term s} (and {\em vice versa}).  We are aware
   229   @{const "ZERO"} $\mid$
   236   @{const "ZERO"} $\mid$
   230   @{const "ONE"} $\mid$
   237   @{const "ONE"} $\mid$
   231   @{term "CH c"} $\mid$
   238   @{term "CH c"} $\mid$
   232   @{term "ALT r\<^sub>1 r\<^sub>2"} $\mid$
   239   @{term "ALT r\<^sub>1 r\<^sub>2"} $\mid$
   233   @{term "SEQ r\<^sub>1 r\<^sub>2"} $\mid$
   240   @{term "SEQ r\<^sub>1 r\<^sub>2"} $\mid$
   234   @{term "STAR r"} 
   241   @{term "STAR r"} $\mid$
       
   242   @{term "NTIMES r n"}
   235   \end{center}
   243   \end{center}
   236 
   244 
   237   \noindent where @{const ZERO} stands for the regular expression that does
   245   \noindent where @{const ZERO} stands for the regular expression that does
   238   not match any string, @{const ONE} for the regular expression that matches
   246   not match any string, @{const ONE} for the regular expression that matches
   239   only the empty string and @{term c} for matching a character literal.
   247   only the empty string and @{term c} for matching a character literal.
   240   The constructors $+$ and $\cdot$ represent alternatives and sequences, respectively.
   248   The constructors $+$ and $\cdot$ represent alternatives and sequences, respectively.
   241   We sometimes omit the $\cdot$ in a sequence regular expression for brevity. 
   249   We sometimes omit the $\cdot$ in a sequence regular expression for brevity. 
   242   The
   250   In our work here we also add to the usual ``basic'' regular expressions 
       
   251   the bounded regular expression @{term "NTIMES r n"} where the @{term n}
       
   252   specifies that @{term r} should match exactly @{term n}-times. For
       
   253   brevity we omit the other bounded regular expressions
       
   254   @{text "r"}$^{\{..n\}}$, @{text "r"}$^{\{n..\}}$ and @{text "r"}$^{\{n..m\}}$
       
   255   which specify an interval for how many times @{text r} should match. Our
       
   256   results extend straightforwardly also to them. The
   243   \emph{language} of a regular expression, written $L(r)$, is defined as usual
   257   \emph{language} of a regular expression, written $L(r)$, is defined as usual
   244   and we omit giving the definition here (see for example \cite{AusafDyckhoffUrban2016}).
   258   and we omit giving the definition here (see for example \cite{AusafDyckhoffUrban2016}).
   245 
   259 
   246   Central to Brzozowski's regular expression matcher are two functions
   260   Central to Brzozowski's regular expression matcher are two functions
   247   called @{text nullable} and \emph{derivative}. The latter is written
   261   called @{text nullable} and \emph{derivative}. The latter is written