diff -r 57e33978e55d -r c92a41d9c4da thys3/Paper.thy --- a/thys3/Paper.thy Tue Jul 05 00:42:06 2022 +0100 +++ b/thys3/Paper.thy Sat Jul 09 14:11:07 2022 +0100 @@ -32,6 +32,7 @@ ALT ("_ + _" [77,77] 78) and SEQ ("_ \ _" [77,77] 78) and STAR ("_\<^sup>*" [79] 78) and + NTIMES ("_\<^sup>{\<^sup>_\<^sup>}" [79] 78) and val.Void ("Empty" 78) and val.Char ("Char _" [1000] 78) and @@ -94,7 +95,13 @@ Brzozowski's derivatives \cite{Brzozowski1964} is that they are neatly expressible in any functional language, and easily definable and reasoned about in theorem provers---the definitions just consist of -inductive datatypes and simple recursive functions. Derivatives of a +inductive datatypes and simple recursive functions. Another neat +feature of derivatives is that they can be easily extended to bounded +regular expressions, such as @{term "NTIMES r n"}, where numbers or +intervals specify how many times a regular expression should be used +during matching. + +Derivatives of a regular expression, written @{term "der c r"}, give a simple solution to the problem of matching a string @{term s} with a regular expression @{term r}: if the derivative of @{term r} w.r.t.\ (in @@ -231,7 +238,8 @@ @{term "CH c"} $\mid$ @{term "ALT r\<^sub>1 r\<^sub>2"} $\mid$ @{term "SEQ r\<^sub>1 r\<^sub>2"} $\mid$ - @{term "STAR r"} + @{term "STAR r"} $\mid$ + @{term "NTIMES r n"} \end{center} \noindent where @{const ZERO} stands for the regular expression that does @@ -239,7 +247,13 @@ only the empty string and @{term c} for matching a character literal. The constructors $+$ and $\cdot$ represent alternatives and sequences, respectively. We sometimes omit the $\cdot$ in a sequence regular expression for brevity. - The + In our work here we also add to the usual ``basic'' regular expressions + the bounded regular expression @{term "NTIMES r n"} where the @{term n} + specifies that @{term r} should match exactly @{term n}-times. For + brevity we omit the other bounded regular expressions + @{text "r"}$^{\{..n\}}$, @{text "r"}$^{\{n..\}}$ and @{text "r"}$^{\{n..m\}}$ + which specify an interval for how many times @{text r} should match. Our + results extend straightforwardly also to them. The \emph{language} of a regular expression, written $L(r)$, is defined as usual and we omit giving the definition here (see for example \cite{AusafDyckhoffUrban2016}).