30 ONE ("\<^bold>1" 81) and |
30 ONE ("\<^bold>1" 81) and |
31 CH ("_" [1000] 80) and |
31 CH ("_" [1000] 80) and |
32 ALT ("_ + _" [77,77] 78) and |
32 ALT ("_ + _" [77,77] 78) and |
33 SEQ ("_ \<cdot> _" [77,77] 78) and |
33 SEQ ("_ \<cdot> _" [77,77] 78) and |
34 STAR ("_\<^sup>*" [79] 78) and |
34 STAR ("_\<^sup>*" [79] 78) and |
|
35 NTIMES ("_\<^sup>{\<^sup>_\<^sup>}" [79] 78) and |
35 |
36 |
36 val.Void ("Empty" 78) and |
37 val.Void ("Empty" 78) and |
37 val.Char ("Char _" [1000] 78) and |
38 val.Char ("Char _" [1000] 78) and |
38 val.Left ("Left _" [79] 78) and |
39 val.Left ("Left _" [79] 78) and |
39 val.Right ("Right _" [1000] 78) and |
40 val.Right ("Right _" [1000] 78) and |
92 expressions have sparked quite a bit of interest in the functional |
93 expressions have sparked quite a bit of interest in the functional |
93 programming and theorem prover communities. The beauty of |
94 programming and theorem prover communities. The beauty of |
94 Brzozowski's derivatives \cite{Brzozowski1964} is that they are neatly |
95 Brzozowski's derivatives \cite{Brzozowski1964} is that they are neatly |
95 expressible in any functional language, and easily definable and |
96 expressible in any functional language, and easily definable and |
96 reasoned about in theorem provers---the definitions just consist of |
97 reasoned about in theorem provers---the definitions just consist of |
97 inductive datatypes and simple recursive functions. Derivatives of a |
98 inductive datatypes and simple recursive functions. Another neat |
|
99 feature of derivatives is that they can be easily extended to bounded |
|
100 regular expressions, such as @{term "NTIMES r n"}, where numbers or |
|
101 intervals specify how many times a regular expression should be used |
|
102 during matching. |
|
103 |
|
104 Derivatives of a |
98 regular expression, written @{term "der c r"}, give a simple solution |
105 regular expression, written @{term "der c r"}, give a simple solution |
99 to the problem of matching a string @{term s} with a regular |
106 to the problem of matching a string @{term s} with a regular |
100 expression @{term r}: if the derivative of @{term r} w.r.t.\ (in |
107 expression @{term r}: if the derivative of @{term r} w.r.t.\ (in |
101 succession) all the characters of the string matches the empty string, |
108 succession) all the characters of the string matches the empty string, |
102 then @{term r} matches @{term s} (and {\em vice versa}). We are aware |
109 then @{term r} matches @{term s} (and {\em vice versa}). We are aware |
229 @{const "ZERO"} $\mid$ |
236 @{const "ZERO"} $\mid$ |
230 @{const "ONE"} $\mid$ |
237 @{const "ONE"} $\mid$ |
231 @{term "CH c"} $\mid$ |
238 @{term "CH c"} $\mid$ |
232 @{term "ALT r\<^sub>1 r\<^sub>2"} $\mid$ |
239 @{term "ALT r\<^sub>1 r\<^sub>2"} $\mid$ |
233 @{term "SEQ r\<^sub>1 r\<^sub>2"} $\mid$ |
240 @{term "SEQ r\<^sub>1 r\<^sub>2"} $\mid$ |
234 @{term "STAR r"} |
241 @{term "STAR r"} $\mid$ |
|
242 @{term "NTIMES r n"} |
235 \end{center} |
243 \end{center} |
236 |
244 |
237 \noindent where @{const ZERO} stands for the regular expression that does |
245 \noindent where @{const ZERO} stands for the regular expression that does |
238 not match any string, @{const ONE} for the regular expression that matches |
246 not match any string, @{const ONE} for the regular expression that matches |
239 only the empty string and @{term c} for matching a character literal. |
247 only the empty string and @{term c} for matching a character literal. |
240 The constructors $+$ and $\cdot$ represent alternatives and sequences, respectively. |
248 The constructors $+$ and $\cdot$ represent alternatives and sequences, respectively. |
241 We sometimes omit the $\cdot$ in a sequence regular expression for brevity. |
249 We sometimes omit the $\cdot$ in a sequence regular expression for brevity. |
242 The |
250 In our work here we also add to the usual ``basic'' regular expressions |
|
251 the bounded regular expression @{term "NTIMES r n"} where the @{term n} |
|
252 specifies that @{term r} should match exactly @{term n}-times. For |
|
253 brevity we omit the other bounded regular expressions |
|
254 @{text "r"}$^{\{..n\}}$, @{text "r"}$^{\{n..\}}$ and @{text "r"}$^{\{n..m\}}$ |
|
255 which specify an interval for how many times @{text r} should match. Our |
|
256 results extend straightforwardly also to them. The |
243 \emph{language} of a regular expression, written $L(r)$, is defined as usual |
257 \emph{language} of a regular expression, written $L(r)$, is defined as usual |
244 and we omit giving the definition here (see for example \cite{AusafDyckhoffUrban2016}). |
258 and we omit giving the definition here (see for example \cite{AusafDyckhoffUrban2016}). |
245 |
259 |
246 Central to Brzozowski's regular expression matcher are two functions |
260 Central to Brzozowski's regular expression matcher are two functions |
247 called @{text nullable} and \emph{derivative}. The latter is written |
261 called @{text nullable} and \emph{derivative}. The latter is written |