regexp: comparison Paper/Paper.thy

equal deleted inserted replaced

-:6a0efaabde19
+:1436fc451bb9
 UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and
 L ("\<^raw:\ensuremath{\cal{L}}>'(_')" [0] 101) and
 Lam ("\<lambda>'(_')" [100] 100) and
 Trn ("_, _" [100, 100] 100) and
 EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
-transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100)
+transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) and
+Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999)
 (*>*)
 section {* Introduction *}
 \end{tabular}
 \end{center}
 \noindent
 On ``paper'' we can define the corresponding graph in terms of the disjoint
-union of the state nodes. Unfortunately in HOL, the definition for disjoint
+union of the state nodes. Unfortunately in HOL, the standard definition for disjoint
 union, namely
 %
 \begin{equation}\label{disjointunion}
 @{term "UPLUS A\<^isub>1 A\<^isub>2 \<equiv> {(1, x) | x. x \<in> A\<^isub>1} \<union> {(2, y) | y. y \<in> A\<^isub>2}"}
 \end{equation}
 establishing that properties are invariant under renaming. Similarly,
 connecting two automata represented as matrices results in very adhoc
 constructions, which are not pleasant to reason about.
 Functions are much better supported in Isabelle/HOL, but they still lead to similar
-problems as with graphs.  Composing two non-deterministic automata in parallel
+problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
-poses still the problem of how to implement disjoint unions. Nipkow \cite{Nipkow98}
+poses again the problem of how to implement disjoint unions. Nipkow \cite{Nipkow98}
-dismisses the option using identities, because it leads to messy proofs. He
+dismisses the option of using identities, because it leads to ``messy proofs''. He
 opts for a variant of \eqref{disjointunion}, but writes
 \begin{quote}
 \it ``If the reader finds the above treatment in terms of bit lists revoltingly
 concrete, I cannot disagree.''
 A language @{text A} is \emph{regular}, provided there is a regular expression that matches all
 strings of @{text "A"}.
 \end{definition}
 \noindent
-The reason is that regular expressions, unlike graphs and matrices, can
+The reason is that regular expressions, unlike graphs, matrices and functons, can
 be easily defined as inductive datatype. Consequently a corresponding reasoning
 infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation
 of regular expression matching based on derivatives \cite{OwensSlind08}.  The purpose of this paper is to
 show that a central result about regular languages---the Myhill-Nerode theorem---can
 be recreated by only using regular expressions. This theorem gives necessary
 and sufficient conditions for when a language is regular. As a corollary of this
 theorem we can easily establish the usual closure properties, including
 complementation, for regular languages.\smallskip
 \noindent
-{\bf Contributions:} To our knowledge, our proof of the Myhill-Nerode theorem is the
+{\bf Contributions:}
+There is an extensive literature on regular languages.
+To our knowledge, our proof of the Myhill-Nerode theorem is the
 first that is based on regular expressions, only. We prove the part of this theorem
 stating that a regular expression has only finitely many partitions using certain
 tagging-functions. Again to our best knowledge, these tagging functions have
 not been used before to establish the Myhill-Nerode theorem.
 *}
 \end{center}
 \noindent
 where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A}
 is defined as the union over all powers, namely @{thm Star_def}. In the paper
-we will often make use of the following properties.
+we will make use of the following properties of these constructions.
 \begin{proposition}\label{langprops}\mbox{}\\
 \begin{tabular}{@ {}ll@ {\hspace{10mm}}ll}
 (i)   & @{thm star_cases}      & (ii)  & @{thm[mode=IfThen] pow_length}\\
 (iii) & @{thm seq_Union_left}  &
 \end{tabular}
 \end{proposition}
 \noindent
-We omit the proofs of these properties, but invite the reader to consult
+We omit the proofs, but invite the reader to consult
 our formalisation.\footnote{Available at ???}
 The notation for the quotient of a language @{text A} according to an
-equivalence relation @{term REL} is @{term "A // REL"}. We will write
+equivalence relation @{term REL} is in Isabelle/HOL @{term "A // REL"}. We will write
 @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined
 as @{text "{y | y \<approx> x}"}.
 Central to our proof will be the solution of equational systems
 implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Prop.~\ref{langprops}@{text "(iii)"}
 this is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
 \end{proof}
 \noindent
-Regular expressions are defined as the following inductive datatype
+Regular expressions are defined as the inductive datatype
 \begin{center}
 @{text r} @{text "::="}
 @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
 @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
 @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
 @{term "STAR r"}
 \end{center}
 \noindent
-and the language matched by a regular expression is defined as:
+and the language matched by a regular expression is defined as
 \begin{center}
 \begin{tabular}{c@ {\hspace{10mm}}c}
 \begin{tabular}{rcl}
 @{thm (lhs) L_rexp.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(1)}\\
 @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\
 \end{tabular}
 \end{tabular}
 \end{center}
+\noindent
+Given a set or regular expressions @{text rs}, we will need the operation of generating
+a regular expressions that matches all languages of @{text rs}. We only need the existence
+of such an regular expressions therefore we use Isabelle's @{const "fold_graph"} and Hilbert's
+@{text "\<epsilon>"} to define @{term "\<Uplus>rs"} which, roughly speaking, folds @{const ALT} over the
+set @{text rs} with @{const NULL} for the empty set. We can prove that for finite sets @{text rs}
+\begin{center}
+@{thm (lhs) folds_alt_simp}@{text "= \<Union> (\<calL> ` rs)"}
+\end{center}
+\noindent
+holds. (whereby @{text "\<calL> ` rs"} stands for the
+image of the set @{text rs} under function @{text "\<calL>"}).
 *}
 section {* Finite Partitions Imply Regularity of a Language *}

changeset 88	1436fc451bb9
parent 86	6457e668dee5
child 89	42af13d194c9