regexp: comparison Paper/Paper.thy

equal deleted inserted replaced

-:dc879cb59c9c
+:14b12b5de6d3
 pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and
 Suc ("_+1" [100] 100) and
 quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and
 REL ("\<approx>") and
 UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and
-L ("L'(_')" [0] 101) and
+L ("\<^raw:\ensuremath{\cal{L}}>'(_')" [0] 101) and
 Lam ("\<lambda>'(_')" [100] 100) and
 Trn ("_, _" [100, 100] 100) and
 EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
 transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100)
 (*>*)
 regular languages are closed under complementation: one just has to exchange
 the accepting and non-accepting states in the corresponding automaton to
 obtain an automaton for the complement language.  The problem, however, lies with
 formalising such reasoning in a HOL-based theorem prover, in our case
 Isabelle/HOL. Automata are build up from states and transitions that
-need to be represented as graphs or matrices, neither
+need to be represented as graphs, matrices or functions, none
-of which can be defined as inductive datatype.\footnote{In some works
+of which can be defined as inductive datatype.
-functions are used to represent state transitions, but also they are not
-inductive datatypes.} This means we have to build our own reasoning
+In case of graphs and matrices, this means we have to build our own
-infrastructure for them, as neither Isabelle/HOL nor HOL4 nor HOLlight support
+reasoning infrastructure for them, as neither Isabelle/HOL nor HOL4 nor
-them with libraries.
+HOLlight support them with libraries. Even worse, reasoning about graphs and
+matrices can be a real hassle in HOL-based theorem provers.  Consider for
-Even worse, reasoning about graphs and matrices can be a real hassle in HOL-based
+example the operation of sequencing two automata, say $A_1$ and $A_2$, by
-theorem provers.  Consider for example the operation of sequencing
+connecting the accepting states of $A_1$ to the initial state of $A_2$:
-two automata, say $A_1$ and $A_2$, by connecting the
-accepting states of $A_1$ to the initial state of $A_2$:
 \begin{center}
 \begin{tabular}{ccc}
 \begin{tikzpicture}[scale=0.8]
 %\draw[step=2mm] (-1,-1) grid (1,1);
 \noindent
 On ``paper'' we can define the corresponding graph in terms of the disjoint
 union of the state nodes. Unfortunately in HOL, the definition for disjoint
 union, namely
+%
-\begin{center}
+\begin{equation}\label{disjointunion}
 @{term "UPLUS A\<^isub>1 A\<^isub>2 \<equiv> {(1, x) | x. x \<in> A\<^isub>1} \<union> {(2, y) | y. y \<in> A\<^isub>2}"}
-\end{center}
+\end{equation}
 \noindent
 changes the type---the disjoint union is not a set, but a set of pairs.
 Using this definition for disjoint unions means we do not have a single type for automata
 and hence will not be able to state properties about \emph{all}
 connecting two automata. This results in clunky proofs
 establishing that properties are invariant under renaming. Similarly,
 connecting two automata represented as matrices results in very adhoc
 constructions, which are not pleasant to reason about.
+Functions are much better supported in Isabelle/HOL, but they still lead to similar
+problems as with graphs.  Composing two non-deterministic automata in parallel
+poses still the problem of how to implement disjoint unions. Nipkow \cite{Nipkow98}
+dismisses the option using identities, because it leads to messy proofs. He
+opts for a variant of \eqref{disjointunion}, but writes
+\begin{quote}
+\it ``If the reader finds the above treatment in terms of bit lists revoltingly
+concrete, I cannot disagree.''
+\end{quote}
+\noindent
+Moreover, it is not so clear how to conveniently impose a finiteness condition
+upon functions in order to represent \emph{finite} automata. The best is
+probably to resort to more advanced reasoning frameworks, such as \emph{locales}.
 Because of these problems to do with representing automata, there seems
 to be no substantial formalisation of automata theory and regular languages
-carried out in a HOL-based theorem prover. We are only aware of the
+carried out in a HOL-based theorem prover. Nipkow establishes in
-large formalisation of automata theory in Nuprl \cite{Constable00} and
+\cite{Nipkow98} the link between regular expressions and automata in
-some smaller formalisations in Coq (for example \cite{Filliatre97}).
+the context of lexing. The only larger formalisations of automata theory
+are carried out in Nuprl \cite{Constable00} and in Coq (for example
-In this paper, we will not attempt to formalise automata theory, but take a completely
+\cite{Filliatre97}).
-different approach to regular languages. Instead of defining a regular language as one
-where there exists an automaton that recognises all strings of the language, we define
+In this paper, we will not attempt to formalise automata theory in
-a regular language as:
+Isabelle/HOL, but take a completely different approach to regular
+languages. Instead of defining a regular language as one where there exists
-\begin{definition}[A Regular Language]
+an automaton that recognises all strings of the language, we define a
+regular language as:
+\begin{definition}
 A language @{text A} is \emph{regular}, provided there is a regular expression that matches all
 strings of @{text "A"}.
 \end{definition}
 \noindent
 @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm}
 @{term "STAR r"}
 \end{center}
 \noindent
-The language matched by a regular expression is defined as usual:
+and the language matched by a regular expression is defined as:
 \begin{center}
 \begin{tabular}{c@ {\hspace{10mm}}c}
 \begin{tabular}{rcl}
 @{thm (lhs) L_rexp.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(1)}\\
 @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\
 \end{tabular}
 \end{tabular}
 \end{center}
 *}
 section {* Finite Partitions Imply Regularity of a Language *}
 text {*
 @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, CHAR c\<^isub>n\<^isub>1) + \<dots> + (Y\<^isub>n\<^isub>q, CHAR c\<^isub>n\<^isub>q)"}\\
 \end{tabular}
 \end{center}
 \noindent
-where the pairs @{text "(Y\<^isub>i\<^isub>j, r\<^isub>i\<^isub>j)"} stand for all transitions
+where the pairs @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} stand for all transitions
-@{term "Y\<^isub>i\<^isub>j \<Turnstile>r\<^isub>i\<^isub>j\<Rightarrow> X\<^isub>i"}.  The term @{text "\<lambda>(EMPTY)"} acts as a marker for the equivalence
+@{term "Y\<^isub>i\<^isub>j \<Turnstile>(CHAR c\<^isub>i\<^isub>j)\<Rightarrow> X\<^isub>i"}.  The term @{text "\<lambda>(EMPTY)"} acts as a marker for the equivalence
 class containing @{text "[]"}. (Note that we mark, roughly speaking, the
 single ``initial'' state in the equational system, which is different from
 the method by Brzozowski \cite{Brzozowski64}, since for his purposes he needs to mark
 the ``terminal'' states.) Overloading the function @{text L} for the two kinds of terms in the
 equational system as follows

changeset 82	14b12b5de6d3
parent 79	bba9c80735f9
child 83	f438f4dbaada