--- a/Paper/Paper.thy Fri Feb 18 14:26:23 2011 +0000
+++ b/Paper/Paper.thy Fri Feb 18 15:06:06 2011 +0000
@@ -45,7 +45,7 @@
Regular languages are an important and well-understood subject in Computer
Science, with many beautiful theorems and many useful algorithms. There is a
wide range of textbooks on this subject, many of which are aimed at students
- and contain very detailed ``pencil-and-paper'' proofs
+ and contain very detailed `pencil-and-paper' proofs
(e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by
formalising the theorems and by verifying formally the algorithms.
@@ -58,7 +58,7 @@
the accepting and non-accepting states in the corresponding automaton to
obtain an automaton for the complement language. The problem, however, lies with
formalising such reasoning in a HOL-based theorem prover, in our case
- Isabelle/HOL. Automata are build up from states and transitions that
+ Isabelle/HOL. Automata are built up from states and transitions that
need to be represented as graphs, matrices or functions, none
of which can be defined as inductive datatype.
@@ -124,7 +124,7 @@
\end{center}
\noindent
- On ``paper'' we can define the corresponding graph in terms of the disjoint
+ On `paper' we can define the corresponding graph in terms of the disjoint
union of the state nodes. Unfortunately in HOL, the standard definition for disjoint
union, namely
%
@@ -181,13 +181,13 @@
Because of these problems to do with representing automata, there seems
to be no substantial formalisation of automata theory and regular languages
- carried out in HOL-based theorem provers. Nipkow establishes in
- \cite{Nipkow98} the link between regular expressions and automata in
- the context of lexing. Berghofer and Reiter formalise automata working over
- bit strings in the context of Presburger arithmetic \cite{BerghoferReiter09}.
+ carried out in HOL-based theorem provers. Nipkow \cite{Nipkow98} establishes
+ the link between regular expressions and automata in
+ the context of lexing. Berghofer and Reiter \cite{BerghoferReiter09}
+ formalise automata working over
+ bit strings in the context of Presburger arithmetic.
The only larger formalisations of automata theory
- are carried out in Nuprl \cite{Constable00} and in Coq (for example
- \cite{Filliatre97}).
+ are carried out in Nuprl \cite{Constable00} and in Coq \cite{Filliatre97}.
In this paper, we will not attempt to formalise automata theory in
Isabelle/HOL, but take a completely different approach to regular
@@ -268,14 +268,14 @@
Central to our proof will be the solution of equational systems
- involving equivalence classes of languages. For this we will use Arden's lemma \cite{Brzozowski64}
+ involving equivalence classes of languages. For this we will use Arden's lemma \cite{Brzozowski64},
which solves equations of the form @{term "X = A ;; X \<union> B"} provided
- @{term "[] \<notin> A"}. However we will need the following ``reverse''
+ @{term "[] \<notin> A"}. However we will need the following `reverse'
version of Arden's lemma.
\begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\
If @{thm (prem 1) arden} then
- @{thm (lhs) arden} has the unique solution
+ @{thm (lhs) arden} if and only if
@{thm (rhs) arden}.
\end{lemma}
@@ -365,7 +365,7 @@
The key definition in the Myhill-Nerode theorem is the
\emph{Myhill-Nerode relation}, which states that w.r.t.~a language two
strings are related, provided there is no distinguishing extension in this
- language. This can be defined as:
+ language. This can be defined as tertiary relation:
\begin{definition}[Myhill-Nerode Relation]\mbox{}\\
@{thm str_eq_def[simplified str_eq_rel_def Pair_Collect]}
@@ -453,11 +453,11 @@
equivalence classes and only finitely many characters. The term @{text
"\<lambda>(EMPTY)"} in the first equation acts as a marker for the equivalence class
containing @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
- single ``initial'' state in the equational system, which is different from
+ single `initial' state in the equational system, which is different from
the method by Brzozowski \cite{Brzozowski64}, where he marks the
- ``terminal'' states. We are forced to set up the equational system in our
- way, because the Myhill-Nerode relation determines the ``direction'' of the
- transitions. The successor ``state'' of an equivalence class @{text Y} can
+ `terminal' states. We are forced to set up the equational system in our
+ way, because the Myhill-Nerode relation determines the `direction' of the
+ transitions. The successor `state' of an equivalence class @{text Y} can
be reached by adding characters to the end of @{text Y}. This is also the
reason why we have to use our reverse version of Arden's lemma.}
Overloading the function @{text \<calL>} for the two kinds of terms in the
@@ -569,7 +569,7 @@
\begin{lemma}\label{ardenable}
Given an equation @{text "X = rhs"}.
If @{text "X = \<Union>\<calL> ` rhs"},
- @{thm (prem 2) Arden_keeps_eq} and
+ @{thm (prem 2) Arden_keeps_eq}, and
@{thm (prem 3) Arden_keeps_eq}, then
@{text "X = \<Union>\<calL> ` (Arden X rhs)"}
\end{lemma}
@@ -640,11 +640,11 @@
\noindent
We are not concerned here with the definition of this operator
- (see \cite{BerghoferNipkow00}), but note that we eliminate
+ (see Berghofer and Nipkow \cite{BerghoferNipkow00}), but note that we eliminate
in each @{const Iter}-step a single equation, and therefore
have a well-founded termination order by taking the cardinality
of the equational system @{text ES}. This enables us to prove
- properties about our definition of @{const Solve} when we ``call'' it with
+ properties about our definition of @{const Solve} when we `call' it with
the equivalence class @{text X} and the initial equational system
@{term "Init (UNIV // \<approx>A)"} from
\eqref{initcs} using the principle:
@@ -690,7 +690,7 @@
\noindent
The first two ensure that the equational system is always finite (number of equations
- and number of terms in each equation); the second makes sure the ``meaning'' of the
+ and number of terms in each equation); the second makes sure the `meaning' of the
equations is preserved under our transformations. The other properties are a bit more
technical, but are needed to get our proof through. Distinctness states that every
equation in the system is distinct. Ardenable ensures that we can always
@@ -880,8 +880,8 @@
text {*
In this paper we took the view that a regular language is one where there
- exists a regular expression that matches all its strings. Regular
- expressions can be conveniently defined as a datatype in a HOL-based theorem
+ exists a regular expression that matches all of its strings. Regular
+ expressions can conveniently be defined as a datatype in a HOL-based theorem
prover. For us it was therefore interesting to find out how far we can push
this point of view.
@@ -940,7 +940,7 @@
state leading up to that state. Usually, however, the states are characterised as the
ones starting from that state leading to the terminal states. The first
choice has consequences how the initial equational system is set up. We have
- the $\lambda$-term on our ``initial state'', while Brzozowski has it on the
+ the $\lambda$-term on our `initial state', while Brzozowski has it on the
terminal states. This means we also need to reverse the direction of Arden's
lemma.
@@ -964,7 +964,7 @@
limitations. One is that there seems to be no notion of a minimal regular
expression, like there is for automata. For an implementation of a simple
regular expression matcher, whose correctness has been formally
- established, we refer the reader to \cite{OwensSlind08}.
+ established, we refer the reader to Owens and Slind \cite{OwensSlind08}.
*}
--- a/Paper/document/root.tex Fri Feb 18 14:26:23 2011 +0000
+++ b/Paper/document/root.tex Fri Feb 18 15:06:06 2011 +0000
@@ -38,8 +38,8 @@
\begin{abstract}
There are numerous textbooks on regular languages. Nearly all of them
introduce the subject by describing finite automata and only mentioning on the
-side a connection with regular expressions. Unfortunately, automata are a
-hassle for formalisations in HOL-based theorem provers. The reason is that
+side a connection with regular expressions. Unfortunately, automata are difficult
+to formalise in HOL-based theorem provers. The reason is that
they need to be represented as graphs, matrices or functions, none of which
are inductive datatypes. Also convenient operations for disjoint unions of
graphs and functions are not easily formalisiable in HOL. In contrast, regular