regexp: changeset 115:c5f138b5fc88

--- a/Paper/Paper.thy	Fri Feb 18 14:26:23 2011 +0000
+++ b/Paper/Paper.thy	Fri Feb 18 15:06:06 2011 +0000
@@ -45,7 +45,7 @@
   Regular languages are an important and well-understood subject in Computer
   Science, with many beautiful theorems and many useful algorithms. There is a
   wide range of textbooks on this subject, many of which are aimed at students
-  and contain very detailed ``pencil-and-paper'' proofs
+  and contain very detailed `pencil-and-paper' proofs
   (e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by
   formalising the theorems and by verifying formally the algorithms.
 
@@ -58,7 +58,7 @@
   the accepting and non-accepting states in the corresponding automaton to
   obtain an automaton for the complement language.  The problem, however, lies with
   formalising such reasoning in a HOL-based theorem prover, in our case
-  Isabelle/HOL. Automata are build up from states and transitions that 
+  Isabelle/HOL. Automata are built up from states and transitions that 
   need to be represented as graphs, matrices or functions, none
   of which can be defined as inductive datatype. 
 
@@ -124,7 +124,7 @@
   \end{center}
 
   \noindent
-  On ``paper'' we can define the corresponding graph in terms of the disjoint 
+  On `paper' we can define the corresponding graph in terms of the disjoint 
   union of the state nodes. Unfortunately in HOL, the standard definition for disjoint 
   union, namely 
   %
@@ -181,13 +181,13 @@
 
   Because of these problems to do with representing automata, there seems
   to be no substantial formalisation of automata theory and regular languages 
-  carried out in HOL-based theorem provers. Nipkow establishes in 
-  \cite{Nipkow98} the link between regular expressions and automata in
-  the context of lexing. Berghofer and Reiter formalise automata working over 
-  bit strings in the context of Presburger arithmetic \cite{BerghoferReiter09}.
+  carried out in HOL-based theorem provers. Nipkow  \cite{Nipkow98} establishes
+  the link between regular expressions and automata in
+  the context of lexing. Berghofer and Reiter \cite{BerghoferReiter09} 
+  formalise automata working over 
+  bit strings in the context of Presburger arithmetic.
   The only larger formalisations of automata theory 
-  are carried out in Nuprl \cite{Constable00} and in Coq (for example 
-  \cite{Filliatre97}).
+  are carried out in Nuprl \cite{Constable00} and in Coq \cite{Filliatre97}.
   
   In this paper, we will not attempt to formalise automata theory in
   Isabelle/HOL, but take a completely different approach to regular
@@ -268,14 +268,14 @@
 
 
   Central to our proof will be the solution of equational systems
-  involving equivalence classes of languages. For this we will use Arden's lemma \cite{Brzozowski64}
+  involving equivalence classes of languages. For this we will use Arden's lemma \cite{Brzozowski64},
   which solves equations of the form @{term "X = A ;; X \<union> B"} provided
-  @{term "[] \<notin> A"}. However we will need the following ``reverse'' 
+  @{term "[] \<notin> A"}. However we will need the following `reverse' 
   version of Arden's lemma.
 
   \begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\
   If @{thm (prem 1) arden} then
-  @{thm (lhs) arden} has the unique solution
+  @{thm (lhs) arden} if and only if
   @{thm (rhs) arden}.
   \end{lemma}
 
@@ -365,7 +365,7 @@
   The key definition in the Myhill-Nerode theorem is the
   \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two 
   strings are related, provided there is no distinguishing extension in this
-  language. This can be defined as:
+  language. This can be defined as tertiary relation:
 
   \begin{definition}[Myhill-Nerode Relation]\mbox{}\\
   @{thm str_eq_def[simplified str_eq_rel_def Pair_Collect]}
@@ -453,11 +453,11 @@
   equivalence classes and only finitely many characters.  The term @{text
   "\<lambda>(EMPTY)"} in the first equation acts as a marker for the equivalence class
   containing @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
-  single ``initial'' state in the equational system, which is different from
+  single `initial' state in the equational system, which is different from
   the method by Brzozowski \cite{Brzozowski64}, where he marks the
-  ``terminal'' states. We are forced to set up the equational system in our
-  way, because the Myhill-Nerode relation determines the ``direction'' of the
-  transitions. The successor ``state'' of an equivalence class @{text Y} can
+  `terminal' states. We are forced to set up the equational system in our
+  way, because the Myhill-Nerode relation determines the `direction' of the
+  transitions. The successor `state' of an equivalence class @{text Y} can
   be reached by adding characters to the end of @{text Y}. This is also the
   reason why we have to use our reverse version of Arden's lemma.}
   Overloading the function @{text \<calL>} for the two kinds of terms in the
@@ -569,7 +569,7 @@
   \begin{lemma}\label{ardenable}
   Given an equation @{text "X = rhs"}.
   If @{text "X = \<Union>\<calL> ` rhs"},
-  @{thm (prem 2) Arden_keeps_eq} and
+  @{thm (prem 2) Arden_keeps_eq}, and
   @{thm (prem 3) Arden_keeps_eq}, then
   @{text "X = \<Union>\<calL> ` (Arden X rhs)"}
   \end{lemma}
@@ -640,11 +640,11 @@
 
   \noindent
   We are not concerned here with the definition of this operator
-  (see \cite{BerghoferNipkow00}), but note that we eliminate
+  (see Berghofer and Nipkow \cite{BerghoferNipkow00}), but note that we eliminate
   in each @{const Iter}-step a single equation, and therefore 
   have a well-founded termination order by taking the cardinality 
   of the equational system @{text ES}. This enables us to prove
-  properties about our definition of @{const Solve} when we ``call'' it with
+  properties about our definition of @{const Solve} when we `call' it with
   the equivalence class @{text X} and the initial equational system 
   @{term "Init (UNIV // \<approx>A)"} from
   \eqref{initcs} using the principle:
@@ -690,7 +690,7 @@
  
   \noindent
   The first two ensure that the equational system is always finite (number of equations
-  and number of terms in each equation); the second makes sure the ``meaning'' of the 
+  and number of terms in each equation); the second makes sure the `meaning' of the 
   equations is preserved under our transformations. The other properties are a bit more
   technical, but are needed to get our proof through. Distinctness states that every
   equation in the system is distinct. Ardenable ensures that we can always
@@ -880,8 +880,8 @@
 
 text {*
   In this paper we took the view that a regular language is one where there
-  exists a regular expression that matches all its strings. Regular
-  expressions can be conveniently defined as a datatype in a HOL-based theorem
+  exists a regular expression that matches all of its strings. Regular
+  expressions can conveniently be defined as a datatype in a HOL-based theorem
   prover. For us it was therefore interesting to find out how far we can push
   this point of view. 
 
@@ -940,7 +940,7 @@
   state leading up to that state. Usually, however, the states are characterised as the
   ones starting from that state leading to the terminal states.  The first
   choice has consequences how the initial equational system is set up. We have
-  the $\lambda$-term on our ``initial state'', while Brzozowski has it on the
+  the $\lambda$-term on our `initial state', while Brzozowski has it on the
   terminal states. This means we also need to reverse the direction of Arden's
   lemma.
 
@@ -964,7 +964,7 @@
   limitations. One is that there seems to be no notion of a minimal regular
   expression, like there is for automata. For an implementation of a simple
   regular expression matcher, whose correctness has been formally
-  established, we refer the reader to \cite{OwensSlind08}.
+  established, we refer the reader to Owens and Slind \cite{OwensSlind08}.
 
 *}

--- a/Paper/document/root.tex	Fri Feb 18 14:26:23 2011 +0000
+++ b/Paper/document/root.tex	Fri Feb 18 15:06:06 2011 +0000
@@ -38,8 +38,8 @@
 \begin{abstract} 
 There are numerous textbooks on regular languages. Nearly all of them
 introduce the subject by describing finite automata and only mentioning on the
-side a connection with regular expressions. Unfortunately, automata are a
-hassle for formalisations in HOL-based theorem provers. The reason is that
+side a connection with regular expressions. Unfortunately, automata are difficult
+to formalise in HOL-based theorem provers. The reason is that
 they need to be represented as graphs, matrices or functions, none of which
 are inductive datatypes. Also convenient operations for disjoint unions of
 graphs and functions are not easily formalisiable in HOL. In contrast, regular

author	urbanc
	Fri, 18 Feb 2011 15:06:06 +0000
changeset 115	c5f138b5fc88
parent 114	c5eb5f3065ae
child 116	342983676c8f

Paper/Paper.thy		file \| annotate \| diff \| comparison \| revisions
Paper/document/root.tex		file \| annotate \| diff \| comparison \| revisions