# HG changeset patch # User urbanc # Date 1313737077 0 # Node ID 9fbf6d9f85ae7aee9f004d6c34ae7da98e99f7fd # Parent 204856ef5573b2fcb02a3aa9854b4ee699e57966 added comments by Xingyuan diff -r 204856ef5573 -r 9fbf6d9f85ae Journal/Paper.thy --- a/Journal/Paper.thy Wed Aug 17 17:36:19 2011 +0000 +++ b/Journal/Paper.thy Fri Aug 19 06:57:57 2011 +0000 @@ -150,17 +150,17 @@ A popular choice for a theorem prover would be one based on Higher-Order Logic (HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development - presented in this paper we will use the latter. HOL is a predicate calculus + presented in this paper we will use the Isabelle/HOL. HOL is a predicate calculus that allows quantification over predicate variables. Its type system is based on Church's Simple Theory of Types \cite{Church40}. Although many mathematical concepts can be conveniently expressed in HOL, there are some limitations that hurt badly, if one attempts a simple-minded formalisation of regular languages in it. - The typical approach to regular languages is to introduce finite - deterministic automata and then define everything in terms of them \cite{ - HopcroftUllman69,Kozen97}. For example, a regular language is normally - defined as: + The typical approach (for example \cite{HopcroftUllman69,Kozen97}) to + regular languages is to introduce finite deterministic automata and then + define everything in terms of them. For example, a regular language is + normally defined as: \begin{dfntn}\label{baddef} A language @{text A} is \emph{regular}, provided there is a @@ -315,8 +315,8 @@ larger formalisations of automata theory are carried out in Nuprl \cite{Constable00} and in Coq \cite{Filliatre97}. - Also one might consider automata theory and regular languages as a well-worn - stock subject where everything is crystal clear. However, paper proofs about + Also one might consider automata theory and regular languages as a meticulously + researched subject where everything is crystal clear. However, paper proofs about automata often involve subtle side-conditions which are easily overlooked, but which make formal reasoning rather painful. For example Kozen's proof of the Myhill-Nerode theorem requires that automata do not have inaccessible @@ -440,11 +440,11 @@ involving equivalence classes of languages. For this we will use Arden's Lemma (see for example \cite[Page 100]{Sakarovitch09}), which solves equations of the form @{term "X = A \ X \ B"} provided - @{term "[] \ A"}. However we will need the following `reverse' - version of Arden's Lemma (`reverse' in the sense of changing the order of @{term "A \ X"} to + @{term "[] \ A"}. However we will need the following `reversed' + version of Arden's Lemma (`reversed' in the sense of changing the order of @{term "A \ X"} to \mbox{@{term "X \ A"}}). - \begin{lmm}[Reverse Arden's Lemma]\label{arden}\mbox{}\\ + \begin{lmm}[Reversed Arden's Lemma]\label{arden}\mbox{}\\ If @{thm (prem 1) arden} then @{thm (lhs) arden} if and only if @{thm (rhs) arden}. @@ -456,7 +456,8 @@ we have @{term "A\ = A \ A\ \ {[]}"}, which is equal to @{term "A\ = A\ \ A \ {[]}"}. Adding @{text B} to both sides gives @{term "B \ A\ = B \ (A\ \ A \ {[]})"}, whose right-hand side - is equal to @{term "(B \ A\) \ A \ B"}. This completes this direction. + is equal to @{term "(B \ A\) \ A \ B"}. Applying the assumed equation + completes this direction. For the other direction we assume @{thm (lhs) arden}. By a simple induction on @{text n}, we can establish the property @@ -540,7 +541,6 @@ text {* \noindent - \footnote{Folklore: Henzinger (arden-DFA-regexp.pdf); Hofmann} The key definition in the Myhill-Nerode theorem is the \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two strings are related, provided there is no distinguishing extension in this @@ -662,7 +662,7 @@ way, because the Myhill-Nerode relation determines the `direction' of the transitions---the successor `state' of an equivalence class @{text Y} can be reached by adding a character to the end of @{text Y}. This is also the - reason why we have to use our reverse version of Arden's Lemma.} + reason why we have to use our reversed version of Arden's Lemma.} In our running example we have the initial equational system \begin{equation}\label{exmpcs} @@ -794,7 +794,7 @@ \noindent - That means we eliminated the dependency of @{text "X\<^isub>3"} on the + That means we eliminated the recursive occurrence of @{text "X\<^isub>3"} on the right-hand side. Note we used the abbreviation @{text "\<^raw:\ensuremath{\bigplus}>{ATOM c\<^isub>1,\, ATOM c\<^isub>m}"} to stand for a regular expression that matches with every character. In @@ -1224,7 +1224,7 @@ \begin{proof} We set in \eqref{finiteimageD}, @{text f} to be @{text "X \ tag ` X"}. We have @{text "range f"} to be a subset of @{term "Pow (range tag)"}, which we know must be - finite by assumption. Now @{term "f (UNIV // =tag=)"} is a subset of @{text "range f"}, + finite by assumption. Now @{term "f ` (UNIV // =tag=)"} is a subset of @{text "range f"}, and so also finite. Injectivity amounts to showing that @{text "X = Y"} under the assumptions that @{text "X, Y \ "}~@{term "UNIV // =tag="} and @{text "f X = f Y"}. From the assumptions we obtain \mbox{@{text "x \ X"}} and @{text "y \ Y"} with @@ -1899,7 +1899,7 @@ Let us now return to our proof for the second direction in the Myhill-Nerode theorem. The point of the above calculations is to use - @{text "\<^raw:$\threesim$>\<^bsub>(\x. ders x r)\<^esub>"} as tagging-relation. + @{text "\<^raw:$\threesim$>\<^bsub>(\x. pders x r)\<^esub>"} as tagging-relation. \begin{proof}[Proof of Theorem~\ref{myhillnerodetwo} (second version)] @@ -1911,7 +1911,7 @@ \end{center} \noindent - which means the tagging-relation @{text "\<^raw:$\threesim$>\<^bsub>(\x. ders x r)\<^esub>"} refines @{term "\(lang r)"}. + which means the tagging-relation @{text "\<^raw:$\threesim$>\<^bsub>(\x. pders x r)\<^esub>"} refines @{term "\(lang r)"}. So we know by Lemma~\ref{fintwo}, \mbox{@{term "finite (UNIV // (\(lang r)))"}} holds if @{term "finite (UNIV // (=(\x. pders x r)=))"}. In order to establish the latter, we can use Lemma~\ref{finone} and show that the range of the @@ -1925,7 +1925,7 @@ \noindent Now the range of @{term "\x. pders x r"} is a subset of @{term "Pow (pders_lang UNIV r)"}, which we know is finite by Theorem~\ref{antimirov}. Consequently there - are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\x. ders x r)\<^esub>"}, + are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\x. pders x r)\<^esub>"}, which refines @{term "\(lang r)"}, and therefore we can again conclude the second part of the Myhill-Nerode theorem. \end{proof} @@ -2107,8 +2107,8 @@ algebraic method} used to convert a finite automaton to a regular expression \cite{Brzozowski64}. The close connection can be seen by considering the equivalence classes as the states of the minimal automaton for the regular - language. However there are some subtle differences. Since we identify - equivalence classes with the states of the automaton, then the most natural + language. However there are some subtle differences. Because our equivalence + classes (or correspondingly states) arise from the Myhil-Nerode Relation, the most natural choice is to characterise each state with the set of strings starting from the initial state leading up to that state. Usually, however, the states are characterised as the strings starting from that state leading to the @@ -2116,7 +2116,7 @@ equational system is set up. We have the $\lambda$-term on our `initial state', while Brzozowski has it on the terminal states. This means we also need to reverse the direction of Arden's Lemma. We have not found anything - in the literature about this way of proving the first direction of the + in the literature about our way of proving the first direction of the Myhill-Nerode theorem, but it appears to be folklore. We presented two proofs for the second direction of the Myhill-Nerode @@ -2151,7 +2151,7 @@ Theorem~\ref{antimirov}). However, since partial derivatives use sets of regular expressions, one needs to carefully analyse whether the resulting algorithm is still executable. Given the existing infrastructure for - executable sets in Isabelle/HOL, it should. + executable sets in Isabelle/HOL \cite{Haftmann09}, it should. Our formalisation of the Myhill-Nerode theorem consists of 780 lines of Isabelle/Isar code for the first direction and 460 for the second (the one diff -r 204856ef5573 -r 9fbf6d9f85ae Journal/document/root.bib --- a/Journal/document/root.bib Wed Aug 17 17:36:19 2011 +0000 +++ b/Journal/document/root.bib Fri Aug 19 06:57:57 2011 +0000 @@ -1,3 +1,13 @@ + + + +@PhdThesis{Haftmann09, + author = {F.~Haftmann}, + title = {{C}ode {G}eneration from {S}pecifications in {H}igher-{O}rder {L}ogic}, + school = {Technical University of Munich}, + year = {2009} +} + @article{Harper99, author = {R.~Harper}, title = {{P}roof-{D}irected {D}ebugging},