diff -r 11c3c302fa2e -r 204856ef5573 Journal/Paper.thy --- a/Journal/Paper.thy Wed Aug 17 07:43:09 2011 +0000 +++ b/Journal/Paper.thy Wed Aug 17 17:36:19 2011 +0000 @@ -157,10 +157,10 @@ limitations that hurt badly, if one attempts a simple-minded formalisation of regular languages in it. - The typical approach to regular languages is to - introduce finite automata and then define everything in terms of them - \cite{ HopcroftUllman69,Kozen97}. For example, a regular language is - normally defined as: + The typical approach to regular languages is to introduce finite + deterministic automata and then define everything in terms of them \cite{ + HopcroftUllman69,Kozen97}. For example, a regular language is normally + defined as: \begin{dfntn}\label{baddef} A language @{text A} is \emph{regular}, provided there is a @@ -252,13 +252,14 @@ \noindent changes the type---the disjoint union is not a set, but a set of pairs. Using this definition for disjoint union means we do not have a - single type for the states of automata. As a result we will not be able to define a - regular language as one for which there exists an automaton that recognises - all its strings (Definition~\ref{baddef}). This is because we cannot make a - definition in HOL that is only polymorphic in the state type and there is no type - quantification available in HOL (unlike in Coq, for example).\footnote{Slind - already pointed out this problem in an email to the HOL4 mailing list on - 21st April 2005.} + single type for the states of automata. As a result we will not be able to + define in our fomalisation a regular language as one for which there exists + an automaton that recognises all its strings (Definition~\ref{baddef}). This + is because we cannot make a definition in HOL that is only polymorphic in + the state type, but not in the predicate for regularity; and there is no + type quantification available in HOL (unlike in Coq, for + example).\footnote{Slind already pointed out this problem in an email to the + HOL4 mailing list on 21st April 2005.} An alternative, which provides us with a single type for states of automata, is to give every state node an identity, for example a natural number, and @@ -268,12 +269,15 @@ matrices results in very adhoc constructions, which are not pleasant to reason about. - Functions are much better supported in Isabelle/HOL, but they still lead to similar - problems as with graphs. Composing, for example, two non-deterministic automata in parallel - requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98} - dismisses for this the option of using identities, because it leads according to - him to ``messy proofs''. Since he does not need to define what regular - languages are, Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but writes + Functions are much better supported in Isabelle/HOL, but they still lead to + similar problems as with graphs. Composing, for example, two + non-deterministic automata in parallel requires also the formalisation of + disjoint unions. Nipkow \cite{Nipkow98} dismisses for this the option of + using identities, because it leads according to him to ``messy + proofs''. Since he does not need to define what regular languages are, + Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but + writes\smallskip + \begin{quote} \it% @@ -281,10 +285,10 @@ `` & All lemmas appear obvious given a picture of the composition of automata\ldots Yet their proofs require a painful amount of detail.'' \end{tabular} - \end{quote} + \end{quote}\smallskip \noindent - and + and\smallskip \begin{quote} \it% @@ -292,7 +296,7 @@ `` & If the reader finds the above treatment in terms of bit lists revoltingly concrete, I cannot disagree. A more abstract approach is clearly desirable.'' \end{tabular} - \end{quote} + \end{quote}\smallskip \noindent @@ -1315,10 +1319,10 @@ \begin{center} \begin{tabular}{c} \scalebox{1}{ - \begin{tikzpicture} - \node[draw,minimum height=3.8ex] (x) { $\hspace{4.8em}@{text x}\hspace{4.8em}$ }; - \node[draw,minimum height=3.8ex, right=-0.03em of x] (za) { $\hspace{0.6em}@{text "z\<^isub>p"}\hspace{0.6em}$ }; - \node[draw,minimum height=3.8ex, right=-0.03em of za] (zza) { $\hspace{2.6em}@{text "z\<^isub>s"}\hspace{2.6em}$ }; + \begin{tikzpicture}[fill=gray!20] + \node[draw,minimum height=3.8ex, fill] (x) { $\hspace{4.8em}@{text x}\hspace{4.8em}$ }; + \node[draw,minimum height=3.8ex, right=-0.03em of x, fill] (za) { $\hspace{0.6em}@{text "z\<^isub>p"}\hspace{0.6em}$ }; + \node[draw,minimum height=3.8ex, right=-0.03em of za, fill] (zza) { $\hspace{2.6em}@{text "z\<^isub>s"}\hspace{2.6em}$ }; \draw[decoration={brace,transform={yscale=3}},decorate] (x.north west) -- ($(za.north west)+(0em,0em)$) @@ -1342,10 +1346,10 @@ \end{tikzpicture}} \\[2mm] \scalebox{1}{ - \begin{tikzpicture} - \node[draw,minimum height=3.8ex] (xa) { $\hspace{3em}@{text "x\<^isub>p"}\hspace{3em}$ }; - \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.2em}@{text "x\<^isub>s"}\hspace{0.2em}$ }; - \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (z) { $\hspace{5em}@{text z}\hspace{5em}$ }; + \begin{tikzpicture}[fill=gray!20] + \node[draw,minimum height=3.8ex, fill] (xa) { $\hspace{3em}@{text "x\<^isub>p"}\hspace{3em}$ }; + \node[draw,minimum height=3.8ex, right=-0.03em of xa, fill] (xxa) { $\hspace{0.2em}@{text "x\<^isub>s"}\hspace{0.2em}$ }; + \node[draw,minimum height=3.8ex, right=-0.03em of xxa, fill] (z) { $\hspace{5em}@{text z}\hspace{5em}$ }; \draw[decoration={brace,transform={yscale=3}},decorate] (xa.north west) -- ($(xxa.north east)+(0em,0em)$) @@ -1458,11 +1462,11 @@ \begin{center} \scalebox{1}{ - \begin{tikzpicture} - \node[draw,minimum height=3.8ex] (xa) { $\hspace{4em}@{text "x\<^bsub>pmax\<^esub>"}\hspace{4em}$ }; - \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.5em}@{text "x\<^bsub>s\<^esub>"}\hspace{0.5em}$ }; - \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (za) { $\hspace{2em}@{text "z\<^isub>a"}\hspace{2em}$ }; - \node[draw,minimum height=3.8ex, right=-0.03em of za] (zb) { $\hspace{7em}@{text "z\<^isub>b"}\hspace{7em}$ }; + \begin{tikzpicture}[fill=gray!20] + \node[draw,minimum height=3.8ex, fill] (xa) { $\hspace{4em}@{text "x\<^bsub>pmax\<^esub>"}\hspace{4em}$ }; + \node[draw,minimum height=3.8ex, right=-0.03em of xa, fill] (xxa) { $\hspace{0.5em}@{text "x\<^bsub>s\<^esub>"}\hspace{0.5em}$ }; + \node[draw,minimum height=3.8ex, right=-0.03em of xxa, fill] (za) { $\hspace{2em}@{text "z\<^isub>a"}\hspace{2em}$ }; + \node[draw,minimum height=3.8ex, right=-0.03em of za, fill] (zb) { $\hspace{7em}@{text "z\<^isub>b"}\hspace{7em}$ }; \draw[decoration={brace,transform={yscale=3}},decorate] (xa.north west) -- ($(xxa.north east)+(0em,0em)$) @@ -1654,13 +1658,11 @@ follows. \begin{center} - \begin{tabular}{c@ {\hspace{10mm}}c} + \begin{tabular}{c} \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}} @{thm (lhs) nullable.simps(1)} & @{text "\"} & @{thm (rhs) nullable.simps(1)}\\ @{thm (lhs) nullable.simps(2)} & @{text "\"} & @{thm (rhs) nullable.simps(2)}\\ @{thm (lhs) nullable.simps(3)} & @{text "\"} & @{thm (rhs) nullable.simps(3)}\\ - \end{tabular} & - \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}} @{thm (lhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} & @{text "\"} & @{thm (rhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\ @{thm (lhs) nullable.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} @@ -1685,8 +1687,15 @@ \noindent The importance of this fact in the context of the Myhill-Nerode theorem is that we can use \eqref{mhders} and \eqref{Dersders} in order to - establish that @{term "x \(lang r) y"} is equivalent to - \mbox{@{term "lang (ders x r) = lang (ders y r)"}}. Hence + establish that + + \begin{center} + @{term "x \(lang r) y"} \hspace{4mm}if and only if\hspace{4mm} + @{term "lang (ders x r) = lang (ders y r)"}. + \end{center} + + \noindent + holds and hence \begin{equation} @{term "x \(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "ders x r = ders y r"} @@ -1694,7 +1703,7 @@ \noindent - which means the right-hand side (seen as a relation) refines the Myhill-Nerode + This means the right-hand side (seen as a relation) refines the Myhill-Nerode relation. Consequently, we can use @{text "\<^raw:$\threesim$>\<^bsub>(\x. ders x r)\<^esub>"} as a tagging-relation. However, in order to be useful for the second part of the @@ -1840,7 +1849,7 @@ of derivatives already built in. Antimirov also proved that for every language and regular expression - there are only finitely many partial derivatives, whereby the partial + there are only finitely many partial derivatives, whereby the set of partial derivatives of @{text r} w.r.t.~a language @{text A} is defined as \begin{equation}\label{Pdersdef} @@ -1853,7 +1862,7 @@ \end{thrm} \noindent - Antimirov's proof first shows this theorem for the language @{term UNIV1}, + Antimirov's proof first establishes this theorem for the language @{term UNIV1}, which is the set of all non-empty strings. For this he proves: \begin{equation}\label{pdersunivone} @@ -1885,10 +1894,10 @@ and for all languages @{text "A"}, @{term "pders_lang A r"} is a subset of @{term "pders_lang UNIV r"}. Since we follow Antimirov's proof quite closely in our formalisation (only the last two cases of - \eqref{pdersunivone} involve some non-routine induction argument), we omit + \eqref{pdersunivone} involve some non-routine induction arguments), we omit the details. - Let us now return to our proof about the second direction in the Myhill-Nerode + Let us now return to our proof for the second direction in the Myhill-Nerode theorem. The point of the above calculations is to use @{text "\<^raw:$\threesim$>\<^bsub>(\x. ders x r)\<^esub>"} as tagging-relation. @@ -1915,14 +1924,14 @@ \noindent Now the range of @{term "\x. pders x r"} is a subset of @{term "Pow (pders_lang UNIV r)"}, - which we know is finite by Theorem~\ref{antimirov}. This means there + which we know is finite by Theorem~\ref{antimirov}. Consequently there are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\x. ders x r)\<^esub>"}, - which refines @{term "\(lang r)"}, and consequently we can again conclude the + which refines @{term "\(lang r)"}, and therefore we can again conclude the second part of the Myhill-Nerode theorem. \end{proof} *} -section {* Closure Properties of Regular Languages *} +section {* Closure Properties of Regular Languages\label{closure} *} text {* \noindent @@ -1960,7 +1969,8 @@ \end{center} \noindent - Closure of regular languages under reversal, that is + Since all finite languages are regular, then by closure under complement also + all co-finite languages. Closure of regular languages under reversal, that is \begin{center} @{text "A\<^bsup>-1\<^esup> \ {s\<^bsup>-1\<^esup> | s \ A}"} @@ -2051,10 +2061,10 @@ proved using the Myhill-Nerode theorem. Our insistence on regular expressions for proving the Myhill-Nerode theorem - arose from the limitations of HOL, on which the popular theorem provers HOL4, - HOLlight and Isabelle/HOL are based. In order to guarantee consistency, - formalisations can extend HOL with definitions that introduce a new concept in - only terms of already existing notions. A convenient definition for automata + arose from the limitations of HOL, used in the popular theorem provers HOL4, + HOLlight and Isabelle/HOL. In order to guarantee consistency, + formalisations in HOL can only extend the logic with definitions that introduce a new concept in + terms of already existing notions. A convenient definition for automata (based on graphs) uses a polymorphic type for the state nodes. This allows us to use the standard operation for disjoint union whenever we need to compose two automata. Unfortunately, we cannot use such a polymorphic definition @@ -2062,14 +2072,14 @@ over set of strings). Consider the following attempt: \begin{center} - @{text "is_regular A \ \M(\). is_finite_automata (M) \ \(M) = A"} + @{text "is_regular A \ \M(\). is_dfa (M) \ \(M) = A"} \end{center} \noindent In this definifion, the definiens is polymorphic in the type of the automata - @{text "M"} (indicated by the @{text "\"}), but the definiendum @{text - "is_regular"} is not. Such definitions are excluded in HOL, because they can - lead easily to inconsistencies (see \cite{PittsHOL4} for a simple + @{text "M"} (indicated by dependency on @{text "\"}), but the definiendum + @{text "is_regular"} is not. Such definitions are excluded in HOL, because + they can lead easily to inconsistencies (see \cite{PittsHOL4} for a simple example). Also HOL does not contain type-quantifiers which would allow us to get rid of the polymorphism by quantifying over the type-variable @{text "\"}. Therefore when defining regularity in terms of automata, the only @@ -2079,8 +2089,10 @@ such states. This makes formalisations quite fiddly and rather unpleasant. Regular expressions proved much more convenient for reasoning in HOL since they can be defined as inductive datatype and a reasoning - infrastructure comes for free. We showed they can be used for establishing - the central result in regular language theory---the Myhill-Nerode theorem. + infrastructure comes for free. The definition of regularity in terms of + regular expressions poses no problem at all for HOL. We showed in this + paper that they can be used for establishing the central result in regular + language theory---the Myhill-Nerode theorem. While regular expressions are convenient, they have some limitations. One is that there seems to be no method of calculating a minimal regular expression @@ -2141,20 +2153,21 @@ algorithm is still executable. Given the existing infrastructure for executable sets in Isabelle/HOL, it should. - Our formalisation of the Myhill-Nerode theorem consists of 780 lines of Isabelle/Isar code for the first direction and 460 for the second (the one - based on tagging functions), plus around 300 lines of standard material + based on tagging-functions), plus around 300 lines of standard material about regular languages. The formalisation of derivatives and partial derivatives shown in Section~\ref{derivatives} consists of 390 lines of - code. The algorithm for solving equational systems, which we used in the - first direction, is conceptually relatively simple. Still the use of sets - over which the algorithm operates means it is not as easy to formalise as - one might hope. However, it seems sets cannot be avoided since the `input' - of the algorithm consists of equivalence classes and we cannot see how to - reformulate the theory so that we can use lists. Lists would be much easier - to reason about, since we can define functions over them by recursion. For - sets we have to use set-comprehensions, which is slightly unwieldy. + code. The closure properties in Section~\ref{closure} can be established in + 190 lines of code. The algorithm for solving equational systems, which we + used in the first direction, is conceptually relatively simple. Still the + use of sets over which the algorithm operates means it is not as easy to + formalise as one might hope. However, it seems sets cannot be avoided since + the `input' of the algorithm consists of equivalence classes and we cannot + see how to reformulate the theory so that we can use lists. Lists would be + much easier to reason about, since we can define functions over them by + recursion. For sets we have to use set-comprehensions, which is slightly + unwieldy. While our formalisation might appear large, it should be seen in the context of the work done by Constable at al \cite{Constable00} who