Journal/Paper.thy
changeset 162 e93760534354
parent 160 ea2e5acbfe4a
child 167 61d0a412a3ae
equal deleted inserted replaced
161:a8a442ba0dbf 162:e93760534354
       
     1 (*<*)
       
     2 theory Paper
       
     3 imports "../Derivs" "~~/src/HOL/Library/LaTeXsugar" 
       
     4 begin
       
     5 
       
     6 declare [[show_question_marks = false]]
       
     7 
       
     8 consts
       
     9  REL :: "(string \<times> string) \<Rightarrow> bool"
       
    10  UPLUS :: "'a set \<Rightarrow> 'a set \<Rightarrow> (nat \<times> 'a) set"
       
    11 
       
    12 abbreviation
       
    13   "EClass x R \<equiv> R `` {x}"
       
    14 
       
    15 abbreviation 
       
    16   "Append_rexp2 r_itm r \<equiv> Append_rexp r r_itm"
       
    17 
       
    18 
       
    19 notation (latex output)
       
    20   str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and
       
    21   str_eq ("_ \<approx>\<^bsub>_\<^esub> _") and
       
    22   Seq (infixr "\<cdot>" 100) and
       
    23   Star ("_\<^bsup>\<star>\<^esup>") and
       
    24   pow ("_\<^bsup>_\<^esup>" [100, 100] 100) and
       
    25   Suc ("_+1" [100] 100) and
       
    26   quotient ("_ \<^raw:\ensuremath{\!\sslash\!}> _" [90, 90] 90) and
       
    27   REL ("\<approx>") and
       
    28   UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and
       
    29   L ("\<^raw:\ensuremath{\cal{L}}>'(_')" [0] 101) and
       
    30   Lam ("\<lambda>'(_')" [100] 100) and 
       
    31   Trn ("'(_, _')" [100, 100] 100) and 
       
    32   EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
       
    33   transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) and
       
    34   Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999) and
       
    35   Append_rexp2 ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 100) and
       
    36   Append_rexp_rhs ("_ \<^raw:\ensuremath{\triangleleft}> _" [100, 100] 50) and
       
    37   uminus ("\<^raw:\ensuremath{\overline{>_\<^raw:}}>" [100] 100) and
       
    38   tag_str_ALT ("tag\<^isub>A\<^isub>L\<^isub>T _ _" [100, 100] 100) and
       
    39   tag_str_ALT ("tag\<^isub>A\<^isub>L\<^isub>T _ _ _" [100, 100, 100] 100) and
       
    40   tag_str_SEQ ("tag\<^isub>S\<^isub>E\<^isub>Q _ _" [100, 100] 100) and
       
    41   tag_str_SEQ ("tag\<^isub>S\<^isub>E\<^isub>Q _ _ _" [100, 100, 100] 100) and
       
    42   tag_str_STAR ("tag\<^isub>S\<^isub>T\<^isub>A\<^isub>R _" [100] 100) and
       
    43   tag_str_STAR ("tag\<^isub>S\<^isub>T\<^isub>A\<^isub>R _ _" [100, 100] 100)
       
    44 lemma meta_eq_app:
       
    45   shows "f \<equiv> \<lambda>x. g x \<Longrightarrow> f x \<equiv> g x"
       
    46   by auto
       
    47 
       
    48 (*>*)
       
    49 
       
    50 
       
    51 section {* Introduction *}
       
    52 
       
    53 text {*
       
    54   Regular languages are an important and well-understood subject in Computer
       
    55   Science, with many beautiful theorems and many useful algorithms. There is a
       
    56   wide range of textbooks on this subject, many of which are aimed at students
       
    57   and contain very detailed `pencil-and-paper' proofs
       
    58   (e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by
       
    59   formalising the theorems and by verifying formally the algorithms.
       
    60 
       
    61   There is however a problem: the typical approach to regular languages is to
       
    62   introduce finite automata and then define everything in terms of them.  For
       
    63   example, a regular language is normally defined as one whose strings are
       
    64   recognised by a finite deterministic automaton. This approach has many
       
    65   benefits. Among them is the fact that it is easy to convince oneself that
       
    66   regular languages are closed under complementation: one just has to exchange
       
    67   the accepting and non-accepting states in the corresponding automaton to
       
    68   obtain an automaton for the complement language.  The problem, however, lies with
       
    69   formalising such reasoning in a HOL-based theorem prover, in our case
       
    70   Isabelle/HOL. Automata are built up from states and transitions that 
       
    71   need to be represented as graphs, matrices or functions, none
       
    72   of which can be defined as an inductive datatype. 
       
    73 
       
    74   In case of graphs and matrices, this means we have to build our own
       
    75   reasoning infrastructure for them, as neither Isabelle/HOL nor HOL4 nor
       
    76   HOLlight support them with libraries. Even worse, reasoning about graphs and
       
    77   matrices can be a real hassle in HOL-based theorem provers.  Consider for
       
    78   example the operation of sequencing two automata, say $A_1$ and $A_2$, by
       
    79   connecting the accepting states of $A_1$ to the initial state of $A_2$:\\[-5.5mm]  
       
    80   %  
       
    81   \begin{center}
       
    82   \begin{tabular}{ccc}
       
    83   \begin{tikzpicture}[scale=0.8]
       
    84   %\draw[step=2mm] (-1,-1) grid (1,1);
       
    85   
       
    86   \draw[rounded corners=1mm, very thick] (-1.0,-0.3) rectangle (-0.2,0.3);
       
    87   \draw[rounded corners=1mm, very thick] ( 0.2,-0.3) rectangle ( 1.0,0.3);
       
    88 
       
    89   \node (A) at (-1.0,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
    90   \node (B) at ( 0.2,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
    91   
       
    92   \node (C) at (-0.2, 0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
    93   \node (D) at (-0.2,-0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
    94 
       
    95   \node (E) at (1.0, 0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
    96   \node (F) at (1.0,-0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
    97   \node (G) at (1.0,-0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
    98 
       
    99   \draw (-0.6,0.0) node {\footnotesize$A_1$};
       
   100   \draw ( 0.6,0.0) node {\footnotesize$A_2$};
       
   101   \end{tikzpicture}
       
   102 
       
   103   & 
       
   104 
       
   105   \raisebox{1.1mm}{\bf\Large$\;\;\;\Rightarrow\,\;\;$}
       
   106 
       
   107   &
       
   108 
       
   109   \begin{tikzpicture}[scale=0.8]
       
   110   %\draw[step=2mm] (-1,-1) grid (1,1);
       
   111   
       
   112   \draw[rounded corners=1mm, very thick] (-1.0,-0.3) rectangle (-0.2,0.3);
       
   113   \draw[rounded corners=1mm, very thick] ( 0.2,-0.3) rectangle ( 1.0,0.3);
       
   114 
       
   115   \node (A) at (-1.0,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
   116   \node (B) at ( 0.2,0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
   117   
       
   118   \node (C) at (-0.2, 0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
   119   \node (D) at (-0.2,-0.13) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
   120 
       
   121   \node (E) at (1.0, 0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
   122   \node (F) at (1.0,-0.0) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
   123   \node (G) at (1.0,-0.2) [circle, very thick, draw, fill=white, inner sep=0.4mm] {};
       
   124   
       
   125   \draw (C) to [very thick, bend left=45] (B);
       
   126   \draw (D) to [very thick, bend right=45] (B);
       
   127 
       
   128   \draw (-0.6,0.0) node {\footnotesize$A_1$};
       
   129   \draw ( 0.6,0.0) node {\footnotesize$A_2$};
       
   130   \end{tikzpicture}
       
   131 
       
   132   \end{tabular}
       
   133   \end{center}
       
   134 
       
   135   \noindent
       
   136   On `paper' we can define the corresponding graph in terms of the disjoint 
       
   137   union of the state nodes. Unfortunately in HOL, the standard definition for disjoint 
       
   138   union, namely 
       
   139   %
       
   140   \begin{equation}\label{disjointunion}
       
   141   @{term "UPLUS A\<^isub>1 A\<^isub>2 \<equiv> {(1, x) | x. x \<in> A\<^isub>1} \<union> {(2, y) | y. y \<in> A\<^isub>2}"}
       
   142   \end{equation}
       
   143 
       
   144   \noindent
       
   145   changes the type---the disjoint union is not a set, but a set of pairs. 
       
   146   Using this definition for disjoint union means we do not have a single type for automata
       
   147   and hence will not be able to state certain properties about \emph{all}
       
   148   automata, since there is no type quantification available in HOL (unlike in Coq, for example). An
       
   149   alternative, which provides us with a single type for automata, is to give every 
       
   150   state node an identity, for example a natural
       
   151   number, and then be careful to rename these identities apart whenever
       
   152   connecting two automata. This results in clunky proofs
       
   153   establishing that properties are invariant under renaming. Similarly,
       
   154   connecting two automata represented as matrices results in very adhoc
       
   155   constructions, which are not pleasant to reason about.
       
   156 
       
   157   Functions are much better supported in Isabelle/HOL, but they still lead to similar
       
   158   problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
       
   159   requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98} 
       
   160   dismisses for this the option of using identities, because it leads according to 
       
   161   him to ``messy proofs''. He
       
   162   opts for a variant of \eqref{disjointunion} using bit lists, but writes 
       
   163 
       
   164   \begin{quote}
       
   165   \it%
       
   166   \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
       
   167   `` & All lemmas appear obvious given a picture of the composition of automata\ldots
       
   168        Yet their proofs require a painful amount of detail.''
       
   169   \end{tabular}
       
   170   \end{quote}
       
   171 
       
   172   \noindent
       
   173   and
       
   174   
       
   175   \begin{quote}
       
   176   \it%
       
   177   \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
       
   178   `` & If the reader finds the above treatment in terms of bit lists revoltingly
       
   179        concrete, I cannot disagree. A more abstract approach is clearly desirable.''
       
   180   \end{tabular}
       
   181   \end{quote}
       
   182 
       
   183 
       
   184   \noindent
       
   185   Moreover, it is not so clear how to conveniently impose a finiteness condition 
       
   186   upon functions in order to represent \emph{finite} automata. The best is
       
   187   probably to resort to more advanced reasoning frameworks, such as \emph{locales}
       
   188   or \emph{type classes},
       
   189   which are \emph{not} available in all HOL-based theorem provers.
       
   190 
       
   191   {\bf add commnets from Brozowski}
       
   192 
       
   193   Because of these problems to do with representing automata, there seems
       
   194   to be no substantial formalisation of automata theory and regular languages 
       
   195   carried out in HOL-based theorem provers. Nipkow  \cite{Nipkow98} establishes
       
   196   the link between regular expressions and automata in
       
   197   the context of lexing. Berghofer and Reiter \cite{BerghoferReiter09} 
       
   198   formalise automata working over 
       
   199   bit strings in the context of Presburger arithmetic.
       
   200   The only larger formalisations of automata theory 
       
   201   are carried out in Nuprl \cite{Constable00} and in Coq \cite{Filliatre97}.
       
   202   
       
   203   In this paper, we will not attempt to formalise automata theory in
       
   204   Isabelle/HOL, but take a different approach to regular
       
   205   languages. Instead of defining a regular language as one where there exists
       
   206   an automaton that recognises all strings of the language, we define a
       
   207   regular language as:
       
   208 
       
   209   \begin{definition}
       
   210   A language @{text A} is \emph{regular}, provided there is a regular expression that matches all
       
   211   strings of @{text "A"}.
       
   212   \end{definition}
       
   213   
       
   214   \noindent
       
   215   The reason is that regular expressions, unlike graphs, matrices and functions, can
       
   216   be easily defined as inductive datatype. Consequently a corresponding reasoning 
       
   217   infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation
       
   218   of regular expression matching based on derivatives \cite{OwensSlind08} and
       
   219   with an equivalence checker for regular expressions in Isabelle/HOL \cite{KraussNipkow11}.  
       
   220   The purpose of this paper is to
       
   221   show that a central result about regular languages---the Myhill-Nerode theorem---can 
       
   222   be recreated by only using regular expressions. This theorem gives necessary
       
   223   and sufficient conditions for when a language is regular. As a corollary of this
       
   224   theorem we can easily establish the usual closure properties, including 
       
   225   complementation, for regular languages.\smallskip
       
   226   
       
   227   \noindent
       
   228   {\bf Contributions:} 
       
   229   There is an extensive literature on regular languages.
       
   230   To our best knowledge, our proof of the Myhill-Nerode theorem is the
       
   231   first that is based on regular expressions, only. We prove the part of this theorem 
       
   232   stating that a regular expression has only finitely many partitions using certain 
       
   233   tagging-functions. Again to our best knowledge, these tagging-functions have
       
   234   not been used before to establish the Myhill-Nerode theorem.
       
   235 *}
       
   236 
       
   237 section {* Preliminaries *}
       
   238 
       
   239 text {*
       
   240   Strings in Isabelle/HOL are lists of characters with the \emph{empty string}
       
   241   being represented by the empty list, written @{term "[]"}.  \emph{Languages}
       
   242   are sets of strings. The language containing all strings is written in
       
   243   Isabelle/HOL as @{term "UNIV::string set"}. The concatenation of two languages 
       
   244   is written @{term "A ;; B"} and a language raised to the power @{text n} is written 
       
   245   @{term "A \<up> n"}. They are defined as usual
       
   246 
       
   247   \begin{center}
       
   248   @{thm Seq_def[THEN eq_reflection, where A1="A" and B1="B"]}
       
   249   \hspace{7mm}
       
   250   @{thm pow.simps(1)[THEN eq_reflection, where A1="A"]}
       
   251   \hspace{7mm}
       
   252   @{thm pow.simps(2)[THEN eq_reflection, where A1="A" and n1="n"]}
       
   253   \end{center}
       
   254 
       
   255   \noindent
       
   256   where @{text "@"} is the list-append operation. The Kleene-star of a language @{text A}
       
   257   is defined as the union over all powers, namely @{thm Star_def}. In the paper
       
   258   we will make use of the following properties of these constructions.
       
   259   
       
   260   \begin{proposition}\label{langprops}\mbox{}\\
       
   261   \begin{tabular}{@ {}ll}
       
   262   (i)   & @{thm star_cases}     \\ 
       
   263   (ii)  & @{thm[mode=IfThen] pow_length}\\
       
   264   (iii) & @{thm seq_Union_left} \\ 
       
   265   \end{tabular}
       
   266   \end{proposition}
       
   267 
       
   268   \noindent
       
   269   In @{text "(ii)"} we use the notation @{term "length s"} for the length of a
       
   270   string; this property states that if \mbox{@{term "[] \<notin> A"}} then the lengths of
       
   271   the strings in @{term "A \<up> (Suc n)"} must be longer than @{text n}.  We omit
       
   272   the proofs for these properties, but invite the reader to consult our
       
   273   formalisation.\footnote{Available at \url{http://www4.in.tum.de/~urbanc/regexp.html}}
       
   274 
       
   275   The notation in Isabelle/HOL for the quotient of a language @{text A} according to an 
       
   276   equivalence relation @{term REL} is @{term "A // REL"}. We will write 
       
   277   @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined 
       
   278   as \mbox{@{text "{y | y \<approx> x}"}}.
       
   279 
       
   280 
       
   281   Central to our proof will be the solution of equational systems
       
   282   involving equivalence classes of languages. For this we will use Arden's Lemma \cite{Brzozowski64},
       
   283   which solves equations of the form @{term "X = A ;; X \<union> B"} provided
       
   284   @{term "[] \<notin> A"}. However we will need the following `reverse' 
       
   285   version of Arden's Lemma (`reverse' in the sense of changing the order of @{term "A ;; X"} to
       
   286   \mbox{@{term "X ;; A"}}).
       
   287 
       
   288   \begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\
       
   289   If @{thm (prem 1) arden} then
       
   290   @{thm (lhs) arden} if and only if
       
   291   @{thm (rhs) arden}.
       
   292   \end{lemma}
       
   293 
       
   294   \begin{proof}
       
   295   For the right-to-left direction we assume @{thm (rhs) arden} and show
       
   296   that @{thm (lhs) arden} holds. From Prop.~\ref{langprops}@{text "(i)"} 
       
   297   we have @{term "A\<star> = {[]} \<union> A ;; A\<star>"},
       
   298   which is equal to @{term "A\<star> = {[]} \<union> A\<star> ;; A"}. Adding @{text B} to both 
       
   299   sides gives @{term "B ;; A\<star> = B ;; ({[]} \<union> A\<star> ;; A)"}, whose right-hand side
       
   300   is equal to @{term "(B ;; A\<star>) ;; A \<union> B"}. This completes this direction. 
       
   301 
       
   302   For the other direction we assume @{thm (lhs) arden}. By a simple induction
       
   303   on @{text n}, we can establish the property
       
   304 
       
   305   \begin{center}
       
   306   @{text "(*)"}\hspace{5mm} @{thm (concl) arden_helper}
       
   307   \end{center}
       
   308   
       
   309   \noindent
       
   310   Using this property we can show that @{term "B ;; (A \<up> n) \<subseteq> X"} holds for
       
   311   all @{text n}. From this we can infer @{term "B ;; A\<star> \<subseteq> X"} using the definition
       
   312   of @{text "\<star>"}.
       
   313   For the inclusion in the other direction we assume a string @{text s}
       
   314   with length @{text k} is an element in @{text X}. Since @{thm (prem 1) arden}
       
   315   we know by Prop.~\ref{langprops}@{text "(ii)"} that 
       
   316   @{term "s \<notin> X ;; (A \<up> Suc k)"} since its length is only @{text k}
       
   317   (the strings in @{term "X ;; (A \<up> Suc k)"} are all longer). 
       
   318   From @{text "(*)"} it follows then that
       
   319   @{term s} must be an element in @{term "(\<Union>m\<in>{0..k}. B ;; (A \<up> m))"}. This in turn
       
   320   implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Prop.~\ref{langprops}@{text "(iii)"} 
       
   321   this is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
       
   322   \end{proof}
       
   323 
       
   324   \noindent
       
   325   Regular expressions are defined as the inductive datatype
       
   326 
       
   327   \begin{center}
       
   328   @{text r} @{text "::="}
       
   329   @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
       
   330   @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
       
   331   @{term "CHAR c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
       
   332   @{term "SEQ r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
       
   333   @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
       
   334   @{term "STAR r"}
       
   335   \end{center}
       
   336 
       
   337   \noindent
       
   338   and the language matched by a regular expression is defined as
       
   339 
       
   340   \begin{center}
       
   341   \begin{tabular}{c@ {\hspace{10mm}}c}
       
   342   \begin{tabular}{rcl}
       
   343   @{thm (lhs) L_rexp.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(1)}\\
       
   344   @{thm (lhs) L_rexp.simps(2)} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(2)}\\
       
   345   @{thm (lhs) L_rexp.simps(3)[where c="c"]} & @{text "\<equiv>"} & @{thm (rhs) L_rexp.simps(3)[where c="c"]}\\
       
   346   \end{tabular}
       
   347   &
       
   348   \begin{tabular}{rcl}
       
   349   @{thm (lhs) L_rexp.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} & @{text "\<equiv>"} &
       
   350        @{thm (rhs) L_rexp.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
       
   351   @{thm (lhs) L_rexp.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]} & @{text "\<equiv>"} &
       
   352        @{thm (rhs) L_rexp.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
       
   353   @{thm (lhs) L_rexp.simps(6)[where r="r"]} & @{text "\<equiv>"} &
       
   354       @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\
       
   355   \end{tabular}
       
   356   \end{tabular}
       
   357   \end{center}
       
   358 
       
   359   Given a finite set of regular expressions @{text rs}, we will make use of the operation of generating 
       
   360   a regular expression that matches the union of all languages of @{text rs}. We only need to know the 
       
   361   existence
       
   362   of such a regular expression and therefore we use Isabelle/HOL's @{const "fold_graph"} and Hilbert's
       
   363   @{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This operation, roughly speaking, folds @{const ALT} over the 
       
   364   set @{text rs} with @{const NULL} for the empty set. We can prove that for a finite set @{text rs}
       
   365   %
       
   366   \begin{equation}\label{uplus}
       
   367   \mbox{@{thm (lhs) folds_alt_simp} @{text "= \<Union> (\<calL> ` rs)"}}
       
   368   \end{equation}
       
   369 
       
   370   \noindent
       
   371   holds, whereby @{text "\<calL> ` rs"} stands for the 
       
   372   image of the set @{text rs} under function @{text "\<calL>"}.
       
   373 *}
       
   374 
       
   375 
       
   376 section {* The Myhill-Nerode Theorem, First Part *}
       
   377 
       
   378 text {*
       
   379   The key definition in the Myhill-Nerode theorem is the
       
   380   \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two 
       
   381   strings are related, provided there is no distinguishing extension in this
       
   382   language. This can be defined as a tertiary relation.
       
   383 
       
   384   \begin{definition}[Myhill-Nerode Relation] Given a language @{text A}, two strings @{text x} and
       
   385   @{text y} are Myhill-Nerode related provided
       
   386   \begin{center}
       
   387   @{thm str_eq_def[simplified str_eq_rel_def Pair_Collect]}
       
   388   \end{center}
       
   389   \end{definition}
       
   390 
       
   391   \noindent
       
   392   It is easy to see that @{term "\<approx>A"} is an equivalence relation, which
       
   393   partitions the set of all strings, @{text "UNIV"}, into a set of disjoint
       
   394   equivalence classes. To illustrate this quotient construction, let us give a simple 
       
   395   example: consider the regular language containing just
       
   396   the string @{text "[c]"}. The relation @{term "\<approx>({[c]})"} partitions @{text UNIV}
       
   397   into three equivalence classes @{text "X\<^isub>1"}, @{text "X\<^isub>2"} and  @{text "X\<^isub>3"}
       
   398   as follows
       
   399   
       
   400   \begin{center}
       
   401   @{text "X\<^isub>1 = {[]}"}\hspace{5mm}
       
   402   @{text "X\<^isub>2 = {[c]}"}\hspace{5mm}
       
   403   @{text "X\<^isub>3 = UNIV - {[], [c]}"}
       
   404   \end{center}
       
   405 
       
   406   One direction of the Myhill-Nerode theorem establishes 
       
   407   that if there are finitely many equivalence classes, like in the example above, then 
       
   408   the language is regular. In our setting we therefore have to show:
       
   409   
       
   410   \begin{theorem}\label{myhillnerodeone}
       
   411   @{thm[mode=IfThen] Myhill_Nerode1}
       
   412   \end{theorem}
       
   413 
       
   414   \noindent
       
   415   To prove this theorem, we first define the set @{term "finals A"} as those equivalence
       
   416   classes from @{term "UNIV // \<approx>A"} that contain strings of @{text A}, namely
       
   417   %
       
   418   \begin{equation} 
       
   419   @{thm finals_def}
       
   420   \end{equation}
       
   421 
       
   422   \noindent
       
   423   In our running example, @{text "X\<^isub>2"} is the only 
       
   424   equivalence class in @{term "finals {[c]}"}.
       
   425   It is straightforward to show that in general @{thm lang_is_union_of_finals} and 
       
   426   @{thm finals_in_partitions} hold. 
       
   427   Therefore if we know that there exists a regular expression for every
       
   428   equivalence class in \mbox{@{term "finals A"}} (which by assumption must be
       
   429   a finite set), then we can use @{text "\<bigplus>"} to obtain a regular expression 
       
   430   that matches every string in @{text A}.
       
   431 
       
   432 
       
   433   Our proof of Thm.~\ref{myhillnerodeone} relies on a method that can calculate a
       
   434   regular expression for \emph{every} equivalence class, not just the ones 
       
   435   in @{term "finals A"}. We
       
   436   first define the notion of \emph{one-character-transition} between 
       
   437   two equivalence classes
       
   438   %
       
   439   \begin{equation} 
       
   440   @{thm transition_def}
       
   441   \end{equation}
       
   442 
       
   443   \noindent
       
   444   which means that if we concatenate the character @{text c} to the end of all 
       
   445   strings in the equivalence class @{text Y}, we obtain a subset of 
       
   446   @{text X}. Note that we do not define an automaton here, we merely relate two sets
       
   447   (with the help of a character). In our concrete example we have 
       
   448   @{term "X\<^isub>1 \<Turnstile>c\<Rightarrow> X\<^isub>2"}, @{term "X\<^isub>1 \<Turnstile>d\<Rightarrow> X\<^isub>3"} with @{text d} being any 
       
   449   other character than @{text c}, and @{term "X\<^isub>3 \<Turnstile>d\<Rightarrow> X\<^isub>3"} for any @{text d}.
       
   450   
       
   451   Next we construct an \emph{initial equational system} that
       
   452   contains an equation for each equivalence class. We first give
       
   453   an informal description of this construction. Suppose we have 
       
   454   the equivalence classes @{text "X\<^isub>1,\<dots>,X\<^isub>n"}, there must be one and only one that
       
   455   contains the empty string @{text "[]"} (since equivalence classes are disjoint).
       
   456   Let us assume @{text "[] \<in> X\<^isub>1"}. We build the following equational system
       
   457   
       
   458   \begin{center}
       
   459   \begin{tabular}{rcl}
       
   460   @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) + \<dots> + (Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) + \<lambda>(EMPTY)"} \\
       
   461   @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, CHAR c\<^isub>2\<^isub>1) + \<dots> + (Y\<^isub>2\<^isub>o, CHAR c\<^isub>2\<^isub>o)"} \\
       
   462   & $\vdots$ \\
       
   463   @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, CHAR c\<^isub>n\<^isub>1) + \<dots> + (Y\<^isub>n\<^isub>q, CHAR c\<^isub>n\<^isub>q)"}\\
       
   464   \end{tabular}
       
   465   \end{center}
       
   466 
       
   467   \noindent
       
   468   where the terms @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"}
       
   469   stand for all transitions @{term "Y\<^isub>i\<^isub>j \<Turnstile>c\<^isub>i\<^isub>j\<Rightarrow>
       
   470   X\<^isub>i"}. 
       
   471   %The intuition behind the equational system is that every 
       
   472   %equation @{text "X\<^isub>i = rhs\<^isub>i"} in this system
       
   473   %corresponds roughly to a state of an automaton whose name is @{text X\<^isub>i} and its predecessor states 
       
   474   %are the @{text "Y\<^isub>i\<^isub>j"}; the @{text "c\<^isub>i\<^isub>j"} are the labels of the transitions from these 
       
   475   %predecessor states to @{text X\<^isub>i}. 
       
   476   There can only be
       
   477   finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side 
       
   478   since by assumption there are only finitely many
       
   479   equivalence classes and only finitely many characters.
       
   480   The term @{text "\<lambda>(EMPTY)"} in the first equation acts as a marker for the initial state, that
       
   481   is the equivalence class
       
   482   containing @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
       
   483   single `initial' state in the equational system, which is different from
       
   484   the method by Brzozowski \cite{Brzozowski64}, where he marks the
       
   485   `terminal' states. We are forced to set up the equational system in our
       
   486   way, because the Myhill-Nerode relation determines the `direction' of the
       
   487   transitions---the successor `state' of an equivalence class @{text Y} can
       
   488   be reached by adding a character to the end of @{text Y}. This is also the
       
   489   reason why we have to use our reverse version of Arden's Lemma.}
       
   490   %In our initial equation system there can only be
       
   491   %finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side 
       
   492   %since by assumption there are only finitely many
       
   493   %equivalence classes and only finitely many characters. 
       
   494   Overloading the function @{text \<calL>} for the two kinds of terms in the
       
   495   equational system, we have
       
   496   
       
   497   \begin{center}
       
   498   @{text "\<calL>(Y, r) \<equiv>"} %
       
   499   @{thm (rhs) L_rhs_trm.simps(2)[where X="Y" and r="r", THEN eq_reflection]}\hspace{10mm}
       
   500   @{thm L_rhs_trm.simps(1)[where r="r", THEN eq_reflection]}
       
   501   \end{center}
       
   502 
       
   503   \noindent
       
   504   and we can prove for @{text "X\<^isub>2\<^isub>.\<^isub>.\<^isub>n"} that the following equations
       
   505   %
       
   506   \begin{equation}\label{inv1}
       
   507   @{text "X\<^isub>i = \<calL>(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>i\<^isub>q, CHAR c\<^isub>i\<^isub>q)"}.
       
   508   \end{equation}
       
   509 
       
   510   \noindent
       
   511   hold. Similarly for @{text "X\<^isub>1"} we can show the following equation
       
   512   %
       
   513   \begin{equation}\label{inv2}
       
   514   @{text "X\<^isub>1 = \<calL>(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) \<union> \<calL>(\<lambda>(EMPTY))"}.
       
   515   \end{equation}
       
   516 
       
   517   \noindent
       
   518   holds. The reason for adding the @{text \<lambda>}-marker to our initial equational system is 
       
   519   to obtain this equation: it only holds with the marker, since none of 
       
   520   the other terms contain the empty string. The point of the initial equational system is
       
   521   that solving it means we will be able to extract a regular expression for every equivalence class. 
       
   522 
       
   523   Our representation for the equations in Isabelle/HOL are pairs,
       
   524   where the first component is an equivalence class (a set of strings)
       
   525   and the second component
       
   526   is a set of terms. Given a set of equivalence
       
   527   classes @{text CS}, our initial equational system @{term "Init CS"} is thus 
       
   528   formally defined as
       
   529   %
       
   530   \begin{equation}\label{initcs}
       
   531   \mbox{\begin{tabular}{rcl}     
       
   532   @{thm (lhs) Init_rhs_def} & @{text "\<equiv>"} & 
       
   533   @{text "if"}~@{term "[] \<in> X"}\\
       
   534   & & @{text "then"}~@{term "{Trn Y (CHAR c) | Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X} \<union> {Lam EMPTY}"}\\
       
   535   & & @{text "else"}~@{term "{Trn Y (CHAR c)| Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X}"}\\
       
   536   @{thm (lhs) Init_def}     & @{text "\<equiv>"} & @{thm (rhs) Init_def}
       
   537   \end{tabular}}
       
   538   \end{equation}
       
   539 
       
   540   
       
   541 
       
   542   \noindent
       
   543   Because we use sets of terms 
       
   544   for representing the right-hand sides of equations, we can 
       
   545   prove \eqref{inv1} and \eqref{inv2} more concisely as
       
   546   %
       
   547   \begin{lemma}\label{inv}
       
   548   If @{thm (prem 1) test} then @{text "X = \<Union> \<calL> ` rhs"}.
       
   549   \end{lemma}
       
   550 
       
   551   \noindent
       
   552   Our proof of Thm.~\ref{myhillnerodeone} will proceed by transforming the
       
   553   initial equational system into one in \emph{solved form} maintaining the invariant
       
   554   in Lem.~\ref{inv}. From the solved form we will be able to read
       
   555   off the regular expressions. 
       
   556 
       
   557   In order to transform an equational system into solved form, we have two 
       
   558   operations: one that takes an equation of the form @{text "X = rhs"} and removes
       
   559   any recursive occurrences of @{text X} in the @{text rhs} using our variant of Arden's 
       
   560   Lemma. The other operation takes an equation @{text "X = rhs"}
       
   561   and substitutes @{text X} throughout the rest of the equational system
       
   562   adjusting the remaining regular expressions appropriately. To define this adjustment 
       
   563   we define the \emph{append-operation} taking a term and a regular expression as argument
       
   564 
       
   565   \begin{center}
       
   566   @{thm Append_rexp.simps(2)[where X="Y" and r="r\<^isub>1" and rexp="r\<^isub>2", THEN eq_reflection]}\hspace{10mm}
       
   567   @{thm Append_rexp.simps(1)[where r="r\<^isub>1" and rexp="r\<^isub>2", THEN eq_reflection]}
       
   568   \end{center}
       
   569 
       
   570   \noindent
       
   571   We lift this operation to entire right-hand sides of equations, written as
       
   572   @{thm (lhs) Append_rexp_rhs_def[where rexp="r"]}. With this we can define
       
   573   the \emph{arden-operation} for an equation of the form @{text "X = rhs"} as:
       
   574   %
       
   575   \begin{equation}\label{arden_def}
       
   576   \mbox{\begin{tabular}{rc@ {\hspace{2mm}}r@ {\hspace{1mm}}l}
       
   577   @{thm (lhs) Arden_def} & @{text "\<equiv>"}~~\mbox{} & \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "let"}}\\ 
       
   578    & & @{text "rhs' ="} & @{term "rhs - {Trn X r | r. Trn X r \<in> rhs}"} \\
       
   579    & & @{text "r' ="}   & @{term "STAR (\<Uplus> {r. Trn X r \<in> rhs})"}\\
       
   580    & &  \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "in"}~~@{term "append_rhs_rexp rhs' r'"}}\\ 
       
   581   \end{tabular}}
       
   582   \end{equation}
       
   583 
       
   584   \noindent
       
   585   In this definition, we first delete all terms of the form @{text "(X, r)"} from @{text rhs};
       
   586   then we calculate the combined regular expressions for all @{text r} coming 
       
   587   from the deleted @{text "(X, r)"}, and take the @{const STAR} of it;
       
   588   finally we append this regular expression to @{text rhs'}. It can be easily seen 
       
   589   that this operation mimics Arden's Lemma on the level of equations. To ensure
       
   590   the non-emptiness condition of Arden's Lemma we say that a right-hand side is
       
   591   @{text ardenable} provided
       
   592 
       
   593   \begin{center}
       
   594   @{thm ardenable_def}
       
   595   \end{center}
       
   596 
       
   597   \noindent
       
   598   This allows us to prove a version of Arden's Lemma on the level of equations.
       
   599 
       
   600   \begin{lemma}\label{ardenable}
       
   601   Given an equation @{text "X = rhs"}.
       
   602   If @{text "X = \<Union>\<calL> ` rhs"},
       
   603   @{thm (prem 2) Arden_keeps_eq}, and
       
   604   @{thm (prem 3) Arden_keeps_eq}, then
       
   605   @{text "X = \<Union>\<calL> ` (Arden X rhs)"}.
       
   606   \end{lemma}
       
   607   
       
   608   \noindent
       
   609   Our @{text ardenable} condition is slightly stronger than needed for applying Arden's Lemma,
       
   610   but we can still ensure that it holds troughout our algorithm of transforming equations
       
   611   into solved form. The \emph{substitution-operation} takes an equation
       
   612   of the form @{text "X = xrhs"} and substitutes it into the right-hand side @{text rhs}.
       
   613 
       
   614   \begin{center}
       
   615   \begin{tabular}{rc@ {\hspace{2mm}}r@ {\hspace{1mm}}l}
       
   616   @{thm (lhs) Subst_def} & @{text "\<equiv>"}~~\mbox{} & \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "let"}}\\ 
       
   617    & & @{text "rhs' ="} & @{term "rhs - {Trn X r | r. Trn X r \<in> rhs}"} \\
       
   618    & & @{text "r' ="}   & @{term "\<Uplus> {r. Trn X r \<in> rhs}"}\\
       
   619    & &  \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "in"}~~@{term "rhs' \<union> append_rhs_rexp xrhs r'"}}\\ 
       
   620   \end{tabular}
       
   621   \end{center}
       
   622 
       
   623   \noindent
       
   624   We again delete first all occurrences of @{text "(X, r)"} in @{text rhs}; we then calculate
       
   625   the regular expression corresponding to the deleted terms; finally we append this
       
   626   regular expression to @{text "xrhs"} and union it up with @{text rhs'}. When we use
       
   627   the substitution operation we will arrange it so that @{text "xrhs"} does not contain
       
   628   any occurrence of @{text X}.
       
   629 
       
   630   With these two operations in place, we can define the operation that removes one equation
       
   631   from an equational systems @{text ES}. The operation @{const Subst_all}
       
   632   substitutes an equation @{text "X = xrhs"} throughout an equational system @{text ES}; 
       
   633   @{const Remove} then completely removes such an equation from @{text ES} by substituting 
       
   634   it to the rest of the equational system, but first eliminating all recursive occurrences
       
   635   of @{text X} by applying @{const Arden} to @{text "xrhs"}.
       
   636 
       
   637   \begin{center}
       
   638   \begin{tabular}{rcl}
       
   639   @{thm (lhs) Subst_all_def} & @{text "\<equiv>"} & @{thm (rhs) Subst_all_def}\\
       
   640   @{thm (lhs) Remove_def}    & @{text "\<equiv>"} & @{thm (rhs) Remove_def}
       
   641   \end{tabular}
       
   642   \end{center}
       
   643 
       
   644   \noindent
       
   645   Finally, we can define how an equational system should be solved. For this 
       
   646   we will need to iterate the process of eliminating equations until only one equation
       
   647   will be left in the system. However, we do not just want to have any equation
       
   648   as being the last one, but the one involving the equivalence class for 
       
   649   which we want to calculate the regular 
       
   650   expression. Let us suppose this equivalence class is @{text X}. 
       
   651   Since @{text X} is the one to be solved, in every iteration step we have to pick an
       
   652   equation to be eliminated that is different from @{text X}. In this way 
       
   653   @{text X} is kept to the final step. The choice is implemented using Hilbert's choice 
       
   654   operator, written @{text SOME} in the definition below.
       
   655   
       
   656   \begin{center}
       
   657   \begin{tabular}{rc@ {\hspace{4mm}}r@ {\hspace{1mm}}l}
       
   658   @{thm (lhs) Iter_def} & @{text "\<equiv>"}~~\mbox{} & \multicolumn{2}{@ {\hspace{-4mm}}l}{@{text "let"}}\\ 
       
   659    & & @{text "(Y, yrhs) ="} & @{term "SOME (Y, yrhs). (Y, yrhs) \<in> ES \<and> X \<noteq> Y"} \\
       
   660    & &  \multicolumn{2}{@ {\hspace{-4mm}}l}{@{text "in"}~~@{term "Remove ES Y yrhs"}}\\ 
       
   661   \end{tabular}
       
   662   \end{center}
       
   663 
       
   664   \noindent
       
   665   The last definition we need applies @{term Iter} over and over until a condition 
       
   666   @{text Cond} is \emph{not} satisfied anymore. This condition states that there
       
   667   are more than one equation left in the equational system @{text ES}. To solve
       
   668   an equational system we use Isabelle/HOL's @{text while}-operator as follows:
       
   669   
       
   670   \begin{center}
       
   671   @{thm Solve_def}
       
   672   \end{center}
       
   673 
       
   674   \noindent
       
   675   We are not concerned here with the definition of this operator
       
   676   (see Berghofer and Nipkow \cite{BerghoferNipkow00}), but note that we eliminate
       
   677   in each @{const Iter}-step a single equation, and therefore 
       
   678   have a well-founded termination order by taking the cardinality 
       
   679   of the equational system @{text ES}. This enables us to prove
       
   680   properties about our definition of @{const Solve} when we `call' it with
       
   681   the equivalence class @{text X} and the initial equational system 
       
   682   @{term "Init (UNIV // \<approx>A)"} from
       
   683   \eqref{initcs} using the principle:
       
   684   %
       
   685   \begin{equation}\label{whileprinciple}
       
   686   \mbox{\begin{tabular}{l}
       
   687   @{term "invariant (Init (UNIV // \<approx>A))"} \\
       
   688   @{term "\<forall>ES. invariant ES \<and> Cond ES \<longrightarrow> invariant (Iter X ES)"}\\
       
   689   @{term "\<forall>ES. invariant ES \<and> Cond ES \<longrightarrow> card (Iter X ES) < card ES"}\\
       
   690   @{term "\<forall>ES. invariant ES \<and> \<not> Cond ES \<longrightarrow> P ES"}\\
       
   691   \hline
       
   692   \multicolumn{1}{c}{@{term "P (Solve X (Init (UNIV // \<approx>A)))"}}
       
   693   \end{tabular}}
       
   694   \end{equation}
       
   695 
       
   696   \noindent
       
   697   This principle states that given an invariant (which we will specify below) 
       
   698   we can prove a property
       
   699   @{text "P"} involving @{const Solve}. For this we have to discharge the following
       
   700   proof obligations: first the
       
   701   initial equational system satisfies the invariant; second the iteration
       
   702   step @{text "Iter"} preserves the invariant as long as the condition @{term Cond} holds;
       
   703   third @{text "Iter"} decreases the termination order, and fourth that
       
   704   once the condition does not hold anymore then the property @{text P} must hold.
       
   705 
       
   706   The property @{term P} in our proof will state that @{term "Solve X (Init (UNIV // \<approx>A))"}
       
   707   returns with a single equation @{text "X = xrhs"} for some @{text "xrhs"}, and
       
   708   that this equational system still satisfies the invariant. In order to get
       
   709   the proof through, the invariant is composed of the following six properties:
       
   710 
       
   711   \begin{center}
       
   712   \begin{tabular}{@ {}rcl@ {\hspace{-13mm}}l @ {}}
       
   713   @{text "invariant ES"} & @{text "\<equiv>"} &
       
   714       @{term "finite ES"} & @{text "(finiteness)"}\\
       
   715   & @{text "\<and>"} & @{thm (rhs) finite_rhs_def} & @{text "(finiteness rhs)"}\\
       
   716   & @{text "\<and>"} & @{text "\<forall>(X, rhs)\<in>ES. X = \<Union>\<calL> ` rhs"} & @{text "(soundness)"}\\
       
   717   & @{text "\<and>"} & @{thm (rhs) distinctness_def}\\
       
   718   &             &  & @{text "(distinctness)"}\\
       
   719   & @{text "\<and>"} & @{thm (rhs) ardenable_all_def} & @{text "(ardenable)"}\\   
       
   720   & @{text "\<and>"} & @{thm (rhs) validity_def} & @{text "(validity)"}\\
       
   721   \end{tabular}
       
   722   \end{center}
       
   723  
       
   724   \noindent
       
   725   The first two ensure that the equational system is always finite (number of equations
       
   726   and number of terms in each equation); the third makes sure the `meaning' of the 
       
   727   equations is preserved under our transformations. The other properties are a bit more
       
   728   technical, but are needed to get our proof through. Distinctness states that every
       
   729   equation in the system is distinct. @{text Ardenable} ensures that we can always
       
   730   apply the @{text Arden} operation. 
       
   731   The last property states that every @{text rhs} can only contain equivalence classes
       
   732   for which there is an equation. Therefore @{text lhss} is just the set containing 
       
   733   the first components of an equational system,
       
   734   while @{text "rhss"} collects all equivalence classes @{text X} in the terms of the 
       
   735   form @{term "Trn X r"}. That means formally @{thm (lhs) lhss_def}~@{text "\<equiv> {X | (X, rhs) \<in> ES}"} 
       
   736   and @{thm (lhs) rhss_def}~@{text "\<equiv> {X | (X, r) \<in> rhs}"}.
       
   737   
       
   738 
       
   739   It is straightforward to prove that the initial equational system satisfies the
       
   740   invariant.
       
   741 
       
   742   \begin{lemma}\label{invzero}
       
   743   @{thm[mode=IfThen] Init_ES_satisfies_invariant}
       
   744   \end{lemma}
       
   745 
       
   746   \begin{proof}
       
   747   Finiteness is given by the assumption and the way how we set up the 
       
   748   initial equational system. Soundness is proved in Lem.~\ref{inv}. Distinctness
       
   749   follows from the fact that the equivalence classes are disjoint. The @{text ardenable}
       
   750   property also follows from the setup of the initial equational system, as does 
       
   751   validity.\qed
       
   752   \end{proof}
       
   753 
       
   754   \noindent
       
   755   Next we show that @{text Iter} preserves the invariant.
       
   756 
       
   757   \begin{lemma}\label{iterone}
       
   758   @{thm[mode=IfThen] iteration_step_invariant[where xrhs="rhs"]}
       
   759   \end{lemma}
       
   760 
       
   761   \begin{proof} 
       
   762   The argument boils down to choosing an equation @{text "Y = yrhs"} to be eliminated
       
   763   and to show that @{term "Subst_all (ES - {(Y, yrhs)}) Y (Arden Y yrhs)"} 
       
   764   preserves the invariant.
       
   765   We prove this as follows:
       
   766 
       
   767   \begin{center}
       
   768   @{text "\<forall> ES."} @{thm (prem 1) Subst_all_satisfies_invariant} implies
       
   769   @{thm (concl) Subst_all_satisfies_invariant}
       
   770   \end{center}
       
   771 
       
   772   \noindent
       
   773   Finiteness is straightforward, as the @{const Subst} and @{const Arden} operations 
       
   774   keep the equational system finite. These operations also preserve soundness 
       
   775   and distinctness (we proved soundness for @{const Arden} in Lem.~\ref{ardenable}).
       
   776   The property @{text ardenable} is clearly preserved because the append-operation
       
   777   cannot make a regular expression to match the empty string. Validity is
       
   778   given because @{const Arden} removes an equivalence class from @{text yrhs}
       
   779   and then @{const Subst_all} removes @{text Y} from the equational system.
       
   780   Having proved the implication above, we can instantiate @{text "ES"} with @{text "ES - {(Y, yrhs)}"}
       
   781   which matches with our proof-obligation of @{const "Subst_all"}. Since
       
   782   \mbox{@{term "ES = ES - {(Y, yrhs)} \<union> {(Y, yrhs)}"}}, we can use the assumption 
       
   783   to complete the proof.\qed 
       
   784   \end{proof}
       
   785 
       
   786   \noindent
       
   787   We also need the fact that @{text Iter} decreases the termination measure.
       
   788 
       
   789   \begin{lemma}\label{itertwo}
       
   790   @{thm[mode=IfThen] iteration_step_measure[simplified (no_asm), where xrhs="rhs"]}
       
   791   \end{lemma}
       
   792 
       
   793   \begin{proof}
       
   794   By assumption we know that @{text "ES"} is finite and has more than one element.
       
   795   Therefore there must be an element @{term "(Y, yrhs) \<in> ES"} with 
       
   796   @{term "(Y, yrhs) \<noteq> (X, rhs)"}. Using the distinctness property we can infer
       
   797   that @{term "Y \<noteq> X"}. We further know that @{text "Remove ES Y yrhs"}
       
   798   removes the equation @{text "Y = yrhs"} from the system, and therefore 
       
   799   the cardinality of @{const Iter} strictly decreases.\qed
       
   800   \end{proof}
       
   801 
       
   802   \noindent
       
   803   This brings us to our property we want to establish for @{text Solve}.
       
   804 
       
   805 
       
   806   \begin{lemma}
       
   807   If @{thm (prem 1) Solve} and @{thm (prem 2) Solve} then there exists
       
   808   a @{text rhs} such that  @{term "Solve X (Init (UNIV // \<approx>A)) = {(X, rhs)}"}
       
   809   and @{term "invariant {(X, rhs)}"}.
       
   810   \end{lemma}
       
   811 
       
   812   \begin{proof} 
       
   813   In order to prove this lemma using \eqref{whileprinciple}, we have to use a slightly
       
   814   stronger invariant since Lem.~\ref{iterone} and \ref{itertwo} have the precondition 
       
   815   that @{term "(X, rhs) \<in> ES"} for some @{text rhs}. This precondition is needed
       
   816   in order to choose in the @{const Iter}-step an equation that is not \mbox{@{term "X = rhs"}}.
       
   817   Therefore our invariant cannot be just @{term "invariant ES"}, but must be 
       
   818   @{term "invariant ES \<and> (\<exists>rhs. (X, rhs) \<in> ES)"}. By assumption 
       
   819   @{thm (prem 2) Solve} and Lem.~\ref{invzero}, the more general invariant holds for
       
   820   the initial equational system. This is premise 1 of~\eqref{whileprinciple}.
       
   821   Premise 2 is given by Lem.~\ref{iterone} and the fact that @{const Iter} might
       
   822   modify the @{text rhs} in the equation @{term "X = rhs"}, but does not remove it.
       
   823   Premise 3 of~\eqref{whileprinciple} is by Lem.~\ref{itertwo}. Now in premise 4
       
   824   we like to show that there exists a @{text rhs} such that @{term "ES = {(X, rhs)}"}
       
   825   and that @{text "invariant {(X, rhs)}"} holds, provided the condition @{text "Cond"}
       
   826   does not holds. By the stronger invariant we know there exists such a @{text "rhs"}
       
   827   with @{term "(X, rhs) \<in> ES"}. Because @{text Cond} is not true, we know the cardinality
       
   828   of @{text ES} is @{text 1}. This means @{text "ES"} must actually be the set @{text "{(X, rhs)}"},
       
   829   for which the invariant holds. This allows us to conclude that 
       
   830   @{term "Solve X (Init (UNIV // \<approx>A)) = {(X, rhs)}"} and @{term "invariant {(X, rhs)}"} hold,
       
   831   as needed.\qed
       
   832   \end{proof}
       
   833 
       
   834   \noindent
       
   835   With this lemma in place we can show that for every equivalence class in @{term "UNIV // \<approx>A"}
       
   836   there exists a regular expression.
       
   837 
       
   838   \begin{lemma}\label{every_eqcl_has_reg}
       
   839   @{thm[mode=IfThen] every_eqcl_has_reg}
       
   840   \end{lemma}
       
   841 
       
   842   \begin{proof}
       
   843   By the preceding lemma, we know that there exists a @{text "rhs"} such
       
   844   that @{term "Solve X (Init (UNIV // \<approx>A))"} returns the equation @{text "X = rhs"},
       
   845   and that the invariant holds for this equation. That means we 
       
   846   know @{text "X = \<Union>\<calL> ` rhs"}. We further know that
       
   847   this is equal to \mbox{@{text "\<Union>\<calL> ` (Arden X rhs)"}} using the properties of the 
       
   848   invariant and Lem.~\ref{ardenable}. Using the validity property for the equation @{text "X = rhs"},
       
   849   we can infer that @{term "rhss rhs \<subseteq> {X}"} and because the @{text Arden} operation
       
   850   removes that @{text X} from @{text rhs}, that @{term "rhss (Arden X rhs) = {}"}.
       
   851   This means the right-hand side @{term "Arden X rhs"} can only consist of terms of the form @{term "Lam r"}.
       
   852   So we can collect those (finitely many) regular expressions @{text rs} and have @{term "X = L (\<Uplus>rs)"}.
       
   853   With this we can conclude the proof.\qed
       
   854   \end{proof}
       
   855 
       
   856   \noindent
       
   857   Lem.~\ref{every_eqcl_has_reg} allows us to finally give a proof for the first direction
       
   858   of the Myhill-Nerode theorem.
       
   859 
       
   860   \begin{proof}[of Thm.~\ref{myhillnerodeone}]
       
   861   By Lem.~\ref{every_eqcl_has_reg} we know that there exists a regular expression for
       
   862   every equivalence class in @{term "UNIV // \<approx>A"}. Since @{text "finals A"} is
       
   863   a subset of  @{term "UNIV // \<approx>A"}, we also know that for every equivalence class
       
   864   in @{term "finals A"} there exists a regular expression. Moreover by assumption 
       
   865   we know that @{term "finals A"} must be finite, and therefore there must be a finite
       
   866   set of regular expressions @{text "rs"} such that
       
   867   @{term "\<Union>(finals A) = L (\<Uplus>rs)"}.
       
   868   Since the left-hand side is equal to @{text A}, we can use @{term "\<Uplus>rs"} 
       
   869   as the regular expression that is needed in the theorem.\qed
       
   870   \end{proof}
       
   871 *}
       
   872 
       
   873 
       
   874 
       
   875 
       
   876 section {* Myhill-Nerode, Second Part *}
       
   877 
       
   878 text {*
       
   879   We will prove in this section the second part of the Myhill-Nerode
       
   880   theorem. It can be formulated in our setting as follows:
       
   881 
       
   882   \begin{theorem}
       
   883   Given @{text "r"} is a regular expression, then @{thm Myhill_Nerode2}.
       
   884   \end{theorem}  
       
   885 
       
   886   \noindent
       
   887   The proof will be by induction on the structure of @{text r}. It turns out
       
   888   the base cases are straightforward.
       
   889 
       
   890 
       
   891   \begin{proof}[Base Cases]
       
   892   The cases for @{const NULL}, @{const EMPTY} and @{const CHAR} are routine, because 
       
   893   we can easily establish that
       
   894 
       
   895   \begin{center}
       
   896   \begin{tabular}{l}
       
   897   @{thm quot_null_eq}\\
       
   898   @{thm quot_empty_subset}\\
       
   899   @{thm quot_char_subset}
       
   900   \end{tabular}
       
   901   \end{center}
       
   902 
       
   903   \noindent
       
   904   hold, which shows that @{term "UNIV // \<approx>(L r)"} must be finite.\qed
       
   905   \end{proof}
       
   906 
       
   907   \noindent
       
   908   Much more interesting, however, are the inductive cases. They seem hard to solve 
       
   909   directly. The reader is invited to try. 
       
   910 
       
   911   Our proof will rely on some
       
   912   \emph{tagging-functions} defined over strings. Given the inductive hypothesis, it will 
       
   913   be easy to prove that the \emph{range} of these tagging-functions is finite
       
   914   (the range of a function @{text f} is defined as @{text "range f \<equiv> f ` UNIV"}).
       
   915   With this we will be able to infer that the tagging-functions, seen as relations,
       
   916   give rise to finitely many equivalence classes of @{const UNIV}. Finally we 
       
   917   will show that the tagging-relations are more refined than @{term "\<approx>(L r)"}, which
       
   918   implies that @{term "UNIV // \<approx>(L r)"} must also be finite (a relation @{text "R\<^isub>1"} 
       
   919   is said to \emph{refine} @{text "R\<^isub>2"} provided @{text "R\<^isub>1 \<subseteq> R\<^isub>2"}).
       
   920   We formally define the notion of a \emph{tagging-relation} as follows.
       
   921 
       
   922   \begin{definition}[Tagging-Relation] Given a tagging-function @{text tag}, then two strings @{text x}
       
   923   and @{text y} are \emph{tag-related} provided
       
   924   \begin{center}
       
   925   @{text "x =tag= y \<equiv> tag x = tag y"}\;.
       
   926   \end{center}
       
   927   \end{definition}
       
   928 
       
   929 
       
   930   In order to establish finiteness of a set @{text A}, we shall use the following powerful
       
   931   principle from Isabelle/HOL's library.
       
   932   %
       
   933   \begin{equation}\label{finiteimageD}
       
   934   @{thm[mode=IfThen] finite_imageD}
       
   935   \end{equation}
       
   936 
       
   937   \noindent
       
   938   It states that if an image of a set under an injective function @{text f} (injective over this set) 
       
   939   is finite, then the set @{text A} itself must be finite. We can use it to establish the following 
       
   940   two lemmas.
       
   941 
       
   942   \begin{lemma}\label{finone}
       
   943   @{thm[mode=IfThen] finite_eq_tag_rel}
       
   944   \end{lemma}
       
   945 
       
   946   \begin{proof}
       
   947   We set in \eqref{finiteimageD}, @{text f} to be @{text "X \<mapsto> tag ` X"}. We have
       
   948   @{text "range f"} to be a subset of @{term "Pow (range tag)"}, which we know must be
       
   949   finite by assumption. Now @{term "f (UNIV // =tag=)"} is a subset of @{text "range f"},
       
   950   and so also finite. Injectivity amounts to showing that @{text "X = Y"} under the
       
   951   assumptions that @{text "X, Y \<in> "}~@{term "UNIV // =tag="} and @{text "f X = f Y"}.
       
   952   From the assumptions we can obtain @{text "x \<in> X"} and @{text "y \<in> Y"} with 
       
   953   @{text "tag x = tag y"}. Since @{text x} and @{text y} are tag-related, this in 
       
   954   turn means that the equivalence classes @{text X}
       
   955   and @{text Y} must be equal.\qed
       
   956   \end{proof}
       
   957 
       
   958   \begin{lemma}\label{fintwo} 
       
   959   Given two equivalence relations @{text "R\<^isub>1"} and @{text "R\<^isub>2"}, whereby
       
   960   @{text "R\<^isub>1"} refines @{text "R\<^isub>2"}.
       
   961   If @{thm (prem 1) refined_partition_finite[where ?R1.0="R\<^isub>1" and ?R2.0="R\<^isub>2"]}
       
   962   then @{thm (concl) refined_partition_finite[where ?R1.0="R\<^isub>1" and ?R2.0="R\<^isub>2"]}.
       
   963   \end{lemma}
       
   964 
       
   965   \begin{proof}
       
   966   We prove this lemma again using \eqref{finiteimageD}. This time we set @{text f} to
       
   967   be @{text "X \<mapsto>"}~@{term "{R\<^isub>1 `` {x} | x. x \<in> X}"}. It is easy to see that 
       
   968   @{term "finite (f ` (UNIV // R\<^isub>2))"} because it is a subset of @{term "Pow (UNIV // R\<^isub>1)"},
       
   969   which is finite by assumption. What remains to be shown is that @{text f} is injective
       
   970   on @{term "UNIV // R\<^isub>2"}. This is equivalent to showing that two equivalence 
       
   971   classes, say @{text "X"} and @{text Y}, in @{term "UNIV // R\<^isub>2"} are equal, provided
       
   972   @{text "f X = f Y"}. For @{text "X = Y"} to be equal, we have to find two elements
       
   973   @{text "x \<in> X"} and @{text "y \<in> Y"} such that they are @{text R\<^isub>2} related.
       
   974   We know there exists a @{text "x \<in> X"} with \mbox{@{term "X = R\<^isub>2 `` {x}"}}. 
       
   975   From the latter fact we can infer that @{term "R\<^isub>1 ``{x} \<in> f X"}
       
   976   and further @{term "R\<^isub>1 ``{x} \<in> f Y"}. This means we can obtain a @{text y}
       
   977   such that @{term "R\<^isub>1 `` {x} = R\<^isub>1 `` {y}"} holds. Consequently @{text x} and @{text y}
       
   978   are @{text "R\<^isub>1"}-related. Since by assumption @{text "R\<^isub>1"} refines @{text "R\<^isub>2"},
       
   979   they must also be @{text "R\<^isub>2"}-related, as we need to show.\qed
       
   980   \end{proof}
       
   981 
       
   982   \noindent
       
   983   Chaining Lem.~\ref{finone} and \ref{fintwo} together, means in order to show
       
   984   that @{term "UNIV // \<approx>(L r)"} is finite, we have to find a tagging-function whose
       
   985   range can be shown to be finite and whose tagging-relation refines @{term "\<approx>(L r)"}.
       
   986   Let us attempt the @{const ALT}-case first.
       
   987 
       
   988   \begin{proof}[@{const "ALT"}-Case] 
       
   989   We take as tagging-function 
       
   990   %
       
   991   \begin{center}
       
   992   @{thm tag_str_ALT_def[where A="A" and B="B", THEN meta_eq_app]}
       
   993   \end{center}
       
   994 
       
   995   \noindent
       
   996   where @{text "A"} and @{text "B"} are some arbitrary languages.
       
   997   We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
       
   998   then @{term "finite ((UNIV // \<approx>A) \<times> (UNIV // \<approx>B))"} holds. The range of
       
   999   @{term "tag_str_ALT A B"} is a subset of this product set---so finite. It remains to be shown
       
  1000   that @{text "=tag\<^isub>A\<^isub>L\<^isub>T A B="} refines @{term "\<approx>(A \<union> B)"}. This amounts to 
       
  1001   showing
       
  1002   %
       
  1003   \begin{center}
       
  1004   @{term "tag\<^isub>A\<^isub>L\<^isub>T A B x = tag\<^isub>A\<^isub>L\<^isub>T A B y \<longrightarrow> x \<approx>(A \<union> B) y"}
       
  1005   \end{center}
       
  1006   %
       
  1007   \noindent
       
  1008   which by unfolding the Myhill-Nerode relation is identical to
       
  1009   %
       
  1010   \begin{equation}\label{pattern}
       
  1011   @{text "\<forall>z. tag\<^isub>A\<^isub>L\<^isub>T A B x = tag\<^isub>A\<^isub>L\<^isub>T A B y \<and> x @ z \<in> A \<union> B \<longrightarrow> y @ z \<in> A \<union> B"}
       
  1012   \end{equation}
       
  1013   %
       
  1014   \noindent
       
  1015   since both @{text "=tag\<^isub>A\<^isub>L\<^isub>T A B="} and @{term "\<approx>(A \<union> B)"} are symmetric. To solve
       
  1016   \eqref{pattern} we just have to unfold the definition of the tagging-function and analyse 
       
  1017   in which set, @{text A} or @{text B}, the string @{term "x @ z"} is. 
       
  1018   The definition of the tagging-function will give us in each case the
       
  1019   information to infer that @{text "y @ z \<in> A \<union> B"}.
       
  1020   Finally we
       
  1021   can discharge this case by setting @{text A} to @{term "L r\<^isub>1"} and @{text B} to @{term "L r\<^isub>2"}.\qed
       
  1022   \end{proof}
       
  1023 
       
  1024 
       
  1025   \noindent
       
  1026   The pattern in \eqref{pattern} is repeated for the other two cases. Unfortunately,
       
  1027   they are slightly more complicated. In the @{const SEQ}-case we essentially have
       
  1028   to be able to infer that 
       
  1029   %
       
  1030   \begin{center}
       
  1031   @{text "\<dots>"}@{term "x @ z \<in> A ;; B \<longrightarrow> y @ z \<in> A ;; B"}
       
  1032   \end{center}
       
  1033   %
       
  1034   \noindent
       
  1035   using the information given by the appropriate tagging-function. The complication 
       
  1036   is to find out what the possible splits of @{text "x @ z"} are to be in @{term "A ;; B"}
       
  1037   (this was easy in case of @{term "A \<union> B"}). To deal with this complication we define the
       
  1038   notions of \emph{string prefixes} 
       
  1039   %
       
  1040   \begin{center}
       
  1041   @{text "x \<le> y \<equiv> \<exists>z. y = x @ z"}\hspace{10mm}
       
  1042   @{text "x < y \<equiv> x \<le> y \<and> x \<noteq> y"}
       
  1043   \end{center}
       
  1044   %
       
  1045   \noindent
       
  1046   and \emph{string subtraction}:
       
  1047   %
       
  1048   \begin{center}
       
  1049   @{text "[] - y \<equiv> []"}\hspace{10mm}
       
  1050   @{text "x - [] \<equiv> x"}\hspace{10mm}
       
  1051   @{text "cx - dy \<equiv> if c = d then x - y else cx"}
       
  1052   \end{center}
       
  1053   %
       
  1054   \noindent
       
  1055   where @{text c} and @{text d} are characters, and @{text x} and @{text y} are strings. 
       
  1056   
       
  1057   Now assuming  @{term "x @ z \<in> A ;; B"} there are only two possible ways of how to `split' 
       
  1058   this string to be in @{term "A ;; B"}:
       
  1059   %
       
  1060   \begin{center}
       
  1061   \begin{tabular}{@ {}c@ {\hspace{10mm}}c@ {}}
       
  1062   \scalebox{0.7}{
       
  1063   \begin{tikzpicture}
       
  1064     \node[draw,minimum height=3.8ex] (xa) { $\hspace{3em}@{text "x'"}\hspace{3em}$ };
       
  1065     \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.2em}@{text "x - x'"}\hspace{0.2em}$ };
       
  1066     \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (z) { $\hspace{5em}@{text z}\hspace{5em}$ };
       
  1067 
       
  1068     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1069            (xa.north west) -- ($(xxa.north east)+(0em,0em)$)
       
  1070                node[midway, above=0.5em]{@{text x}};
       
  1071 
       
  1072     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1073            (z.north west) -- ($(z.north east)+(0em,0em)$)
       
  1074                node[midway, above=0.5em]{@{text z}};
       
  1075 
       
  1076     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1077            ($(xa.north west)+(0em,3ex)$) -- ($(z.north east)+(0em,3ex)$)
       
  1078                node[midway, above=0.8em]{@{term "x @ z \<in> A ;; B"}};
       
  1079 
       
  1080     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1081            ($(z.south east)+(0em,0ex)$) -- ($(xxa.south west)+(0em,0ex)$)
       
  1082                node[midway, below=0.5em]{@{term "(x - x') @ z \<in> B"}};
       
  1083 
       
  1084     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1085            ($(xa.south east)+(0em,0ex)$) -- ($(xa.south west)+(0em,0ex)$)
       
  1086                node[midway, below=0.5em]{@{term "x' \<in> A"}};
       
  1087   \end{tikzpicture}}
       
  1088   &
       
  1089   \scalebox{0.7}{
       
  1090   \begin{tikzpicture}
       
  1091     \node[draw,minimum height=3.8ex] (x) { $\hspace{4.8em}@{text x}\hspace{4.8em}$ };
       
  1092     \node[draw,minimum height=3.8ex, right=-0.03em of x] (za) { $\hspace{0.6em}@{text "z'"}\hspace{0.6em}$ };
       
  1093     \node[draw,minimum height=3.8ex, right=-0.03em of za] (zza) { $\hspace{2.6em}@{text "z - z'"}\hspace{2.6em}$  };
       
  1094 
       
  1095     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1096            (x.north west) -- ($(za.north west)+(0em,0em)$)
       
  1097                node[midway, above=0.5em]{@{text x}};
       
  1098 
       
  1099     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1100            ($(za.north west)+(0em,0ex)$) -- ($(zza.north east)+(0em,0ex)$)
       
  1101                node[midway, above=0.5em]{@{text z}};
       
  1102 
       
  1103     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1104            ($(x.north west)+(0em,3ex)$) -- ($(zza.north east)+(0em,3ex)$)
       
  1105                node[midway, above=0.8em]{@{term "x @ z \<in> A ;; B"}};
       
  1106 
       
  1107     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1108            ($(za.south east)+(0em,0ex)$) -- ($(x.south west)+(0em,0ex)$)
       
  1109                node[midway, below=0.5em]{@{text "x @ z' \<in> A"}};
       
  1110 
       
  1111     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1112            ($(zza.south east)+(0em,0ex)$) -- ($(za.south east)+(0em,0ex)$)
       
  1113                node[midway, below=0.5em]{@{text "(z - z') \<in> B"}};
       
  1114   \end{tikzpicture}}
       
  1115   \end{tabular}
       
  1116   \end{center}
       
  1117   %
       
  1118   \noindent
       
  1119   Either there is a prefix of @{text x} in @{text A} and the rest is in @{text B} (first picture),
       
  1120   or @{text x} and a prefix of @{text "z"} is in @{text A} and the rest in @{text B} (second picture).
       
  1121   In both cases we have to show that @{term "y @ z \<in> A ;; B"}. For this we use the 
       
  1122   following tagging-function
       
  1123   %
       
  1124   \begin{center}
       
  1125   @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B", THEN meta_eq_app]}
       
  1126   \end{center}
       
  1127 
       
  1128   \noindent
       
  1129   with the idea that in the first split we have to make sure that @{text "(x - x') @ z"}
       
  1130   is in the language @{text B}.
       
  1131 
       
  1132   \begin{proof}[@{const SEQ}-Case]
       
  1133   If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
       
  1134   then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
       
  1135   @{term "tag_str_SEQ A B"} is a subset of this product set, and therefore finite.
       
  1136   We have to show injectivity of this tagging-function as
       
  1137   %
       
  1138   \begin{center}
       
  1139   @{term "\<forall>z. tag_str_SEQ A B x = tag_str_SEQ A B y \<and> x @ z \<in> A ;; B \<longrightarrow> y @ z \<in> A ;; B"}
       
  1140   \end{center}
       
  1141   %
       
  1142   \noindent
       
  1143   There are two cases to be considered (see pictures above). First, there exists 
       
  1144   a @{text "x'"} such that
       
  1145   @{text "x' \<in> A"}, @{text "x' \<le> x"} and @{text "(x - x') @ z \<in> B"} hold. We therefore have
       
  1146   %
       
  1147   \begin{center}
       
  1148   @{term "(\<approx>B `` {x - x'}) \<in> ({\<approx>B `` {x - x'} |x'. x' \<le> x \<and> x' \<in> A})"}
       
  1149   \end{center}
       
  1150   %
       
  1151   \noindent
       
  1152   and by the assumption about @{term "tag_str_SEQ A B"} also 
       
  1153   %
       
  1154   \begin{center}
       
  1155   @{term "(\<approx>B `` {x - x'}) \<in> ({\<approx>B `` {y - y'} |y'. y' \<le> y \<and> y' \<in> A})"}
       
  1156   \end{center}
       
  1157   %
       
  1158   \noindent
       
  1159   That means there must be a @{text "y'"} such that @{text "y' \<in> A"} and 
       
  1160   @{term "\<approx>B `` {x - x'} = \<approx>B `` {y - y'}"}. This equality means that
       
  1161   @{term "(x - x') \<approx>B (y - y')"} holds. Unfolding the Myhill-Nerode
       
  1162   relation and together with the fact that @{text "(x - x') @ z \<in> B"}, we
       
  1163   have @{text "(y - y') @ z \<in> B"}. We already know @{text "y' \<in> A"}, therefore
       
  1164   @{term "y @ z \<in> A ;; B"}, as needed in this case.
       
  1165 
       
  1166   Second, there exists a @{text "z'"} such that @{term "x @ z' \<in> A"} and @{text "z - z' \<in> B"}.
       
  1167   By the assumption about @{term "tag_str_SEQ A B"} we have
       
  1168   @{term "\<approx>A `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Which means by the Myhill-Nerode
       
  1169   relation that @{term "y @ z' \<in> A"} holds. Using @{text "z - z' \<in> B"}, we can conclude also in this case
       
  1170   with @{term "y @ z \<in> A ;; B"}. We again can complete the @{const SEQ}-case
       
  1171   by setting @{text A} to @{term "L r\<^isub>1"} and @{text B} to @{term "L r\<^isub>2"}.\qed 
       
  1172   \end{proof}
       
  1173   
       
  1174   \noindent
       
  1175   The case for @{const STAR} is similar to @{const SEQ}, but poses a few extra challenges. When
       
  1176   we analyse the case that @{text "x @ z"} is an element in @{term "A\<star>"} and @{text x} is not the 
       
  1177   empty string, we 
       
  1178   have the following picture:
       
  1179   %
       
  1180   \begin{center}
       
  1181   \scalebox{0.7}{
       
  1182   \begin{tikzpicture}
       
  1183     \node[draw,minimum height=3.8ex] (xa) { $\hspace{4em}@{text "x'\<^isub>m\<^isub>a\<^isub>x"}\hspace{4em}$ };
       
  1184     \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.5em}@{text "x - x'\<^isub>m\<^isub>a\<^isub>x"}\hspace{0.5em}$ };
       
  1185     \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (za) { $\hspace{2em}@{text "z\<^isub>a"}\hspace{2em}$ };
       
  1186     \node[draw,minimum height=3.8ex, right=-0.03em of za] (zb) { $\hspace{7em}@{text "z\<^isub>b"}\hspace{7em}$ };
       
  1187 
       
  1188     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1189            (xa.north west) -- ($(xxa.north east)+(0em,0em)$)
       
  1190                node[midway, above=0.5em]{@{text x}};
       
  1191 
       
  1192     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1193            (za.north west) -- ($(zb.north east)+(0em,0em)$)
       
  1194                node[midway, above=0.5em]{@{text z}};
       
  1195 
       
  1196     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1197            ($(xa.north west)+(0em,3ex)$) -- ($(zb.north east)+(0em,3ex)$)
       
  1198                node[midway, above=0.8em]{@{term "x @ z \<in> A\<star>"}};
       
  1199 
       
  1200     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1201            ($(za.south east)+(0em,0ex)$) -- ($(xxa.south west)+(0em,0ex)$)
       
  1202                node[midway, below=0.5em]{@{text "(x - x'\<^isub>m\<^isub>a\<^isub>x) @ z\<^isub>a \<in> A"}};
       
  1203 
       
  1204     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1205            ($(xa.south east)+(0em,0ex)$) -- ($(xa.south west)+(0em,0ex)$)
       
  1206                node[midway, below=0.5em]{@{term "x'\<^isub>m\<^isub>a\<^isub>x \<in> A\<star>"}};
       
  1207 
       
  1208     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1209            ($(zb.south east)+(0em,0ex)$) -- ($(zb.south west)+(0em,0ex)$)
       
  1210                node[midway, below=0.5em]{@{term "z\<^isub>b \<in> A\<star>"}};
       
  1211 
       
  1212     \draw[decoration={brace,transform={yscale=3}},decorate]
       
  1213            ($(zb.south east)+(0em,-4ex)$) -- ($(xxa.south west)+(0em,-4ex)$)
       
  1214                node[midway, below=0.5em]{@{term "(x - x'\<^isub>m\<^isub>a\<^isub>x) @ z \<in> A\<star>"}};
       
  1215   \end{tikzpicture}}
       
  1216   \end{center}
       
  1217   %
       
  1218   \noindent
       
  1219   We can find a strict prefix @{text "x'"} of @{text x} such that @{term "x' \<in> A\<star>"},
       
  1220   @{text "x' < x"} and the rest @{term "(x - x') @ z \<in> A\<star>"}. For example the empty string 
       
  1221   @{text "[]"} would do.
       
  1222   There are potentially many such prefixes, but there can only be finitely many of them (the 
       
  1223   string @{text x} is finite). Let us therefore choose the longest one and call it 
       
  1224   @{text "x'\<^isub>m\<^isub>a\<^isub>x"}. Now for the rest of the string @{text "(x - x'\<^isub>m\<^isub>a\<^isub>x) @ z"} we
       
  1225   know it is in @{term "A\<star>"}. By definition of @{term "A\<star>"}, we can separate
       
  1226   this string into two parts, say @{text "a"} and @{text "b"}, such that @{text "a \<in> A"}
       
  1227   and @{term "b \<in> A\<star>"}. Now @{text a} must be strictly longer than @{text "x - x'\<^isub>m\<^isub>a\<^isub>x"},
       
  1228   otherwise @{text "x'\<^isub>m\<^isub>a\<^isub>x"} is not the longest prefix. That means @{text a}
       
  1229   `overlaps' with @{text z}, splitting it into two components @{text "z\<^isub>a"} and
       
  1230    @{text "z\<^isub>b"}. For this we know that @{text "(x - x'\<^isub>m\<^isub>a\<^isub>x) @ z\<^isub>a \<in> A"} and
       
  1231   @{term "z\<^isub>b \<in> A\<star>"}. To cut a story short, we have divided @{term "x @ z \<in> A\<star>"}
       
  1232   such that we have a string @{text a} with @{text "a \<in> A"} that lies just on the
       
  1233   `border' of @{text x} and @{text z}. This string is @{text "(x - x'\<^isub>m\<^isub>a\<^isub>x) @ z\<^isub>a"}.
       
  1234 
       
  1235   In order to show that @{term "x @ z \<in> A\<star>"} implies @{term "y @ z \<in> A\<star>"}, we use
       
  1236   the following tagging-function:
       
  1237   %
       
  1238   \begin{center}
       
  1239   @{thm tag_str_STAR_def[where ?L1.0="A", THEN meta_eq_app]}\smallskip
       
  1240   \end{center}
       
  1241 
       
  1242   \begin{proof}[@{const STAR}-Case]
       
  1243   If @{term "finite (UNIV // \<approx>A)"} 
       
  1244   then @{term "finite (Pow (UNIV // \<approx>A))"} holds. The range of
       
  1245   @{term "tag_str_STAR A"} is a subset of this set, and therefore finite.
       
  1246   Again we have to show injectivity of this tagging-function as  
       
  1247   %
       
  1248   \begin{center}
       
  1249   @{term "\<forall>z. tag_str_STAR A x = tag_str_STAR A y \<and> x @ z \<in> A\<star> \<longrightarrow> y @ z \<in> A\<star>"}
       
  1250   \end{center}  
       
  1251   %
       
  1252   \noindent
       
  1253   We first need to consider the case that @{text x} is the empty string.
       
  1254   From the assumption we can infer @{text y} is the empty string and
       
  1255   clearly have @{term "y @ z \<in> A\<star>"}. In case @{text x} is not the empty
       
  1256   string, we can divide the string @{text "x @ z"} as shown in the picture 
       
  1257   above. By the tagging-function we have
       
  1258   %
       
  1259   \begin{center}
       
  1260   @{term "\<approx>A `` {(x - x'\<^isub>m\<^isub>a\<^isub>x)} \<in> ({\<approx>A `` {x - x'} |x'. x' < x \<and> x' \<in> A\<star>})"}
       
  1261   \end{center}
       
  1262   %
       
  1263   \noindent
       
  1264   which by assumption is equal to
       
  1265   %
       
  1266   \begin{center}
       
  1267   @{term "\<approx>A `` {(x - x'\<^isub>m\<^isub>a\<^isub>x)} \<in> ({\<approx>A `` {y - y'} |y'. y' < y \<and> y' \<in> A\<star>})"}
       
  1268   \end{center}
       
  1269   %
       
  1270   \noindent 
       
  1271   and we know that we have a @{term "y' \<in> A\<star>"} and @{text "y' < y"}
       
  1272   and also know @{term "(x - x'\<^isub>m\<^isub>a\<^isub>x) \<approx>A (y - y')"}. Unfolding the Myhill-Nerode
       
  1273   relation we know @{term "(y - y') @ z\<^isub>a \<in> A"}. We also know that @{term "z\<^isub>b \<in> A\<star>"}.
       
  1274   Therefore @{term "y' @ ((y - y') @ z\<^isub>a) @ z\<^isub>b \<in> A\<star>"}, which means
       
  1275   @{term "y @ z \<in> A\<star>"}. As the last step we have to set @{text "A"} to @{term "L r"} and
       
  1276   complete the proof.\qed
       
  1277   \end{proof}
       
  1278 *}
       
  1279 
       
  1280 section {* Second Part based on Partial Derivatives *}
       
  1281 
       
  1282 text {*
       
  1283    We briefly considered using the method Brzozowski presented in the
       
  1284   Appendix of~\cite{Brzozowski64} in order to prove the second
       
  1285   direction of the Myhill-Nerode theorem. There he calculates the
       
  1286   derivatives for regular expressions and shows that for every
       
  1287   language there can be only finitely many of them %derivations (if
       
  1288   regarded equal modulo ACI). We could have used as tagging-function
       
  1289   the set of derivatives of a regular expression with respect to a
       
  1290   language.  Using the fact that two strings are Myhill-Nerode related
       
  1291   whenever their derivative is the same, together with the fact that
       
  1292   there are only finitely such derivatives would give us a similar
       
  1293   argument as ours. However it seems not so easy to calculate the set
       
  1294   of derivatives modulo ACI. Therefore we preferred our direct method
       
  1295   of using tagging-functions. 
       
  1296 
       
  1297 *}
       
  1298 
       
  1299 section {* Closure Properties *}
       
  1300 
       
  1301 
       
  1302 section {* Conclusion and Related Work *}
       
  1303 
       
  1304 text {*
       
  1305   In this paper we took the view that a regular language is one where there
       
  1306   exists a regular expression that matches all of its strings. Regular
       
  1307   expressions can conveniently be defined as a datatype in HOL-based theorem
       
  1308   provers. For us it was therefore interesting to find out how far we can push
       
  1309   this point of view. We have established in Isabelle/HOL both directions 
       
  1310   of the Myhill-Nerode theorem.
       
  1311   %
       
  1312   \begin{theorem}[The Myhill-Nerode Theorem]\mbox{}\\
       
  1313   A language @{text A} is regular if and only if @{thm (rhs) Myhill_Nerode}.
       
  1314   \end{theorem}
       
  1315   %
       
  1316   \noindent
       
  1317   Having formalised this theorem means we
       
  1318   pushed our point of view quite far. Using this theorem we can obviously prove when a language
       
  1319   is \emph{not} regular---by establishing that it has infinitely many
       
  1320   equivalence classes generated by the Myhill-Nerode relation (this is usually
       
  1321   the purpose of the pumping lemma \cite{Kozen97}).  We can also use it to
       
  1322   establish the standard textbook results about closure properties of regular
       
  1323   languages. Interesting is the case of closure under complement, because
       
  1324   it seems difficult to construct a regular expression for the complement
       
  1325   language by direct means. However the existence of such a regular expression
       
  1326   can be easily proved using the Myhill-Nerode theorem since 
       
  1327   %
       
  1328   \begin{center}
       
  1329   @{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"}
       
  1330   \end{center}
       
  1331   %
       
  1332   \noindent
       
  1333   holds for any strings @{text "s\<^isub>1"} and @{text
       
  1334   "s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"} give rise to the same
       
  1335   partitions.  Proving the existence of such a regular expression via automata 
       
  1336   using the standard method would 
       
  1337   be quite involved. It includes the
       
  1338   steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
       
  1339   "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
       
  1340   regular expression.
       
  1341 
       
  1342   While regular expressions are convenient in formalisations, they have some
       
  1343   limitations. One is that there seems to be no method of calculating a
       
  1344   minimal regular expression (for example in terms of length) for a regular
       
  1345   language, like there is
       
  1346   for automata. On the other hand, efficient regular expression matching,
       
  1347   without using automata, poses no problem \cite{OwensReppyTuron09}.
       
  1348   For an implementation of a simple regular expression matcher,
       
  1349   whose correctness has been formally established, we refer the reader to
       
  1350   Owens and Slind \cite{OwensSlind08}.
       
  1351 
       
  1352 
       
  1353   Our formalisation consists of 780 lines of Isabelle/Isar code for the first
       
  1354   direction and 460 for the second, plus around 300 lines of standard material about
       
  1355   regular languages. While this might be seen as too large to count as a
       
  1356   concise proof pearl, this should be seen in the context of the work done by
       
  1357   Constable at al \cite{Constable00} who formalised the Myhill-Nerode theorem
       
  1358   in Nuprl using automata. They write that their four-member team needed
       
  1359   something on the magnitude of 18 months for their formalisation. The
       
  1360   estimate for our formalisation is that we needed approximately 3 months and
       
  1361   this included the time to find our proof arguments. Unlike Constable et al,
       
  1362   who were able to follow the proofs from \cite{HopcroftUllman69}, we had to
       
  1363   find our own arguments.  So for us the formalisation was not the
       
  1364   bottleneck. It is hard to gauge the size of a formalisation in Nurpl, but
       
  1365   from what is shown in the Nuprl Math Library about their development it
       
  1366   seems substantially larger than ours. The code of ours can be found in the
       
  1367   Mercurial Repository at
       
  1368   \mbox{\url{http://www4.in.tum.de/~urbanc/regexp.html}}.
       
  1369 
       
  1370 
       
  1371   Our proof of the first direction is very much inspired by \emph{Brzozowski's
       
  1372   algebraic method} used to convert a finite automaton to a regular
       
  1373   expression \cite{Brzozowski64}. The close connection can be seen by considering the equivalence
       
  1374   classes as the states of the minimal automaton for the regular language.
       
  1375   However there are some subtle differences. Since we identify equivalence
       
  1376   classes with the states of the automaton, then the most natural choice is to
       
  1377   characterise each state with the set of strings starting from the initial
       
  1378   state leading up to that state. Usually, however, the states are characterised as the
       
  1379   strings starting from that state leading to the terminal states.  The first
       
  1380   choice has consequences about how the initial equational system is set up. We have
       
  1381   the $\lambda$-term on our `initial state', while Brzozowski has it on the
       
  1382   terminal states. This means we also need to reverse the direction of Arden's
       
  1383   Lemma.
       
  1384 
       
  1385   This is also where our method shines, because we can completely
       
  1386   side-step the standard argument \cite{Kozen97} where automata need
       
  1387   to be composed, which as stated in the Introduction is not so easy
       
  1388   to formalise in a HOL-based theorem prover. However, it is also the
       
  1389   direction where we had to spend most of the `conceptual' time, as
       
  1390   our proof-argument based on tagging-functions is new for
       
  1391   establishing the Myhill-Nerode theorem. All standard proofs of this
       
  1392   direction proceed by arguments over automata.\medskip
       
  1393   
       
  1394   \noindent
       
  1395   {\bf Acknowledgements:} We are grateful for the comments we received from Larry
       
  1396   Paulson.
       
  1397 
       
  1398 *}
       
  1399 
       
  1400 
       
  1401 (*<*)
       
  1402 end
       
  1403 (*>*)