31
b6815473ee2e
1. Myhill.thy is proved to correct some typos. 2. Some sections are added to the first direction. 3. The small theory of list difference is now separated into Prefix_subtract.thy and the ROOT.ML
zhang
diff
changeset
+ − 1
theory Myhill
62
d94209ad2880
Copy Myhill.thy into Myhill_2.thyand put Myhill_2.thy into logic ListP. Myhill.thy is now almost empty ready to be extended by new manuscripts.
zhang
diff
changeset
+ − 2
imports Myhill_2
31
b6815473ee2e
1. Myhill.thy is proved to correct some typos. 2. Some sections are added to the first direction. 3. The small theory of list difference is now separated into Prefix_subtract.thy and the ROOT.ML
zhang
diff
changeset
+ − 3
begin
b6815473ee2e
1. Myhill.thy is proved to correct some typos. 2. Some sections are added to the first direction. 3. The small theory of list difference is now separated into Prefix_subtract.thy and the ROOT.ML
zhang
diff
changeset
+ − 4
99
+ − 5
section {* Preliminaries \label{sec_prelim}*}
73
+ − 6
+ − 7
subsection {* Finite automata and \mht \label{sec_fa_mh} *}
31
b6815473ee2e
1. Myhill.thy is proved to correct some typos. 2. Some sections are added to the first direction. 3. The small theory of list difference is now separated into Prefix_subtract.thy and the ROOT.ML
zhang
diff
changeset
+ − 8
b6815473ee2e
1. Myhill.thy is proved to correct some typos. 2. Some sections are added to the first direction. 3. The small theory of list difference is now separated into Prefix_subtract.thy and the ROOT.ML
zhang
diff
changeset
+ − 9
text {*
64
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 10
69
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 11
A {\em determinisitc finite automata (DFA)} $M$ is a 5-tuple
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 12
$(Q, \Sigma, \delta, s, F)$, where:
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 13
\begin{enumerate}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 14
\item $Q$ is a finite set of {\em states}, also denoted $Q_M$.
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 15
\item $\Sigma$ is a finite set of {\em alphabets}, also denoted $\Sigma_M$.
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 16
\item $\delta$ is a {\em transition function} of type @{text "Q \<times> \<Sigma> \<Rightarrow> Q"} (a total function),
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 17
also denoted $\delta_M$.
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 18
\item @{text "s \<in> Q"} is a state called {\em initial state}, also denoted $s_M$.
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 19
\item @{text "F \<subseteq> Q"} is a set of states named {\em accepting states}, also denoted $F_M$.
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 20
\end{enumerate}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 21
Therefore, we have $M = (Q_M, \Sigma_M, \delta_M, s_M, F_M)$. Every DFA $M$ can be interpreted as
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 22
a function assigning states to strings, denoted $\dfa{M}$, the definition of which is as the following:
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 23
\begin{equation}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 24
\begin{aligned}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 25
\dfa{M}([]) &\equiv s_M \\
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 26
\dfa{M}(xa) &\equiv \delta_M(\dfa{M}(x), a)
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 27
\end{aligned}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 28
\end{equation}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 29
A string @{text "x"} is said to be {\em accepted} (or {\em recognized}) by a DFA $M$ if
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 30
$\dfa{M}(x) \in F_M$. The language recoginzed by DFA $M$, denoted
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 31
$L(M)$, is defined as:
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 32
\begin{equation}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 33
L(M) \equiv \{x~|~\dfa{M}(x) \in F_M\}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 34
\end{equation}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 35
The standard way of specifying a laugage $\Lang$ as {\em regular} is by stipulating that:
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 36
$\Lang = L(M)$ for some DFA $M$.
64
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 37
69
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 38
For any DFA $M$, the DFA obtained by changing initial state to another $p \in Q_M$ is denoted $M_p$,
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 39
which is defined as:
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 40
\begin{equation}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 41
M_p \ \equiv\ (Q_M, \Sigma_M, \delta_M, p, F_M)
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 42
\end{equation}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 43
Two states $p, q \in Q_M$ are said to be {\em equivalent}, denoted $p \approx_M q$, iff.
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 44
\begin{equation}\label{m_eq_def}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 45
L(M_p) = L(M_q)
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 46
\end{equation}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 47
It is obvious that $\approx_M$ is an equivalent relation over $Q_M$. and
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 48
the partition induced by $\approx_M$ has $|Q_M|$ equivalent classes.
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 49
By overloading $\approx_M$, an equivalent relation over strings can be defined:
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 50
\begin{equation}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 51
x \approx_M y ~ ~ \equiv ~ ~ \dfa{M}(x) \approx_M \dfa{M}(y)
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 52
\end{equation}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 53
It can be proved that the the partition induced by $\approx_M$ also has $|Q_M|$ equivalent classes.
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 54
It is also easy to show that: if $x \approx_M y$, then $x \approx_{L(M)} y$, and this means
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 55
$\approx_M$ is a more refined equivalent relation than $\approx_{L(M)}$. Since partition induced by
73
+ − 56
$\approx_M$ is finite, the one induced by $\approx_{L(M)}$ must also be finite, and this is
+ − 57
one of the two directions of \mht:
+ − 58
\begin{Lem}[\mht , Direction two]
69
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 59
If a language $\Lang$ is regular (i.e. $\Lang = L(M)$ for some DFA $M$), then
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 60
the partition induced by $\approx_\Lang$ is finite.
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 61
\end{Lem}
68
+ − 62
69
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 63
The other direction is:
73
+ − 64
\begin{Lem}[\mht , Direction one]\label{auto_mh_d1}
69
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 65
If the partition induced by $\approx_\Lang$ is finite, then
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 66
$\Lang$ is regular (i.e. $\Lang = L(M)$ for some DFA $M$).
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 67
\end{Lem}
73
+ − 68
The $M$ we are seeking when prove lemma \ref{auto_mh_d2} can be constructed out of $\approx_\Lang$,
+ − 69
denoted $M_\Lang$ and defined as the following:
69
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 70
\begin{subequations}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 71
\begin{eqnarray}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 72
Q_{M_\Lang} ~ & \equiv & ~ \{ \cls{x}{\approx_\Lang}~|~ x \in \Sigma^* \}\\
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 73
\Sigma_{M_\Lang} ~ & \equiv & ~ \Sigma_M \\
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 74
\delta_{M_\Lang} ~ & \equiv & ~ \left (\lambda (\cls{x}{\approx_\Lang}, a). \cls{xa}{\approx_\Lang} \right) \\
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 75
s_{M_\Lang} ~ & \equiv & ~ \cls{[]}{\approx_\Lang} \\
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 76
F_{M_\Lang} ~ & \equiv & ~ \{ \cls{x}{\approx_\Lang}~|~ x \in \Lang \}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 77
\end{eqnarray}
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 78
\end{subequations}
73
+ − 79
It can be proved that $Q_{M_\Lang}$ is indeed finite and $\Lang = L(M_\Lang)$, so lemma \ref{auto_mh_d1} holds.
+ − 80
It can also be proved that $M_\Lang$ is the minimal DFA (therefore unique) which recoginzes $\Lang$.
+ − 81
+ − 82
68
+ − 83
69
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 84
*}
68
+ − 85
73
+ − 86
subsection {* The objective and the underlying intuition *}
+ − 87
69
ecf6c61a4541
A formal presentation of automaton and Myhill-Nerrode theorem added to serve as a basis for further discussion
zhang
diff
changeset
+ − 88
text {*
73
+ − 89
It is now obvious from section \ref{sec_fa_mh} that \mht\ can be established easily when
+ − 90
{\em reglar languages} are defined as ones recognized by finite automata.
+ − 91
Under the context where the use of finite automata is forbiden, the situation is quite different.
+ − 92
The theorem now has to be expressed as:
+ − 93
\begin{Thm}[\mht , Regular expression version]
+ − 94
A language $\Lang$ is regular (i.e. $\Lang = L(\re)$ for some {\em regular expression} $\re$)
+ − 95
iff. the partition induced by $\approx_\Lang$ is finite.
+ − 96
\end{Thm}
+ − 97
The proof of this version consists of two directions (if the use of automata are not allowed):
+ − 98
\begin{description}
+ − 99
\item [Direction one:]
+ − 100
generating a regular expression $\re$ out of the finite partition induced by $\approx_\Lang$,
+ − 101
such that $\Lang = L(\re)$.
+ − 102
\item [Direction two:]
+ − 103
showing the finiteness of the partition induced by $\approx_\Lang$, under the assmption
+ − 104
that $\Lang$ is recognized by some regular expression $\re$ (i.e. $\Lang = L(\re)$).
+ − 105
\end{description}
+ − 106
The development of these two directions consititutes the body of this paper.
64
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 107
73
+ − 108
*}
+ − 109
+ − 110
section {* Direction @{text "regular language \<Rightarrow>finite partition"} *}
+ − 111
+ − 112
text {*
+ − 113
Although not used explicitly, the notion of finite autotmata and its relationship with
+ − 114
language partition, as outlined in section \ref{sec_fa_mh}, still servers as important intuitive
+ − 115
guides in the development of this paper.
+ − 116
For example, {\em Direction one} follows the {\em Brzozowski algebraic method}
+ − 117
used to convert finite autotmata to regular expressions, under the intuition that every
+ − 118
partition member $\cls{x}{\approx_\Lang}$ is a state in the DFA $M_\Lang$ constructed to prove
+ − 119
lemma \ref{auto_mh_d1} of section \ref{sec_fa_mh}.
+ − 120
78
+ − 121
The basic idea of Brzozowski method is to extract an equational system out of the
+ − 122
transition relationship of the automaton in question. In the equational system, every
+ − 123
automaton state is represented by an unknown, the solution of which is expected to be
+ − 124
a regular expresion characterizing the state in a certain sense. There are two choices of
+ − 125
how a automaton state can be characterized. The first is to characterize by the set of
+ − 126
strings leading from the state in question into accepting states.
+ − 127
The other choice is to characterize by the set of strings leading from initial state
+ − 128
into the state in question. For the second choice, the language recognized the automaton
+ − 129
can be characterized by the solution of initial state, while for the second choice,
+ − 130
the language recoginzed by the automaton can be characterized by
+ − 131
combining solutions of all accepting states by @{text "+"}. Because of the automaton
+ − 132
used as our intuitive guide, the $M_\Lang$, the states of which are
+ − 133
sets of strings leading from initial state, the second choice is used in this paper.
73
+ − 134
78
+ − 135
Supposing the automaton in Fig \ref{fig_auto_part_rel} is the $M_\Lang$ for some language $\Lang$,
+ − 136
and suppose $\Sigma = \{a, b, c, d, e\}$. Under the second choice, the equational system extracted is:
+ − 137
\begin{subequations}
+ − 138
\begin{eqnarray}
+ − 139
X_0 & = & X_1 \cdot c + X_2 \cdot d + \lambda \label{x_0_def_o} \\
+ − 140
X_1 & = & X_0 \cdot a + X_1 \cdot b + X_2 \cdot d \label{x_1_def_o} \\
+ − 141
X_2 & = & X_0 \cdot b + X_1 \cdot d + X_2 \cdot a \\
+ − 142
X_3 & = & \begin{aligned}
+ − 143
& X_0 \cdot (c + d + e) + X_1 \cdot (a + e) + X_2 \cdot (b + e) + \\
+ − 144
& X_3 \cdot (a + b + c + d + e)
+ − 145
\end{aligned}
+ − 146
\end{eqnarray}
+ − 147
\end{subequations}
+ − 148
64
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 149
\begin{figure}[h!]
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 150
\centering
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 151
\begin{minipage}{0.5\textwidth}
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 152
\scalebox{0.8}{
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 153
\begin{tikzpicture}[ultra thick,>=latex]
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 154
\node[draw,circle,initial] (start) {$X_0$};
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 155
\node[draw,circle,accepting] at ($(start) + (3.5cm,2cm)$) (ac1) {$X_1$};
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 156
\node[draw,circle,accepting] at ($(start) + (3.5cm,-2cm)$) (ac2) {$X_2$};
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 157
\node[draw,circle] at ($(start) + (6.5cm,0cm)$) (ab) {$X_3$};
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 158
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 159
\path[->] (start) edge [bend left] node [midway, above] {$a$} (ac1);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 160
\path[->] (start) edge [bend right] node [midway, below] {$b$} (ac2);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 161
\path[->] (ac1) edge [loop above] node [midway, above] {$b$} (ac1);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 162
\path[->] (ac2) edge [loop below] node [midway, below] {$a$} (ac2);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 163
\path[->] (ac1) edge [bend right] node [midway, left] {$c$} (ac2);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 164
\path[->] (ac2) edge [bend right] node [midway, right] {$c$} (ac1);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 165
\path[->] (ac1) edge node [midway, sloped, above] {$d$} (start);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 166
\path[->] (ac2) edge node [midway, sloped, above] {$d$} (start);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 167
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 168
\path [draw, rounded corners,->,dashed] (start) -- ($(start) + (0cm, 3.7cm)$)
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 169
-- ($(ab) + (0cm, 3.7cm)$) node[midway,above,sloped]{$\Sigma - \{a, b\}$} -- (ab);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 170
\path[->,dashed] (ac1) edge node [midway, above, sloped] {$\Sigma - \{b,c,d\}$} (ab);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 171
\path[->,dashed] (ac2) edge node [midway, below, sloped] {$\Sigma - \{a,c,d\}$} (ab);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 172
\path[->,dashed] (ab) edge [loop right] node [midway, right] {$\Sigma$} (ab);
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 173
\end{tikzpicture}}
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 174
\end{minipage}
78
+ − 175
\caption{An example automaton}\label{fig_auto_part_rel}
64
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 176
\end{figure}
b69d4e04e64a
Added something to Myhill.thy, trying to explain the relationship between finite automata and language partition.
zhang
diff
changeset
+ − 177
78
+ − 178
Every $\cdot$-item on the right side of equations describes some state transtions, except
+ − 179
the $\lambda$ in \eqref{x_0_def_o}, which represents empty string @{text "[]"}.
+ − 180
The reason is that: every state is characterized by the
+ − 181
set of incoming strings leading from initial state. For non-initial state, every such
+ − 182
string can be splitted into a prefix leading into a preceding state and a single character suffix
+ − 183
transiting into from the preceding state. The exception happens at
+ − 184
initial state, where the empty string is a incoming string which can not be splitted. The $\lambda$
+ − 185
in \eqref{x_0_def_o} is introduce to repsent this indivisible string. There is one and only one
+ − 186
$\lambda$ in every equational system such obtained, becasue $[]$ can only be contaied in one
+ − 187
equivalent class (the intial state in $M_\Lang$) and equivalent classes are disjoint.
+ − 188
+ − 189
Suppose all unknowns ($X_0, X_1, X_2, X_3$) are solvable, the regular expression charactering
+ − 190
laugnage $\Lang$ is $X_1 + X_2$. This paper gives a procedure
+ − 191
by which arbitrarily picked unknown can be solved. The basic idea to solve $X_i$ is by
+ − 192
eliminating all variables other than $X_i$ from the equational system. If
+ − 193
$X_0$ is the one picked to be solved, variables $X_1, X_2, X_3$ have to be removed one by
+ − 194
one. The order to remove does not matter as long as the remaing equations are kept valid.
+ − 195
Suppose $X_1$ is the first one to remove, the action is to replace all occurences of $X_1$
+ − 196
in remaining equations by the right hand side of its characterizing equation, i.e.
+ − 197
the $ X_0 \cdot a + X_1 \cdot b + X_2 \cdot d$ in \eqref{x_1_def_o}. However, because
+ − 198
of the recursive occurence of $X_1$, this replacement does not really removed $X_1$. Arden's
+ − 199
lemma is invoked to transform recursive equations like \eqref{x_1_def_o} into non-recursive
+ − 200
ones. For example, the recursive equation \eqref{x_1_def_o} is transformed into the follwing
+ − 201
non-recursive one:
+ − 202
\begin{equation}
99
+ − 203
X_1 = (X_0 \cdot a + X_2 \cdot d) \cdot b^* = X_0 \cdot (a \cdot b^*) + X_2 \cdot (d \cdot b^*)
78
+ − 204
\end{equation}
99
+ − 205
which, by Arden's lemma, still characterizes $X_1$ correctly. By subsituting
+ − 206
$(X_0 \cdot a + X_2 \cdot d) \cdot b^*$ for all $X_1$ and removing \eqref{x_1_def_o},
78
+ − 207
we get:
+ − 208
\begin{subequations}
+ − 209
\begin{eqnarray}
99
+ − 210
X_0 & = & \begin{aligned}
+ − 211
& (X_0 \cdot (a \cdot b^*) + X_2 \cdot (d \cdot b^*)) \cdot c +
+ − 212
X_2 \cdot d + \lambda = \\
+ − 213
& X_0 \cdot (a \cdot b^* \cdot c) + X_2 \cdot (d \cdot b^* \cdot c) +
+ − 214
X_2 \cdot d + \lambda = \\
+ − 215
& X_0 \cdot (a \cdot b^* \cdot c) + X_2 \cdot (d \cdot b^* \cdot c + d) + \lambda
+ − 216
\end{aligned} \label{x_0_def_1} \\
+ − 217
X_2 & = & \begin{aligned}
+ − 218
& X_0 \cdot b + (X_0 \cdot (a \cdot b^*) + X_2 \cdot (d \cdot b^*)) \cdot d + X_2 \cdot a = \\
+ − 219
& X_0 \cdot b + X_0 \cdot (a \cdot b^* \cdot d) + X_2 \cdot (d \cdot b^* \cdot d) + X_2 \cdot a = \\
+ − 220
& X_0 \cdot (b + a \cdot b^* \cdot d) + X_2 \cdot (d \cdot b^* \cdot d + a)
+ − 221
\end{aligned} \\
78
+ − 222
X_3 & = & \begin{aligned}
+ − 223
& X_0 \cdot (c + d + e) + ((X_0 \cdot a + X_2 \cdot d) \cdot b^*) \cdot (a + e)\\
99
+ − 224
& + X_2 \cdot (b + e) + X_3 \cdot (a + b + c + d + e) \label{x_3_def_1}
78
+ − 225
\end{aligned}
+ − 226
\end{eqnarray}
+ − 227
\end{subequations}
99
+ − 228
Suppose $X_3$ is the one to remove next, since $X_3$ dose not appear in either $X_0$ or $X_2$,
+ − 229
the removal of equation \eqref{x_3_def_1} changes nothing in the rest equations. Therefore, we get:
+ − 230
\begin{subequations}
+ − 231
\begin{eqnarray}
+ − 232
X_0 & = & X_0 \cdot (a \cdot b^* \cdot c) + X_2 \cdot (d \cdot b^* \cdot c + d) + \lambda \label{x_0_def_2} \\
+ − 233
X_2 & = & X_0 \cdot (b + a \cdot b^* \cdot d) + X_2 \cdot (d \cdot b^* \cdot d + a) \label{x_2_def_2}
+ − 234
\end{eqnarray}
+ − 235
\end{subequations}
+ − 236
Actually, since absorbing state like $X_3$ contributes nothing to the language $\Lang$, it could have been removed
+ − 237
at the very beginning of this precedure without affecting the final result. Now, the last unknown to remove
+ − 238
is $X_2$ and the Arden's transformaton of \eqref{x_2_def_2} is:
+ − 239
\begin{equation} \label{x_2_ardened}
+ − 240
X_2 ~ = ~ (X_0 \cdot (b + a \cdot b^* \cdot d)) \cdot (d \cdot b^* \cdot d + a)^* =
+ − 241
X_0 \cdot ((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*)
+ − 242
\end{equation}
+ − 243
By substituting the right hand side of \eqref{x_2_ardened} into \eqref{x_0_def_2}, we get:
+ − 244
\begin{equation}
+ − 245
\begin{aligned}
+ − 246
X_0 & = && X_0 \cdot (a \cdot b^* \cdot c) + \\
+ − 247
& && X_0 \cdot ((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*) \cdot
+ − 248
(d \cdot b^* \cdot c + d) + \lambda \\
+ − 249
& = && X_0 \cdot ((a \cdot b^* \cdot c) + \\
+ − 250
& && \hspace{3em} ((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*) \cdot
+ − 251
(d \cdot b^* \cdot c + d)) + \lambda
+ − 252
\end{aligned}
+ − 253
\end{equation}
+ − 254
By applying Arden's transformation to this, we get the solution of $X_0$ as:
+ − 255
\begin{equation}
+ − 256
\begin{aligned}
+ − 257
X_0 = ((a \cdot b^* \cdot c) +
+ − 258
((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*) \cdot
+ − 259
(d \cdot b^* \cdot c + d))^*
+ − 260
\end{aligned}
+ − 261
\end{equation}
+ − 262
Using the same method, solutions for $X_1$ and $X_2$ can be obtained as well and the
+ − 263
regular expressoin for $\Lang$ is just $X_1 + X_2$. The formalization of this procedure
+ − 264
consititues the first direction of the {\em regular expression} verion of
+ − 265
\mht. Detailed explaination are given in {\bf paper.pdf} and more details and comments
+ − 266
can be found in the formal scripts.
78
+ − 267
*}
73
+ − 268
99
+ − 269
section {* Direction @{text "finite partition \<Rightarrow> regular language"} *}
+ − 270
+ − 271
text {*
+ − 272
It is well known in the theory of regular languages that
+ − 273
the existence of finite language partition amounts to the existence of
+ − 274
a minimal automaton, i.e. the $M_\Lang$ constructed in section \ref{sec_prelim}, which recoginzes the
+ − 275
same language $\Lang$. The standard way to prove the existence of finite language partition
+ − 276
is to construct a automaton out of the regular expression which recoginzes the same language, from
+ − 277
which the existence of finite language partition follows immediately. As discussed in the introducton of
+ − 278
{\bf paper.pdf} as well as in [5], the problem for this approach happens when automata
+ − 279
of sub regular expressions are combined to form the automaton of the mother regular expression,
+ − 280
no matter what kind of representation is used, the formalization is cubersome, as said
+ − 281
by Nipkow in [5]: `{\em a more abstract mathod is clearly desirable}'. In this section,
+ − 282
an {\em intrinsically abstract} method is given, which completely avoid the mentioning of automata,
+ − 283
let along any particular representations.
+ − 284
*}
+ − 285
+ − 286
text {*
+ − 287
The main proof structure is a structural induction on regular expressions,
+ − 288
where base cases (cases for @{const "NULL"}, @{const "EMPTY"}, @{const "CHAR"}) are quite straightforward to
+ − 289
proof. Real difficulty lies in inductive cases. By inductive hypothesis, languages defined by
+ − 290
sub-expressions induce finite partitiions. Under such hypothsis, we need to prove that the language
+ − 291
defined by the composite regular expression gives rise to finite partion.
+ − 292
The basic idea is to attach a tag @{text "tag(x)"} to every string
+ − 293
@{text "x"}. The tagging fuction @{text "tag"} is carefully devised, which returns tags
+ − 294
made of equivalent classes of the partitions induced by subexpressoins, and therefore has a finite
+ − 295
range. Let @{text "Lang"} be the composite language, it is proved that:
+ − 296
\begin{quote}
+ − 297
If strings with the same tag are equivalent with respect to @{text "Lang"}, expressed as:
+ − 298
\[
+ − 299
@{text "tag(x) = tag(y) \<Longrightarrow> x \<approx>Lang y"}
+ − 300
\]
+ − 301
then the partition induced by @{text "Lang"} must be finite.
+ − 302
\end{quote}
+ − 303
There are two arguments for this. The first goes as the following:
+ − 304
\begin{enumerate}
+ − 305
\item First, the tagging function @{text "tag"} induces an equivalent relation @{text "(=tag=)"}
+ − 306
(defiintion of @{text "f_eq_rel"} and lemma @{text "equiv_f_eq_rel"}).
+ − 307
\item It is shown that: if the range of @{text "tag"} (denoted @{text "range(tag)"}) is finite,
+ − 308
the partition given rise by @{text "(=tag=)"} is finite (lemma @{text "finite_eq_f_rel"}).
+ − 309
Since tags are made from equivalent classes from component partitions, and the inductive
+ − 310
hypothesis ensures the finiteness of these partitions, it is not difficult to prove
+ − 311
the finiteness of @{text "range(tag)"}.
+ − 312
\item It is proved that if equivalent relation @{text "R1"} is more refined than @{text "R2"}
+ − 313
(expressed as @{text "R1 \<subseteq> R2"}),
+ − 314
and the partition induced by @{text "R1"} is finite, then the partition induced by @{text "R2"}
+ − 315
is finite as well (lemma @{text "refined_partition_finite"}).
+ − 316
\item The injectivity assumption @{text "tag(x) = tag(y) \<Longrightarrow> x \<approx>Lang y"} implies that
+ − 317
@{text "(=tag=)"} is more refined than @{text "(\<approx>Lang)"}.
+ − 318
\item Combining the points above, we have: the partition induced by language @{text "Lang"}
+ − 319
is finite (lemma @{text "tag_finite_imageD"}).
+ − 320
\end{enumerate}
+ − 321
+ − 322
We could have followed another approach given in appendix II of Brzozowski's paper [?], where
+ − 323
the set of derivatives of any regular expression can be proved to be finite.
+ − 324
Since it is easy to show that strings with same derivative are equivalent with respect to the
+ − 325
language, then the second direction follows. We believe that our
+ − 326
apporoach is easy to formalize, with no need to do complicated derivation calculations
+ − 327
and countings as in [???].
+ − 328
*}
+ − 329
+ − 330
31
b6815473ee2e
1. Myhill.thy is proved to correct some typos. 2. Some sections are added to the first direction. 3. The small theory of list difference is now separated into Prefix_subtract.thy and the ROOT.ML
zhang
diff
changeset
+ − 331
end