198 of the recursive occurence of $X_1$, this replacement does not really removed $X_1$. Arden's |
198 of the recursive occurence of $X_1$, this replacement does not really removed $X_1$. Arden's |
199 lemma is invoked to transform recursive equations like \eqref{x_1_def_o} into non-recursive |
199 lemma is invoked to transform recursive equations like \eqref{x_1_def_o} into non-recursive |
200 ones. For example, the recursive equation \eqref{x_1_def_o} is transformed into the follwing |
200 ones. For example, the recursive equation \eqref{x_1_def_o} is transformed into the follwing |
201 non-recursive one: |
201 non-recursive one: |
202 \begin{equation} |
202 \begin{equation} |
203 X_1 = (X_0 \cdot a + X_2 \cdot d) \cdot b^* |
203 X_1 = (X_0 \cdot a + X_2 \cdot d) \cdot b^* = X_0 \cdot (a \cdot b^*) + X_2 \cdot (d \cdot b^*) |
204 \end{equation} |
204 \end{equation} |
205 which, by Arden's lemma, still characterizes $X_1$ correctly. By subsitute |
205 which, by Arden's lemma, still characterizes $X_1$ correctly. By subsituting |
206 $(X_0 \cdot a + X_2 \cdot d) \cdot b^*$ for all $X_1$ and remove \eqref{x_1_def_o}, |
206 $(X_0 \cdot a + X_2 \cdot d) \cdot b^*$ for all $X_1$ and removing \eqref{x_1_def_o}, |
207 we get: |
207 we get: |
208 |
|
209 \begin{subequations} |
208 \begin{subequations} |
210 \begin{eqnarray} |
209 \begin{eqnarray} |
211 X_0 & = & ((X_0 \cdot a + X_2 \cdot d) \cdot b^*) \cdot c + X_2 \cdot d + \lambda \label{x_0_def_1} \\ |
210 X_0 & = & \begin{aligned} |
212 X_2 & = & X_0 \cdot b + ((X_0 \cdot a + X_2 \cdot d) \cdot b^*) \cdot d + X_2 \cdot a \\ |
211 & (X_0 \cdot (a \cdot b^*) + X_2 \cdot (d \cdot b^*)) \cdot c + |
|
212 X_2 \cdot d + \lambda = \\ |
|
213 & X_0 \cdot (a \cdot b^* \cdot c) + X_2 \cdot (d \cdot b^* \cdot c) + |
|
214 X_2 \cdot d + \lambda = \\ |
|
215 & X_0 \cdot (a \cdot b^* \cdot c) + X_2 \cdot (d \cdot b^* \cdot c + d) + \lambda |
|
216 \end{aligned} \label{x_0_def_1} \\ |
|
217 X_2 & = & \begin{aligned} |
|
218 & X_0 \cdot b + (X_0 \cdot (a \cdot b^*) + X_2 \cdot (d \cdot b^*)) \cdot d + X_2 \cdot a = \\ |
|
219 & X_0 \cdot b + X_0 \cdot (a \cdot b^* \cdot d) + X_2 \cdot (d \cdot b^* \cdot d) + X_2 \cdot a = \\ |
|
220 & X_0 \cdot (b + a \cdot b^* \cdot d) + X_2 \cdot (d \cdot b^* \cdot d + a) |
|
221 \end{aligned} \\ |
213 X_3 & = & \begin{aligned} |
222 X_3 & = & \begin{aligned} |
214 & X_0 \cdot (c + d + e) + ((X_0 \cdot a + X_2 \cdot d) \cdot b^*) \cdot (a + e)\\ |
223 & X_0 \cdot (c + d + e) + ((X_0 \cdot a + X_2 \cdot d) \cdot b^*) \cdot (a + e)\\ |
215 & + X_2 \cdot (b + e) + X_3 \cdot (a + b + c + d + e) |
224 & + X_2 \cdot (b + e) + X_3 \cdot (a + b + c + d + e) \label{x_3_def_1} |
216 \end{aligned} |
225 \end{aligned} |
217 \end{eqnarray} |
226 \end{eqnarray} |
218 \end{subequations} |
227 \end{subequations} |
219 |
228 Suppose $X_3$ is the one to remove next, since $X_3$ dose not appear in either $X_0$ or $X_2$, |
|
229 the removal of equation \eqref{x_3_def_1} changes nothing in the rest equations. Therefore, we get: |
|
230 \begin{subequations} |
|
231 \begin{eqnarray} |
|
232 X_0 & = & X_0 \cdot (a \cdot b^* \cdot c) + X_2 \cdot (d \cdot b^* \cdot c + d) + \lambda \label{x_0_def_2} \\ |
|
233 X_2 & = & X_0 \cdot (b + a \cdot b^* \cdot d) + X_2 \cdot (d \cdot b^* \cdot d + a) \label{x_2_def_2} |
|
234 \end{eqnarray} |
|
235 \end{subequations} |
|
236 Actually, since absorbing state like $X_3$ contributes nothing to the language $\Lang$, it could have been removed |
|
237 at the very beginning of this precedure without affecting the final result. Now, the last unknown to remove |
|
238 is $X_2$ and the Arden's transformaton of \eqref{x_2_def_2} is: |
|
239 \begin{equation} \label{x_2_ardened} |
|
240 X_2 ~ = ~ (X_0 \cdot (b + a \cdot b^* \cdot d)) \cdot (d \cdot b^* \cdot d + a)^* = |
|
241 X_0 \cdot ((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*) |
|
242 \end{equation} |
|
243 By substituting the right hand side of \eqref{x_2_ardened} into \eqref{x_0_def_2}, we get: |
|
244 \begin{equation} |
|
245 \begin{aligned} |
|
246 X_0 & = && X_0 \cdot (a \cdot b^* \cdot c) + \\ |
|
247 & && X_0 \cdot ((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*) \cdot |
|
248 (d \cdot b^* \cdot c + d) + \lambda \\ |
|
249 & = && X_0 \cdot ((a \cdot b^* \cdot c) + \\ |
|
250 & && \hspace{3em} ((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*) \cdot |
|
251 (d \cdot b^* \cdot c + d)) + \lambda |
|
252 \end{aligned} |
|
253 \end{equation} |
|
254 By applying Arden's transformation to this, we get the solution of $X_0$ as: |
|
255 \begin{equation} |
|
256 \begin{aligned} |
|
257 X_0 = ((a \cdot b^* \cdot c) + |
|
258 ((b + a \cdot b^* \cdot d) \cdot (d \cdot b^* \cdot d + a)^*) \cdot |
|
259 (d \cdot b^* \cdot c + d))^* |
|
260 \end{aligned} |
|
261 \end{equation} |
|
262 Using the same method, solutions for $X_1$ and $X_2$ can be obtained as well and the |
|
263 regular expressoin for $\Lang$ is just $X_1 + X_2$. The formalization of this procedure |
|
264 consititues the first direction of the {\em regular expression} verion of |
|
265 \mht. Detailed explaination are given in {\bf paper.pdf} and more details and comments |
|
266 can be found in the formal scripts. |
220 *} |
267 *} |
221 |
268 |
|
269 section {* Direction @{text "finite partition \<Rightarrow> regular language"} *} |
|
270 |
|
271 text {* |
|
272 It is well known in the theory of regular languages that |
|
273 the existence of finite language partition amounts to the existence of |
|
274 a minimal automaton, i.e. the $M_\Lang$ constructed in section \ref{sec_prelim}, which recoginzes the |
|
275 same language $\Lang$. The standard way to prove the existence of finite language partition |
|
276 is to construct a automaton out of the regular expression which recoginzes the same language, from |
|
277 which the existence of finite language partition follows immediately. As discussed in the introducton of |
|
278 {\bf paper.pdf} as well as in [5], the problem for this approach happens when automata |
|
279 of sub regular expressions are combined to form the automaton of the mother regular expression, |
|
280 no matter what kind of representation is used, the formalization is cubersome, as said |
|
281 by Nipkow in [5]: `{\em a more abstract mathod is clearly desirable}'. In this section, |
|
282 an {\em intrinsically abstract} method is given, which completely avoid the mentioning of automata, |
|
283 let along any particular representations. |
|
284 *} |
|
285 |
|
286 text {* |
|
287 The main proof structure is a structural induction on regular expressions, |
|
288 where base cases (cases for @{const "NULL"}, @{const "EMPTY"}, @{const "CHAR"}) are quite straightforward to |
|
289 proof. Real difficulty lies in inductive cases. By inductive hypothesis, languages defined by |
|
290 sub-expressions induce finite partitiions. Under such hypothsis, we need to prove that the language |
|
291 defined by the composite regular expression gives rise to finite partion. |
|
292 The basic idea is to attach a tag @{text "tag(x)"} to every string |
|
293 @{text "x"}. The tagging fuction @{text "tag"} is carefully devised, which returns tags |
|
294 made of equivalent classes of the partitions induced by subexpressoins, and therefore has a finite |
|
295 range. Let @{text "Lang"} be the composite language, it is proved that: |
|
296 \begin{quote} |
|
297 If strings with the same tag are equivalent with respect to @{text "Lang"}, expressed as: |
|
298 \[ |
|
299 @{text "tag(x) = tag(y) \<Longrightarrow> x \<approx>Lang y"} |
|
300 \] |
|
301 then the partition induced by @{text "Lang"} must be finite. |
|
302 \end{quote} |
|
303 There are two arguments for this. The first goes as the following: |
|
304 \begin{enumerate} |
|
305 \item First, the tagging function @{text "tag"} induces an equivalent relation @{text "(=tag=)"} |
|
306 (defiintion of @{text "f_eq_rel"} and lemma @{text "equiv_f_eq_rel"}). |
|
307 \item It is shown that: if the range of @{text "tag"} (denoted @{text "range(tag)"}) is finite, |
|
308 the partition given rise by @{text "(=tag=)"} is finite (lemma @{text "finite_eq_f_rel"}). |
|
309 Since tags are made from equivalent classes from component partitions, and the inductive |
|
310 hypothesis ensures the finiteness of these partitions, it is not difficult to prove |
|
311 the finiteness of @{text "range(tag)"}. |
|
312 \item It is proved that if equivalent relation @{text "R1"} is more refined than @{text "R2"} |
|
313 (expressed as @{text "R1 \<subseteq> R2"}), |
|
314 and the partition induced by @{text "R1"} is finite, then the partition induced by @{text "R2"} |
|
315 is finite as well (lemma @{text "refined_partition_finite"}). |
|
316 \item The injectivity assumption @{text "tag(x) = tag(y) \<Longrightarrow> x \<approx>Lang y"} implies that |
|
317 @{text "(=tag=)"} is more refined than @{text "(\<approx>Lang)"}. |
|
318 \item Combining the points above, we have: the partition induced by language @{text "Lang"} |
|
319 is finite (lemma @{text "tag_finite_imageD"}). |
|
320 \end{enumerate} |
|
321 |
|
322 We could have followed another approach given in appendix II of Brzozowski's paper [?], where |
|
323 the set of derivatives of any regular expression can be proved to be finite. |
|
324 Since it is easy to show that strings with same derivative are equivalent with respect to the |
|
325 language, then the second direction follows. We believe that our |
|
326 apporoach is easy to formalize, with no need to do complicated derivation calculations |
|
327 and countings as in [???]. |
|
328 *} |
|
329 |
|
330 |
222 end |
331 end |