lexing: changeset 536:aff7bf93b9c7

--- a/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex	Mon Jun 06 15:57:17 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex	Mon Jun 06 16:45:42 2022 +0100
@@ -393,7 +393,7 @@
 \noindent
 This algorithm keeps the regular expression size small, for example,
 with this simplification our previous $(a + aa)^*$ example's 8000 nodes
-will be reduced to just 6 and stays constant, no matter how long the
+will be reduced to just 17 and stays constant, no matter how long the
 input string is.
 
 
@@ -427,10 +427,12 @@
 
 Ausaf and Urban cleverly defined an auxiliary function called $\flex$,
 defined as
-\[
-\flex \; r \; f \; [] \; v \; = \; f\; v
-\flex \; r \; f \; c :: s \; v =  \flex r \; \lambda v. \, f (\inj \; r\; c\; v)\; s \; v
-\]
+\begin{center}
+\begin{tabular}{lcr}
+$\flex \; r \; f \; [] \; v$       &  $=$ &   $f\; v$\\
+$\flex \; r \; f \; c :: s \; v$ &  $=$ &   $\flex \; r \; \lambda v. \, f (\inj \; r\; c\; v)\; s \; v$
+\end{tabular}
+\end{center}
 which accumulates the characters that needs to be injected back, 
 and does the injection in a stack-like manner (last taken derivative first injected).
 $\flex$ is connected to the $\lexer$:

--- a/ChengsongTanPhdThesis/Chapters/Inj.tex	Mon Jun 06 15:57:17 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Inj.tex	Mon Jun 06 16:45:42 2022 +0100
@@ -57,17 +57,17 @@
 is denoted by the Kleene star operator:
 \begin{center}
 \begin{tabular}{lcl}
-$\bigcup_{i \geq 0} A^i$ & $\denote$ & $A^*$\\
+ $A*$ & $\dn$ & $\bigcup_{i \geq 0} A^i$ \\
 \end{tabular}
 \end{center}
 
-In Isabelle of course we cannot easily get a counterpart of
-the $\bigcup_{i \geq 0}$ operator, so we instead define the Kleene star
+To get an induction principle in Isabelle/HOL, 
+we instead define the Kleene star
 as an inductive set: 
 \begin{center}
 \begin{tabular}{lcl}
-$[] \in A^*$  & &\\
-$s_1 \in A \land \; s_2 \in A^* $ & $\implies$ & $s_1 @ s_2 \in A^*$\\
+$[] \in A*$  & &\\
+$s_1 \in A \land \; s_2 \in A* $ & $\implies$ & $s_1 @ s_2 \in A*$\\
 \end{tabular}
 \end{center}
 \subsection{Language Derivatives}
@@ -89,48 +89,54 @@
 which is essentially the left quotient $A \backslash L'$ of $A$ against 
 the singleton language $L' = \{w\}$
 in formal language theory.
-For this dissertation the $\textit{Ders}$ notation would suffice, there is
-no need for a more general derivative definition.
+For this dissertation the $\textit{Ders}$ definition with 
+a single string suffices.
 
 With the  sequencing, Kleene star, and $\textit{Der}$ operator on languages,
 we have a  few properties of how the language derivative can be defined using 
 sub-languages.
 \begin{lemma}
-$\Der \; c \; (A @ B) = \textit{if} \;  [] \in A \; \textit{then} ((\Der \; c \; A) @ B ) \cup \Der \; c\; B \quad \textit{else}\; (\Der \; c \; A) @ B$
+\[
+	\Der \; c \; (A @ B) =
+	\begin{cases}
+	((\Der \; c \; A) @ B ) \cup \Der \; c\; B, &  \text{if} \;  [] \in A  \\
+	 (\Der \; c \; A) @ B, & \text{otherwise}
+	 \end{cases}	
+\]
 \end{lemma}
 \noindent
 This lemma states that if $A$ contains the empty string, $\Der$ can "pierce" through it
 and get to $B$.
 
-The language $A^*$'s derivative can be described using the language derivative
+The language $A*$'s derivative can be described using the language derivative
 of $A$:
 \begin{lemma}
-$\textit{Der} \;c \;A^* = (\textit{Der}\; c A) @ (A^*)$\\
+$\textit{Der} \;c \;A* = (\textit{Der}\; c A) @ (A*)$\\
 \end{lemma}
 \begin{proof}
 \begin{itemize}
 \item{$\subseteq$}
 The set 
-\[ \{s \mid c :: s \in A^*\} \]
+\[ \{s \mid c :: s \in A*\} \]
 is enclosed in the set
-\[ \{s_1 @ s_2 \mid s_1 \, s_2. s_1 \in \{s \mid c :: s \in A\} \land s_2 \in A^* \} \]
+\[ \{s_1 @ s_2 \mid s_1 \, s_2. s_1 \in \{s \mid c :: s \in A\} \land s_2 \in A* \} \]
 because whenever you have a string starting with a character 
-in the language of a Kleene star $A^*$, then that character together with some sub-string
+in the language of a Kleene star $A*$, then that character together with some sub-string
 immediately after it will form the first iteration, and the rest of the string will 
-be still in $A^*$.
+be still in $A*$.
 \item{$\supseteq$}
 Note that
-\[ \Der \; c \; A^* = \Der \; c \;  (\{ [] \} \cup (A @ A^*) ) \]
+\[ \Der \; c \; A* = \Der \; c \;  (\{ [] \} \cup (A @ A*) ) \]
 and 
-\[ \Der \; c \;  (\{ [] \} \cup (A @ A^*) ) = \Der\; c \; (A @ A^*) \]
+\[ \Der \; c \;  (\{ [] \} \cup (A @ A*) ) = \Der\; c \; (A @ A*) \]
 where the $\textit{RHS}$ of the above equatioin can be rewritten
-as \[ (\Der \; c\; A) @ A^* \cup A' \], $A'$ being a possibly empty set.
+as \[ (\Der \; c\; A) @ A* \cup A' \], $A'$ being a possibly empty set.
 \end{itemize}
 \end{proof}
 Before we define the $\textit{Der}$ and $\textit{Ders}$ counterpart
 for regular languages, we need to first give definitions for regular expressions.
 
-\subsection{Regular Expressions and Their Language Interpretation}
+\subsection{Regular Expressions and Their Meaning}
 Suppose we have an alphabet $\Sigma$, the strings  whose characters
 are from $\Sigma$
 can be expressed as $\Sigma^*$.
@@ -151,17 +157,22 @@
 %TODO: FILL in the other defs
 \begin{center}
 \begin{tabular}{lcl}
+$L \; (\ZERO)$ & $\dn$ & $\phi$\\
+$L \; (\ONE)$ & $\dn$ & $\{[]\}$\\
+$L \; (c)$ & $\dn$ & $\{[c]\}$\\
 $L \; (r_1 + r_2)$ & $\dn$ & $ L \; (r_1) \cup L \; ( r_2)$\\
 $L \; (r_1 \cdot r_2)$ & $\dn$ & $ L \; (r_1) \cap L \; (r_2)$\\
+$L \; (r^*)$ & $\dn$ & $ (L(r))^*$
 \end{tabular}
 \end{center}
+\noindent
 Which is also called the "language interpretation" of
 a regular expression.
 
 Now with semantic derivatives of a language and regular expressions and
 their language interpretations in place, we are ready to define derivatives on regexes.
 \subsection{Brzozowski Derivatives and a Regular Expression Matcher}
-The language derivatives acts on a string set and chops off a character from
+The language derivative acts on a string set and chops off a character from
 all strings in that set, we want to define a derivative operation on regular expressions
 so that after derivative $L(r\backslash c)$ 
 will look as if it was obtained by doing a language derivative on $L(r)$:
@@ -181,7 +192,7 @@
 \noindent
 and language Kleene star:
 \[
-\textit{Der} \;c \;A^* = (\textit{Der}\; c A) @ (A^*)
+\textit{Der} \;c \;A* = (\textit{Der}\; c A) @ (A*)
 \]
 to get derivative of the Kleene star regular expression:
 \[

--- a/thys2/blexer2.sc	Mon Jun 06 15:57:17 2022 +0100
+++ b/thys2/blexer2.sc	Mon Jun 06 16:45:42 2022 +0100
@@ -1260,7 +1260,17 @@
     }
   }
 }
-small()
+// small()
+
+def aaa_star() = {
+  val r = STAR(("a" | "aa"))
+  for(i <- 0 to 100) {
+    val prog = "a" * i
+    println(asize(bders_simp(prog.toList, internalise(r))))
+  }
+}
+aaa_star()
+
 def generator_test() {
 
   test(rexp(4), 1000000) { (r: Rexp) =>

author	Chengsong
	Mon, 06 Jun 2022 16:45:42 +0100
changeset 536	aff7bf93b9c7
parent 535	ce91c29d2885
child 537	50e590823220

ChengsongTanPhdThesis/Chapters/Bitcoded1.tex		file \| annotate \| diff \| comparison \| revisions
ChengsongTanPhdThesis/Chapters/Inj.tex		file \| annotate \| diff \| comparison \| revisions
thys2/blexer2.sc		file \| annotate \| diff \| comparison \| revisions