# HG changeset patch # User Chengsong # Date 1654530342 -3600 # Node ID aff7bf93b9c739434ba7b8c3bafa485205612a14 # Parent ce91c29d2885011e0f03ba64f9b6ecb46a00a1b3 comments addressed all diff -r ce91c29d2885 -r aff7bf93b9c7 ChengsongTanPhdThesis/Chapters/Bitcoded1.tex --- a/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex Mon Jun 06 15:57:17 2022 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex Mon Jun 06 16:45:42 2022 +0100 @@ -393,7 +393,7 @@ \noindent This algorithm keeps the regular expression size small, for example, with this simplification our previous $(a + aa)^*$ example's 8000 nodes -will be reduced to just 6 and stays constant, no matter how long the +will be reduced to just 17 and stays constant, no matter how long the input string is. @@ -427,10 +427,12 @@ Ausaf and Urban cleverly defined an auxiliary function called $\flex$, defined as -\[ -\flex \; r \; f \; [] \; v \; = \; f\; v -\flex \; r \; f \; c :: s \; v = \flex r \; \lambda v. \, f (\inj \; r\; c\; v)\; s \; v -\] +\begin{center} +\begin{tabular}{lcr} +$\flex \; r \; f \; [] \; v$ & $=$ & $f\; v$\\ +$\flex \; r \; f \; c :: s \; v$ & $=$ & $\flex \; r \; \lambda v. \, f (\inj \; r\; c\; v)\; s \; v$ +\end{tabular} +\end{center} which accumulates the characters that needs to be injected back, and does the injection in a stack-like manner (last taken derivative first injected). $\flex$ is connected to the $\lexer$: diff -r ce91c29d2885 -r aff7bf93b9c7 ChengsongTanPhdThesis/Chapters/Inj.tex --- a/ChengsongTanPhdThesis/Chapters/Inj.tex Mon Jun 06 15:57:17 2022 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Inj.tex Mon Jun 06 16:45:42 2022 +0100 @@ -57,17 +57,17 @@ is denoted by the Kleene star operator: \begin{center} \begin{tabular}{lcl} -$\bigcup_{i \geq 0} A^i$ & $\denote$ & $A^*$\\ + $A*$ & $\dn$ & $\bigcup_{i \geq 0} A^i$ \\ \end{tabular} \end{center} -In Isabelle of course we cannot easily get a counterpart of -the $\bigcup_{i \geq 0}$ operator, so we instead define the Kleene star +To get an induction principle in Isabelle/HOL, +we instead define the Kleene star as an inductive set: \begin{center} \begin{tabular}{lcl} -$[] \in A^*$ & &\\ -$s_1 \in A \land \; s_2 \in A^* $ & $\implies$ & $s_1 @ s_2 \in A^*$\\ +$[] \in A*$ & &\\ +$s_1 \in A \land \; s_2 \in A* $ & $\implies$ & $s_1 @ s_2 \in A*$\\ \end{tabular} \end{center} \subsection{Language Derivatives} @@ -89,48 +89,54 @@ which is essentially the left quotient $A \backslash L'$ of $A$ against the singleton language $L' = \{w\}$ in formal language theory. -For this dissertation the $\textit{Ders}$ notation would suffice, there is -no need for a more general derivative definition. +For this dissertation the $\textit{Ders}$ definition with +a single string suffices. With the sequencing, Kleene star, and $\textit{Der}$ operator on languages, we have a few properties of how the language derivative can be defined using sub-languages. \begin{lemma} -$\Der \; c \; (A @ B) = \textit{if} \; [] \in A \; \textit{then} ((\Der \; c \; A) @ B ) \cup \Der \; c\; B \quad \textit{else}\; (\Der \; c \; A) @ B$ +\[ + \Der \; c \; (A @ B) = + \begin{cases} + ((\Der \; c \; A) @ B ) \cup \Der \; c\; B, & \text{if} \; [] \in A \\ + (\Der \; c \; A) @ B, & \text{otherwise} + \end{cases} +\] \end{lemma} \noindent This lemma states that if $A$ contains the empty string, $\Der$ can "pierce" through it and get to $B$. -The language $A^*$'s derivative can be described using the language derivative +The language $A*$'s derivative can be described using the language derivative of $A$: \begin{lemma} -$\textit{Der} \;c \;A^* = (\textit{Der}\; c A) @ (A^*)$\\ +$\textit{Der} \;c \;A* = (\textit{Der}\; c A) @ (A*)$\\ \end{lemma} \begin{proof} \begin{itemize} \item{$\subseteq$} The set -\[ \{s \mid c :: s \in A^*\} \] +\[ \{s \mid c :: s \in A*\} \] is enclosed in the set -\[ \{s_1 @ s_2 \mid s_1 \, s_2. s_1 \in \{s \mid c :: s \in A\} \land s_2 \in A^* \} \] +\[ \{s_1 @ s_2 \mid s_1 \, s_2. s_1 \in \{s \mid c :: s \in A\} \land s_2 \in A* \} \] because whenever you have a string starting with a character -in the language of a Kleene star $A^*$, then that character together with some sub-string +in the language of a Kleene star $A*$, then that character together with some sub-string immediately after it will form the first iteration, and the rest of the string will -be still in $A^*$. +be still in $A*$. \item{$\supseteq$} Note that -\[ \Der \; c \; A^* = \Der \; c \; (\{ [] \} \cup (A @ A^*) ) \] +\[ \Der \; c \; A* = \Der \; c \; (\{ [] \} \cup (A @ A*) ) \] and -\[ \Der \; c \; (\{ [] \} \cup (A @ A^*) ) = \Der\; c \; (A @ A^*) \] +\[ \Der \; c \; (\{ [] \} \cup (A @ A*) ) = \Der\; c \; (A @ A*) \] where the $\textit{RHS}$ of the above equatioin can be rewritten -as \[ (\Der \; c\; A) @ A^* \cup A' \], $A'$ being a possibly empty set. +as \[ (\Der \; c\; A) @ A* \cup A' \], $A'$ being a possibly empty set. \end{itemize} \end{proof} Before we define the $\textit{Der}$ and $\textit{Ders}$ counterpart for regular languages, we need to first give definitions for regular expressions. -\subsection{Regular Expressions and Their Language Interpretation} +\subsection{Regular Expressions and Their Meaning} Suppose we have an alphabet $\Sigma$, the strings whose characters are from $\Sigma$ can be expressed as $\Sigma^*$. @@ -151,17 +157,22 @@ %TODO: FILL in the other defs \begin{center} \begin{tabular}{lcl} +$L \; (\ZERO)$ & $\dn$ & $\phi$\\ +$L \; (\ONE)$ & $\dn$ & $\{[]\}$\\ +$L \; (c)$ & $\dn$ & $\{[c]\}$\\ $L \; (r_1 + r_2)$ & $\dn$ & $ L \; (r_1) \cup L \; ( r_2)$\\ $L \; (r_1 \cdot r_2)$ & $\dn$ & $ L \; (r_1) \cap L \; (r_2)$\\ +$L \; (r^*)$ & $\dn$ & $ (L(r))^*$ \end{tabular} \end{center} +\noindent Which is also called the "language interpretation" of a regular expression. Now with semantic derivatives of a language and regular expressions and their language interpretations in place, we are ready to define derivatives on regexes. \subsection{Brzozowski Derivatives and a Regular Expression Matcher} -The language derivatives acts on a string set and chops off a character from +The language derivative acts on a string set and chops off a character from all strings in that set, we want to define a derivative operation on regular expressions so that after derivative $L(r\backslash c)$ will look as if it was obtained by doing a language derivative on $L(r)$: @@ -181,7 +192,7 @@ \noindent and language Kleene star: \[ -\textit{Der} \;c \;A^* = (\textit{Der}\; c A) @ (A^*) +\textit{Der} \;c \;A* = (\textit{Der}\; c A) @ (A*) \] to get derivative of the Kleene star regular expression: \[ diff -r ce91c29d2885 -r aff7bf93b9c7 thys2/blexer2.sc --- a/thys2/blexer2.sc Mon Jun 06 15:57:17 2022 +0100 +++ b/thys2/blexer2.sc Mon Jun 06 16:45:42 2022 +0100 @@ -1260,7 +1260,17 @@ } } } -small() +// small() + +def aaa_star() = { + val r = STAR(("a" | "aa")) + for(i <- 0 to 100) { + val prog = "a" * i + println(asize(bders_simp(prog.toList, internalise(r)))) + } +} +aaa_star() + def generator_test() { test(rexp(4), 1000000) { (r: Rexp) =>