% Chapter Template% Main chapter title\chapter{Correctness of Bit-coded Algorithm with Simplification}\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}%Then we illustrate how the algorithm without bitcodes falls short for such aggressive %simplifications and therefore introduce our version of the bitcoded algorithm and %its correctness proof in %Chapter 3\ref{Chapter3}. Now we introduce the simplifications, which is why we introduce the bitcodes in the first place.\subsection*{Simplification Rules}This section introduces aggressive (in terms of size) simplification ruleson annotated regular expressionsto keep derivatives small. Such simplifications are promisingas we havegenerated test data that showthat a good tight bound can be achieved. We could onlypartially cover the search space as there are infinitely many regularexpressions and strings. One modification we introduced is to allow a list of annotated regularexpressions in the $\sum$ constructor. This allows us to not justdelete unnecessary $\ZERO$s and $\ONE$s from regular expressions, butalso unnecessary ``copies'' of regular expressions (very similar tosimplifying $r + r$ to just $r$, but in a more general setting). Anothermodification is that we use simplification rules inspired by Antimirov'swork on partial derivatives. They maintain the idea that only the first``copy'' of a regular expression in an alternative contributes to thecalculation of a POSIX value. All subsequent copies can be pruned away fromthe regular expression. A recursive definition of our simplification function that looks somewhat similar to our Scala code is given below:%\comment{Use $\ZERO$, $\ONE$ and so on. %Is it $ALTS$ or $ALTS$?}\\\begin{center} \begin{tabular}{@{}lcl@{}} $\textit{simp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp} \; a_2) \; \textit{match} $ \\ &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\ &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\ &&$\quad\textit{case} \; (\ONE, a_2') \Rightarrow \textit{fuse} \; bs \; a_2'$ \\ &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow \textit{fuse} \; bs \; a_1'$ \\ &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$ \\ $\textit{simp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map} \; simp \; as)) \; \textit{match} $ \\ &&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\ &&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\ &&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\ $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$ \end{tabular} \end{center} \noindentThe simplification does a pattern matching on the regular expression.When it detected that the regular expression is an alternative orsequence, it will try to simplify its child regular expressionsrecursively and then see if one of the children turns into $\ZERO$ or$\ONE$, which might trigger further simplification at the current level.The most involved part is the $\sum$ clause, where we use twoauxiliary functions $\textit{flatten}$ and $\textit{distinct}$ to open up nestedalternatives and reduce as many duplicates as possible. Function$\textit{distinct}$ keeps the first occurring copy only and removes all later oneswhen detected duplicates. Function $\textit{flatten}$ opens up nested $\sum$s.Its recursive definition is given below: \begin{center} \begin{tabular}{@{}lcl@{}} $\textit{flatten} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \; (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\ $\textit{flatten} \; \ZERO :: as'$ & $\dn$ & $ \textit{flatten} \; \textit{as'} $ \\ $\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; \textit{as'}$ \quad(otherwise) \end{tabular} \end{center} \noindentHere $\textit{flatten}$ behaves like the traditional functional programming flattenfunction, except that it also removes $\ZERO$s. Or in terms of regular expressions, itremoves parentheses, for example changing $a+(b+c)$ into $a+b+c$.Having defined the $\simp$ function,we can use the previous notation of naturalextension from derivative w.r.t.~character to derivativew.r.t.~string:%\comment{simp in the [] case?}\begin{center}\begin{tabular}{lcl}$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\$r \backslash_{simp} [\,] $ & $\dn$ & $r$\end{tabular}\end{center}\noindentto obtain an optimised version of the algorithm: \begin{center}\begin{tabular}{lcl} $\textit{blexer\_simp}\;r\,s$ & $\dn$ & $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\ & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ & & $\;\;\textit{else}\;\textit{None}$\end{tabular}\end{center}\noindentThis algorithm keeps the regular expression size small, for example,with this simplification our previous $(a + aa)^*$ example's 8000 nodeswill be reduced to just 17 and stays constant, no matter how long theinput string is.%----------------------------------------------------------------------------------------% SECTION common identities%----------------------------------------------------------------------------------------\section{Common Identities In Simplification-Related Functions} Need to prove these before starting on the big correctness proof.%-----------------------------------% SUBSECTION %-----------------------------------\subsection{Idempotency of $\simp$}\begin{equation} \simp \;r = \simp\; \simp \; r \end{equation}This property means we do not have to repeatedlyapply simplification in each step, which justifiesour definition of $\blexersimp$.It will also be useful in future proofs where properties such as closed forms are needed.The proof is by structural induction on $r$.%-----------------------------------% SUBSECTION %-----------------------------------\subsection{Syntactic Equivalence Under $\simp$}We prove that minor differences can be annhilatedby $\simp$.For example,\begin{center}$\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) = \simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$\end{center}%----------------------------------------------------------------------------------------% SECTION corretness proof%----------------------------------------------------------------------------------------\section{Proof Technique of Correctness of Bit-coded Algorithm with Simplification}The non-trivial part of proving the correctness of the algorithm with simplificationcompared with not having simplification is that we can no longer use the argument in \cref{flex_retrieve}.The function \retrieve needs the structure of the annotated regular expression to agree with the structure of the value, but simplification will always mess with the structure:%TODO: after simp does not agree with each other: (a + 0) --> a v.s. Left(Char(a))