lexing: ChengsongTanPhdThesis/Chapters/ChapterBitcoded1.tex@823d9b19d21c (annotated)

527 2c907b118f78 all chapters put in Chengsong parents: diff changeset	1	% Chapter Template
2c907b118f78 all chapters put in Chengsong parents: diff changeset	2
2c907b118f78 all chapters put in Chengsong parents: diff changeset	3	% Main chapter title
2c907b118f78 all chapters put in Chengsong parents: diff changeset	4	\chapter{Correctness of Bit-coded Algorithm without Simplification}
2c907b118f78 all chapters put in Chengsong parents: diff changeset	5
2c907b118f78 all chapters put in Chengsong parents: diff changeset	6	\label{ChapterBitcoded1} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
2c907b118f78 all chapters put in Chengsong parents: diff changeset	7	%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
2c907b118f78 all chapters put in Chengsong parents: diff changeset	8	%simplifications and therefore introduce our version of the bitcoded algorithm and
2c907b118f78 all chapters put in Chengsong parents: diff changeset	9	%its correctness proof in
2c907b118f78 all chapters put in Chengsong parents: diff changeset	10	%Chapter 3\ref{Chapter3}.
2c907b118f78 all chapters put in Chengsong parents: diff changeset	11
530 823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	12	\section*{Bit-coded Algorithm}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	13	Bits and bitcodes (lists of bits) are defined as:
527 2c907b118f78 all chapters put in Chengsong parents: diff changeset	14
530 823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	15	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	16	$b ::= 1 \mid 0 \qquad
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	17	bs ::= [] \mid b::bs
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	18	$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	19	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	20
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	21	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	22	The $1$ and $0$ are not in bold in order to avoid
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	23	confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	24	bit-lists) can be used to encode values (or potentially incomplete values) in a
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	25	compact form. This can be straightforwardly seen in the following
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	26	coding function from values to bitcodes:
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	27
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	28	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	29	\begin{tabular}{lcl}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	30	$\textit{code}(\Empty)$ & $\dn$ & $[]$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	31	$\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	32	$\textit{code}(\Left\,v)$ & $\dn$ & $0 :: code(v)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	33	$\textit{code}(\Right\,v)$ & $\dn$ & $1 :: code(v)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	34	$\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	35	$\textit{code}(\Stars\,[])$ & $\dn$ & $[0]$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	36	$\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $1 :: code(v) \;@\;
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	37	code(\Stars\,vs)$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	38	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	39	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	40
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	41	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	42	Here $\textit{code}$ encodes a value into a bitcodes by converting
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	43	$\Left$ into $0$, $\Right$ into $1$, and marks the start of a non-empty
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	44	star iteration by $1$. The border where a local star terminates
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	45	is marked by $0$. This coding is lossy, as it throws away the information about
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	46	characters, and also does not encode the ``boundary'' between two
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	47	sequence values. Moreover, with only the bitcode we cannot even tell
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	48	whether the $1$s and $0$s are for $\Left/\Right$ or $\Stars$. The
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	49	reason for choosing this compact way of storing information is that the
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	50	relatively small size of bits can be easily manipulated and ``moved
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	51	around'' in a regular expression. In order to recover values, we will
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	52	need the corresponding regular expression as an extra information. This
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	53	means the decoding function is defined as:
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	54
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	55
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	56	%\begin{definition}[Bitdecoding of Values]\mbox{}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	57	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	58	\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	59	$\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	60	$\textit{decode}'\,bs\,(c)$ & $\dn$ & $(\Char\,c, bs)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	61	$\textit{decode}'\,(0\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	62	$\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}\;
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	63	(\Left\,v, bs_1)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	64	$\textit{decode}'\,(1\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	65	$\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_2\;\textit{in}\;
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	66	(\Right\,v, bs_1)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	67	$\textit{decode}'\,bs\;(r_1\cdot r_2)$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	68	$\textit{let}\,(v_1, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	69	& & $\textit{let}\,(v_2, bs_2) = \textit{decode}'\,bs_1\,r_2$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	70	& & \hspace{35mm}$\textit{in}\;(\Seq\,v_1\,v_2, bs_2)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	71	$\textit{decode}'\,(0\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	72	$\textit{decode}'\,(1\!::\!bs)\,(r^*)$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	73	$\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	74	& & $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	75	& & \hspace{35mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\bigskip\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	76
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	77	$\textit{decode}\,bs\,r$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	78	$\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	79	& & $\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	80	\textit{else}\;\textit{None}$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	81	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	82	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	83	%\end{definition}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	84
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	85	Sulzmann and Lu's integrated the bitcodes into regular expressions to
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	86	create annotated regular expressions \cite{Sulzmann2014}.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	87	\emph{Annotated regular expressions} are defined by the following
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	88	grammar:%\comment{ALTS should have an $as$ in the definitions, not just $a_1$ and $a_2$}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	89
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	90	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	91	\begin{tabular}{lcl}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	92	$\textit{a}$ & $::=$ & $\ZERO$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	93	& $\mid$ & $_{bs}\ONE$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	94	& $\mid$ & $_{bs}{\bf c}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	95	& $\mid$ & $_{bs}\sum\,as$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	96	& $\mid$ & $_{bs}a_1\cdot a_2$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	97	& $\mid$ & $_{bs}a^*$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	98	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	99	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	100	%(in \textit{ALTS})
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	101
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	102	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	103	where $bs$ stands for bitcodes, $a$ for $\mathbf{a}$nnotated regular
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	104	expressions and $as$ for a list of annotated regular expressions.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	105	The alternative constructor($\sum$) has been generalized to
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	106	accept a list of annotated regular expressions rather than just 2.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	107	We will show that these bitcodes encode information about
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	108	the (POSIX) value that should be generated by the Sulzmann and Lu
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	109	algorithm.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	110
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	111
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	112	To do lexing using annotated regular expressions, we shall first
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	113	transform the usual (un-annotated) regular expressions into annotated
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	114	regular expressions. This operation is called \emph{internalisation} and
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	115	defined as follows:
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	116
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	117	%\begin{definition}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	118	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	119	\begin{tabular}{lcl}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	120	$(\ZERO)^\uparrow$ & $\dn$ & $\ZERO$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	121	$(\ONE)^\uparrow$ & $\dn$ & $_{[]}\ONE$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	122	$(c)^\uparrow$ & $\dn$ & $_{[]}{\bf c}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	123	$(r_1 + r_2)^\uparrow$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	124	$_{[]}\sum[\textit{fuse}\,[0]\,r_1^\uparrow,\,
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	125	\textit{fuse}\,[1]\,r_2^\uparrow]$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	126	$(r_1\cdot r_2)^\uparrow$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	127	$_{[]}r_1^\uparrow \cdot r_2^\uparrow$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	128	$(r^*)^\uparrow$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	129	$_{[]}(r^\uparrow)^*$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	130	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	131	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	132	%\end{definition}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	133
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	134	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	135	We use up arrows here to indicate that the basic un-annotated regular
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	136	expressions are ``lifted up'' into something slightly more complex. In the
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	137	fourth clause, $\textit{fuse}$ is an auxiliary function that helps to
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	138	attach bits to the front of an annotated regular expression. Its
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	139	definition is as follows:
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	140
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	141	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	142	\begin{tabular}{lcl}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	143	$\textit{fuse}\;bs \; \ZERO$ & $\dn$ & $\ZERO$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	144	$\textit{fuse}\;bs\; _{bs'}\ONE$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	145	$_{bs @ bs'}\ONE$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	146	$\textit{fuse}\;bs\;_{bs'}{\bf c}$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	147	$_{bs@bs'}{\bf c}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	148	$\textit{fuse}\;bs\,_{bs'}\sum\textit{as}$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	149	$_{bs@bs'}\sum\textit{as}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	150	$\textit{fuse}\;bs\; _{bs'}a_1\cdot a_2$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	151	$_{bs@bs'}a_1 \cdot a_2$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	152	$\textit{fuse}\;bs\,_{bs'}a^*$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	153	$_{bs @ bs'}a^*$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	154	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	155	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	156
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	157	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	158	After internalising the regular expression, we perform successive
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	159	derivative operations on the annotated regular expressions. This
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	160	derivative operation is the same as what we had previously for the
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	161	basic regular expressions, except that we beed to take care of
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	162	the bitcodes:
527 2c907b118f78 all chapters put in Chengsong parents: diff changeset	163
2c907b118f78 all chapters put in Chengsong parents: diff changeset	164
530 823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	165	\iffalse
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	166	%\begin{definition}{bder}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	167	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	168	\begin{tabular}{@{}lcl@{}}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	169	$(\textit{ZERO})\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	170	$(\textit{ONE}\;bs)\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	171	$(\textit{CHAR}\;bs\,d)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	172	$\textit{if}\;c=d\; \;\textit{then}\;
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	173	\textit{ONE}\;bs\;\textit{else}\;\textit{ZERO}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	174	$(\textit{ALTS}\;bs\,as)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	175	$\textit{ALTS}\;bs\,(map (\backslash c) as)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	176	$(\textit{SEQ}\;bs\,a_1\,a_2)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	177	$\textit{if}\;\textit{bnullable}\,a_1$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	178	& &$\textit{then}\;\textit{ALTS}\,bs\,List((\textit{SEQ}\,[]\,(a_1\,\backslash c)\,a_2),$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	179	& &$\phantom{\textit{then}\;\textit{ALTS}\,bs\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c)))$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	180	& &$\textit{else}\;\textit{SEQ}\,bs\,(a_1\,\backslash c)\,a_2$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	181	$(\textit{STAR}\,bs\,a)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	182	$\textit{SEQ}\;bs\,(\textit{fuse}\, [\Z] (r\,\backslash c))\,
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	183	(\textit{STAR}\,[]\,r)$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	184	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	185	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	186	%\end{definition}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	187
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	188	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	189	\begin{tabular}{@{}lcl@{}}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	190	$(\textit{ZERO})\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	191	$(_{bs}\textit{ONE})\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	192	$(_{bs}\textit{CHAR}\;d)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	193	$\textit{if}\;c=d\; \;\textit{then}\;
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	194	_{bs}\textit{ONE}\;\textit{else}\;\textit{ZERO}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	195	$(_{bs}\textit{ALTS}\;\textit{as})\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	196	$_{bs}\textit{ALTS}\;(\textit{as}.\textit{map}(\backslash c))$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	197	$(_{bs}\textit{SEQ}\;a_1\,a_2)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	198	$\textit{if}\;\textit{bnullable}\,a_1$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	199	& &$\textit{then}\;_{bs}\textit{ALTS}\,List((_{[]}\textit{SEQ}\,(a_1\,\backslash c)\,a_2),$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	200	& &$\phantom{\textit{then}\;_{bs}\textit{ALTS}\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c)))$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	201	& &$\textit{else}\;_{bs}\textit{SEQ}\,(a_1\,\backslash c)\,a_2$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	202	$(_{bs}\textit{STAR}\,a)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	203	$_{bs}\textit{SEQ}\;(\textit{fuse}\, [0] \; r\,\backslash c )\,
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	204	(_{bs}\textit{STAR}\,[]\,r)$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	205	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	206	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	207	%\end{definition}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	208	\fi
527 2c907b118f78 all chapters put in Chengsong parents: diff changeset	209
530 823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	210	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	211	\begin{tabular}{@{}lcl@{}}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	212	$(\ZERO)\,\backslash c$ & $\dn$ & $\ZERO$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	213	$(_{bs}\ONE)\,\backslash c$ & $\dn$ & $\ZERO$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	214	$(_{bs}{\bf d})\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	215	$\textit{if}\;c=d\; \;\textit{then}\;
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	216	_{bs}\ONE\;\textit{else}\;\ZERO$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	217	$(_{bs}\sum \;\textit{as})\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	218	$_{bs}\sum\;(\textit{map} (\_\backslash c) as )$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	219	$(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	220	$\textit{if}\;\textit{bnullable}\,a_1$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	221	& &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	222	& &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	223	& &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	224	$(_{bs}a^*)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	225	$_{bs}(\textit{fuse}\, [0] \; r\,\backslash c)\cdot
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	226	(_{[]}r^*))$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	227	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	228	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	229
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	230	%\end{definition}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	231	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	232	For instance, when we do derivative of $_{bs}a^*$ with respect to c,
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	233	we need to unfold it into a sequence,
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	234	and attach an additional bit $0$ to the front of $r \backslash c$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	235	to indicate one more star iteration. Also the sequence clause
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	236	is more subtle---when $a_1$ is $\textit{bnullable}$ (here
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	237	\textit{bnullable} is exactly the same as $\textit{nullable}$, except
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	238	that it is for annotated regular expressions, therefore we omit the
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	239	definition). Assume that $\textit{bmkeps}$ correctly extracts the bitcode for how
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	240	$a_1$ matches the string prior to character $c$ (more on this later),
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	241	then the right branch of alternative, which is $\textit{fuse} \; \bmkeps \; a_1 (a_2
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	242	\backslash c)$ will collapse the regular expression $a_1$(as it has
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	243	already been fully matched) and store the parsing information at the
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	244	head of the regular expression $a_2 \backslash c$ by fusing to it. The
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	245	bitsequence $\textit{bs}$, which was initially attached to the
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	246	first element of the sequence $a_1 \cdot a_2$, has
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	247	now been elevated to the top-level of $\sum$, as this information will be
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	248	needed whichever way the sequence is matched---no matter whether $c$ belongs
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	249	to $a_1$ or $ a_2$. After building these derivatives and maintaining all
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	250	the lexing information, we complete the lexing by collecting the
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	251	bitcodes using a generalised version of the $\textit{mkeps}$ function
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	252	for annotated regular expressions, called $\textit{bmkeps}$:
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	253
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	254
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	255	%\begin{definition}[\textit{bmkeps}]\mbox{}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	256	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	257	\begin{tabular}{lcl}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	258	$\textit{bmkeps}\,(_{bs}\ONE)$ & $\dn$ & $bs$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	259	$\textit{bmkeps}\,(_{bs}\sum a::\textit{as})$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	260	$\textit{if}\;\textit{bnullable}\,a$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	261	& &$\textit{then}\;bs\,@\,\textit{bmkeps}\,a$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	262	& &$\textit{else}\;bs\,@\,\textit{bmkeps}\,(_{bs}\sum \textit{as})$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	263	$\textit{bmkeps}\,(_{bs} a_1 \cdot a_2)$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	264	$bs \,@\,\textit{bmkeps}\,a_1\,@\, \textit{bmkeps}\,a_2$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	265	$\textit{bmkeps}\,(_{bs}a^*)$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	266	$bs \,@\, [0]$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	267	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	268	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	269	%\end{definition}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	270
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	271	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	272	This function completes the value information by travelling along the
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	273	path of the regular expression that corresponds to a POSIX value and
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	274	collecting all the bitcodes, and using $S$ to indicate the end of star
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	275	iterations. If we take the bitcodes produced by $\textit{bmkeps}$ and
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	276	decode them, we get the value we expect. The corresponding lexing
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	277	algorithm looks as follows:
527 2c907b118f78 all chapters put in Chengsong parents: diff changeset	278
530 823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	279	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	280	\begin{tabular}{lcl}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	281	$\textit{blexer}\;r\,s$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	282	$\textit{let}\;a = (r^\uparrow)\backslash s\;\textit{in}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	283	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	284	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	285	& & $\;\;\textit{else}\;\textit{None}$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	286	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	287	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	288
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	289	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	290	In this definition $\_\backslash s$ is the generalisation of the derivative
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	291	operation from characters to strings (just like the derivatives for un-annotated
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	292	regular expressions).
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	293
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	294	Now we introduce the simplifications, which is why we introduce the
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	295	bitcodes in the first place.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	296
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	297	\subsection*{Simplification Rules}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	298
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	299	This section introduces aggressive (in terms of size) simplification rules
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	300	on annotated regular expressions
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	301	to keep derivatives small. Such simplifications are promising
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	302	as we have
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	303	generated test data that show
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	304	that a good tight bound can be achieved. We could only
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	305	partially cover the search space as there are infinitely many regular
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	306	expressions and strings.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	307
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	308	One modification we introduced is to allow a list of annotated regular
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	309	expressions in the $\sum$ constructor. This allows us to not just
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	310	delete unnecessary $\ZERO$s and $\ONE$s from regular expressions, but
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	311	also unnecessary ``copies'' of regular expressions (very similar to
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	312	simplifying $r + r$ to just $r$, but in a more general setting). Another
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	313	modification is that we use simplification rules inspired by Antimirov's
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	314	work on partial derivatives. They maintain the idea that only the first
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	315	``copy'' of a regular expression in an alternative contributes to the
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	316	calculation of a POSIX value. All subsequent copies can be pruned away from
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	317	the regular expression. A recursive definition of our simplification function
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	318	that looks somewhat similar to our Scala code is given below:
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	319	%\comment{Use $\ZERO$, $\ONE$ and so on.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	320	%Is it $ALTS$ or $ALTS$?}\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	321
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	322	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	323	\begin{tabular}{@{}lcl@{}}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	324
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	325	$\textit{simp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp} \; a_2) \; \textit{match} $ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	326	&&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	327	&&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	328	&&$\quad\textit{case} \; (\ONE, a_2') \Rightarrow \textit{fuse} \; bs \; a_2'$ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	329	&&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow \textit{fuse} \; bs \; a_1'$ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	330	&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	331
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	332	$\textit{simp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map} \; simp \; as)) \; \textit{match} $ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	333	&&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	334	&&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	335	&&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\
527 2c907b118f78 all chapters put in Chengsong parents: diff changeset	336
530 823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	337	$\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	338	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	339	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	340
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	341	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	342	The simplification does a pattern matching on the regular expression.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	343	When it detected that the regular expression is an alternative or
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	344	sequence, it will try to simplify its child regular expressions
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	345	recursively and then see if one of the children turns into $\ZERO$ or
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	346	$\ONE$, which might trigger further simplification at the current level.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	347	The most involved part is the $\sum$ clause, where we use two
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	348	auxiliary functions $\textit{flatten}$ and $\textit{distinct}$ to open up nested
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	349	alternatives and reduce as many duplicates as possible. Function
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	350	$\textit{distinct}$ keeps the first occurring copy only and removes all later ones
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	351	when detected duplicates. Function $\textit{flatten}$ opens up nested $\sum$s.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	352	Its recursive definition is given below:
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	353
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	354	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	355	\begin{tabular}{@{}lcl@{}}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	356	$\textit{flatten} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	357	(\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	358	$\textit{flatten} \; \ZERO :: as'$ & $\dn$ & $ \textit{flatten} \; \textit{as'} $ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	359	$\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; \textit{as'}$ \quad(otherwise)
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	360	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	361	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	362
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	363	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	364	Here $\textit{flatten}$ behaves like the traditional functional programming flatten
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	365	function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	366	removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	367
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	368	Having defined the $\simp$ function,
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	369	we can use the previous notation of natural
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	370	extension from derivative w.r.t.~character to derivative
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	371	w.r.t.~string:%\comment{simp in the [] case?}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	372
527 2c907b118f78 all chapters put in Chengsong parents: diff changeset	373	\begin{center}
530 823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	374	\begin{tabular}{lcl}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	375	$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	376	$r \backslash_{simp} [\,] $ & $\dn$ & $r$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	377	\end{tabular}
527 2c907b118f78 all chapters put in Chengsong parents: diff changeset	378	\end{center}
2c907b118f78 all chapters put in Chengsong parents: diff changeset	379
530 823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	380	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	381	to obtain an optimised version of the algorithm:
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	382
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	383	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	384	\begin{tabular}{lcl}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	385	$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	386	$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	387	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	388	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	389	& & $\;\;\textit{else}\;\textit{None}$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	390	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	391	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	392
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	393	\noindent
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	394	This algorithm keeps the regular expression size small, for example,
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	395	with this simplification our previous $(a + aa)^*$ example's 8000 nodes
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	396	will be reduced to just 6 and stays constant, no matter how long the
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	397	input string is.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	398
527 2c907b118f78 all chapters put in Chengsong parents: diff changeset	399
2c907b118f78 all chapters put in Chengsong parents: diff changeset	400
2c907b118f78 all chapters put in Chengsong parents: diff changeset	401
2c907b118f78 all chapters put in Chengsong parents: diff changeset	402
2c907b118f78 all chapters put in Chengsong parents: diff changeset	403
530 823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	404
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	405
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	406
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	407
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	408
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	409	%-----------------------------------
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	410	% SUBSECTION 1
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	411	%-----------------------------------
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	412	\section{Specifications of Some Helper Functions}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	413	Here we give some functions' definitions,
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	414	which we will use later.
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	415	\begin{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	416	\begin{tabular}{ccc}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	417	$\retrieve \; \ACHAR \, \textit{bs} \, c \; \Char(c) = \textit{bs}$
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	418	\end{tabular}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	419	\end{center}
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	420
823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	421
527 2c907b118f78 all chapters put in Chengsong parents: diff changeset	422	%----------------------------------------------------------------------------------------
528 28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	423	% SECTION correctness proof
527 2c907b118f78 all chapters put in Chengsong parents: diff changeset	424	%----------------------------------------------------------------------------------------
528 28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	425	\section{Correctness of Bit-coded Algorithm (Without Simplification)}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	426	We now give the proof the correctness of the algorithm with bit-codes.
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	427
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	428	Ausaf and Urban cleverly defined an auxiliary function called $\flex$,
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	429	defined as
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	430	\[
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	431	\flex \; r \; f \; [] \; v \; = \; f\; v
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	432	\flex \; r \; f \; c :: s \; v = \flex r \; \lambda v. \, f (\inj \; r\; c\; v)\; s \; v
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	433	\]
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	434	which accumulates the characters that needs to be injected back,
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	435	and does the injection in a stack-like manner (last taken derivative first injected).
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	436	$\flex$ is connected to the $\lexer$:
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	437	\begin{lemma}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	438	$\flex \; r \; \textit{id}\; s \; \mkeps (r\backslash s) = \lexer \; r \; s$
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	439	\end{lemma}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	440	$\flex$ provides us a bridge between $\lexer$ and $\blexer$.
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	441	What is even better about $\flex$ is that it allows us to
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	442	directly operate on the value $\mkeps (r\backslash v)$,
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	443	which is pivotal in the definition of $\lexer $ and $\blexer$, but not visible as an argument.
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	444	When the value created by $\mkeps$ becomes available, one can
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	445	prove some stepwise properties of lexing nicely:
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	446	\begin{lemma}\label{flexStepwise}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	447	$\textit{flex} \; r \; f \; s@[c] \; v= \flex \; r \; f\; s \; (\inj \; (r\backslash s) \; c \; v) $
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	448	\end{lemma}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	449
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	450	And for $\blexer$ we have a function with stepwise properties like $\flex$ as well,
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	451	called $\retrieve$\ref{retrieveDef}.
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	452	$\retrieve$ takes bit-codes from annotated regular expressions
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	453	guided by a value.
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	454	$\retrieve$ is connected to the $\blexer$ in the following way:
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	455	\begin{lemma}\label{blexer_retrieve}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	456	$\blexer \; r \; s = \decode \; (\retrieve \; (\internalise \; r) \; (\mkeps \; (r \backslash s) )) \; r$
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	457	\end{lemma}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	458	If you take derivative of an annotated regular expression,
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	459	you can $\retrieve$ the same bit-codes as before the derivative took place,
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	460	provided that you use the corresponding value:
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	461
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	462	\begin{lemma}\label{retrieveStepwise}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	463	$\retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	464	\end{lemma}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	465	The other good thing about $\retrieve$ is that it can be connected to $\flex$:
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	466	%centralLemma1
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	467	\begin{lemma}\label{flex_retrieve}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	468	$\flex \; r \; \textit{id}\; s\; v = \decode \; (\retrieve \; (r\backslash s )\; v) \; r$
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	469	\end{lemma}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	470	\begin{proof}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	471	By induction on $s$. The induction tactic is reverse induction on strings.
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	472	$v$ is allowed to be arbitrary.
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	473	The crucial point is to rewrite
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	474	\[
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	475	\retrieve \; (r \backslash s@[c]) \; \mkeps (r \backslash s@[c])
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	476	\]
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	477	as
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	478	\[
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	479	\retrieve \; (r \backslash s) \; (\inj \; (r \backslash s) \; c\; \mkeps (r \backslash s@[c]))
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	480	\].
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	481	This enables us to equate
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	482	\[
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	483	\retrieve \; (r \backslash s@[c]) \; \mkeps (r \backslash s@[c])
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	484	\]
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	485	with
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	486	\[
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	487	\flex \; r \; \textit{id} \; s \; (\inj \; (r\backslash s) \; c\; (\mkeps (r\backslash s@[c])))
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	488	\],
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	489	which in turn can be rewritten as
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	490	\[
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	491	\flex \; r \; \textit{id} \; s@[c] \; (\mkeps (r\backslash s@[c]))
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	492	\].
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	493	\end{proof}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	494
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	495	With the above lemma we can now link $\flex$ and $\blexer$.
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	496
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	497	\begin{lemma}\label{flex_blexer}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	498	$\textit{flex} \; r \; \textit{id} \; s \; \mkeps(r \backslash s) = \blexer \; r \; s$
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	499	\end{lemma}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	500	\begin{proof}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	501	Using two of the above lemmas: \ref{flex_retrieve} and \ref{blexer_retrieve}.
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	502	\end{proof}
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	503	Finally
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	504
28751de4b4ba revised according to comments Chengsong parents: 527 diff changeset	505
530 823d9b19d21c all comments addressed Chengsong parents: 528 diff changeset	506

author	Chengsong
	Mon, 30 May 2022 20:36:15 +0100
changeset 530	823d9b19d21c
parent 528	28751de4b4ba
permissions	-rwxr-xr-x