lexing: ChengsongTanPhdThesis/Chapters/Bitcoded1.tex@660cf698eb26 (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	% Main chapter title
538 8016a2480704 intro and chap2 Chengsong parents: 537 diff changeset	4	\chapter{Bit-coded Algorithm of Sulzmann and Lu}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	5
cc54ce075db5 restructured Chengsong parents: diff changeset	6	\label{Bitcoded1} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
cc54ce075db5 restructured Chengsong parents: diff changeset	8	%simplifications and therefore introduce our version of the bitcoded algorithm and
cc54ce075db5 restructured Chengsong parents: diff changeset	9	%its correctness proof in
cc54ce075db5 restructured Chengsong parents: diff changeset	10	%Chapter 3\ref{Chapter3}.
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	11	In this chapter, we are going to describe the bit-coded algorithm
653 bc5571c38d1f more updates in section 4.2 and incorporating Christian comments Chengsong parents: 649 diff changeset	12	introduced by Sulzmann and Lu \parencite{Sulzmann2014} and their correctness proof.
667 660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	13	Just like in chapter \ref{Inj}, the algorithms and proofs have been included
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	14	for self-containedness reasons,
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	15	even though they have been originally found and described by
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	16	Sulzmann and Lu (\cite{Sulzmann2014}) and
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	17	Ausaf et al. (\cite{AusafDyckhoffUrban2016} and \cite{Ausaf}).
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	18	%In addition to this, the
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	19	The details of the proofs in this thesis
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	20	also follow more closely the actual Isabelle formalisation.
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	21	For example, lemma \ref{flexStepwise} and \ref{retrieveStepwise}
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	22	are not included in the publications by Ausaf et al., despite them being
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	23	some of the key lemmas leading to the correctness result.
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	24
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	25	We will first motivate the bit-coded algorithm in section \ref{bitMotivate},
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	26	and then introduce their formal definitions in section \ref{bitDef},
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	27	followed by a description of the correctness proof of $\blexer$ in section \ref{blexerProof}.
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	28
653 bc5571c38d1f more updates in section 4.2 and incorporating Christian comments Chengsong parents: 649 diff changeset	29	%to address the growth problem of
bc5571c38d1f more updates in section 4.2 and incorporating Christian comments Chengsong parents: 649 diff changeset	30	%derivatives of
bc5571c38d1f more updates in section 4.2 and incorporating Christian comments Chengsong parents: 649 diff changeset	31	%regular expressions.
667 660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	32	%We have implemented their algorithm in Scala and Isabelle,
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	33	%and found problems
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	34	%in their algorithm, such as de-duplication not working properly and redundant
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	35	%fixpoint construction.
660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	36	\section{The Motivation Behind Using Bitcodes}\label{bitMotivate}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	37	Let us give again the definition of $\lexer$ from Chapter \ref{Inj}:
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	38	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	39	\begin{tabular}{lcl}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	40	$\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\; \Some(\mkeps \; r) \; \textit{else} \; \None$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	41	$\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	42	& & $\quad \phantom{\mid}\; \None \implies \None$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	43	& & $\quad \mid \Some(v) \implies \Some(\inj \; r\; c\; v)$
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	44	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	45	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	46	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	47	The first problem with this algorithm is that
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	48	for the function $\inj$ to work properly
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	49	we cannot destroy the structure of a regular expression,
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	50	and therefore cannot simplify too aggressively.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	51	For example,
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	52	\[
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	53	r + (r + a) \rightarrow r + a
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	54	\]
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	55	cannot be applied because that would require
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	56	breaking up the inner alternative.
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	57	The $\lexer$ plus $\textit{simp}$ therefore only enables
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	58	same-level de-duplications like
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	59	\[
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	60	r + r \rightarrow r.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	61	\]
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	62	Secondly, the algorithm recursively calls $\lexer$ on
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	63	each new character input,
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	64	and before starting a child call
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	65	it stores information of previous lexing steps
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	66	on a stack, in the form of regular expressions
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	67	and characters: $r_0$, $c_0$, $r_1$, $c_1$, etc.
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	68	Each descent into deeper recursive calls in $\lexer$
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	69	causes a new pair of $r_i, c_i$ to be pushed to the call stack.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	70	\begin{figure}[H]
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	71	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	72	%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	73	\node [ circle ] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	74
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	75	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	76	\node [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	77	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	78	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	79	\node [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	80
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	81	\node [minimum width = 2cm, rectangle, draw] (stack) at (0, 3) {Stack};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	82
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	83	\path
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	84	(r)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	85	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	86
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	87	\path (r1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	88	edge [bend right, dashed] node {saved} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	89	\path (c1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	90	edge [bend right, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	91
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	92
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	93	\end{tikzpicture}
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	94	\caption{First derivative taken}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	95	\end{figure}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	96
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	97
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	98
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	99	\begin{figure}[H]
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	100	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	101	%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	102	\node [ circle ] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	103
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	104	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	105	\node [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	106	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	107	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	108	\node [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	109
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	110
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	111	\node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	112	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	113	%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	114	\node [circle, draw] (r2) at (2, 5) {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	115	\node [minimum width = 3cm, minimum height = 1cm, rectangle, draw] (stack) at (0, 2) {\large Stack};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	116
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	117	\path
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	118	(r)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	119	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	120
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	121	\path (r2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	122	edge [bend right, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	123	\path (c2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	124	edge [bend right, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	125
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	126	\path (r1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	127	edge [] node {} (r2);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	128
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	129	\end{tikzpicture}
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	130	\caption{Second derivative taken}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	131	\end{figure}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	132	\noindent
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	133	As the number of derivative steps increases,
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	134	the stack would increase:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	135
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	136	\begin{figure}[H]
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	137	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	138	%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	139	\node [ circle ] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	140
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	141	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	142	\node [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	143	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	144	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	145	\node [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	146
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	147
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	148	\node [circle, minimum size = 0.1, ] (c2) at (0, 5.4) {$c_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	149	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	150	%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	151	\node [circle, ] (r2) at (2, 5) {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	152	\node [minimum width = 4cm, minimum height = 2.5cm, rectangle, draw] (stack) at (0, 1) { \large Stack};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	153
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	154	\node [] (ldots) at (3.5, 5) {};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	155	%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	156
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	157	\node [minimum size = 0.5, circle, ] (rn) at (6, 5) {};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	158
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	159	\node (rldots) at ($(ldots)!.4!(rn)$) {\ldots};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	160
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	161	\path
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	162	(r)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	163	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	164
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	165	\path (rldots)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	166	edge [bend left, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	167
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	168	\path (r1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	169	edge [] node {} (r2);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	170
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	171	\path (r2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	172	edge [] node {} (ldots);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	173	\path (ldots)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	174	edge [bend left, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	175	\path (5.03, 4.9)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	176	edge [bend left, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	177	\end{tikzpicture}
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	178	\caption{More derivatives taken}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	179	\end{figure}
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	180
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	181
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	182	\begin{figure}[H]
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	183	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	184	%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	185	\node [radius = 0.5, circle, draw] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	186
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	187	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	188	\node [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	189	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	190	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	191	\node [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	192	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	193	%\node (0, 6) (c2) circle [radius = 0.3] {$c_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	194	\node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	195	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	196	%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	197	\node [circle, draw] (r2) at (2, 5) {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	198	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	199	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	200	\node [] (ldots) at (4.5, 5) {};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	201	%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	202
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	203	\node [minimum size = 0.5, circle, draw] (rn) at (6, 5) {$r_n$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	204
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	205	\node at ($(ldots)!.4!(rn)$) {\ldots};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	206
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	207
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	208
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	209
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	210	\node [minimum size = 6cm, rectangle, draw] (stack) at (0, 0) {\Huge Stack};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	211
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	212	\path
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	213	(r)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	214	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	215	\path
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	216	(r1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	217	edge [] node {} (r2);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	218	\path (r2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	219	edge [] node {} (ldots);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	220	\path (r)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	221	edge [dashed, bend right] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	222
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	223	\path (r1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	224	edge [dashed, ] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	225
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	226	\path (c1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	227	edge [dashed, bend right] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	228
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	229	\path (c2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	230	edge [dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	231	\path (4.5, 5)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	232	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	233	\path (4.9, 5)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	234	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	235	\path (5.3, 5)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	236	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	237	\path (r2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	238	edge [dashed, ] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	239	\path (rn)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	240	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	241	\end{tikzpicture}
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	242	\caption{Before injection phase starts}
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	243	\end{figure}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	244
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	245
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	246	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	247	After all derivatives have been taken, the stack grows to a maximum size
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	248	and the pair of regular expressions and characters $r_i, c_{i+1}$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	249	are then popped out and used in the injection phase.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	250
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	251
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	252
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	253	\begin{figure}[H]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	254	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	255	%\draw (-6,-6) grid (6,6);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	256	\node [radius = 0.5, circle, ] (r) at (-6, 5) {$r$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	257
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	258	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	259	\node [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	260	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	261	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	262	\node [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	263	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	264	%\node (0, 6) (c2) circle [radius = 0.3] {$c_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	265	\node [circle, minimum size = 0.1, ] (c2) at (0, 5.4) {$c_2$};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	266	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	267	%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	268	\node [circle, ] (r2) at (2, 5) {$r_2$};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	269	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	270	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	271	\node [] (ldots) at (4.5, 5) {};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	272	%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	273
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	274	\node [minimum size = 0.5, circle, ] (rn) at (6, 5) {$r_n$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	275
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	276	\node at ($(ldots)!.4!(rn)$) {\ldots};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	277
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	278	\node [minimum size = 0.5, circle, ] (vn) at (6, -5) {$v_n$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	279
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	280	\node [] (ldots2) at (3.5, -5) {};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	281
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	282	\node [minimum size = 0.5, circle, ] (v2) at (2, -5) {$v_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	283
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	284	\node at ($(ldots2)!.4!(v2)$) {\ldots};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	285
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	286
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	287	\node [circle, ] (v1) at (-2, -5) {$v_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	288
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	289	\node [radius = 0.5, circle, ] (v) at (-6, -5) {$v$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	290
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	291	\node [minimum size = 6cm, rectangle, draw] (stack) at (0, 0) {\Huge Stack};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	292
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	293	\path
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	294	(r)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	295	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	296	\path
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	297	(r1)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	298	edge [] node {} (r2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	299	\path (r2)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	300	edge [] node {} (ldots);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	301	\path (rn)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	302	edge [] node {$\mkeps$} (vn);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	303	\path (vn)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	304	edge [] node {} (ldots2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	305	\path (v2)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	306	edge [] node {$\inj \; r_1 \; c_2\;v_2$} (v1);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	307
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	308	\path (v1)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	309	edge [] node {$\inj \; r \; c_1 \; v_1$} (v);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	310
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	311	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	312	edge [dashed] node {} (-4.2, -5.2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	313	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	314	edge [dashed] node {} (-4, -5.2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	315	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	316	edge [dashed] node {} (-0.1, -5.2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	317	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	318	edge [dashed] node {} (0.2, -5.26);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	319	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	320	edge [dashed] node {} (3.2, -5);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	321	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	322	edge [dashed] node {} (2.7, -5);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	323	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	324	edge [dashed] node {} (3.7, -5);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	325	\end{tikzpicture}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	326	\caption{Stored $r_i, c_{i+1}$ Used by $\inj$}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	327	\end{figure}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	328	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	329	Storing all $r_i, c_{i+1}$ pairs recursively
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	330	allows the algorithm to work in an elegant way, at the expense of
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	331	storing quite a bit of verbose information on the stack.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	332	The stack seems to grow at least quadratically with respect
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	333	to the input (not taking into account the size bloat of $r_i$),
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	334	which can be inefficient and prone to stack overflows.
667 660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	335	\section{Bitcoded Algorithm}\label{bitDef}
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	336	To address this,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	337	Sulzmann and Lu defined a new datatype
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	338	called \emph{annotated regular expression},
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	339	which condenses all the partial lexing information
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	340	(that was originally stored in $r_i, c_{i+1}$ pairs)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	341	into bitcodes.
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	342	The bitcodes are then carried with the regular
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	343	expression, and augmented or moved around
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	344	as the lexing proceeds.
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	345	It becomes unnecessary
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	346	to remember all the
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	347	intermediate expressions, but only the most recent one
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	348	with this bit-carrying regular expression.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	349	Annotated regular expressions
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	350	are defined as the following
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	351	Isabelle datatype:\footnote{ We use subscript notation to indicate
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	352	that the bitcodes are auxiliary information that does not
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	353	interfere with the structure of the regular expressions }
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	354	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	355	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	356	$\textit{a}$ & $::=$ & $\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	357	& $\mid$ & $_{bs}\ONE$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	358	& $\mid$ & $_{bs}{\bf c}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	359	& $\mid$ & $_{bs}\sum\,as$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	360	& $\mid$ & $_{bs}a_1\cdot a_2$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	361	& $\mid$ & $_{bs}a^*$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	362	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	363	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	364	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	365	where $bs$ stands for bit-codes, $a$ for $\mathbf{a}$nnotated regular
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	366	expressions and $as$ for lists of annotated regular expressions.
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	367	The alternative constructor, written $\sum$, has been generalised to
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	368	take a list of annotated regular expressions rather than just two.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	369	Why is it generalised? This is because when we analyse nested
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	370	alternatives, there can be more than two elements at the same level
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	371	after de-duplication, which can no longer be stored in a binary
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	372	constructor.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	373	Bits and bitcodes (lists of bits) are defined as:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	374	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	375	$b ::= S \mid Z \qquad
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	376	bs ::= [] \mid b::bs
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	377	$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	378	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	379	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	380	We use $S$ and $Z$ rather than $1$ and $0$ is to avoid
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	381	confusion with the regular expressions $\ZERO$ and $\ONE$.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	382	The idea is to use the bitcodes
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	383	to indicate which choice was made at a certain point
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	384	during lexing.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	385	For example,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	386	$(_{Z}a+_{S}b) \backslash a$ gives us
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	387	$_{Z}\ONE + \ZERO$, where the $Z$ bitcode will
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	388	later be used to decode that a left branch was
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	389	selected at the time when the part $a+b$ was analysed by
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	390	derivative.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	391	\subsection{A Bird's Eye View of the Bit-coded Lexer}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	392	Before we give the details of the functions and definitions
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	393	related to our
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	394	$\blexer$ (\emph{b}-itcoded lexer), we first provide a high-level
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	395	view of the algorithm.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	396	\begin{figure}[H]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	397	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	398	%\draw (-6,-6) grid (6,6);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	399
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	400	\node [circle, draw] (r0) at (-6, 2) {$r$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	401
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	402	\node [radius = 0.5, circle, draw] (r) at (-6, 5) {$_{bs}a$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	403	\path (r0)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	404	edge [] node {$\internalise$} (r);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	405	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	406	\node [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	407	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	408	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	409	\node [minimum size = 1.0cm, circle, draw] (r1) at (-2, 5) {$_{bs_1}a_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	410	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	411	%\node (0, 6) (c2) circle [radius = 0.3] {$c_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	412	\node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	413	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	414	%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	415	\node [circle, draw, minimum size = 1.4cm] (r2) at (2, 5) {$_{bs_2}a_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	416	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	417	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	418	\node [] (ldots) at (4.5, 5) {};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	419	%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	420
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	421	\node [minimum size = 2.2cm, circle, draw] (rn) at (6, 5) {$_{bs_n}a_n$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	422
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	423	\node at ($(ldots)!.1!(rn)$) {\ldots};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	424
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	425	\node [minimum size = 0.5, circle, ] (v) at (6, 2) {$v$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	426
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	427	%\node [] (v2) at (4, -5) {};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	428	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	429	%\node [draw, cross out] (ldots2) at (5, -5) {};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	430	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	431	%\node at ($(ldots2)!.4!(v2)$) {\ldots};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	432
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	433
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	434	\node [align = center] (decode) at (6.6, 3.2) {$\bmkeps$\\$\decode$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	435
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	436	\path (c1)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	437	edge [dashed, bend left] node {} (r0);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	438
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	439	\path (c2)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	440	edge [dashed, bend left] node {} (r0);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	441
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	442	\path (r1)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	443	edge [dashed, bend right] node {} (r2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	444
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	445
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	446	\path
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	447	(r)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	448	edge [dashed, bend right] node[left] {} (r1);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	449
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	450	\path
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	451	(r)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	452	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	453
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	454	\path
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	455	(r1)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	456	edge [] node {} (r2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	457	\path (r2)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	458	edge [] node {} (ldots);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	459	\path (rn)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	460	edge [] node {} (v);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	461
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	462	\path (r0)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	463	edge [dashed, bend right] node {} (decode);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	464	%\path (v)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	465	%edge [] node {} (ldots2);
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	466
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	467
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	468
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	469	\end{tikzpicture}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	470	\caption{A bird's eye view of $\blexer$. The $_{bsi}a_{i}$s stand
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	471	for the annotated regular expressions in each derivative step.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	472	The characters used in each derivative are written as $c_i$.
bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	473	The size of the derivatives typically increases during each derivative step.
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	474	The process starts with an internalise phase and concludes with a decoding phase.}
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	475	\end{figure}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	476	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	477	The plain regular expressions
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	478	are first ``lifted'' to an annotated regular expression,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	479	with the function called \emph{internalise} ($r \rightarrow _{bs}a$).
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	480	Then the annotated regular expression $_{bs}a$ will
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	481	be transformed by successive derivatives with respect
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	482	to the input stream of characters $c_1, c_2$ etc.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	483	Each time a derivative is taken, the bitcodes
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	484	are moved around, discarded or augmented together
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	485	with the regular expression they are attached to.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	486	After all input has been consumed, the
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	487	bitcodes are collected by $\bmkeps$,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	488	which traverses the nullable part of the regular expression
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	489	and collects the bitcodes along the way.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	490	The collected bitcodes are then $\decode$d with the guidance
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	491	of the input regular expression $r$ ( $_{bs}a \rightarrow r$).
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	492	The most notable improvements of $\blexer$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	493	over $\lexer$ are
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	494	\begin{itemize}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	495	\item
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	496
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	497	the absence of the second injection phase.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	498	\item
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	499	one does not need to store each pair of the
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	500	intermediate regular expressions $_{bs_i}a_i$ and $c_{i+1}$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	501	The previous annotated regular expression $_{bs_i}a_i$'s information is passed
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	502	on without loss to its successor $_{bs_{i+1}}a_{i+1}$,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	503	and $c_{i+1}$'s information is stored in $L\;r$.\footnote{
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	504	which will be used during the decode phase, where we use $r$ as
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	505	a source of information.}
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	506	\item
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	507	simplification works much smoother--one can maintain correctness
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	508	while applying quite aggressive simplifications.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	509	\end{itemize}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	510	%In the next section we will
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	511	%give the operations needed in $\blexer$,
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	512	%with some remarks on the idea behind their definitions.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	513	\subsection{Operations in $\textit{Blexer}$}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	514	The first operation we define related to bit-coded regular expressions
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	515	is how we move bits to the inside of regular expressions.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	516	This operation is called $\fuse$:
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	517	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	518	\begin{tabular}{lcl}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	519	$\textit{fuse}\;bs \; (\ZERO)$ & $\dn$ & $\ZERO$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	520	$\textit{fuse}\;bs\; (_{bs'})\ONE$ & $\dn$ &
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	521	$_{bs @ bs'}\ONE$\\
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	522	$\textit{fuse}\;bs\;(_{bs'}{\bf c})$ & $\dn$ &
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	523	$_{bs@bs'}{\bf c}$\\
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	524	$\textit{fuse}\;bs\;(_{bs'}\sum\textit{as})$ & $\dn$ &
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	525	$_{bs@bs'}\sum\textit{as}$\\
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	526	$\textit{fuse}\;bs\; (_{bs'}a_1\cdot a_2)$ & $\dn$ &
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	527	$_{bs@bs'}a_1 \cdot a_2$\\
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	528	$\textit{fuse}\;bs\;(_{bs'}a^*)$ & $\dn$ &
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	529	$_{bs @ bs'}a^*$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	530	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	531	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	532
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	533	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	534	With \emph{fuse} we are able to define the $\internalise$ function,
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	535	written $(\_)^\uparrow$,
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	536	that translates a ``standard'' regular expression into an
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	537	annotated regular expression.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	538	This function will be applied before we start
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	539	with the derivative phase of the algorithm.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	540
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	541	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	542	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	543	$(\ZERO)^\uparrow$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	544	$(\ONE)^\uparrow$ & $\dn$ & $_{[]}\ONE$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	545	$(c)^\uparrow$ & $\dn$ & $_{[]}{\bf c}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	546	$(r_1 + r_2)^\uparrow$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	547	$_{[]}\sum[\textit{fuse}\,[Z]\,r_1^\uparrow,\,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	548	\textit{fuse}\,[S]\,r_2^\uparrow]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	549	$(r_1\cdot r_2)^\uparrow$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	550	$_{[]}r_1^\uparrow \cdot r_2^\uparrow$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	551	$(r^*)^\uparrow$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	552	$_{[]}(r^\uparrow)^*$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	553	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	554	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	555	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	556	The opposite of $\textit{internalise}$ is
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	557	$\erase$, where all bit-codes are removed,
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	558	and the alternative operator $\sum$ for annotated
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	559	regular expressions is transformed into the binary version
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	560	in (plain) regular expressions. This can be defined as follows:
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	561	\begin{center}\label{eraseDef}
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	562	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	563	$\ZERO_\downarrow$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	564	$( _{bs}\ONE )_\downarrow$ & $\dn$ & $\ONE$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	565	$( _{bs}\mathbf{c} )_\downarrow$ & $\dn$ & $\mathbf{c}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	566	$( _{bs} a_1 \cdot a_2 )_\downarrow$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	567	$ (a_1) _\downarrow \cdot (a_2) _\downarrow$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	568	$( _{bs} [])_\downarrow $ & $\dn$ & $\ZERO $\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	569	$( _{bs} [a] )_\downarrow$ & $\dn$ & $a_\downarrow$\\
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	570	$(_{bs} \sum [a_1, \; a_2])_\downarrow$ & $\dn$ & $ (a_1) _\downarrow + ( a_2 ) _\downarrow $\\
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	571	$(_{bs} \sum (a :: as))_\downarrow$ & $\dn$ & $ a_\downarrow + \; (_{[]} \sum as)_\downarrow$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	572	$( _{bs} a^* )_\downarrow$ & $\dn$ & $(a_\downarrow)^*$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	573	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	574	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	575	\noindent
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	576	Where we abbreviate the $\erase\; a$ operation
dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	577	as $(a)_\downarrow$, for conciseness.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	578
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	579	Determining whether an annotated regular expression
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	580	contains the empty string in its language requires
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	581	a dedicated function $\bnullable$.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	582	In our formalisation this function is defined by simply calling $\erase$
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	583	before $\nullable$.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	584	\begin{definition}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	585	$\bnullable \; a \dn \nullable \; (a_\downarrow)$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	586	\end{definition}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	587	The function for collecting
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	588	bitcodes from a
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	589	(b)nullable regular expression is called $\bmkeps$.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	590	This function is a lifted version of the $\textit{mkeps}$ function,
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	591	which follows the path $\mkeps$ takes to traverse the
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	592	$\nullable$ branches when visiting a regular expression,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	593	but collects bitcodes instead of generating a value.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	594	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	595	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	596	$\textit{bmkeps}\,(_{bs}\ONE)$ & $\dn$ & $bs$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	597	$\textit{bmkeps}\,(_{bs}\sum a::\textit{as})$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	598	$\textit{if}\;\textit{bnullable}\,a$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	599	& &$\textit{then}\;bs\,@\,\textit{bmkeps}\,a$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	600	& &$\textit{else}\;bs\,@\,\textit{bmkeps}\,(_{[]}\sum \textit{as})$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	601	$\textit{bmkeps}\,(_{bs} a_1 \cdot a_2)$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	602	$bs \,@\,\textit{bmkeps}\,a_1\,@\, \textit{bmkeps}\,a_2$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	603	$\textit{bmkeps}\,(_{bs}a^*)$ & $\dn$ &
585 4969ef817d92 chap4 more Chengsong parents: 582 diff changeset	604	$bs \,@\, [S]$
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	605	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	606	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	607	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	608	This function, just like $\mkeps$,
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	609	visits a regular expression tree respecting
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	610	the POSIX rules.
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	611	It traverses each child of the sequence regular expression
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	612	from left to right and creates a bitcode by stitching
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	613	together bitcodes obtained from the children expressions.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	614	In the case of alternative regular expressions,
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	615	it looks for the leftmost
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	616	$\nullable$ branch
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	617	to visit and ignores the other siblings.
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	618	%Whenever there is some bitcodes attached to a
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	619	%node, it returns the bitcodes concatenated with whatever
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	620	%child recursive calls return.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	621	The only time when $\bmkeps$ creates new bitcodes
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	622	is when it completes a star's iterations by attaching a $S$ to the end of the bitcode
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	623	list it returns.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	624
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	625	The bitcodes extracted by $\bmkeps$ need to be
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	626	$\decode$d (with the guidance of a ``plain'' regular expression):
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	627	%\begin{definition}[Bitdecoding of Values]\mbox{}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	628	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	629	\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	630	$\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	631	$\textit{decode}'\,bs\,(c)$ & $\dn$ & $(\Char\,c, bs)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	632	$\textit{decode}'\,(Z\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	633	$\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}\;
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	634	(\Left\,v, bs_1)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	635	$\textit{decode}'\,(S\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	636	$\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_2\;\textit{in}\;
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	637	(\Right\,v, bs_1)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	638	$\textit{decode}'\,bs\;(r_1\cdot r_2)$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	639	$\textit{let}\,(v_1, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	640	& & $\textit{let}\,(v_2, bs_2) = \textit{decode}'\,bs_1\,r_2$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	641	& & \hspace{35mm}$\textit{in}\;(\Seq\,v_1\,v_2, bs_2)$\\
585 4969ef817d92 chap4 more Chengsong parents: 582 diff changeset	642	$\textit{decode}'\,(S\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\
4969ef817d92 chap4 more Chengsong parents: 582 diff changeset	643	$\textit{decode}'\,(Z\!::\!bs)\,(r^*)$ & $\dn$ &
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	644	$\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	645	& & $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	646	& & \hspace{35mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\bigskip\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	647
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	648	$\textit{decode}\,bs\,r$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	649	$\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	650	& & $\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	651	\textit{else}\;\textit{None}$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	652	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	653	\end{center}
537 50e590823220 more Chengsong parents: 536 diff changeset	654	\noindent
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	655	The function $\decode'$ returns a pair consisting of
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	656	a partially decoded value and some leftover bit-list.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	657	The function $\decode'$ succeeds if the left-over
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	658	bit-sequence is empty.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	659
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	660	%$\decode$ is terminating as $\decode'$ is terminating.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	661	%$\decode'$ is terminating
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	662	%because at least one of $\decode'$'s parameters will go down in terms
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	663	%of size.\\
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	664	The reverse operation of $\decode$ is $\code$.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	665	This function encodes a value into a bitcode by converting
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	666	$\Left$ into $Z$, $\Right$ into $S$, and marks the start of any non-empty
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	667	star iteration by $S$. $ Z$ marks the border where a star iteration
bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	668	terminates.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	669	This coding is lossy, as it throws away the information about
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	670	characters, and does not encode the ``boundary'' between two
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	671	sequence values. Moreover, with only the bitcode we cannot even tell
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	672	whether the $S$s and $Z$s are for $\Left/\Right$ or $\Stars$,
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	673	but this will not be necessary for our correctness proof.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	674	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	675	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	676	$\textit{code}(\Empty)$ & $\dn$ & $[]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	677	$\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	678	$\textit{code}(\Left\,v)$ & $\dn$ & $Z :: code(v)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	679	$\textit{code}(\Right\,v)$ & $\dn$ & $S :: code(v)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	680	$\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	681	$\textit{code}(\Stars\,[])$ & $\dn$ & $[Z]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	682	$\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $S :: code(v) \;@\;
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	683	code(\Stars\,vs)$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	684	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	685	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	686	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	687	We can prove that given a value $v$ and regular expression $r$
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	688	with $\vdash v:r$,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	689	then we have the property that $\decode$ and $\code$ are
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	690	reverse operations of one another:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	691	\begin{lemma}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	692	\[If \vdash v : r \; then \;\decode \; (\code \; v) \; r = \textit{Some}(v) \]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	693	\end{lemma}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	694	\begin{proof}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	695	By proving a more general version of the lemma, on $\decode'$:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	696	\[\vdash v : r \implies \decode' \; ((\code \; v) @ ds) \; r = (v, ds) \]
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	697	Then we obtain the property by setting $ds$ to be $[]$ and unfolding the $\decode$ definition.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	698	\end{proof}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	699	\noindent
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	700	Now we define the central part of Sulzmann and Lu's second lexing algorithm,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	701	the $\bder$ function (which stands for \emph{b}itcoded-derivative):
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	702	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	703	\begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	704	$(\ZERO)\,\backslash c$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	705	$(_{bs}\ONE)\,\backslash c$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	706	$(_{bs}{\bf d})\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	707	$\textit{if}\;c=d\; \;\textit{then}\;
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	708	_{bs}\ONE\;\textit{else}\;\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	709	$(_{bs}\sum \;\textit{as})\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	710	$_{bs}\sum\;(\textit{map} \; (\_\backslash c) \; as )$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	711	$(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	712	$\textit{if}\;\textit{bnullable}\,a_1$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	713	& &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	714	& &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	715	& &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	716	$(_{bs}a^*)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	717	$_{bs}(\textit{fuse}\, [Z] \; r\,\backslash c)\cdot
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	718	(_{[]}r^*))$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	719	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	720	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	721	\noindent
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	722	For $\bder \; c\; a$, we use the infix notation $a\backslash c$ for readability.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	723	The $\bder$ function tells us how regular expressions can be recursively traversed,
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	724	where the bitcodes are augmented and carried around
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	725	when a derivative is taken.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	726	We give the intuition behind some of the more involved cases in
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	727	$\bder$.
dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	728
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	729	For example,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	730	in the \emph{star} case,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	731	a derivative of $_{bs}a^*$ means
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	732	that one more star iteration needs to be taken.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	733	We therefore need to unfold it into a sequence,
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	734	and attach an additional bit $Z$ to the front of $a \backslash c$
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	735	as a record to indicate one new star iteration is unfolded.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	736
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	737	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	738	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	739	\begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	740	$(_{bs}a^*)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	741	$_{bs}(\underbrace{\textit{fuse}\, [Z] \; a\,\backslash c}_{\text{One more iteration}})\cdot
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	742	(_{[]}a^*))$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	743	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	744	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	745
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	746	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	747	This information will be recovered later by the $\decode$ function.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	748	%The intuition is that the bit $Z$ will be decoded at the right location,
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	749	%because we accumulate bits from left to right (a rigorous proof will be given
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	750	%later).
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	751
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	752	%\begin{tikzpicture}[ > = stealth, % arrow head style
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	753	% shorten > = 1pt, % don't touch arrow head to node
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	754	% semithick % line style
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	755	% ]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	756	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	757	% \tikzstyle{every state}=[
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	758	% draw = black,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	759	% thin,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	760	% fill = cyan!29,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	761	% minimum size = 7mm
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	762	% ]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	763	% \begin{scope}[node distance=1cm and 0cm, every node/.style=state]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	764	% \node (k) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	765	% {$bs$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	766	% \nodepart{two} $a^*$ };
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	767	% \node (l) [below =of k, rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	768	% { $bs$ + [Z]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	769	% \nodepart{two} $(a\backslash c )\cdot a^*$ };
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	770	% \end{scope}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	771	% \path[->]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	772	% (k) edge (l);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	773	%\end{tikzpicture}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	774	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	775	%Pictorially the process looks like below.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	776	%Like before, the red region denotes
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	777	%previous lexing information (stored as bitcodes in $bs$).
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	778
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	779	%\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	780	% \begin{scope}[node distance=1cm]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	781	% \node (a) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	782	% {$bs$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	783	% \nodepart{two} $a^*$ };
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	784	% \node (b) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	785	% { $bs$ + [Z]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	786	% \nodepart{two} $(a\backslash c )\cdot a^*$ };
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	787	%%\caption{term 1 \ref{term:1}'s matching configuration}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	788	% \end{scope}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	789	%\end{tikzpicture}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	790
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	791	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	792	Another place the $\bder$ function differs
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	793	from derivatives on regular expressions
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	794	is the sequence case:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	795	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	796	\begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	797
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	798	$(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	799	$\textit{if}\;\textit{bnullable}\,a_1$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	800	& &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	801	& &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	802	& &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	803	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	804	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	805	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	806	%We assume that $\bmkeps$
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	807	%correctly extracts the bitcode for how $a_1$
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	808	%matches the string prior to $c$ (more on this later).
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	809	%The bitsequence $\textit{bs}$ which was initially
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	810	%attached to the first element of the sequence
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	811	%$a_1 \cdot a_2$, has now been
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	812	%elevated to the top level of the $\sum$ constructor.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	813	%This is because this piece of information will be needed
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	814	%whichever way the sequence is matched,
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	815	%regardless of whether $c$ belongs to $a_1$ or $a_2$.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	816	The difference mainly lies in the $\textit{if}$ branch (when $a_1$ is $\bnullable$):
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	817	we use $\bmkeps$ to store the lexing information
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	818	in $a_1$ before collapsing
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	819	it (as it has been fully matched by string prior to $c$),
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	820	and attach the collected bit-codes to the front of $a_2$
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	821	before throwing away $a_1$.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	822	In the injection-based lexer, $r_1$ is immediately thrown away in
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	823	the $\textit{if}$ branch,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	824	the information $r_1$ stores is therefore lost:
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	825	\begin{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	826	\begin{tabular}{lcl}
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	827	$(r_1 \cdot r_2)\backslash c$ & $\dn$ & $\mathit{if} \, [] \in L(r_1)$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	828	& & $\mathit{then}\;(r_1\backslash c) \cdot r_2 \,+\, r_2\backslash c$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	829	& & $\mathit{else}\; \ldots$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	830	\end{tabular}
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	831	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	832	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	833
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	834	%\noindent
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	835	%this loss of information makes it necessary
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	836	%to store on stack all the regular expressions'
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	837	%``snapshots'' before they were
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	838	%taken derivative of.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	839	%So that the related information will be available
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	840	%once the child recursive
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	841	%call finishes.\\
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	842	The rest of the clauses of $\bder$ is rather similar to
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	843	$\der$. \\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	844	Generalising the derivative operation with bitcodes to strings, we have
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	845	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	846	\begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	847	$a\backslash_s [] $ & $\dn$ & $a$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	848	$a\backslash (c :: s) $ & $\dn$ & $(a \backslash c) \backslash_s s$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	849	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	850	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	851	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	852	As we did earlier, we omit the $s$ subscript at $\backslash_s$ when there is no danger
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	853	of confusion.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	854	\subsection{Putting Things Together}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	855	Putting these operations altogether, we obtain a lexer with bit-coded regular expressions
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	856	as its internal data structures, which we call $\blexer$:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	857
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	858	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	859	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	860	$\textit{blexer}\;r\,s$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	861	$\textit{let}\;a = (r^\uparrow)\backslash s\;\textit{in}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	862	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	863	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	864	& & $\;\;\textit{else}\;\textit{None}$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	865	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	866	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	867	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	868	This function first attaches bitcodes to a
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	869	plain regular expression $r$, and then builds successive derivatives
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	870	with respect to the input string $s$, and
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	871	then test whether the result is (b)nullable.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	872	If yes, then extract the bitcodes from the nullable expression,
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	873	and decodes the bitcodes into a lexical value.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	874	If there does not exist a match between $r$ and $s$, the lexer
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	875	outputs $\None$ indicating a mismatch.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	876	Ausaf and Urban formally proved the correctness of the $\blexer$, namely
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	877	\begin{property}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	878	$\blexer \;r \; s = \lexer \; r \; s$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	879	\end{property}
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	880	\noindent
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	881	This was claimed but not formalised in Sulzmann and Lu's work.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	882	We introduce the proof later, after we give all
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	883	the needed auxiliary functions and definitions.
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	884	\subsection{An Example $\blexer$ Run}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	885	Before introducing the proof we shall first walk
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	886	through a concrete example of our $\blexer$ calculating the right
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	887	lexical information through bit-coded regular expressions.
dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	888
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	889	Consider the regular expression $(aa)^* \cdot (b+c)$ matching the string $aab$.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	890	We give again the bird's eye view of this particular example
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	891	in each stage of the algorithm:
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	892
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	893	\tikzset{three sided/.style={
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	894	draw=none,
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	895	append after command={
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	896	[-,shorten <= -0.5\pgflinewidth]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	897	([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	898	edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	899	([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	900	edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	901	([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	902	edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	903	}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	904	}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	905	}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	906
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	907	\tikzset{three sided1/.style={
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	908	draw=none,
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	909	append after command={
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	910	[-,shorten <= -0.5\pgflinewidth]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	911	([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	912	edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	913	([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	914	edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	915	([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	916	edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	917	}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	918	}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	919	}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	920
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	921	\begin{figure}[H]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	922	\begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	923	\node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	924	\node [rectangle, draw] (a) at (-6, 4) {$(aa)^*(_{Z}b + _{S}c)$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	925	\path (r)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	926	edge [] node {$\internalise$} (a);
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	927	\node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	928	\path (a)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	929	edge [] node {$\backslash a$} (a1);
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	930
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	931	\node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	932	\node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	933	\path (a1)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	934	edge [] node {$\backslash a$} (a21);
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	935	\node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	936	\path (a22)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	937	edge [] node {$\backslash b$} (a3);
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	938	\path (a21)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	939	edge [dashed, bend right] node {} (a3);
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	940	\node [rectangle, draw] (bs) at (2, 4) {$ZSZ$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	941	\path (a3)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	942	edge [below] node {$\bmkeps$} (bs);
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	943	\node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	944	\path (bs)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	945	edge [] node {$\decode$} (v);
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	946
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	947
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	948	\end{tikzpicture}
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	949	\caption{$\blexer$ with the regular expression $(aa)^*(b+c)$ and $aab$}
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	950	\end{figure}
537 50e590823220 more Chengsong parents: 536 diff changeset	951	\noindent
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	952	The one dashed arrow indicates that $_Z(\ONE \cdot (aa)^*)$
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	953	turned into $ZS$ after derivative w.r.t $b$
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	954	is taken, which calls $\bmkeps$ on the nullable $_Z\ONE\cdot (aa)^*$
dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	955	before processing $_Zb+_Sc$.
dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	956
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	957	The annotated regular expressions
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	958	would look overwhelming if we explicitly indicate all the
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	959	locations where bitcodes are attached.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	960	For example,
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	961	$(aa)^*\cdot (b+c)$ would
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	962	look like $_{[]}(_{[]}(_{[]}a \cdot _{[]}a)^*\cdot _{[]}(_{[]}b+_{[]}c))$
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	963	after
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	964	internalise.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	965	Therefore for readability we omit bitcodes if they are empty.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	966	This applies to all annotated
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	967	regular expressions in this thesis.
dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	968
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	969	%and assume we have just read the first character $a$:
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	970	%\begin{center}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	971	%\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	972	% \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	973	% {$(_{[Z]}(\ONE \cdot a) \cdot (aa)^* )\cdot bc$
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	974	% \nodepart{two} $[Z] \iff \Seq \; (\Stars \; [\Seq\; a \; ??, \;??]) \; ??$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	975	%\end{tikzpicture}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	976	%\end{center}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	977	%\noindent
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	978	%We use the red area for (annotated) regular expressions and the blue
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	979	%area the (partially calculated) bitcodes
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	980	%and its corresponding (partial) values.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	981	%The first derivative
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	982	%generates a $Z$ bitcode to indicate
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	983	%a new iteration has been started.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	984	%This bitcode is attached to the front of
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	985	%the unrolled star iteration $\ONE\cdot a$
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	986	%for later decoding.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	987	%\begin{center}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	988	%\begin{tikzpicture}[]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	989	% \node [rectangle split, rectangle split horizontal,
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	990	% rectangle split parts=2, rectangle split part fill={red!30,blue!20}, draw, rounded corners, inner sep=10pt]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	991	% (der2) at (0,0)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	992	% {$(_{[Z]}(\ONE \cdot \ONE) \cdot (aa)^*) \cdot bc $
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	993	% \nodepart{two} $[Z] \iff \Seq \; (\Stars \; [\Seq \; a\;a, ??]) \; ??$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	994	%
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	995	%\node [draw=none, minimum size = 0.1, ] (r) at (-7, 0) {$a_1$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	996	%\path
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	997	% (r)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	998	% edge [->, >=stealth',shorten >=1pt, above] node {$\backslash a$} (der2);
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	999	%%\caption{term 1 \ref{term:1}'s matching configuration}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1000	%\end{tikzpicture}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1001	%\end{center}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1002	%\noindent
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1003	%After we take derivative with respect to
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1004	%second input character $a$, the annotated
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1005	%regular expression has the second $a$ chopped off.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1006	%The second derivative does not involve any
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1007	%new bitcodes being generated, because
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1008	%there are no new iterations or bifurcations
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1009	%in the regular expression requiring any $S$ or $Z$ marker
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1010	%to indicate choices.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1011	%\begin{center}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1012	%\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1013	% \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1014	% {$(_{[Z]}(\ONE \cdot \ONE) \cdot (aa)^*) \cdot (\ONE \cdot c) $
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1015	% \nodepart{two} $[Z] \iff \Seq \; (\Stars \; [\Seq \; a\;a, ??]) \; ??$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1016	%%\caption{term 1 \ref{term:1}'s matching configuration}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1017	%\end{tikzpicture}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1018	%\end{center}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1019	%\noindent
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1020	%
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1021	%
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1022	%\begin{center}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1023	%\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1024	% \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1025	% {$\stackrel{Bitcoded}{\longrightarrow} \Seq(\Stars[\Char(a), \Char(a)], ???)$
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1026	% \nodepart{two} $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$ $\stackrel{Inj}{\longleftarrow}$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1027	%%\caption{term 1 \ref{term:1}'s matching configuration}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1028	%\end{tikzpicture}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1029	%\end{center}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1030	%\noindent
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1031	%If we do this kind of "attachment"
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1032	%and each time augment the attached partially
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1033	%constructed value when taking off a
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1034	%character:
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1035	%\begin{center}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1036	%\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1037	% \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] (spPoint)
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1038	% {$\Seq(\Stars[\Char(a), \Char(a)], \ldots)$
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1039	% \nodepart{two} Remaining: $b c$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1040	%\end{tikzpicture}\\
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1041	%$\downarrow$\\
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1042	%\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1043	% \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1044	% {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \ldots))$
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1045	% \nodepart{two} Remaining: $c$};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1046	%\end{tikzpicture}\\
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1047	%$\downarrow$\\
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1048	%\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1049	% \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1050	% {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \Char(c)))$
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1051	% \nodepart{two} EOF};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1052	%\end{tikzpicture}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1053	%\end{center}
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	1054
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1055	In the next section we introduce the correctness proof
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1056	found by Ausaf and Urban
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1057	of the bitcoded lexer.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1058	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	1059	% SUBSECTION 1
cc54ce075db5 restructured Chengsong parents: diff changeset	1060	%-----------------------------------
667 660cf698eb26 added example of how inj and lexer works Chengsong parents: 657 diff changeset	1061	\section{Correctness Proof of $\textit{Blexer}$}\label{blexerProof}
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1062	Why is $\blexer$ correct?
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1063	In other words, why is it the case that
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1064	$\blexer$ outputs the same value as $\lexer$?
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1065	Intuitively,
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1066	that is because
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1067	\begin{itemize}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1068	\item
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1069	$\blexer$ follows an almost identical
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1070	path as that of $\lexer$,
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1071	for example $r_1, r_2, \ldots$ and $a_1, a_2, \ldots$ being produced,
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1072	which are the same up to the application of $\erase$.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1073	\item
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1074	The bit-encodings work properly,
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1075	allowing the possibility of
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1076	pulling out the right lexical value from an
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1077	annotated regular expression at
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1078	any stage, using $\bmkeps$ or $\retrieve$ (details in
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1079	lemmas \ref{bmkepsRetrieve} and \ref{blexer_retrieve}).
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1080	\end{itemize}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1081	We will elaborate on this, with the help of
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1082	some helper functions such as $\retrieve$ and
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1083	$\flex$.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1084	\subsection{Specifications of Some Helper Functions}
622 4b1149fb5aec incorporated more comments, bib Chengsong parents: 585 diff changeset	1085	The functions we introduce will give a more detailed view into
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1086	the lexing process, which is not possible
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1087	using $\lexer$ or $\blexer$ alone.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1088	\subsubsection{$\textit{Retrieve}$}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1089	The first function we shall introduce is $\retrieve$.
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1090	Sulzmann and Lu gave its definition, and
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	1091	Ausaf and Urban found it
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1092	useful in their mechanised proofs.
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1093	Our bit-coded lexer ``retrieve''s the bitcodes using $\bmkeps$,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1094	after all the derivatives have been taken:
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1095	\begin{center}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1096	\begin{tabular}{lcl}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1097	& & $\ldots$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1098	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1099	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1100	& & $\ldots$
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1101	\end{tabular}
cc54ce075db5 restructured Chengsong parents: diff changeset	1102	\end{center}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1103	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1104	The function $\bmkeps$ retrieves the value $v$'s
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1105	information in the format
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1106	of bitcodes, by travelling along the
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1107	path of the regular expression that corresponds to a POSIX match,
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1108	collecting all the bitcodes.
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1109	We know that this "retrieved" bitcode leads to the correct value after decoding,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1110	which is $v_0$ in the injection-based lexing diagram:
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1111	\vspace{20mm}
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1112	\begin{figure}[H]%\label{graph:injLexer}
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1113	\begin{center}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1114	\begin{tikzcd}[
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1115	every matrix/.append style = {name=p},
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1116	remember picture, overlay,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1117	]
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1118	a_0 \arrow[r, "\backslash c_0"] \arrow[d] & a_1 \arrow[r, "\backslash c_1"] \arrow[d] & a_2 \arrow[r, dashed] \arrow[d] & a_n \arrow[d] \\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1119	v_0 & v_1 \arrow[l,"inj_{r_0} c_0"] & v_2 \arrow[l, "inj_{r_1} c_1"] & v_n \arrow[l, dashed]
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1120	\end{tikzcd}
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1121	\end{center}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1122	\begin{tikzpicture}[
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1123	remember picture, overlay,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1124	E/.style = {ellipse, draw=blue, dashed,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1125	inner xsep=4mm,inner ysep=-4mm, rotate=90, fit=#1}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1126	]
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1127	\node[E = (p-1-1) (p-2-1)] {};
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1128	\node[E = (p-1-4) (p-2-4)] {};
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1129	\node[E = (p-1-2) (p-2-2)] {};
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1130	\node[E = (p-1-3) (p-2-3)] {};
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1131	\end{tikzpicture}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1132	\caption{Injection-based lexing diagram,
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1133	with matching value and regular expression pairs
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1134	encircled}\label{Inj2}
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1135	\end{figure}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1136	\vspace{20mm}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1137	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1138	We have that $\vdash v_i:(a_i)_\downarrow$,
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1139	where $v_i$ are the intermediate values generated step by step in $\lexer$.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1140	These values are not explicitly created in $\blexer$.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1141
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1142	We encircled in the diagram all the pairs $v_i, a_i$ to show that these values
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1143	and regular expressions share the same structure.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1144	These pairs all contain adequate information to
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1145	obtain the final lexing result.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1146	For example, in the leftmost pair the
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1147	lexical information is condensed in
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1148	$v_0$, whereas for the rightmost pair,
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	1149	the lexing result hides is in the bitcodes of $a_n$.
dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	1150
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1151	The function $\bmkeps$ is able to extract from $a_n$ the result
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1152	by looking for nullable parts of the regular expression.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1153	However for regular expressions $a_i$ in general,
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1154	they might not necessarily be nullable.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1155	For those regular expressions that are not nullable,
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1156	$\bmkeps$ will not work.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1157	Therefore they need some additional ``help''
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1158	finding the POSIX bit-encoding.
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1159	The most straightforward ``help'' comes from $a_i$'s corresponding
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1160	value $v_i$, and this suggests a function $f$ satisfying the
9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1161	following properties:
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1162	\begin{itemize}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1163	\item
581 9db2500629be chap3 almost done Chengsong parents: 580 diff changeset	1164	$f \; a_i\;v_i = f \; a_n \; v_n = \bmkeps \; a_n$%\decode \; (\bmkeps \; a_n) \; (\erase \; a_0)$
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1165	\item
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1166	$f \; a_i\;v_i = f \; a_0 \; v_0 = \code \; v_0$ %\decode \;(\code \; v_0) \; (\erase \; a_0)$
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1167	\end{itemize}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1168	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1169	Sulzmann and Lu came up with the following definition for $f$ satisfying
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1170	the above
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1171	requirements, and named it \emph{retrieve}:
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1172	\vspace{-7mm}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1173	\begin{center}
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1174	\begin{tabular}{llcl}
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1175	$\retrieve \; \, _{bs} \ONE$ & $ \Empty$ & $\dn$ & $\textit{bs}$\\
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1176
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1177	$\retrieve \; \, _{bs} \mathbf{c}$ & $ (\Char \; c) $ & $\dn$ & $ \textit{bs}$\\
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1178
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1179	$\retrieve \; \, _{bs} a_1 \cdot a_2$ & $ (\Seq \; v_1 \; v_2) $ &
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1180	$\dn$ & $\textit{bs} \; @\; (\retrieve \; a_1\; v_1)\; @ \;(\retrieve \; a_2 \; v_2)$\\
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1181
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1182	$\retrieve \; \, _{bs} \Sigma (a :: \textit{as})$ & $ (\Left \; v) $ &
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1183	$\dn$ & $\textit{bs}\; @\; (\retrieve \; a \; v)$\\
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1184
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1185	$\retrieve \; \, _{bs} \Sigma (a :: \textit{as})$ & $ (\Right \; v) $ &
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1186	$\dn$ & $\textit{bs}\; @\; (\retrieve \; (_{[]}\Sigma \textit{as}) \; v)$\\
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1187
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1188	$\retrieve \; \, _{bs} a^* $ & $ (\Stars \; (v :: vs)) $ & $\dn$ &
585 4969ef817d92 chap4 more Chengsong parents: 582 diff changeset	1189	$\textit{bs}\; @\; [Z] \; @ \; (\retrieve \; a \; v)\; @ \;
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1190	(\retrieve \; (_{[]} a^*) \; (\Stars \; vs) )$\\
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1191
585 4969ef817d92 chap4 more Chengsong parents: 582 diff changeset	1192	$\retrieve \; \, _{bs} a^* $ & $ (\Stars \; []) $ & $\dn$ & $\textit{bs} \; @ \; [S]$
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1193	\end{tabular}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1194	\end{center}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1195	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1196	As stated, $\retrieve$ collects the right bit-codes from the
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1197	final derivative $a_n$, guided by $v_n$. This can be proved
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1198	as follows:
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1199	\begin{lemma}\label{bmkepsRetrieve}
657 00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1200	\mbox{}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1201	\begin{itemize}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1202	\item
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1203	If $\forall v \in vs. \vdash v : r$, and $\code \; v = \retrieve \; (\internalise\; r) \; v$\\
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1204	then $\code \; (\Stars \; vs) = \retrieve \; _{[]}((\internalise \; r)^*) \; (\Stars \; vs)$
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1205	\item
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1206	$\vdash v : r \implies \retrieve \; (r)^\uparrow \; v = \code \; v$
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1207	\item
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1208	$\bnullable \; a \implies \bmkeps \; a = \retrieve \; a \; (\mkeps \; (a_\downarrow))$
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1209	\end{itemize}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1210	\begin{proof}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1211	The first part is by induction on the value list $vs$.
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1212	The second part is by induction on $r$,
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1213	and the star case uses the first part.
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1214	The last part is by a routine induction on $a$.
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1215	\end{proof}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1216	\noindent
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1217	\end{lemma}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1218	\begin{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1219	By a routine induction on $a$.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1220	\end{proof}
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1221	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1222	%The design of $\retrieve$ enables us to extract bitcodes
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1223	%from both annotated regular expressions and values.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1224	%$\retrieve$ always ``sucks up'' all the information.
657 00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1225	%When more information is stored in the value, we would be able to
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1226	%extract more from the value, and vice versa.
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1227	%For example in star iterations, $\retrieve$ will be able to extract from $\Stars \; vs$
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1228	%exactly the same bitcodes as $\code \; (\Stars \; vs)$:
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1229	%\begin{lemma}\label{retrieveEncodeSTARS}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1230	% If $\forall v \in vs. \vdash v : r$, and $\code \; v = \retrieve \; (\internalise\; r) \; v$\\
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1231	% then $\code \; (\Stars \; vs) = \retrieve \; _{[]}((\internalise \; r)^*) \; (\Stars \; vs)$
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1232	%\end{lemma}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1233	%\begin{proof}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1234	% By induction on the value list $vs$.
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1235	%\end{proof}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1236	%\noindent
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1237	%Similarly, when the value list does not hold information,
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1238	%only the bitcodes plus some delimiter will be recorded:
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1239	%\begin{center}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1240	% $\retrieve \; _{bs}((\internalise \; r)^*) \; (\Stars \; [] ) = bs @ [S]$.
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1241	%\end{center}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1242	%In general, if we have a regular expression that is just internalised
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1243	%and the final lexing result value, we can $\retrieve$ the bitcodes
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1244	%that look as if we have en$\code$-ed the value as bitcodes:
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1245	%\begin{lemma}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1246	% $\vdash v : r \implies \retrieve \; (r)^\uparrow \; v = \code \; v$
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1247	%\end{lemma}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1248	%\begin{proof}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1249	% By induction on $r$.
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1250	% The star case uses lemma \ref{retrieveEncodeSTARS}.
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1251	%\end{proof}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1252	%\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1253	The following property of $\retrieve$ is crucial, as
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1254	it provides a "bridge" between $a_0$ and $a_n $ in the
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1255	lexing diagram by linking adjacent regular expressions $a_i$ and
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1256	$a_{i+1}$.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1257	The property says that one
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1258	can retrieve the same bits from a derivative
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1259	regular expression as those from
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1260	before the derivative took place,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1261	provided that the corresponding values are used respectively:
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1262	\begin{lemma}\label{retrieveStepwise}
657 00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1263	$\vdash v : (a\backslash c)_\downarrow \implies \retrieve \; (a \backslash c) \; v=
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1264	\retrieve \; a \; (\inj \; a_\downarrow\; c\; v)$
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1265	\end{lemma}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1266	\begin{proof}
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1267	We do an induction on $r$, generalising over $v$.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1268	The induction principle in our formalisation
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1269	is function $\erase$'s cases.
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1270	\end{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1271	\noindent
649 ef2b8abcbc55 more Chengsong parents: 640 diff changeset	1272	Intuitively the lemma can be understood as all lexical value
ef2b8abcbc55 more Chengsong parents: 640 diff changeset	1273	information being preserved and recoverable throughout each lexing step.
ef2b8abcbc55 more Chengsong parents: 640 diff changeset	1274	We shall see in the next chapter that this no longer
ef2b8abcbc55 more Chengsong parents: 640 diff changeset	1275	holds with simplifications which takes away certain sub-expressions
ef2b8abcbc55 more Chengsong parents: 640 diff changeset	1276	corresponding to non-POSIX lexical values.
ef2b8abcbc55 more Chengsong parents: 640 diff changeset	1277
657 00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1278	%Before we move on to the next
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1279	%helper function, we rewrite $\blexer$ in
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1280	%the following way which makes later proofs more convenient:
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1281	%\begin{lemma}\label{blexer_retrieve}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1282	%$\blexer \; r \; s = \decode \; (\retrieve \; (\internalise \; r) \; (\mkeps \; (r \backslash s) )) \; r$
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1283	%\end{lemma}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1284	%\begin{proof}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1285	% Using lemma \ref{bmkepsRetrieve}.
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1286	%\end{proof}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1287	%\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1288	The function $\retrieve$ allows connecting
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1289	between the intermediate
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1290	results $a_i$ and $a_{i+1}$ in $\blexer$.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1291	For plain regular expressions something similar is required.
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1292
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	1293	\subsection{The Function $\flex$}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1294	Ausaf and Urban defined an auxiliary function called $\flex$ for $\lexer$,
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1295	defined as
536 aff7bf93b9c7 comments addressed all Chengsong parents: 532 diff changeset	1296	\begin{center}
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1297	\begin{tabular}{lcl}
536 aff7bf93b9c7 comments addressed all Chengsong parents: 532 diff changeset	1298	$\flex \; r \; f \; [] \; v$ & $=$ & $f\; v$\\
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1299	$\flex \; r \; f \; (c :: s) \; v$ & $=$ & $\flex \; r \; (\lambda v. \, f (\inj \; r\; c\; v)) \;\; s \; v$
536 aff7bf93b9c7 comments addressed all Chengsong parents: 532 diff changeset	1300	\end{tabular}
aff7bf93b9c7 comments addressed all Chengsong parents: 532 diff changeset	1301	\end{center}
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1302	which accumulates the characters that need to be injected back,
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1303	and does the injection in a stack-like manner (the last character being chopped off in the derivatives phase is first injected).
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1304	The function
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1305	$\flex$ can calculate what $\lexer$ calculates, given the input regular expression
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1306	$r$, the identity function $id$, the input string $s$ and the value
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1307	$v_n= \mkeps \; (r\backslash s)$:
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1308	\begin{lemma}
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1309	$\flex \;\; r \;\; \textit{id}\;\; s \;\; (\mkeps \;(r\backslash s)) = \lexer \; r \; s$
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1310	\end{lemma}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1311	\begin{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1312	By reverse induction on $s$.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1313	\end{proof}
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1314	\noindent
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1315	The main advantage of using $\flex$
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1316	in $\lexer$ is that
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1317	$\flex$ makes the value $v$ and function $f$
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1318	that manipulates the value explicit parameters,
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1319	which ``exposes'' $\lexer$'s recursive call
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1320	arguments and allows us to ``set breakpoints'' and ``resume''
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1321	at any point during $\lexer$'s recursive calls. Therefore
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1322	the following stepwise property holds.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1323	\begin{lemma}\label{flexStepwise}
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1324	$\textit{flex} \;\; r \;\; f \;\; (s@[c]) \;\; v = \flex \;\; r \;\; f \;\; s \;\; (\inj \; (r\backslash s) \; c \; v) $
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1325	\end{lemma}
cc54ce075db5 restructured Chengsong parents: diff changeset	1326	\begin{proof}
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1327	By induction on the shape of $r\backslash s$.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1328	\end{proof}
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1329	\noindent
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1330	With $\flex$ and $\retrieve$,
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1331	we are ready to connect $\lexer$ and $\blexer$,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1332	giving the correctness proof for $\blexer$.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1333
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1334	%----------------------------------------------------------------------------------------
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1335	% SECTION correctness proof
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1336	%----------------------------------------------------------------------------------------
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1337	\section{Correctness of the Bit-coded Algorithm (Without Simplification)}
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1338	We can first show that
638 dd9dde2d902b comments till chap4 Chengsong parents: 624 diff changeset	1339	$\flex$ and $\blexer$ calculates the same value.
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1340	\begin{lemma}\label{flex_retrieve}
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1341	If $\vdash v: (r\backslash s)$, then\\
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1342	$\flex \; r \; \textit{id}\; s\; v = \decode \; (\retrieve \; (r\backslash s )\; v) \; r$
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1343	\end{lemma}
cc54ce075db5 restructured Chengsong parents: diff changeset	1344	\begin{proof}
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1345	By induction on $s$.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1346	We prove the interesting case where
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1347	both $\flex$ and $\decode$ successfully terminates
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1348	with some value.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1349	We take advantage of the stepwise properties
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1350	both sides.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1351	The induction tactic is reverse induction on string $s$.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1352	The inductive hypothesis says that $\flex \; r \; id \;s \; v =
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1353	\decode \; (\retrieve \; (r\backslash s) \; v) \; r$ holds,
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1354	where $v$ can be any value satisfying
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1355	the assumption $\vdash v: (r\backslash s)$.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1356	The crucial point is to rewrite
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1357	\[
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1358	\retrieve \;\; (r \backslash (s@[c])) \;\; (\mkeps\; (r \backslash (s@[c]) ))
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1359	\]
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1360	as
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1361	\[
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1362	\retrieve \;\; (r \backslash s)
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1363	\;\; (\inj \; (r \backslash s) \; c\; \mkeps (r \backslash (s@[c])))
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1364	\]
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1365	using lemma \ref{retrieveStepwise}.
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1366	This enables us to equate
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1367	\[
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1368	\retrieve \; (r \backslash (s@[c])) \; (\mkeps \; (r \backslash (s@[c]) ))
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1369	\]
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1370	with
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1371	\[
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1372	\flex \; r \; \textit{id} \; s \; (\inj \; (r\backslash s) \; c\; (\mkeps (r\backslash s@[c])))
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1373	\]
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1374	using IH,
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1375	which in turn can be rewritten as
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1376	\[
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1377	\flex \; r \; \textit{id} \; (s@[c]) \; (\mkeps \; (r \backslash (s@[c])))
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1378	\].
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1379	\end{proof}
582 3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1380	\noindent
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1381	With this pivotal lemma we can now link $\flex$ and $\blexer$
3e19073e91f4 chap3 done Chengsong parents: 581 diff changeset	1382	and finally give the correctness of $\blexer$--it outputs the same result as $\lexer$:
649 ef2b8abcbc55 more Chengsong parents: 640 diff changeset	1383	\begin{theorem}\label{blexerCorrect}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1384	$\blexer\; r \; s = \lexer \; r \; s$
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1385	\end{theorem}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1386	\begin{proof}
657 00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1387	We can rewrite the expression $\blexer \; r \; s$ by using lemma \ref{bmkepsRetrieve}:
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1388	\[
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1389	\blexer \; r \; s = \decode \; (\retrieve \; (\; r^\uparrow) \; (\mkeps \; (r \backslash s) )) \; r
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1390	\]
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1391	From \ref{flex_retrieve} we have that the left-hand-side is equal to
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1392	\[
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1393	\textit{flex} \; r \; \textit{id} \; s \; \mkeps \; (r \backslash s),
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1394	\]
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1395	which in turn equals
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1396	\[
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1397	\lexer \; r \; s.
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1398	\]
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1399	%which immediately implies this theorem.
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1400	\end{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1401	\noindent
657 00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1402	To piece things together
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1403	%and spell out the correctness
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1404	%result of the bitcoded lexer more explicitly,
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1405	and give the correctness result explicitly,
576 3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1406	we use the fact from the previous chapter that
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1407	\[
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1408	(r, s) \rightarrow v \;\; \textit{iff} \;\; \lexer \; r \; s =\Some \; v
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1409	\quad \quad
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1410	\nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \lexer \;r\;s = None
576 3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1411	\]
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1412	to obtain this corollary:
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1413	\begin{corollary}\label{blexerPOSIX}
657 00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1414	The $\blexer$ function correctly implements a POSIX lexer, namely
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1415	\begin{itemize}
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1416	\item
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1417	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \; s = \Some \; v$
00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1418	\item
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1419	$\nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \blexer \;r\;s = None$
657 00171b627b8d Fixed some annotated/unannotated a/r notation inconsistencies. Chengsong parents: 653 diff changeset	1420	\end{itemize}
576 3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1421	\end{corollary}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1422	Our main reason for analysing the bit-coded algorithm over
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 622 diff changeset	1423	the injection-based one is that it allows us to define
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1424	more aggressive simplifications.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1425	We will elaborate on this in the next chapter.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1426
cc54ce075db5 restructured Chengsong parents: diff changeset	1427

author	Chengsong
	Tue, 25 Jul 2023 17:28:29 +0100
changeset 667	660cf698eb26
parent 657	00171b627b8d
child 668	3831621d7b14
permissions	-rwxr-xr-x