lexing: ChengsongTanPhdThesis/Chapters/Bitcoded1.tex@e0f0a81f907b (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	% Main chapter title
538 8016a2480704 intro and chap2 Chengsong parents: 537 diff changeset	4	\chapter{Bit-coded Algorithm of Sulzmann and Lu}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	5
cc54ce075db5 restructured Chengsong parents: diff changeset	6	\label{Bitcoded1} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
cc54ce075db5 restructured Chengsong parents: diff changeset	8	%simplifications and therefore introduce our version of the bitcoded algorithm and
cc54ce075db5 restructured Chengsong parents: diff changeset	9	%its correctness proof in
cc54ce075db5 restructured Chengsong parents: diff changeset	10	%Chapter 3\ref{Chapter3}.
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	11	In this chapter, we are going to describe the bit-coded algorithm
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	12	introduced by Sulzmann and Lu \parencite{Sulzmann2014} to address the growth problem of
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	13	regular expressions.
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	14	We have implemented their algorithm in Scala, and found out inefficiencies
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	15	in their algorithm such as de-duplication not working properly and redundant
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	16	fixpoint construction. Their algorithm is improved and verified with the help of
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	17	formal proofs.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	18	\section{The Motivation Behind Using Bitcodes}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	19	We first do a recap of what was going on
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	20	in the lexer algorithm in Chapter \ref{Inj},
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	21	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	22	\begin{tabular}{lcl}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	23	$\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\; \Some(\mkeps \; r) \; \textit{else} \; \None$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	24	$\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	25	& & $\quad \phantom{\mid}\; \None \implies \None$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	26	& & $\quad \mid \Some(v) \implies \Some(\inj \; r\; c\; v)$
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	27	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	28	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	29	\noindent
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	30	The algorithm recursively calls $\lexer$ on
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	31	each new character input,
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	32	and before starting a child call
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	33	it stores information of previous lexing steps
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	34	on a stack, in the form of regular expressions
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	35	and characters: $r_0$, $c_0$, $r_1$, $c_1$, etc.
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	36	Each descent into deeper recursive calls in $\lexer$
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	37	causes a new pair of $r_i, c_i$ to be pushed to the call stack.
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	38	\begin{figure}[H]
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	39	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	40	%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	41	\node [ circle ] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	42
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	43	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	44	\node [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	45	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	46	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	47	\node [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	48
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	49	\node [minimum width = 2cm, rectangle, draw] (stack) at (0, 3) {Stack};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	50
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	51	\path
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	52	(r)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	53	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	54
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	55	\path (r1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	56	edge [bend right, dashed] node {saved} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	57	\path (c1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	58	edge [bend right, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	59
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	60
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	61	\end{tikzpicture}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	62	\caption{First Derivative Taken}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	63	\end{figure}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	64
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	65
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	66
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	67	\begin{figure}[H]
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	68	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	69	%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	70	\node [ circle ] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	71
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	72	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	73	\node [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	74	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	75	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	76	\node [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	77
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	78
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	79	\node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	80	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	81	%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	82	\node [circle, draw] (r2) at (2, 5) {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	83	\node [minimum width = 3cm, minimum height = 1cm, rectangle, draw] (stack) at (0, 2) {\large Stack};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	84
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	85	\path
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	86	(r)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	87	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	88
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	89	\path (r2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	90	edge [bend right, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	91	\path (c2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	92	edge [bend right, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	93
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	94	\path (r1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	95	edge [] node {} (r2);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	96
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	97	\end{tikzpicture}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	98	\caption{Second Derivative Taken}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	99	\end{figure}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	100	\noindent
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	101	As the number of derivative steps increase,
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	102	the stack would increase:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	103
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	104	\begin{figure}[H]
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	105	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	106	%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	107	\node [ circle ] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	108
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	109	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	110	\node [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	111	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	112	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	113	\node [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	114
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	115
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	116	\node [circle, minimum size = 0.1, ] (c2) at (0, 5.4) {$c_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	117	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	118	%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	119	\node [circle, ] (r2) at (2, 5) {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	120	\node [minimum width = 4cm, minimum height = 2.5cm, rectangle, draw] (stack) at (0, 1) { \large Stack};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	121
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	122	\node [] (ldots) at (3.5, 5) {};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	123	%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	124
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	125	\node [minimum size = 0.5, circle, ] (rn) at (6, 5) {};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	126
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	127	\node (rldots) at ($(ldots)!.4!(rn)$) {\ldots};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	128
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	129	\path
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	130	(r)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	131	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	132
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	133	\path (rldots)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	134	edge [bend left, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	135
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	136	\path (r1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	137	edge [] node {} (r2);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	138
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	139	\path (r2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	140	edge [] node {} (ldots);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	141	\path (ldots)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	142	edge [bend left, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	143	\path (5.03, 4.9)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	144	edge [bend left, dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	145	\end{tikzpicture}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	146	\caption{More Derivatives Taken}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	147	\end{figure}
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	148
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	149
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	150	\begin{figure}[H]
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	151	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	152	%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	153	\node [radius = 0.5, circle, draw] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	154
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	155	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	156	\node [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	157	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	158	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	159	\node [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	160	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	161	%\node (0, 6) (c2) circle [radius = 0.3] {$c_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	162	\node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	163	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	164	%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	165	\node [circle, draw] (r2) at (2, 5) {$r_2$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	166	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	167	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	168	\node [] (ldots) at (4.5, 5) {};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	169	%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	170
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	171	\node [minimum size = 0.5, circle, draw] (rn) at (6, 5) {$r_n$};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	172
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	173	\node at ($(ldots)!.4!(rn)$) {\ldots};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	174
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	175
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	176
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	177
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	178	\node [minimum size = 6cm, rectangle, draw] (stack) at (0, 0) {\Huge Stack};
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	179
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	180	\path
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	181	(r)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	182	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	183	\path
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	184	(r1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	185	edge [] node {} (r2);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	186	\path (r2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	187	edge [] node {} (ldots);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	188	\path (r)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	189	edge [dashed, bend right] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	190
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	191	\path (r1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	192	edge [dashed, ] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	193
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	194	\path (c1)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	195	edge [dashed, bend right] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	196
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	197	\path (c2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	198	edge [dashed] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	199	\path (4.5, 5)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	200	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	201	\path (4.9, 5)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	202	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	203	\path (5.3, 5)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	204	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	205	\path (r2)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	206	edge [dashed, ] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	207	\path (rn)
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	208	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	209	\end{tikzpicture}
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	210	\caption{Before Injection Phase Starts}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	211	\end{figure}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	212
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	213
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	214	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	215	After all derivatives have been taken, the stack grows to a maximum size
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	216	and the pair of regular expressions and characters $r_i, c_{i+1}$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	217	are then popped out and used in the injection phase.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	218
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	219
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	220
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	221	\begin{figure}[H]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	222	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	223	%\draw (-6,-6) grid (6,6);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	224	\node [radius = 0.5, circle, ] (r) at (-6, 5) {$r$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	225
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	226	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	227	\node [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	228	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	229	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	230	\node [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	231	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	232	%\node (0, 6) (c2) circle [radius = 0.3] {$c_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	233	\node [circle, minimum size = 0.1, ] (c2) at (0, 5.4) {$c_2$};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	234	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	235	%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	236	\node [circle, ] (r2) at (2, 5) {$r_2$};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	237	%
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	238	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	239	\node [] (ldots) at (4.5, 5) {};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	240	%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	241
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	242	\node [minimum size = 0.5, circle, ] (rn) at (6, 5) {$r_n$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	243
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	244	\node at ($(ldots)!.4!(rn)$) {\ldots};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	245
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	246	\node [minimum size = 0.5, circle, ] (vn) at (6, -5) {$v_n$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	247
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	248	\node [] (ldots2) at (3.5, -5) {};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	249
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	250	\node [minimum size = 0.5, circle, ] (v2) at (2, -5) {$v_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	251
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	252	\node at ($(ldots2)!.4!(v2)$) {\ldots};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	253
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	254
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	255	\node [circle, ] (v1) at (-2, -5) {$v_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	256
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	257	\node [radius = 0.5, circle, ] (v) at (-6, -5) {$v$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	258
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	259	\node [minimum size = 6cm, rectangle, draw] (stack) at (0, 0) {\Huge Stack};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	260
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	261	\path
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	262	(r)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	263	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	264	\path
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	265	(r1)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	266	edge [] node {} (r2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	267	\path (r2)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	268	edge [] node {} (ldots);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	269	\path (rn)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	270	edge [] node {$\mkeps$} (vn);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	271	\path (vn)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	272	edge [] node {} (ldots2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	273	\path (v2)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	274	edge [] node {$\inj \; r_1 \; c_2\;v_2$} (v1);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	275
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	276	\path (v1)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	277	edge [] node {$\inj \; r \; c_1 \; v_1$} (v);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	278
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	279	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	280	edge [dashed] node {} (-4.2, -5.2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	281	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	282	edge [dashed] node {} (-4, -5.2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	283	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	284	edge [dashed] node {} (-0.1, -5.2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	285	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	286	edge [dashed] node {} (0.2, -5.26);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	287	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	288	edge [dashed] node {} (3.2, -5);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	289	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	290	edge [dashed] node {} (2.7, -5);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	291	\path (stack)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	292	edge [dashed] node {} (3.7, -5);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	293	\end{tikzpicture}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	294	\caption{Stored $r_i, c_{i+1}$ Used by $\inj$}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	295	\end{figure}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	296	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	297	Storing all $r_i, c_{i+1}$ pairs recursively
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	298	allows the algorithm to work in an elegant way, at the expense of
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	299	storing quite a bit of verbose information.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	300	The stack seems to grow at least quadratically fast with respect
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	301	to the input (not taking into account the size bloat of $r_i$),
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	302	which can be inefficient and prone to stack overflow.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	303	\section{Bitcoded Algorithm}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	304	To address this,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	305	Sulzmann and Lu chose to define a new datatype
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	306	called \emph{annotated regular expression},
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	307	which condenses all the partial lexing information
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	308	(that was originally stored in $r_i, c_{i+1}$ pairs)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	309	into bitcodes.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	310	Annotated regular expressions
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	311	are defined as the following
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	312	Isabelle datatype \footnote{ We use subscript notation to indicate
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	313	that the bitcodes are auxiliary information that do not
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	314	interfere with the structure of the regular expressions }:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	315	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	316	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	317	$\textit{a}$ & $::=$ & $\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	318	& $\mid$ & $_{bs}\ONE$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	319	& $\mid$ & $_{bs}{\bf c}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	320	& $\mid$ & $_{bs}\sum\,as$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	321	& $\mid$ & $_{bs}a_1\cdot a_2$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	322	& $\mid$ & $_{bs}a^*$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	323	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	324	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	325	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	326	where $bs$ stands for bit-codes, $a$ for $\mathbf{a}$nnotated regular
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	327	expressions and $as$ for lists of annotated regular expressions.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	328	The alternative constructor, written, $\sum$, has been generalised to
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	329	accept a list of annotated regular expressions rather than just two.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	330	Why is it generalised? This is because when we open up nested
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	331	alternatives, there could be more than two elements at the same level
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	332	after de-duplication, which can no longer be stored in a binary
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	333	constructor.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	334	Bits and bitcodes (lists of bits) are defined as:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	335	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	336	$b ::= S \mid Z \qquad
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	337	bs ::= [] \mid b::bs
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	338	$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	339	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	340	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	341	We use $S$ and $Z$ rather than $1$ and $0$ is to avoid
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	342	confusion with the regular expressions $\ZERO$ and $\ONE$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	343	The idea is to use the attached bitcodes
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	344	to indicate which choice was made at a certain point
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	345	during lexing.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	346	For example,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	347	$(_{Z}a+_{S}b) \backslash a$ gives us
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	348	$_{Z}\ONE + \ZERO$, this $Z$ bitcode will
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	349	later be used to decode that a left branch was
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	350	selected at the time when the part $a+b$ is being taken
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	351	derivative of.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	352	\subsection{A Bird's Eye View of the Bit-coded Lexer}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	353	Before we give out the rest of the functions and definitions
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	354	related to our
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	355	$\blexer$ (\emph{b}-itcoded lexer), we first provide a high-level
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	356	view of the algorithm, so the reader does not get lost in
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	357	the details.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	358	\begin{figure}[H]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	359	\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	360	%\draw (-6,-6) grid (6,6);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	361
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	362	\node [circle, draw] (r0) at (-6, 2) {$r$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	363
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	364	\node [radius = 0.5, circle, draw] (r) at (-6, 5) {$_{bs}a$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	365	\path (r0)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	366	edge [] node {$\internalise$} (r);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	367	%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	368	\node [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	369	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	370	%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	371	\node [minimum size = 1.0cm, circle, draw] (r1) at (-2, 5) {$_{bs_1}a_1$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	372	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	373	%\node (0, 6) (c2) circle [radius = 0.3] {$c_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	374	\node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	375	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	376	%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	377	\node [circle, draw, minimum size = 1.4cm] (r2) at (2, 5) {$_{bs_2}a_2$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	378	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	379	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	380	\node [] (ldots) at (4.5, 5) {};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	381	%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	382
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	383	\node [minimum size = 2.2cm, circle, draw] (rn) at (6, 5) {$_{bs_n}a_n$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	384
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	385	\node at ($(ldots)!.1!(rn)$) {\ldots};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	386
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	387	\node [minimum size = 0.5, circle, ] (v) at (6, 2) {$v$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	388
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	389	%\node [] (v2) at (4, -5) {};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	390	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	391	%\node [draw, cross out] (ldots2) at (5, -5) {};
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	392	%
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	393	%\node at ($(ldots2)!.4!(v2)$) {\ldots};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	394
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	395
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	396	\node [align = center] (decode) at (6.6, 3.2) {$\bmkeps$\\$\decode$};
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	397
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	398	\path (c1)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	399	edge [dashed, bend left] node {} (r0);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	400
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	401	\path (c2)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	402	edge [dashed, bend left] node {} (r0);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	403
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	404	\path (r1)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	405	edge [dashed, bend right] node {} (r2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	406
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	407
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	408	\path
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	409	(r)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	410	edge [dashed, bend right] node[left] {} (r1);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	411
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	412	\path
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	413	(r)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	414	edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	415
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	416	\path
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	417	(r1)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	418	edge [] node {} (r2);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	419	\path (r2)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	420	edge [] node {} (ldots);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	421	\path (rn)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	422	edge [] node {} (v);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	423
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	424	\path (r0)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	425	edge [dashed, bend right] node {} (decode);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	426	%\path (v)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	427	%edge [] node {} (ldots2);
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	428
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	429
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	430
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	431	\end{tikzpicture}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	432	\caption{Bird's Eye View of $\blexer$}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	433	\end{figure}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	434	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	435	The plain regular expressions
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	436	are first ``lifted'' to an annotated regular expression,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	437	with the function called \emph{internalise}.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	438	Then the annotated regular expression $_{bs}a$ will
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	439	go through successive derivatives with respect
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	440	to the input stream of characters $c_1, c_2$ etc.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	441	Each time a derivative is taken, the bitcodes
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	442	are moved around, discarded or augmented together
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	443	with the regular expression they are attached to.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	444	After all input has been consumed, the
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	445	bitcodes are collected by $\bmkeps$,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	446	which traverses the nullable part of the regular expression
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	447	and collect the bitcodes along the way.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	448	The collected bitcodes are then $\decode$d with the guidance
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	449	of the input regular expression $r$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	450	The most notable improvements of $\blexer$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	451	over $\lexer$ are
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	452	\begin{itemize}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	453	\item
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	454
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	455	An absence of the second injection phase.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	456	\item
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	457	One need not store each pair of the
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	458	intermediate regular expressions $_{bs_i}a_i$ and $c_{i+1}$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	459	The previous annotated regular expression $_{bs_i}a_i$'s information is passed
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	460	on without loss to its successor $_{bs_{i+1}}a_{i+1}$,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	461	and $c_{i+1}$'s already contained in the strings in $L\;r$ \footnote{
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	462	which can be easily recovered if we decode in the correct order}.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	463	\item
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	464	The simplification works much better--one can maintain correctness
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	465	while applying quite strong simplifications, as we shall wee.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	466	\end{itemize}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	467	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	468	In the next section we will
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	469	give the operations needed in $\blexer$,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	470	with some remarks on the idea behind their definitions.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	471	\subsection{Operations in $\textit{Blexer}$}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	472	The first operation we define related to bit-coded regular expressions
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	473	is how we move bits to the inside of regular expressions.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	474	Called $\fuse$, this operation attaches bit-codes
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	475	to the front of an annotated regular expression:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	476	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	477	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	478	$\textit{fuse}\;bs \; \ZERO$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	479	$\textit{fuse}\;bs\; _{bs'}\ONE$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	480	$_{bs @ bs'}\ONE$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	481	$\textit{fuse}\;bs\;_{bs'}{\bf c}$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	482	$_{bs@bs'}{\bf c}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	483	$\textit{fuse}\;bs\,_{bs'}\sum\textit{as}$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	484	$_{bs@bs'}\sum\textit{as}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	485	$\textit{fuse}\;bs\; _{bs'}a_1\cdot a_2$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	486	$_{bs@bs'}a_1 \cdot a_2$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	487	$\textit{fuse}\;bs\,_{bs'}a^*$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	488	$_{bs @ bs'}a^*$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	489	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	490	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	491
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	492	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	493	With \emph{fuse} we are able to define the $\internalise$ function
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	494	that translates a ``standard'' regular expression into an
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	495	annotated regular expression.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	496	This function will be applied before we start
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	497	with the derivative phase of the algorithm.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	498
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	499	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	500	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	501	$(\ZERO)^\uparrow$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	502	$(\ONE)^\uparrow$ & $\dn$ & $_{[]}\ONE$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	503	$(c)^\uparrow$ & $\dn$ & $_{[]}{\bf c}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	504	$(r_1 + r_2)^\uparrow$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	505	$_{[]}\sum[\textit{fuse}\,[Z]\,r_1^\uparrow,\,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	506	\textit{fuse}\,[S]\,r_2^\uparrow]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	507	$(r_1\cdot r_2)^\uparrow$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	508	$_{[]}r_1^\uparrow \cdot r_2^\uparrow$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	509	$(r^*)^\uparrow$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	510	$_{[]}(r^\uparrow)^*$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	511	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	512	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	513	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	514	We use an up arrow with postfix notation
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	515	to denote this operation
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	516	for convenience.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	517	The opposite of $\textit{internalise}$ is
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	518	$\erase$, where all the bit-codes are removed,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	519	and the alternative operator $\sum$ for annotated
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	520	regular expressions is transformed to the binary version
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	521	in plain regular expressions.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	522	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	523	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	524	$\ZERO_\downarrow$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	525	$( _{bs}\ONE )_\downarrow$ & $\dn$ & $\ONE$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	526	$( _{bs}\mathbf{c} )_\downarrow$ & $\dn$ & $\mathbf{c}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	527	$( _{bs} a_1 \cdot a_2 )_\downarrow$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	528	$ (a_1) _\downarrow \cdot (a_2) _\downarrow$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	529	$( _{bs} [])_\downarrow $ & $\dn$ & $\ZERO $\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	530	$( _{bs} [a] )_\downarrow$ & $\dn$ & $a_\downarrow$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	531	$_{bs} \sum [a_1, \; a_2]$ & $\dn$ & $ (a_1) _\downarrow + ( a_2 ) _\downarrow $\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	532	$(_{bs} \sum (a :: as))_\downarrow$ & $\dn$ & $ a_\downarrow + \; (_{[]} \sum as)_\downarrow$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	533	$( _{bs} a^* )_\downarrow$ & $\dn$ & $(a_\downarrow)^*$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	534	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	535	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	536	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	537	We also abbreviate the $\erase\; a$ operation
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	538	as $a_\downarrow$, for conciseness.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	539
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	540	Testing whether an annotated regular expression
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	541	contains the empty string in its lauguage requires
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	542	a dedicated function $\bnullable$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	543	$\bnullable$ simply calls $\erase$ before $\nullable$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	544	\begin{definition}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	545	$\bnullable \; a \dn \nullable \; (a_\downarrow)$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	546	\end{definition}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	547	The function for collecting
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	548	bitcodes from a
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	549	(b)nullable regular expression is called $\bmkeps$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	550	$\bmkeps$ is a variation of the $\textit{mkeps}$ function,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	551	which follows the path $\mkeps$ takes to traverse the
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	552	$\nullable$ branches when visiting a regular expression,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	553	but gives back bitcodes instead of a value.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	554	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	555	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	556	$\textit{bmkeps}\,(_{bs}\ONE)$ & $\dn$ & $bs$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	557	$\textit{bmkeps}\,(_{bs}\sum a::\textit{as})$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	558	$\textit{if}\;\textit{bnullable}\,a$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	559	& &$\textit{then}\;bs\,@\,\textit{bmkeps}\,a$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	560	& &$\textit{else}\;bs\,@\,\textit{bmkeps}\,(_{[]}\sum \textit{as})$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	561	$\textit{bmkeps}\,(_{bs} a_1 \cdot a_2)$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	562	$bs \,@\,\textit{bmkeps}\,a_1\,@\, \textit{bmkeps}\,a_2$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	563	$\textit{bmkeps}\,(_{bs}a^*)$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	564	$bs \,@\, [Z]$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	565	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	566	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	567	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	568	$\bmkeps$ retrieves the value $v$'s
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	569	information in the format
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	570	of bitcodes, by travelling along the
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	571	path of the regular expression that corresponds to a POSIX match,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	572	collecting all the bitcodes, and attaching $S$ to indicate the end of star
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	573	iterations. \\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	574	The bitcodes extracted by $\bmkeps$ need to be
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	575	$\decode$d (with the guidance of a plain regular expression):
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	576	%\begin{definition}[Bitdecoding of Values]\mbox{}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	577	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	578	\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	579	$\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	580	$\textit{decode}'\,bs\,(c)$ & $\dn$ & $(\Char\,c, bs)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	581	$\textit{decode}'\,(Z\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	582	$\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}\;
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	583	(\Left\,v, bs_1)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	584	$\textit{decode}'\,(S\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	585	$\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_2\;\textit{in}\;
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	586	(\Right\,v, bs_1)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	587	$\textit{decode}'\,bs\;(r_1\cdot r_2)$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	588	$\textit{let}\,(v_1, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	589	& & $\textit{let}\,(v_2, bs_2) = \textit{decode}'\,bs_1\,r_2$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	590	& & \hspace{35mm}$\textit{in}\;(\Seq\,v_1\,v_2, bs_2)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	591	$\textit{decode}'\,(Z\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	592	$\textit{decode}'\,(S\!::\!bs)\,(r^*)$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	593	$\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	594	& & $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	595	& & \hspace{35mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\bigskip\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	596
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	597	$\textit{decode}\,bs\,r$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	598	$\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	599	& & $\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	600	\textit{else}\;\textit{None}$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	601	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	602	\end{center}
537 50e590823220 more Chengsong parents: 536 diff changeset	603	\noindent
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	604	The function $\decode'$ returns a pair consisting of
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	605	a partially decoded value and some leftover bit list that cannot
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	606	be decide yet.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	607	The function $\decode'$ succeeds if the left-over
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	608	bit-sequence is empty.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	609	$\decode$ is terminating as $\decode'$ is terminating.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	610	$\decode'$ is terminating
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	611	because at least one of $\decode'$'s parameters will go down in terms
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	612	of size.\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	613	The reverse operation of $\decode$ is $\code$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	614	$\textit{code}$ encodes a value into a bitcode by converting
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	615	$\Left$ into $Z$, $\Right$ into $S$, and marks the start of any non-empty
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	616	star iteration by $S$. The border where a star iteration
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	617	terminates is marked by $Z$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	618	This coding is lossy, as it throws away the information about
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	619	characters, and does not encode the ``boundary'' between two
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	620	sequence values. Moreover, with only the bitcode we cannot even tell
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	621	whether the $S$s and $Z$s are for $\Left/\Right$ or $\Stars$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	622	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	623	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	624	$\textit{code}(\Empty)$ & $\dn$ & $[]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	625	$\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	626	$\textit{code}(\Left\,v)$ & $\dn$ & $Z :: code(v)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	627	$\textit{code}(\Right\,v)$ & $\dn$ & $S :: code(v)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	628	$\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	629	$\textit{code}(\Stars\,[])$ & $\dn$ & $[Z]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	630	$\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $S :: code(v) \;@\;
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	631	code(\Stars\,vs)$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	632	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	633	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	634	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	635	Assume we have a value $v$ and regular expression $r$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	636	with $\vdash v:r$,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	637	then we have the property that $\decode$ and $\code$ are
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	638	reverse operations of one another:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	639	\begin{lemma}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	640	\[If \vdash v : r \; then \;\decode \; (\code \; v) \; r = \textit{Some}(v) \]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	641	\end{lemma}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	642	\begin{proof}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	643	By proving a more general version of the lemma, on $\decode'$:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	644	\[\vdash v : r \implies \decode' \; ((\code \; v) @ ds) \; r = (v, ds) \]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	645	Then setting $ds$ to be $[]$ and unfolding $\decode$ definition,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	646	we obtain the property.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	647	\end{proof}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	648	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	649	Now we give out the central part of this lexing algorithm,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	650	the $\bder$ function (stands for \emph{b}itcoded-derivative):
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	651	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	652	\begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	653	$(\ZERO)\,\backslash c$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	654	$(_{bs}\ONE)\,\backslash c$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	655	$(_{bs}{\bf d})\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	656	$\textit{if}\;c=d\; \;\textit{then}\;
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	657	_{bs}\ONE\;\textit{else}\;\ZERO$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	658	$(_{bs}\sum \;\textit{as})\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	659	$_{bs}\sum\;(\textit{map} \; (\_\backslash c) \; as )$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	660	$(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	661	$\textit{if}\;\textit{bnullable}\,a_1$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	662	& &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	663	& &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	664	& &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	665	$(_{bs}a^*)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	666	$_{bs}(\textit{fuse}\, [Z] \; r\,\backslash c)\cdot
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	667	(_{[]}r^*))$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	668	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	669	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	670	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	671	For most time we use the infix notation $(\_\backslash\_)$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	672	to mean $\bder$ for brevity when
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	673	there is no danger of confusion with derivatives on plain regular expressions.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	674	For example, we write $( _{[]}r^* ) \backslash c$ instead of $\bder \;c \; _{[]}r^*$,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	675	as the bitcodes at the front of $r^*$ indicates that it is
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	676	a bit-coded regular expression, not a plain one.\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	677	$\bder$ tells us how regular expressions can be recursively traversed,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	678	where the bitcodes are augmented and carried around
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	679	when a derivative is taken.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	680	We give the intuition behind some of the more involved cases in
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	681	$\bder$. \\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	682	For example,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	683	in the \emph{star} case,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	684	a derivative on $_{bs}a^*$ means
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	685	that one more star iteratoin needs to be taken.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	686	we need to unfold it into a sequence,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	687	and attach an additional bit $Z$ to the front of $r \backslash c$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	688	as a record to indicate one new star iteration is unfolded.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	689
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	690	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	691	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	692	\begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	693	$(_{bs}a^*)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	694	$_{bs}(\underbrace{\textit{fuse}\, [Z] \; a\,\backslash c}_{\text{One more iteration}})\cdot
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	695	(_{[]}a^*))$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	696	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	697	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	698
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	699	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	700	This information will be recovered later by the $\decode$ function.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	701	The intuition is that the bit $Z$ will be decoded at the right location,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	702	because we accumulate bits from left to right (a rigorous proof will be given
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	703	later).
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	704
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	705	%\begin{tikzpicture}[ > = stealth, % arrow head style
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	706	% shorten > = 1pt, % don't touch arrow head to node
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	707	% semithick % line style
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	708	% ]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	709	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	710	% \tikzstyle{every state}=[
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	711	% draw = black,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	712	% thin,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	713	% fill = cyan!29,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	714	% minimum size = 7mm
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	715	% ]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	716	% \begin{scope}[node distance=1cm and 0cm, every node/.style=state]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	717	% \node (k) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	718	% {$bs$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	719	% \nodepart{two} $a^*$ };
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	720	% \node (l) [below =of k, rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	721	% { $bs$ + [Z]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	722	% \nodepart{two} $(a\backslash c )\cdot a^*$ };
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	723	% \end{scope}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	724	% \path[->]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	725	% (k) edge (l);
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	726	%\end{tikzpicture}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	727	%
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	728	%Pictorially the process looks like below.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	729	%Like before, the red region denotes
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	730	%previous lexing information (stored as bitcodes in $bs$).
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	731
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	732	%\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	733	% \begin{scope}[node distance=1cm]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	734	% \node (a) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	735	% {$bs$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	736	% \nodepart{two} $a^*$ };
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	737	% \node (b) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	738	% { $bs$ + [Z]
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	739	% \nodepart{two} $(a\backslash c )\cdot a^*$ };
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	740	%%\caption{term 1 \ref{term:1}'s matching configuration}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	741	% \end{scope}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	742	%\end{tikzpicture}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	743
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	744	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	745	Another place the $\bder$ function differs
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	746	from derivatives on plain regular expressions
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	747	is the sequence case:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	748	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	749	\begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	750
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	751	$(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	752	$\textit{if}\;\textit{bnullable}\,a_1$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	753	& &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	754	& &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	755	& &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	756	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	757	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	758	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	759	The difference is that (when $a_1$ is $\bnullable$)
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	760	we use $\bmkeps$ to store the lexing information
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	761	in $a_1$ before collapsing
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	762	it (as it has been fully matched by string prior to $c$),
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	763	and attach the collected bit-codes to the front of $a_2$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	764	before throwing away $a_1$. We assume that $\bmkeps$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	765	correctly extracts the bitcode for how $a_1$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	766	matches the string prior to $c$ (more on this later).
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	767	The bitsequence $\textit{bs}$ which was initially
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	768	attached to the first element of the sequence
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	769	$a_1 \cdot a_2$, has now been
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	770	elevated to the top level of the $\sum$ constructor.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	771	This is because this piece of information will be needed
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	772	whichever way the sequence is matched,
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	773	regardless of whether $c$ belongs to $a_1$ or $a_2$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	774
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	775	In the injection-based lexer, $r_1$ is immediately thrown away in
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	776	subsequent derivatives on the right branch (when $r_1$ is $\nullable$),
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	777	depite $r_1$ potentially storing information of previous matches:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	778	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	779	$(r_1 \cdot r_2 )\backslash c = (r_1 \backslash c) \cdot r_2 + r_2 \backslash c$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	780	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	781	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	782	this loss of information makes it necessary
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	783	to store on stack all the regular expressions'
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	784	``snapshots'' before they were
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	785	taken derivative of.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	786	So that the related information will be available
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	787	once the child recursive
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	788	call finishes.\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	789	The rest of the clauses of $\bder$ is rather similar to
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	790	$\der$. \\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	791	Generalising the derivative operation with bitcodes to strings, we have
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	792	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	793	\begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	794	$a\backslash_s [] $ & $\dn$ & $a$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	795	$a\backslash (c :: s) $ & $\dn$ & $(a \backslash c) \backslash_s s$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	796	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	797	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	798	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	799	As we did earlier, we omit the $s$ subscript at $\backslash_s$ when there is no danger
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	800	of confusion.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	801	\subsection{Putting Things Together}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	802	Putting these operations altogether, we obtain a lexer with bit-coded regular expressions
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	803	as its internal data structures, which we call $\blexer$:
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	804
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	805	\begin{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	806	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	807	$\textit{blexer}\;r\,s$ & $\dn$ &
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	808	$\textit{let}\;a = (r^\uparrow)\backslash s\;\textit{in}$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	809	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	810	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	811	& & $\;\;\textit{else}\;\textit{None}$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	812	\end{tabular}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	813	\end{center}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	814
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	815	\noindent
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	816	Ausaf and Urban formally proved the correctness of the $\blexer$, namely
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	817	\begin{property}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	818	$\blexer \;r \; s = \lexer \; r \; s$.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	819	\end{property}
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	820	This was claimed but not formalised in Sulzmann and Lu's work.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	821	We introduce the proof later, after we give all
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	822	the needed auxiliary functions and definitions.
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	823	But before this we shall first walk the reader
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	824	through a concrete example of our $\blexer$ calculating the right
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	825	lexical information through bit-coded regular expressions.\\
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	826	Consider the regular expression $(aa)^* \cdot bc$ matching the string $aabc$
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	827	and assume we have just read the first character $a$:
532 cc54ce075db5 restructured Chengsong parents: diff changeset	828	\begin{center}
537 50e590823220 more Chengsong parents: 536 diff changeset	829	\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
50e590823220 more Chengsong parents: 536 diff changeset	830	\node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	831	{$\ONE \cdot a \cdot (aa)^* \cdot bc$
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	832	\nodepart{two} $[] \iff \Seq \; (\Stars \; [\Seq\; a \; ??, \;]), ??$};
537 50e590823220 more Chengsong parents: 536 diff changeset	833	\end{tikzpicture}
50e590823220 more Chengsong parents: 536 diff changeset	834	\end{center}
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	835	\noindent
580 e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	836	We use the red area for (annotated) regular expressions and the blue
e0f0a81f907b halfway chap3 Chengsong parents: 579 diff changeset	837	area the (partial) bitcodes and (partial) values.
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	838	In the injection-based lexing algorithm, we ``neglect" the red area
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	839	by putting all the characters we have consumed and
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	840	intermediate regular expressions on the stack when
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	841	we go from left to right in the derivative phase.
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	842	The red area grows till the string is exhausted.
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	843	During the injection phase, the value in the blue area
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	844	is built up incrementally, while the red area shrinks.
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	845	Before we have recovered all characters and intermediate
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	846	derivative regular expressions from the stack,
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	847	what values these characters and regular expressions correspond
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	848	to are unknown:
537 50e590823220 more Chengsong parents: 536 diff changeset	849	\begin{center}
50e590823220 more Chengsong parents: 536 diff changeset	850	\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
50e590823220 more Chengsong parents: 536 diff changeset	851	\node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={white!30,blue!20},]
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	852	{$(\ONE \cdot \ONE) \cdot (aa)^* \cdot bc $ correspond to:$???$
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	853	\nodepart{two} $b c$ corresponds to $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$};
537 50e590823220 more Chengsong parents: 536 diff changeset	854	%\caption{term 1 \ref{term:1}'s matching configuration}
50e590823220 more Chengsong parents: 536 diff changeset	855	\end{tikzpicture}
50e590823220 more Chengsong parents: 536 diff changeset	856	\end{center}
50e590823220 more Chengsong parents: 536 diff changeset	857	\noindent
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	858	However, they should be calculable,
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	859	as characters and regular expression shapes
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	860	after taking derivative w.r.t those characters
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	861	have already been known, therefore in our example,
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	862	we know that the value starts with two $a$s,
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	863	and makes up to an iteration in a Kleene star:
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	864	(We have put the injection-based lexing's partial
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	865	result in the right part of the split rectangle
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	866	to contrast it with the partial valued produced
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	867	in a forward manner)
537 50e590823220 more Chengsong parents: 536 diff changeset	868	\begin{center}
50e590823220 more Chengsong parents: 536 diff changeset	869	\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
50e590823220 more Chengsong parents: 536 diff changeset	870	\node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	871	{$\stackrel{Bitcoded}{\longrightarrow} \Seq(\Stars[\Char(a), \Char(a)], ???)$
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	872	\nodepart{two} $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$ $\stackrel{Inj}{\longleftarrow}$};
537 50e590823220 more Chengsong parents: 536 diff changeset	873	%\caption{term 1 \ref{term:1}'s matching configuration}
50e590823220 more Chengsong parents: 536 diff changeset	874	\end{tikzpicture}
50e590823220 more Chengsong parents: 536 diff changeset	875	\end{center}
50e590823220 more Chengsong parents: 536 diff changeset	876	\noindent
50e590823220 more Chengsong parents: 536 diff changeset	877	If we do this kind of "attachment"
50e590823220 more Chengsong parents: 536 diff changeset	878	and each time augment the attached partially
50e590823220 more Chengsong parents: 536 diff changeset	879	constructed value when taking off a
50e590823220 more Chengsong parents: 536 diff changeset	880	character:
50e590823220 more Chengsong parents: 536 diff changeset	881	\begin{center}
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	882	\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	883	\node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] (spPoint)
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	884	{$\Seq(\Stars[\Char(a), \Char(a)], \ldots)$
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	885	\nodepart{two} Remaining: $b c$};
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	886	\end{tikzpicture}\\
3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	887	$\downarrow$\\
537 50e590823220 more Chengsong parents: 536 diff changeset	888	\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
50e590823220 more Chengsong parents: 536 diff changeset	889	\node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
50e590823220 more Chengsong parents: 536 diff changeset	890	{$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \ldots))$
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	891	\nodepart{two} Remaining: $c$};
537 50e590823220 more Chengsong parents: 536 diff changeset	892	\end{tikzpicture}\\
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	893	$\downarrow$\\
537 50e590823220 more Chengsong parents: 536 diff changeset	894	\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
50e590823220 more Chengsong parents: 536 diff changeset	895	\node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
50e590823220 more Chengsong parents: 536 diff changeset	896	{$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \Char(c)))$
50e590823220 more Chengsong parents: 536 diff changeset	897	\nodepart{two} EOF};
50e590823220 more Chengsong parents: 536 diff changeset	898	\end{tikzpicture}
50e590823220 more Chengsong parents: 536 diff changeset	899	\end{center}
50e590823220 more Chengsong parents: 536 diff changeset	900	\noindent
50e590823220 more Chengsong parents: 536 diff changeset	901	In the end we could recover the value without a backward phase.
50e590823220 more Chengsong parents: 536 diff changeset	902	But (partial) values are a bit clumsy to stick together with a regular expression, so
50e590823220 more Chengsong parents: 536 diff changeset	903	we instead use bit-codes to encode them.
50e590823220 more Chengsong parents: 536 diff changeset	904
50e590823220 more Chengsong parents: 536 diff changeset	905	Bits and bitcodes (lists of bits) are defined as:
50e590823220 more Chengsong parents: 536 diff changeset	906	\begin{center}
50e590823220 more Chengsong parents: 536 diff changeset	907	$b ::= S \mid Z \qquad
532 cc54ce075db5 restructured Chengsong parents: diff changeset	908	bs ::= [] \mid b::bs
cc54ce075db5 restructured Chengsong parents: diff changeset	909	$
cc54ce075db5 restructured Chengsong parents: diff changeset	910	\end{center}
cc54ce075db5 restructured Chengsong parents: diff changeset	911
cc54ce075db5 restructured Chengsong parents: diff changeset	912	\noindent
cc54ce075db5 restructured Chengsong parents: diff changeset	913
cc54ce075db5 restructured Chengsong parents: diff changeset	914	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	915	% SUBSECTION 1
cc54ce075db5 restructured Chengsong parents: diff changeset	916	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	917	\section{Specifications of Some Helper Functions}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	918	The functions we introduce will give a more detailed glimpse into
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	919	the lexing process, which might not be possible
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	920	using $\lexer$ or $\blexer$ themselves.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	921	The first function we shall look at is $\retrieve$.
543 b2bea5968b89 thesis_thys Chengsong parents: 542 diff changeset	922	\subsection{$\textit{Retrieve}$}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	923	Our bit-coded lexer "retrieve"s the bitcodes using $\bmkeps$
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	924	after we finished doing all the derivatives:
532 cc54ce075db5 restructured Chengsong parents: diff changeset	925	\begin{center}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	926	\begin{tabular}{lcl}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	927	& & $\ldots$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	928	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	929	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	930	& & $\ldots$
532 cc54ce075db5 restructured Chengsong parents: diff changeset	931	\end{tabular}
cc54ce075db5 restructured Chengsong parents: diff changeset	932	\end{center}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	933	\noindent
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	934	Recall that $\bmkeps$ looks for the leftmost branch of an alternative
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	935	and completes a star's iterations by attaching a $Z$ at the end of the bitcodes
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	936	extracted. It "retrieves" a sequence by visiting both children and then stitch together
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	937	two bitcodes using concatenation. After the entire tree structure of the regular
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	938	expression has been traversed using the above manner, we get a bitcode encoding the
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	939	lexing result.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	940	We know that this "retrieved" bitcode leads to the correct value after decoding,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	941	which is $v_0$ in the bird's eye view of the injection-based lexing diagram.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	942	Now assume we keep every other data structure in the diagram \ref{InjFigure},
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	943	and only replace all the plain regular expression by their annotated counterparts,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	944	computed during a $\blexer$ run.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	945	Then we obtain a diagram for the annotated regular expression derivatives and
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	946	their corresponding values, though the values are never calculated in $\blexer$.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	947	We have that $a_n$ contains all the lexing result information.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	948	\vspace{20mm}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	949	\begin{center}%\label{graph:injLexer}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	950	\begin{tikzcd}[
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	951	every matrix/.append style = {name=p},
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	952	remember picture, overlay,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	953	]
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	954	a_0 \arrow[r, "\backslash c_0"] \arrow[d] & a_1 \arrow[r, "\backslash c_1"] \arrow[d] & a_2 \arrow[r, dashed] \arrow[d] & a_n \arrow[d] \\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	955	v_0 & v_1 \arrow[l,"inj_{r_0} c_0"] & v_2 \arrow[l, "inj_{r_1} c_1"] & v_n \arrow[l, dashed]
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	956	\end{tikzcd}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	957	\begin{tikzpicture}[
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	958	remember picture, overlay,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	959	E/.style = {ellipse, draw=blue, dashed,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	960	inner xsep=4mm,inner ysep=-4mm, rotate=90, fit=#1}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	961	]
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	962	\node[E = (p-1-1) (p-2-1)] {};
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	963	\node[E = (p-1-4) (p-2-4)] {};
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	964	\end{tikzpicture}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	965
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	966	\end{center}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	967	\vspace{20mm}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	968	\noindent
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	969	On the other hand, $v_0$ also encodes the correct lexing result, as we have proven for $\lexer$.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	970	Encircled in the diagram are the two pairs $v_0, a_0$ and $v_n, a_n$, which both
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	971	encode the correct lexical result. Though for the leftmost pair, we have
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	972	the information condensed in $v_0$ the value part, whereas for the rightmost pair,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	973	the information is concentrated on $a_n$.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	974	We know that in the intermediate steps the pairs $v_i, a_i$, must in some way encode the complete
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	975	lexing information as well. Therefore, we need a unified approach to extract such lexing result
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	976	from a value $v_i$ and its annotated regular expression $a_i$.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	977	And the function $f$ must satisfy these requirements:
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	978	\begin{itemize}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	979	\item
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	980	$f \; a_i\;v_i = f \; a_n \; v_n = \decode \; (\bmkeps \; a_n) \; (\erase \; a_0)$
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	981	\item
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	982	$f \; a_i\;v_i = f \; a_0 \; v_0 = v_0 = \decode \;(\code \; v_0) \; (\erase \; a_0)$
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	983	\end{itemize}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	984	\noindent
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	985	If we factor out the common part $\decode \; \_ \; (\erase \; a_0)$,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	986	The core of the function $f$ is something that produces the bitcodes
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	987	$\code \; v_0$.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	988	It is unclear how, but Sulzmann and Lu came up with a function satisfying all the above
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	989	requirements, named \emph{retrieve}:
532 cc54ce075db5 restructured Chengsong parents: diff changeset	990
cc54ce075db5 restructured Chengsong parents: diff changeset	991
cc54ce075db5 restructured Chengsong parents: diff changeset	992
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	993	\begin{center}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	994	\begin{tabular}{lcr}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	995	$\retrieve \; \, (_{bs} \ONE) \; \, (\Empty)$ & $\dn$ & $\textit{bs}$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	996	$\retrieve \; \, (_{bs} \mathbf{c} ) \; \Char(c)$ & $\dn$ & $ \textit{bs}$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	997	$\retrieve \; \, (_{bs} a_1 \cdot a_2) \; \Seq(v_1, v_2)$ & $\dn$ & $\textit{bs} @ (\retrieve \; a_1\; v_1) @ (\retrieve \; a_2 \; v_2)$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	998	$\retrieve \; \, (_{bs} \Sigma (a :: \textit{as}) \; \,\Left(v)$ & $\dn$ & $\textit{bs} @ (\retrieve \; a \; v)$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	999	$\retrieve \; \, (_{bs} \Sigma (a :: \textit{as} \; \, \Right(v)$ & $\dn$ & $\textit{bs} @ (\retrieve \; (_{[]}\Sigma \textit{as}) \; v)$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1000	$\retrieve \; \, (_{bs} a^) \; \, (\Stars(v :: vs)) $ & $\dn$ & $\textit{bs} @ (\retrieve \; a \; v) @ (\retrieve \; (_{[]} a^) \; (\Stars(vs)))$\\
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1001	$\retrieve \; \, (_{bs} a^*) \; \, (\Stars([]) $ & $\dn$ & $\textit{bs} @ [Z]$
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1002	\end{tabular}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1003	\end{center}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1004	\noindent
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1005	As promised, $\retrieve$ collects the right bit-codes from the
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1006	final derivative $a_n$:
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1007	\begin{lemma}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1008	$\bnullable \; a \implies \bmkeps \; a = \retrieve \; a \; (\mkeps \; (\erase \; a))$
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1009	\end{lemma}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1010	\begin{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1011	By a routine induction on $a$.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1012	\end{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1013	The design of $\retrieve$ enables extraction of bit-codes
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1014	from not only $\bnullable$ (annotated) regular expressions,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1015	but also those that are not $\bnullable$.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1016	For example, if we have the regular expression just internalised
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1017	and the lexing result value, we could $\retrieve$ the bitcdoes
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1018	that look as if we have en$\code$-ed the value:
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1019	\begin{lemma}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1020	$\vdash v : r \implies \retrieve \; (r)^\uparrow \; v = \code \; v$
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1021	\end{lemma}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1022	\begin{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1023	By induction on $r$.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1024	\end{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1025	\noindent
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1026	The following property is more interesting, as
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1027	it provides a "bridge" between $a_0, v_0$ and $a_n, v_n$ in the
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1028	lexing diagram.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1029	If you take derivative of an annotated regular expression,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1030	you can $\retrieve$ the same bit-codes as before the derivative took place,
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1031	provided that you use the corresponding value:
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1032
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1033	\begin{lemma}\label{retrieveStepwise}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1034	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1035	\end{lemma}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1036	\begin{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1037	By induction on $r$, where $v$ is allowed to be arbitrary.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1038	The induction principle is function $\erase$'s cases.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1039	\end{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1040	\noindent
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1041	$\retrieve$ is connected to the $\blexer$ in the following way:
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1042	\begin{lemma}\label{blexer_retrieve}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1043	$\blexer \; r \; s = \decode \; (\retrieve \; (\internalise \; r) \; (\mkeps \; (r \backslash s) )) \; r$
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1044	\end{lemma}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1045	\noindent
564 3cbcd7cda0a9 more Chengsong parents: 543 diff changeset	1046	$\retrieve$ allows free navigation on the diagram \ref{InjFigure} for annotated regular expressiones of $\blexer$.
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1047	For plain regular expressions something similar is required as well.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1048
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1049	\subsection{$\flex$}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1050	Ausaf and Urban cleverly defined an auxiliary function called $\flex$ for $\lexer$,
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1051	defined as
536 aff7bf93b9c7 comments addressed all Chengsong parents: 532 diff changeset	1052	\begin{center}
aff7bf93b9c7 comments addressed all Chengsong parents: 532 diff changeset	1053	\begin{tabular}{lcr}
aff7bf93b9c7 comments addressed all Chengsong parents: 532 diff changeset	1054	$\flex \; r \; f \; [] \; v$ & $=$ & $f\; v$\\
aff7bf93b9c7 comments addressed all Chengsong parents: 532 diff changeset	1055	$\flex \; r \; f \; c :: s \; v$ & $=$ & $\flex \; r \; \lambda v. \, f (\inj \; r\; c\; v)\; s \; v$
aff7bf93b9c7 comments addressed all Chengsong parents: 532 diff changeset	1056	\end{tabular}
aff7bf93b9c7 comments addressed all Chengsong parents: 532 diff changeset	1057	\end{center}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1058	which accumulates the characters that needs to be injected back,
cc54ce075db5 restructured Chengsong parents: diff changeset	1059	and does the injection in a stack-like manner (last taken derivative first injected).
cc54ce075db5 restructured Chengsong parents: diff changeset	1060	$\flex$ is connected to the $\lexer$:
cc54ce075db5 restructured Chengsong parents: diff changeset	1061	\begin{lemma}
cc54ce075db5 restructured Chengsong parents: diff changeset	1062	$\flex \; r \; \textit{id}\; s \; \mkeps (r\backslash s) = \lexer \; r \; s$
cc54ce075db5 restructured Chengsong parents: diff changeset	1063	\end{lemma}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1064	\begin{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1065	By reverse induction on $s$.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1066	\end{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1067	$\flex$ provides us a bridge between $\lexer$'s intermediate steps.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1068	What is even better about $\flex$ is that it allows us to
cc54ce075db5 restructured Chengsong parents: diff changeset	1069	directly operate on the value $\mkeps (r\backslash v)$,
cc54ce075db5 restructured Chengsong parents: diff changeset	1070	which is pivotal in the definition of $\lexer $ and $\blexer$, but not visible as an argument.
cc54ce075db5 restructured Chengsong parents: diff changeset	1071	When the value created by $\mkeps$ becomes available, one can
cc54ce075db5 restructured Chengsong parents: diff changeset	1072	prove some stepwise properties of lexing nicely:
cc54ce075db5 restructured Chengsong parents: diff changeset	1073	\begin{lemma}\label{flexStepwise}
cc54ce075db5 restructured Chengsong parents: diff changeset	1074	$\textit{flex} \; r \; f \; s@[c] \; v= \flex \; r \; f\; s \; (\inj \; (r\backslash s) \; c \; v) $
cc54ce075db5 restructured Chengsong parents: diff changeset	1075	\end{lemma}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1076	\begin{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1077	By induction on the shape of $r\backslash s$
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1078	\end{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1079	\noindent
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1080	With $\flex$ and $\retrieve$ ready, we are ready to connect $\lexer$ and $\blexer$ .
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1081
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1082	\subsection{Correctness Proof of Bit-coded Algorithm}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1083	\begin{lemma}\label{flex_retrieve}
cc54ce075db5 restructured Chengsong parents: diff changeset	1084	$\flex \; r \; \textit{id}\; s\; v = \decode \; (\retrieve \; (r\backslash s )\; v) \; r$
cc54ce075db5 restructured Chengsong parents: diff changeset	1085	\end{lemma}
cc54ce075db5 restructured Chengsong parents: diff changeset	1086	\begin{proof}
cc54ce075db5 restructured Chengsong parents: diff changeset	1087	By induction on $s$. The induction tactic is reverse induction on strings.
cc54ce075db5 restructured Chengsong parents: diff changeset	1088	$v$ is allowed to be arbitrary.
cc54ce075db5 restructured Chengsong parents: diff changeset	1089	The crucial point is to rewrite
cc54ce075db5 restructured Chengsong parents: diff changeset	1090	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	1091	\retrieve \; (r \backslash s@[c]) \; \mkeps (r \backslash s@[c])
cc54ce075db5 restructured Chengsong parents: diff changeset	1092	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	1093	as
cc54ce075db5 restructured Chengsong parents: diff changeset	1094	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	1095	\retrieve \; (r \backslash s) \; (\inj \; (r \backslash s) \; c\; \mkeps (r \backslash s@[c]))
cc54ce075db5 restructured Chengsong parents: diff changeset	1096	\].
cc54ce075db5 restructured Chengsong parents: diff changeset	1097	This enables us to equate
cc54ce075db5 restructured Chengsong parents: diff changeset	1098	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	1099	\retrieve \; (r \backslash s@[c]) \; \mkeps (r \backslash s@[c])
cc54ce075db5 restructured Chengsong parents: diff changeset	1100	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	1101	with
cc54ce075db5 restructured Chengsong parents: diff changeset	1102	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	1103	\flex \; r \; \textit{id} \; s \; (\inj \; (r\backslash s) \; c\; (\mkeps (r\backslash s@[c])))
cc54ce075db5 restructured Chengsong parents: diff changeset	1104	\],
cc54ce075db5 restructured Chengsong parents: diff changeset	1105	which in turn can be rewritten as
cc54ce075db5 restructured Chengsong parents: diff changeset	1106	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	1107	\flex \; r \; \textit{id} \; s@[c] \; (\mkeps (r\backslash s@[c]))
cc54ce075db5 restructured Chengsong parents: diff changeset	1108	\].
cc54ce075db5 restructured Chengsong parents: diff changeset	1109	\end{proof}
cc54ce075db5 restructured Chengsong parents: diff changeset	1110
cc54ce075db5 restructured Chengsong parents: diff changeset	1111	With the above lemma we can now link $\flex$ and $\blexer$.
cc54ce075db5 restructured Chengsong parents: diff changeset	1112
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1113	%----------------------------------------------------------------------------------------
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1114	% SECTION correctness proof
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1115	%----------------------------------------------------------------------------------------
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1116	\section{Correctness of Bit-coded Algorithm (Without Simplification)}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1117	We now give the proof the correctness of the algorithm with bit-codes.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1118
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1119	\begin{lemma}\label{flex_blexer}
cc54ce075db5 restructured Chengsong parents: diff changeset	1120	$\textit{flex} \; r \; \textit{id} \; s \; \mkeps(r \backslash s) = \blexer \; r \; s$
cc54ce075db5 restructured Chengsong parents: diff changeset	1121	\end{lemma}
cc54ce075db5 restructured Chengsong parents: diff changeset	1122	\begin{proof}
cc54ce075db5 restructured Chengsong parents: diff changeset	1123	Using two of the above lemmas: \ref{flex_retrieve} and \ref{blexer_retrieve}.
cc54ce075db5 restructured Chengsong parents: diff changeset	1124	\end{proof}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1125	Finally the correctness of $\blexer$ is given as it outputs the same result as $\lexer$:
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1126	\begin{theorem}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1127	$\blexer\; r \; s = \lexer \; r \; s$
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1128	\end{theorem}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1129	\begin{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1130	Straightforward corollary of \ref{flex_blexer}.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1131	\end{proof}
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1132	\noindent
576 3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1133	To piece things together and spell out the exact correctness
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1134	of the bitcoded lexer
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1135	in terms of producing POSIX values,
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1136	we use the fact from the previous chapter that
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1137	\[
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1138	If \; (r, s) \rightarrow v \; then \; \lexer \; r \; s = v
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1139	\]
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1140	to obtain this corollary:
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1141	\begin{corollary}\label{blexerPOSIX}
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1142	$If \; (r, s) \rightarrow v \; then \blexer \; r \; s = v$
3e1b699696b6 thesis chap5 Chengsong parents: 575 diff changeset	1143	\end{corollary}
542 a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1144	Our main reason for wanting a bit-coded algorithm over
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1145	the injection-based one is for its capabilities of allowing
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1146	more aggressive simplifications.
a7344c9afbaf chapter3 finished Chengsong parents: 538 diff changeset	1147	We will elaborate on this in the next chapter.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1148
cc54ce075db5 restructured Chengsong parents: diff changeset	1149

author	Chengsong
	Wed, 17 Aug 2022 22:59:15 +0100
changeset 580	e0f0a81f907b
parent 579	35df9cdd36ca
child 581	9db2500629be
permissions	-rwxr-xr-x