ChengsongTanPhdThesis/Chapters/Bitcoded1.tex
changeset 581 9db2500629be
parent 580 e0f0a81f907b
child 582 3e19073e91f4
equal deleted inserted replaced
580:e0f0a81f907b 581:9db2500629be
   305 Sulzmann and Lu chose to  define a new datatype 
   305 Sulzmann and Lu chose to  define a new datatype 
   306 called \emph{annotated regular expression},
   306 called \emph{annotated regular expression},
   307 which condenses all the partial lexing information
   307 which condenses all the partial lexing information
   308 (that was originally stored in $r_i, c_{i+1}$ pairs)
   308 (that was originally stored in $r_i, c_{i+1}$ pairs)
   309 into bitcodes.
   309 into bitcodes.
       
   310 The bitcodes are then carried with the regular
       
   311 expression, and augmented or moved around
       
   312 as the lexing goes on.
       
   313 It becomes unnecessary
       
   314 to remember all the 
       
   315 intermediate expresssions, but only the most recent one
       
   316 with this bit-carrying regular expression.
   310 Annotated regular expressions 
   317 Annotated regular expressions 
   311 are defined as the following 
   318 are defined as the following 
   312 Isabelle datatype \footnote{ We use subscript notation to indicate
   319 Isabelle datatype \footnote{ We use subscript notation to indicate
   313 	that the bitcodes are auxiliary information that do not
   320 	that the bitcodes are auxiliary information that do not
   314 interfere with the structure of the regular expressions }:
   321 interfere with the structure of the regular expressions }:
   563   $\textit{bmkeps}\,(_{bs}a^*)$ & $\dn$ &
   570   $\textit{bmkeps}\,(_{bs}a^*)$ & $\dn$ &
   564      $bs \,@\, [Z]$
   571      $bs \,@\, [Z]$
   565 \end{tabular}    
   572 \end{tabular}    
   566 \end{center}    
   573 \end{center}    
   567 \noindent
   574 \noindent
   568 $\bmkeps$ retrieves the value $v$'s
   575 $\bmkeps$, just like $\mkeps$, 
   569 information in the format
   576 visits a regular expression tree respecting
   570 of bitcodes, by travelling along the
   577 the POSIX rules. The difference, however, is that
   571 path of the regular expression that corresponds to a POSIX match,
   578 it does not create values, but only bitcodes.
   572 collecting all the bitcodes, and attaching $S$ to indicate the end of star
   579 It traverses each child of the sequence regular expression
   573 iterations. \\
   580 from left to right and creates a bitcode by stitching
       
   581 together bitcodes obtained from the children expressions.
       
   582 In the case of alternative regular expressions, 
       
   583 it looks for the leftmost
       
   584 $\nullable$ branch
       
   585 to visit and ignores other siblings.
       
   586 %Whenever there is some bitcodes attached to a
       
   587 %node, it returns the bitcodes concatenated with whatever
       
   588 %child recursive calls return.
       
   589 The only time when $\bmkeps$ creates new bitcodes
       
   590 is when it completes a star's iterations by attaching a $S$ to the end of the bitcode
       
   591 list it returns.\\
   574 The bitcodes extracted by $\bmkeps$ need to be 
   592 The bitcodes extracted by $\bmkeps$ need to be 
   575 $\decode$d (with the guidance of a plain regular expression):
   593 $\decode$d (with the guidance of a plain regular expression):
   576 %\begin{definition}[Bitdecoding of Values]\mbox{}
   594 %\begin{definition}[Bitdecoding of Values]\mbox{}
   577 \begin{center}
   595 \begin{center}
   578 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
   596 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
   811   & & $\;\;\textit{else}\;\textit{None}$
   829   & & $\;\;\textit{else}\;\textit{None}$
   812 \end{tabular}
   830 \end{tabular}
   813 \end{center}
   831 \end{center}
   814 
   832 
   815 \noindent
   833 \noindent
       
   834 $\blexer$ first attaches bitcodes to a
       
   835 plain regular expression, and then do successive derivatives
       
   836 with respect to the input string $s$, and
       
   837 then test whether the result is nullable.
       
   838 If yes, then extract the bitcodes out of the nullable expression,
       
   839 and decodes the bitcodes into a lexical value.
       
   840 If there does not exists a match between $r$ and $s$ the lexer
       
   841 outputs $\None$ indicating a failed lex.\\
   816 Ausaf and Urban formally proved the correctness of the $\blexer$, namely
   842 Ausaf and Urban formally proved the correctness of the $\blexer$, namely
   817 \begin{property}
   843 \begin{property}
   818 $\blexer \;r \; s = \lexer \; r \; s$.
   844 $\blexer \;r \; s = \lexer \; r \; s$.
   819 \end{property}
   845 \end{property}
       
   846 \noindent
   820 This was claimed but not formalised in Sulzmann and Lu's work.
   847 This was claimed but not formalised in Sulzmann and Lu's work.
   821 We introduce the proof later, after we give all
   848 We introduce the proof later, after we give all
   822 the needed auxiliary functions and definitions.
   849 the needed auxiliary functions and definitions.
   823 But before this we shall first walk the reader 
   850 \subsection{An Example $\blexer$ Run}
       
   851 Before introducing the proof we shall first walk the reader 
   824 through a concrete example of our $\blexer$ calculating the right 
   852 through a concrete example of our $\blexer$ calculating the right 
   825 lexical information through bit-coded regular expressions.\\
   853 lexical information through bit-coded regular expressions.\\
   826 Consider the regular expression $(aa)^* \cdot bc$ matching the string $aabc$
   854 Consider the regular expression $(aa)^* \cdot (b+c)$ matching the string $aab$.
   827 and assume we have just read the first character $a$:
   855 We give again the bird's eye view of this particular example 
   828 \begin{center}
   856 in each stage of the algorithm:
   829 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
   857 
   830     \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
   858 \tikzset{three sided/.style={
   831 	    {$\ONE \cdot a \cdot (aa)^* \cdot bc$ 
   859         draw=none,
   832 	    \nodepart{two} $[] \iff \Seq \; (\Stars \; [\Seq\; a \; ??, \;]), ??$};
   860         append after command={
   833 \end{tikzpicture}
   861             [-,shorten <= -0.5\pgflinewidth]
   834 \end{center}
   862             ([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
   835 \noindent
   863         edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) 
   836 We use the red area for (annotated) regular expressions and the blue 
   864             ([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
   837 area the (partial) bitcodes and (partial) values.
   865         edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)            
   838 In the injection-based lexing algorithm, we ``neglect" the red area
   866             ([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
   839 by putting all the characters we have consumed and
   867         edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
   840 intermediate regular expressions on the stack when 
   868         }
   841 we go from left to right in the derivative phase.
   869     }
   842 The red area grows till the string is exhausted.
   870 }
   843 During the injection phase, the value in the blue area
   871 
   844 is built up incrementally, while the red area shrinks.
   872 \tikzset{three sided1/.style={
   845 Before we have recovered all characters and intermediate
   873         draw=none,
   846 derivative regular expressions from the stack,
   874         append after command={
   847 what values these characters and regular expressions correspond 
   875             [-,shorten <= -0.5\pgflinewidth]
   848 to are unknown: 
   876             ([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
   849 \begin{center}
   877         edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) 
   850 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
   878             ([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
   851     \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={white!30,blue!20},]
   879         edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)            
   852 	    {$(\ONE \cdot \ONE) \cdot (aa)^* \cdot bc $ correspond to:$???$
   880             ([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
   853          \nodepart{two}  $b c$ corresponds to  $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$};
   881         edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
   854 %\caption{term 1 \ref{term:1}'s matching configuration}
   882         }
   855 \end{tikzpicture}
   883     }
   856 \end{center}
   884 }
   857 \noindent
   885 
   858 However, they should be calculable,
   886 \begin{figure}[H]
   859 as characters and regular expression shapes
   887 	\begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick]
   860 after taking derivative w.r.t those characters
   888 		\node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$};
   861 have already been known, therefore in our example,
   889 		\node [rectangle, draw] (a) at (-6, 4)	  {$(aa)^*(_{Z}b + _{S}c)$};
   862 we know that the value starts with two $a$s,
   890 		\path	(r)
   863 and makes up to an iteration in a Kleene star:
   891 			edge [] node {$\internalise$} (a);
   864 (We have put the injection-based lexing's partial 
   892 		\node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$};
   865 result in the right part of the split rectangle
   893 		\path	(a)
   866 to contrast it with the partial valued produced
   894 			edge [] node {$\backslash a$} (a1);
   867 in a forward manner)
   895 
   868 \begin{center}
   896 		\node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$};
   869 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
   897 		\node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$};
   870     \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
   898 		\path	(a1)
   871 	    {$\stackrel{Bitcoded}{\longrightarrow} \Seq(\Stars[\Char(a), \Char(a)], ???)$
   899 			edge [] node {$\backslash a$} (a21);
   872 	\nodepart{two} $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$  $\stackrel{Inj}{\longleftarrow}$};
   900 		\node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$};
   873 %\caption{term 1 \ref{term:1}'s matching configuration}
   901 		\path	(a22)
   874 \end{tikzpicture}
   902 			edge [] node {$\backslash b$} (a3);
   875 \end{center}
   903 		\path	(a21)
   876 \noindent
   904 			edge [dashed, bend right] node {} (a3);
   877 If we do this kind of "attachment"
   905 		\node [rectangle, draw] (bs) at (2, 4) {$ZSZ$};
   878 and each time augment the attached partially
   906 		\path	(a3)
   879 constructed value when taking off a 
   907 			edge [below] node {$\bmkeps$} (bs);
   880 character:
   908 		\node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$};
   881 \begin{center}
   909 		\path 	(bs)
   882 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
   910 			edge [] node {$\decode$} (v);
   883 	\node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] (spPoint)
   911 
   884         {$\Seq(\Stars[\Char(a), \Char(a)], \ldots)$
   912 
   885          \nodepart{two} Remaining: $b c$};
   913 	\end{tikzpicture}
   886 \end{tikzpicture}\\
   914 	\caption{$\blexer \;\;\;\; (aa)^*(b+c) \;\;\;\; aab$}
   887 $\downarrow$\\
   915 \end{figure}
   888 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
   916 \noindent
   889     \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
   917 The one dashed arrow indicates that $_Z(\ONE \cdot (aa)^*)$
   890         {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \ldots))$
   918 turned into $ZS$ after derivative w.r.t $b$
   891          \nodepart{two} Remaining: $c$};
   919 is taken, which calls $\bmkeps$ on the nuallable $_Z\ONE\cdot (aa)^*$
   892 \end{tikzpicture}\\
   920 before processing $_Zb+_Sc$.\\
   893 $\downarrow$\\
   921 The annotated regular expressions
   894 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
   922 would look too cumbersome if we explicitly indicate all the
   895     \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
   923 locations where bitcodes are attached.
   896         {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \Char(c)))$
   924 For example,
   897          \nodepart{two} EOF};
   925 $(aa)^*\cdot (b+c)$ would 
   898 \end{tikzpicture}
   926 look like $_{[]}(_{[]}(_{[]}a \cdot _{[]}a)^*\cdot _{[]}(_{[]}b+_{[]}c))$ 
   899 \end{center}
   927 after 
   900 \noindent
   928 internalise.
   901 In the end we could recover the value without a backward phase.
   929 Therefore for readability we omit bitcodes if they are empty. 
   902 But (partial) values are a bit clumsy to stick together with a regular expression, so 
   930 This applies to all example annotated 
   903 we instead use bit-codes to encode them.
   931 regular expressions in this thesis.\\
   904 
   932 %and assume we have just read the first character $a$:
   905 Bits and bitcodes (lists of bits) are defined as:
   933 %\begin{center}
   906 \begin{center}
   934 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
   907 		$b ::=   S \mid  Z \qquad
   935 %    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
   908 bs ::= [] \mid b::bs    
   936 %	    {$(_{[Z]}(\ONE \cdot a) \cdot (aa)^* )\cdot bc$ 
   909 $
   937 %	    \nodepart{two} $[Z] \iff \Seq \; (\Stars \; [\Seq\; a \; ??, \;??]) \; ??$};
   910 \end{center}
   938 %\end{tikzpicture}
   911 
   939 %\end{center}
   912 \noindent
   940 %\noindent
   913 
   941 %We use the red area for (annotated) regular expressions and the blue 
       
   942 %area the (partially calculated) bitcodes 
       
   943 %and its corresponding (partial) values.
       
   944 %The first derivative 
       
   945 %generates a $Z$ bitcode to indicate
       
   946 %a new iteration has been started.
       
   947 %This bitcode is attached to the front of
       
   948 %the unrolled star iteration $\ONE\cdot a$
       
   949 %for later decoding.
       
   950 %\begin{center}
       
   951 %\begin{tikzpicture}[]
       
   952 %    \node [rectangle split, rectangle split horizontal, 
       
   953 %	    rectangle split parts=2, rectangle split part fill={red!30,blue!20}, draw, rounded corners, inner sep=10pt]
       
   954 %	    (der2) at (0,0)
       
   955 %	    {$(_{[Z]}(\ONE \cdot \ONE) \cdot (aa)^*) \cdot bc $ 
       
   956 %	    \nodepart{two} $[Z] \iff \Seq \; (\Stars \; [\Seq \; a\;a, ??]) \; ??$};
       
   957 %
       
   958 %\node [draw=none, minimum size = 0.1, ] (r) at (-7, 0) {$a_1$};
       
   959 %\path
       
   960 %	(r)
       
   961 %	edge [->, >=stealth',shorten >=1pt, above] node {$\backslash a$} (der2);
       
   962 %%\caption{term 1 \ref{term:1}'s matching configuration}
       
   963 %\end{tikzpicture}
       
   964 %\end{center}
       
   965 %\noindent
       
   966 %After we take derivative with respect to 
       
   967 %second input character $a$, the annotated
       
   968 %regular expression has the second $a$ chopped off.
       
   969 %The second derivative does not involve any 
       
   970 %new bitcodes being generated, because
       
   971 %there are no new iterations or bifurcations
       
   972 %in the regular expression requiring any $S$ or $Z$ marker
       
   973 %to indicate choices.
       
   974 %\begin{center}
       
   975 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
       
   976 %    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
   977 %	    {$(_{[Z]}(\ONE \cdot \ONE) \cdot (aa)^*) \cdot (\ONE \cdot c) $ 
       
   978 %	    \nodepart{two} $[Z] \iff \Seq \; (\Stars \; [\Seq \; a\;a, ??]) \; ??$};
       
   979 %%\caption{term 1 \ref{term:1}'s matching configuration}
       
   980 %\end{tikzpicture}
       
   981 %\end{center}
       
   982 %\noindent
       
   983 %
       
   984 %
       
   985 %\begin{center}
       
   986 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
       
   987 %    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
   988 %	    {$\stackrel{Bitcoded}{\longrightarrow} \Seq(\Stars[\Char(a), \Char(a)], ???)$
       
   989 %	\nodepart{two} $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$  $\stackrel{Inj}{\longleftarrow}$};
       
   990 %%\caption{term 1 \ref{term:1}'s matching configuration}
       
   991 %\end{tikzpicture}
       
   992 %\end{center}
       
   993 %\noindent
       
   994 %If we do this kind of "attachment"
       
   995 %and each time augment the attached partially
       
   996 %constructed value when taking off a 
       
   997 %character:
       
   998 %\begin{center}
       
   999 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
       
  1000 %	\node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] (spPoint)
       
  1001 %        {$\Seq(\Stars[\Char(a), \Char(a)], \ldots)$
       
  1002 %         \nodepart{two} Remaining: $b c$};
       
  1003 %\end{tikzpicture}\\
       
  1004 %$\downarrow$\\
       
  1005 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
       
  1006 %    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
  1007 %        {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \ldots))$
       
  1008 %         \nodepart{two} Remaining: $c$};
       
  1009 %\end{tikzpicture}\\
       
  1010 %$\downarrow$\\
       
  1011 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
       
  1012 %    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
       
  1013 %        {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \Char(c)))$
       
  1014 %         \nodepart{two} EOF};
       
  1015 %\end{tikzpicture}
       
  1016 %\end{center}
       
  1017 \noindent
       
  1018 In the next section we introduce the correctness proof
       
  1019 found by Ausaf and Urban
       
  1020 of the bitcoded lexer.
   914 %-----------------------------------
  1021 %-----------------------------------
   915 %	SUBSECTION 1
  1022 %	SUBSECTION 1
   916 %-----------------------------------
  1023 %-----------------------------------
   917 \section{Specifications of Some Helper Functions}
  1024 \section{Correctness Proof of $\textit{Blexer}$}
       
  1025 Why is $\blexer$ correct?
       
  1026 In other words, why is it the case that 
       
  1027 $\blexer$ outputs the same value as $\lexer$?
       
  1028 Intuitively,
       
  1029 that is because 
       
  1030 \begin{itemize}
       
  1031 	\item
       
  1032 		$\blexer$ follows an almost identical
       
  1033 		path as that of $\lexer$, 
       
  1034 		for example $r_1, r_2, \ldots$ and $a_1, a_2, \ldots$ being produced,
       
  1035 		which are the same up to the application of $\erase$.
       
  1036 	\item
       
  1037 		The bit-encodings work properly,
       
  1038 		allowing the possibility of 
       
  1039 		pulling out the right lexical value from an
       
  1040 		annotated regular expression at 
       
  1041 		any stage of the algorithm.
       
  1042 \end{itemize}
       
  1043 We will elaborate on this, with the help of
       
  1044 some helper functions such as $\retrieve$ and
       
  1045 $\flex$.
       
  1046 \subsection{Specifications of Some Helper Functions}
   918 The functions we introduce will give a more detailed glimpse into 
  1047 The functions we introduce will give a more detailed glimpse into 
   919 the lexing process, which might not be possible
  1048 the lexing process, which is not be possible
   920 using $\lexer$ or $\blexer$ themselves.
  1049 using $\lexer$ or $\blexer$ alone.
   921 The first function we shall look at is $\retrieve$.
  1050 \subsubsection{$\textit{Retrieve}$}
   922 \subsection{$\textit{Retrieve}$}
  1051 The first function we shall introduce is $\retrieve$.
   923 Our bit-coded lexer "retrieve"s the bitcodes using $\bmkeps$
  1052 Sulzmann and Lu gave its definition, and
   924 after we finished doing all the derivatives:
  1053 Ausaf and Urban found
       
  1054 its usage in mechanised proofs.
       
  1055 Our bit-coded lexer ``retrieve''s the bitcodes using $\bmkeps$,
       
  1056 after all the derivatives has been taken:
   925 \begin{center}
  1057 \begin{center}
   926 \begin{tabular}{lcl}
  1058 \begin{tabular}{lcl}
   927 	& & $\ldots$\\
  1059 	& & $\ldots$\\
   928   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
  1060   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
   929   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
  1061   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
   930   & & $\ldots$
  1062   & & $\ldots$
   931 \end{tabular}
  1063 \end{tabular}
   932 \end{center}
  1064 \end{center}
   933 \noindent
  1065 \noindent
   934 Recall that $\bmkeps$ looks for the leftmost branch of an alternative
  1066 $\bmkeps$ retrieves the value $v$'s
   935 and completes a star's iterations by attaching a $Z$ at the end of the bitcodes
  1067 information in the format
   936 extracted. It "retrieves" a sequence by visiting both children and then stitch together 
  1068 of bitcodes, by travelling along the
   937 two bitcodes using concatenation. After the entire tree structure of the regular 
  1069 path of the regular expression that corresponds to a POSIX match,
   938 expression has been traversed using the above manner, we get a bitcode encoding the 
  1070 collecting all the bitcodes.
   939 lexing result.
       
   940 We know that this "retrieved" bitcode leads to the correct value after decoding,
  1071 We know that this "retrieved" bitcode leads to the correct value after decoding,
   941 which is $v_0$ in the bird's eye view of the injection-based lexing diagram.
  1072 which is $v_0$ in the injection-based lexing diagram.
   942 Now assume we keep every other data structure in the diagram \ref{InjFigure},
  1073 As an observation we pointed at the beginning of this section,
   943 and only replace all the plain regular expression by their annotated counterparts,
  1074 the annotated regular expressions generated in successive derivative steps
   944 computed during a $\blexer$ run.
  1075 in $\blexer$ after $\erase$ has the same structure 
   945 Then we obtain a diagram for the annotated regular expression derivatives and
  1076 as those appeared in $\lexer$.
   946 their corresponding values, though the values are never calculated in $\blexer$.
  1077 We redraw the diagram below to visualise this fact.
   947 We have that $a_n$ contains all the lexing result information.
  1078 We pretend that all the values are 
       
  1079 ready despite they are only calculated in $\lexer$.
       
  1080 In general we have $\vdash v_i:(a_i)_\downarrow$.
   948 \vspace{20mm}
  1081 \vspace{20mm}
   949 \begin{center}%\label{graph:injLexer}
  1082 \begin{figure}[H]%\label{graph:injLexer}
       
  1083 \begin{center}
   950 \begin{tikzcd}[
  1084 \begin{tikzcd}[
   951 	every matrix/.append style = {name=p},
  1085 	every matrix/.append style = {name=p},
   952 	remember picture, overlay,
  1086 	remember picture, overlay,
   953 	]
  1087 	]
   954 	a_0 \arrow[r, "\backslash c_0"]  \arrow[d] & a_1 \arrow[r, "\backslash c_1"] \arrow[d] & a_2 \arrow[r, dashed] \arrow[d] & a_n \arrow[d] \\
  1088 	a_0 \arrow[r, "\backslash c_0"]  \arrow[d] & a_1 \arrow[r, "\backslash c_1"] \arrow[d] & a_2 \arrow[r, dashed] \arrow[d] & a_n \arrow[d] \\
   955 v_0           & v_1 \arrow[l,"inj_{r_0} c_0"]                & v_2 \arrow[l, "inj_{r_1} c_1"]              & v_n \arrow[l, dashed]         
  1089 v_0           & v_1 \arrow[l,"inj_{r_0} c_0"]                & v_2 \arrow[l, "inj_{r_1} c_1"]              & v_n \arrow[l, dashed]         
   956 \end{tikzcd}
  1090 \end{tikzcd}
       
  1091 \end{center}
   957 \begin{tikzpicture}[
  1092 \begin{tikzpicture}[
   958 	remember picture, overlay,
  1093 	remember picture, overlay,
   959 E/.style = {ellipse, draw=blue, dashed,
  1094 E/.style = {ellipse, draw=blue, dashed,
   960             inner xsep=4mm,inner ysep=-4mm, rotate=90, fit=#1}
  1095             inner xsep=4mm,inner ysep=-4mm, rotate=90, fit=#1}
   961                         ]
  1096                         ]
   962 \node[E = (p-1-1) (p-2-1)] {};
  1097 \node[E = (p-1-1) (p-2-1)] {};
   963 \node[E = (p-1-4) (p-2-4)] {};
  1098 \node[E = (p-1-4) (p-2-4)] {};
       
  1099 \node[E = (p-1-2) (p-2-2)] {};
       
  1100 \node[E = (p-1-3) (p-2-3)] {};
   964 \end{tikzpicture}
  1101 \end{tikzpicture}
   965 
  1102 
   966 \end{center}
  1103 \end{figure}
   967 \vspace{20mm}
  1104 \vspace{20mm}
   968 \noindent
  1105 \noindent
   969 On the other hand, $v_0$ also encodes the correct lexing result, as we have proven for $\lexer$.
  1106 We encircle in the diagram  all the pairs $v_i, a_i$ to show that these values
   970 Encircled in the diagram  are the two pairs $v_0, a_0$ and $v_n, a_n$, which both 
  1107 and regular expressions correspond to each other.
   971 encode the correct lexical result. Though for the leftmost pair, we have
  1108 For the leftmost pair, we have that the 
   972 the information condensed in $v_0$ the value part, whereas for the rightmost pair,
  1109 lexical information is condensed in 
   973 the information is concentrated on $a_n$.
  1110 $v_0$--the value part, whereas for the rightmost pair,
   974 We know that in the intermediate steps the pairs $v_i, a_i$, must in some way encode the complete
  1111 the lexing result is in the bitcodes of $a_n$.
   975 lexing information as well. Therefore, we need a unified approach to extract such lexing result
  1112 $\bmkeps$ is able to extract from $a_n$ the result
   976 from a value $v_i$ and its annotated regular expression $a_i$. 
  1113 by looking for nullable parts of the regular expression,
   977 And the function $f$ must satisfy these requirements:
  1114 however for regular expressions $a_i$ in general
       
  1115 they might not necessarily be nullable and therefore
       
  1116 needs some ``help'' finding the POSIX bit-encoding.
       
  1117 The most straightforward ``help'' comes from $a_i$'s corresponding
       
  1118 value $v_i$, and this suggests a function $f$ satisfying the
       
  1119 following properties:
   978 \begin{itemize}
  1120 \begin{itemize}
   979 	\item
  1121 	\item
   980 		$f \; a_i\;v_i = f \; a_n \; v_n = \decode \; (\bmkeps \; a_n) \; (\erase \; a_0)$
  1122 		$f \; a_i\;v_i = f \; a_n \; v_n = \bmkeps \; a_n$%\decode \; (\bmkeps \; a_n) \; (\erase \; a_0)$
   981 	\item
  1123 	\item
   982 		$f \; a_i\;v_i = f \; a_0 \; v_0 = v_0 = \decode \;(\code \; v_0) \; (\erase \; a_0)$
  1124 		$f \; a_i\;v_i = f \; a_0 \; v_0 = \decode \;(\code \; v_0) \; (\erase \; a_0)$
   983 \end{itemize}
  1125 \end{itemize}
   984 \noindent
  1126 \noindent
   985 If we factor out the common part $\decode \; \_ \; (\erase \; a_0)$,
  1127 If we factor out the common part $\decode \; \_ \; (\erase \; a_0)$,
   986 The core of the function $f$ is something that produces the bitcodes
  1128 The core of the function $f$ is something that produces the bitcodes
   987 $\code \; v_0$.
  1129 $\code \; v_0$.