811 & & $\;\;\textit{else}\;\textit{None}$ |
829 & & $\;\;\textit{else}\;\textit{None}$ |
812 \end{tabular} |
830 \end{tabular} |
813 \end{center} |
831 \end{center} |
814 |
832 |
815 \noindent |
833 \noindent |
|
834 $\blexer$ first attaches bitcodes to a |
|
835 plain regular expression, and then do successive derivatives |
|
836 with respect to the input string $s$, and |
|
837 then test whether the result is nullable. |
|
838 If yes, then extract the bitcodes out of the nullable expression, |
|
839 and decodes the bitcodes into a lexical value. |
|
840 If there does not exists a match between $r$ and $s$ the lexer |
|
841 outputs $\None$ indicating a failed lex.\\ |
816 Ausaf and Urban formally proved the correctness of the $\blexer$, namely |
842 Ausaf and Urban formally proved the correctness of the $\blexer$, namely |
817 \begin{property} |
843 \begin{property} |
818 $\blexer \;r \; s = \lexer \; r \; s$. |
844 $\blexer \;r \; s = \lexer \; r \; s$. |
819 \end{property} |
845 \end{property} |
|
846 \noindent |
820 This was claimed but not formalised in Sulzmann and Lu's work. |
847 This was claimed but not formalised in Sulzmann and Lu's work. |
821 We introduce the proof later, after we give all |
848 We introduce the proof later, after we give all |
822 the needed auxiliary functions and definitions. |
849 the needed auxiliary functions and definitions. |
823 But before this we shall first walk the reader |
850 \subsection{An Example $\blexer$ Run} |
|
851 Before introducing the proof we shall first walk the reader |
824 through a concrete example of our $\blexer$ calculating the right |
852 through a concrete example of our $\blexer$ calculating the right |
825 lexical information through bit-coded regular expressions.\\ |
853 lexical information through bit-coded regular expressions.\\ |
826 Consider the regular expression $(aa)^* \cdot bc$ matching the string $aabc$ |
854 Consider the regular expression $(aa)^* \cdot (b+c)$ matching the string $aab$. |
827 and assume we have just read the first character $a$: |
855 We give again the bird's eye view of this particular example |
828 \begin{center} |
856 in each stage of the algorithm: |
829 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
857 |
830 \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
858 \tikzset{three sided/.style={ |
831 {$\ONE \cdot a \cdot (aa)^* \cdot bc$ |
859 draw=none, |
832 \nodepart{two} $[] \iff \Seq \; (\Stars \; [\Seq\; a \; ??, \;]), ??$}; |
860 append after command={ |
833 \end{tikzpicture} |
861 [-,shorten <= -0.5\pgflinewidth] |
834 \end{center} |
862 ([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
835 \noindent |
863 edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
836 We use the red area for (annotated) regular expressions and the blue |
864 ([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
837 area the (partial) bitcodes and (partial) values. |
865 edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
838 In the injection-based lexing algorithm, we ``neglect" the red area |
866 ([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
839 by putting all the characters we have consumed and |
867 edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
840 intermediate regular expressions on the stack when |
868 } |
841 we go from left to right in the derivative phase. |
869 } |
842 The red area grows till the string is exhausted. |
870 } |
843 During the injection phase, the value in the blue area |
871 |
844 is built up incrementally, while the red area shrinks. |
872 \tikzset{three sided1/.style={ |
845 Before we have recovered all characters and intermediate |
873 draw=none, |
846 derivative regular expressions from the stack, |
874 append after command={ |
847 what values these characters and regular expressions correspond |
875 [-,shorten <= -0.5\pgflinewidth] |
848 to are unknown: |
876 ([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
849 \begin{center} |
877 edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
850 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
878 ([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
851 \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={white!30,blue!20},] |
879 edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
852 {$(\ONE \cdot \ONE) \cdot (aa)^* \cdot bc $ correspond to:$???$ |
880 ([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
853 \nodepart{two} $b c$ corresponds to $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$}; |
881 edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
854 %\caption{term 1 \ref{term:1}'s matching configuration} |
882 } |
855 \end{tikzpicture} |
883 } |
856 \end{center} |
884 } |
857 \noindent |
885 |
858 However, they should be calculable, |
886 \begin{figure}[H] |
859 as characters and regular expression shapes |
887 \begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick] |
860 after taking derivative w.r.t those characters |
888 \node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$}; |
861 have already been known, therefore in our example, |
889 \node [rectangle, draw] (a) at (-6, 4) {$(aa)^*(_{Z}b + _{S}c)$}; |
862 we know that the value starts with two $a$s, |
890 \path (r) |
863 and makes up to an iteration in a Kleene star: |
891 edge [] node {$\internalise$} (a); |
864 (We have put the injection-based lexing's partial |
892 \node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$}; |
865 result in the right part of the split rectangle |
893 \path (a) |
866 to contrast it with the partial valued produced |
894 edge [] node {$\backslash a$} (a1); |
867 in a forward manner) |
895 |
868 \begin{center} |
896 \node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$}; |
869 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
897 \node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$}; |
870 \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
898 \path (a1) |
871 {$\stackrel{Bitcoded}{\longrightarrow} \Seq(\Stars[\Char(a), \Char(a)], ???)$ |
899 edge [] node {$\backslash a$} (a21); |
872 \nodepart{two} $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$ $\stackrel{Inj}{\longleftarrow}$}; |
900 \node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$}; |
873 %\caption{term 1 \ref{term:1}'s matching configuration} |
901 \path (a22) |
874 \end{tikzpicture} |
902 edge [] node {$\backslash b$} (a3); |
875 \end{center} |
903 \path (a21) |
876 \noindent |
904 edge [dashed, bend right] node {} (a3); |
877 If we do this kind of "attachment" |
905 \node [rectangle, draw] (bs) at (2, 4) {$ZSZ$}; |
878 and each time augment the attached partially |
906 \path (a3) |
879 constructed value when taking off a |
907 edge [below] node {$\bmkeps$} (bs); |
880 character: |
908 \node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$}; |
881 \begin{center} |
909 \path (bs) |
882 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
910 edge [] node {$\decode$} (v); |
883 \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] (spPoint) |
911 |
884 {$\Seq(\Stars[\Char(a), \Char(a)], \ldots)$ |
912 |
885 \nodepart{two} Remaining: $b c$}; |
913 \end{tikzpicture} |
886 \end{tikzpicture}\\ |
914 \caption{$\blexer \;\;\;\; (aa)^*(b+c) \;\;\;\; aab$} |
887 $\downarrow$\\ |
915 \end{figure} |
888 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
916 \noindent |
889 \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
917 The one dashed arrow indicates that $_Z(\ONE \cdot (aa)^*)$ |
890 {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \ldots))$ |
918 turned into $ZS$ after derivative w.r.t $b$ |
891 \nodepart{two} Remaining: $c$}; |
919 is taken, which calls $\bmkeps$ on the nuallable $_Z\ONE\cdot (aa)^*$ |
892 \end{tikzpicture}\\ |
920 before processing $_Zb+_Sc$.\\ |
893 $\downarrow$\\ |
921 The annotated regular expressions |
894 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
922 would look too cumbersome if we explicitly indicate all the |
895 \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
923 locations where bitcodes are attached. |
896 {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \Char(c)))$ |
924 For example, |
897 \nodepart{two} EOF}; |
925 $(aa)^*\cdot (b+c)$ would |
898 \end{tikzpicture} |
926 look like $_{[]}(_{[]}(_{[]}a \cdot _{[]}a)^*\cdot _{[]}(_{[]}b+_{[]}c))$ |
899 \end{center} |
927 after |
900 \noindent |
928 internalise. |
901 In the end we could recover the value without a backward phase. |
929 Therefore for readability we omit bitcodes if they are empty. |
902 But (partial) values are a bit clumsy to stick together with a regular expression, so |
930 This applies to all example annotated |
903 we instead use bit-codes to encode them. |
931 regular expressions in this thesis.\\ |
904 |
932 %and assume we have just read the first character $a$: |
905 Bits and bitcodes (lists of bits) are defined as: |
933 %\begin{center} |
906 \begin{center} |
934 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
907 $b ::= S \mid Z \qquad |
935 % \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
908 bs ::= [] \mid b::bs |
936 % {$(_{[Z]}(\ONE \cdot a) \cdot (aa)^* )\cdot bc$ |
909 $ |
937 % \nodepart{two} $[Z] \iff \Seq \; (\Stars \; [\Seq\; a \; ??, \;??]) \; ??$}; |
910 \end{center} |
938 %\end{tikzpicture} |
911 |
939 %\end{center} |
912 \noindent |
940 %\noindent |
913 |
941 %We use the red area for (annotated) regular expressions and the blue |
|
942 %area the (partially calculated) bitcodes |
|
943 %and its corresponding (partial) values. |
|
944 %The first derivative |
|
945 %generates a $Z$ bitcode to indicate |
|
946 %a new iteration has been started. |
|
947 %This bitcode is attached to the front of |
|
948 %the unrolled star iteration $\ONE\cdot a$ |
|
949 %for later decoding. |
|
950 %\begin{center} |
|
951 %\begin{tikzpicture}[] |
|
952 % \node [rectangle split, rectangle split horizontal, |
|
953 % rectangle split parts=2, rectangle split part fill={red!30,blue!20}, draw, rounded corners, inner sep=10pt] |
|
954 % (der2) at (0,0) |
|
955 % {$(_{[Z]}(\ONE \cdot \ONE) \cdot (aa)^*) \cdot bc $ |
|
956 % \nodepart{two} $[Z] \iff \Seq \; (\Stars \; [\Seq \; a\;a, ??]) \; ??$}; |
|
957 % |
|
958 %\node [draw=none, minimum size = 0.1, ] (r) at (-7, 0) {$a_1$}; |
|
959 %\path |
|
960 % (r) |
|
961 % edge [->, >=stealth',shorten >=1pt, above] node {$\backslash a$} (der2); |
|
962 %%\caption{term 1 \ref{term:1}'s matching configuration} |
|
963 %\end{tikzpicture} |
|
964 %\end{center} |
|
965 %\noindent |
|
966 %After we take derivative with respect to |
|
967 %second input character $a$, the annotated |
|
968 %regular expression has the second $a$ chopped off. |
|
969 %The second derivative does not involve any |
|
970 %new bitcodes being generated, because |
|
971 %there are no new iterations or bifurcations |
|
972 %in the regular expression requiring any $S$ or $Z$ marker |
|
973 %to indicate choices. |
|
974 %\begin{center} |
|
975 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
|
976 % \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
977 % {$(_{[Z]}(\ONE \cdot \ONE) \cdot (aa)^*) \cdot (\ONE \cdot c) $ |
|
978 % \nodepart{two} $[Z] \iff \Seq \; (\Stars \; [\Seq \; a\;a, ??]) \; ??$}; |
|
979 %%\caption{term 1 \ref{term:1}'s matching configuration} |
|
980 %\end{tikzpicture} |
|
981 %\end{center} |
|
982 %\noindent |
|
983 % |
|
984 % |
|
985 %\begin{center} |
|
986 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
|
987 % \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
988 % {$\stackrel{Bitcoded}{\longrightarrow} \Seq(\Stars[\Char(a), \Char(a)], ???)$ |
|
989 % \nodepart{two} $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$ $\stackrel{Inj}{\longleftarrow}$}; |
|
990 %%\caption{term 1 \ref{term:1}'s matching configuration} |
|
991 %\end{tikzpicture} |
|
992 %\end{center} |
|
993 %\noindent |
|
994 %If we do this kind of "attachment" |
|
995 %and each time augment the attached partially |
|
996 %constructed value when taking off a |
|
997 %character: |
|
998 %\begin{center} |
|
999 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
|
1000 % \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] (spPoint) |
|
1001 % {$\Seq(\Stars[\Char(a), \Char(a)], \ldots)$ |
|
1002 % \nodepart{two} Remaining: $b c$}; |
|
1003 %\end{tikzpicture}\\ |
|
1004 %$\downarrow$\\ |
|
1005 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
|
1006 % \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
1007 % {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \ldots))$ |
|
1008 % \nodepart{two} Remaining: $c$}; |
|
1009 %\end{tikzpicture}\\ |
|
1010 %$\downarrow$\\ |
|
1011 %\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
|
1012 % \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] |
|
1013 % {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \Char(c)))$ |
|
1014 % \nodepart{two} EOF}; |
|
1015 %\end{tikzpicture} |
|
1016 %\end{center} |
|
1017 \noindent |
|
1018 In the next section we introduce the correctness proof |
|
1019 found by Ausaf and Urban |
|
1020 of the bitcoded lexer. |
914 %----------------------------------- |
1021 %----------------------------------- |
915 % SUBSECTION 1 |
1022 % SUBSECTION 1 |
916 %----------------------------------- |
1023 %----------------------------------- |
917 \section{Specifications of Some Helper Functions} |
1024 \section{Correctness Proof of $\textit{Blexer}$} |
|
1025 Why is $\blexer$ correct? |
|
1026 In other words, why is it the case that |
|
1027 $\blexer$ outputs the same value as $\lexer$? |
|
1028 Intuitively, |
|
1029 that is because |
|
1030 \begin{itemize} |
|
1031 \item |
|
1032 $\blexer$ follows an almost identical |
|
1033 path as that of $\lexer$, |
|
1034 for example $r_1, r_2, \ldots$ and $a_1, a_2, \ldots$ being produced, |
|
1035 which are the same up to the application of $\erase$. |
|
1036 \item |
|
1037 The bit-encodings work properly, |
|
1038 allowing the possibility of |
|
1039 pulling out the right lexical value from an |
|
1040 annotated regular expression at |
|
1041 any stage of the algorithm. |
|
1042 \end{itemize} |
|
1043 We will elaborate on this, with the help of |
|
1044 some helper functions such as $\retrieve$ and |
|
1045 $\flex$. |
|
1046 \subsection{Specifications of Some Helper Functions} |
918 The functions we introduce will give a more detailed glimpse into |
1047 The functions we introduce will give a more detailed glimpse into |
919 the lexing process, which might not be possible |
1048 the lexing process, which is not be possible |
920 using $\lexer$ or $\blexer$ themselves. |
1049 using $\lexer$ or $\blexer$ alone. |
921 The first function we shall look at is $\retrieve$. |
1050 \subsubsection{$\textit{Retrieve}$} |
922 \subsection{$\textit{Retrieve}$} |
1051 The first function we shall introduce is $\retrieve$. |
923 Our bit-coded lexer "retrieve"s the bitcodes using $\bmkeps$ |
1052 Sulzmann and Lu gave its definition, and |
924 after we finished doing all the derivatives: |
1053 Ausaf and Urban found |
|
1054 its usage in mechanised proofs. |
|
1055 Our bit-coded lexer ``retrieve''s the bitcodes using $\bmkeps$, |
|
1056 after all the derivatives has been taken: |
925 \begin{center} |
1057 \begin{center} |
926 \begin{tabular}{lcl} |
1058 \begin{tabular}{lcl} |
927 & & $\ldots$\\ |
1059 & & $\ldots$\\ |
928 & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ |
1060 & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ |
929 & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ |
1061 & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ |
930 & & $\ldots$ |
1062 & & $\ldots$ |
931 \end{tabular} |
1063 \end{tabular} |
932 \end{center} |
1064 \end{center} |
933 \noindent |
1065 \noindent |
934 Recall that $\bmkeps$ looks for the leftmost branch of an alternative |
1066 $\bmkeps$ retrieves the value $v$'s |
935 and completes a star's iterations by attaching a $Z$ at the end of the bitcodes |
1067 information in the format |
936 extracted. It "retrieves" a sequence by visiting both children and then stitch together |
1068 of bitcodes, by travelling along the |
937 two bitcodes using concatenation. After the entire tree structure of the regular |
1069 path of the regular expression that corresponds to a POSIX match, |
938 expression has been traversed using the above manner, we get a bitcode encoding the |
1070 collecting all the bitcodes. |
939 lexing result. |
|
940 We know that this "retrieved" bitcode leads to the correct value after decoding, |
1071 We know that this "retrieved" bitcode leads to the correct value after decoding, |
941 which is $v_0$ in the bird's eye view of the injection-based lexing diagram. |
1072 which is $v_0$ in the injection-based lexing diagram. |
942 Now assume we keep every other data structure in the diagram \ref{InjFigure}, |
1073 As an observation we pointed at the beginning of this section, |
943 and only replace all the plain regular expression by their annotated counterparts, |
1074 the annotated regular expressions generated in successive derivative steps |
944 computed during a $\blexer$ run. |
1075 in $\blexer$ after $\erase$ has the same structure |
945 Then we obtain a diagram for the annotated regular expression derivatives and |
1076 as those appeared in $\lexer$. |
946 their corresponding values, though the values are never calculated in $\blexer$. |
1077 We redraw the diagram below to visualise this fact. |
947 We have that $a_n$ contains all the lexing result information. |
1078 We pretend that all the values are |
|
1079 ready despite they are only calculated in $\lexer$. |
|
1080 In general we have $\vdash v_i:(a_i)_\downarrow$. |
948 \vspace{20mm} |
1081 \vspace{20mm} |
949 \begin{center}%\label{graph:injLexer} |
1082 \begin{figure}[H]%\label{graph:injLexer} |
|
1083 \begin{center} |
950 \begin{tikzcd}[ |
1084 \begin{tikzcd}[ |
951 every matrix/.append style = {name=p}, |
1085 every matrix/.append style = {name=p}, |
952 remember picture, overlay, |
1086 remember picture, overlay, |
953 ] |
1087 ] |
954 a_0 \arrow[r, "\backslash c_0"] \arrow[d] & a_1 \arrow[r, "\backslash c_1"] \arrow[d] & a_2 \arrow[r, dashed] \arrow[d] & a_n \arrow[d] \\ |
1088 a_0 \arrow[r, "\backslash c_0"] \arrow[d] & a_1 \arrow[r, "\backslash c_1"] \arrow[d] & a_2 \arrow[r, dashed] \arrow[d] & a_n \arrow[d] \\ |
955 v_0 & v_1 \arrow[l,"inj_{r_0} c_0"] & v_2 \arrow[l, "inj_{r_1} c_1"] & v_n \arrow[l, dashed] |
1089 v_0 & v_1 \arrow[l,"inj_{r_0} c_0"] & v_2 \arrow[l, "inj_{r_1} c_1"] & v_n \arrow[l, dashed] |
956 \end{tikzcd} |
1090 \end{tikzcd} |
|
1091 \end{center} |
957 \begin{tikzpicture}[ |
1092 \begin{tikzpicture}[ |
958 remember picture, overlay, |
1093 remember picture, overlay, |
959 E/.style = {ellipse, draw=blue, dashed, |
1094 E/.style = {ellipse, draw=blue, dashed, |
960 inner xsep=4mm,inner ysep=-4mm, rotate=90, fit=#1} |
1095 inner xsep=4mm,inner ysep=-4mm, rotate=90, fit=#1} |
961 ] |
1096 ] |
962 \node[E = (p-1-1) (p-2-1)] {}; |
1097 \node[E = (p-1-1) (p-2-1)] {}; |
963 \node[E = (p-1-4) (p-2-4)] {}; |
1098 \node[E = (p-1-4) (p-2-4)] {}; |
|
1099 \node[E = (p-1-2) (p-2-2)] {}; |
|
1100 \node[E = (p-1-3) (p-2-3)] {}; |
964 \end{tikzpicture} |
1101 \end{tikzpicture} |
965 |
1102 |
966 \end{center} |
1103 \end{figure} |
967 \vspace{20mm} |
1104 \vspace{20mm} |
968 \noindent |
1105 \noindent |
969 On the other hand, $v_0$ also encodes the correct lexing result, as we have proven for $\lexer$. |
1106 We encircle in the diagram all the pairs $v_i, a_i$ to show that these values |
970 Encircled in the diagram are the two pairs $v_0, a_0$ and $v_n, a_n$, which both |
1107 and regular expressions correspond to each other. |
971 encode the correct lexical result. Though for the leftmost pair, we have |
1108 For the leftmost pair, we have that the |
972 the information condensed in $v_0$ the value part, whereas for the rightmost pair, |
1109 lexical information is condensed in |
973 the information is concentrated on $a_n$. |
1110 $v_0$--the value part, whereas for the rightmost pair, |
974 We know that in the intermediate steps the pairs $v_i, a_i$, must in some way encode the complete |
1111 the lexing result is in the bitcodes of $a_n$. |
975 lexing information as well. Therefore, we need a unified approach to extract such lexing result |
1112 $\bmkeps$ is able to extract from $a_n$ the result |
976 from a value $v_i$ and its annotated regular expression $a_i$. |
1113 by looking for nullable parts of the regular expression, |
977 And the function $f$ must satisfy these requirements: |
1114 however for regular expressions $a_i$ in general |
|
1115 they might not necessarily be nullable and therefore |
|
1116 needs some ``help'' finding the POSIX bit-encoding. |
|
1117 The most straightforward ``help'' comes from $a_i$'s corresponding |
|
1118 value $v_i$, and this suggests a function $f$ satisfying the |
|
1119 following properties: |
978 \begin{itemize} |
1120 \begin{itemize} |
979 \item |
1121 \item |
980 $f \; a_i\;v_i = f \; a_n \; v_n = \decode \; (\bmkeps \; a_n) \; (\erase \; a_0)$ |
1122 $f \; a_i\;v_i = f \; a_n \; v_n = \bmkeps \; a_n$%\decode \; (\bmkeps \; a_n) \; (\erase \; a_0)$ |
981 \item |
1123 \item |
982 $f \; a_i\;v_i = f \; a_0 \; v_0 = v_0 = \decode \;(\code \; v_0) \; (\erase \; a_0)$ |
1124 $f \; a_i\;v_i = f \; a_0 \; v_0 = \decode \;(\code \; v_0) \; (\erase \; a_0)$ |
983 \end{itemize} |
1125 \end{itemize} |
984 \noindent |
1126 \noindent |
985 If we factor out the common part $\decode \; \_ \; (\erase \; a_0)$, |
1127 If we factor out the common part $\decode \; \_ \; (\erase \; a_0)$, |
986 The core of the function $f$ is something that produces the bitcodes |
1128 The core of the function $f$ is something that produces the bitcodes |
987 $\code \; v_0$. |
1129 $\code \; v_0$. |