ChengsongTanPhdThesis/Chapters/Inj.tex
changeset 667 660cf698eb26
parent 666 6da4516ea87d
child 668 3831621d7b14
equal deleted inserted replaced
666:6da4516ea87d 667:660cf698eb26
  1330 a regular expression is defined recursively as
  1330 a regular expression is defined recursively as
  1331 a set of strings:
  1331 a set of strings:
  1332 %TODO: FILL in the other defs
  1332 %TODO: FILL in the other defs
  1333 \begin{center}
  1333 \begin{center}
  1334 \begin{tabular}{lcl}
  1334 \begin{tabular}{lcl}
  1335 $L \; \ZERO$ & $\dn$ & $\phi$\\
  1335 $L \; \ZERO$ & $\dn$ & $\varnothing$\\
  1336 $L \; \ONE$ & $\dn$ & $\{[]\}$\\
  1336 $L \; \ONE$ & $\dn$ & $\{[]\}$\\
  1337 $L \; c$ & $\dn$ & $\{[c]\}$\\
  1337 $L \; c$ & $\dn$ & $\{[c]\}$\\
  1338 $L \; (r_1 + r_2)$ & $\dn$ & $ L \; r_1 \cup L \; r_2$\\
  1338 $L \; (r_1 + r_2)$ & $\dn$ & $ L \; r_1 \cup L \; r_2$\\
  1339 $L \; (r_1 \cdot r_2)$ & $\dn$ & $ L \; r_1 @ L \; r_2$\\
  1339 $L \; (r_1 \cdot r_2)$ & $\dn$ & $ L \; r_1 @ L \; r_2$\\
  1340 $L \; (r^*)$ & $\dn$ & $ (L\;r)*$
  1340 $L \; (r^*)$ & $\dn$ & $ (L\;r)*$
  2070 to inject the character back into.
  2070 to inject the character back into.
  2071 Once the character is
  2071 Once the character is
  2072 injected back to that sub-value; 
  2072 injected back to that sub-value; 
  2073 $\inj$ assembles all parts
  2073 $\inj$ assembles all parts
  2074 to form a new value.
  2074 to form a new value.
  2075 \subsection{An Example of how Injection Works}
  2075 
  2076 We will provide a few examples on why
       
  2077 
       
  2078 \begin{center}
       
  2079 	\begin{tabular}{lcl}
       
  2080 		$\inj \; (a\cdot b)\cdot c \; \Seq \; \ONE \; b$ & $=$ & $(a+e)$\\
       
  2081 		$A$ & $B$ & $C$
       
  2082 	\end{tabular}
       
  2083 \end{center}
       
  2084 For instance, the last clause is an
  2076 For instance, the last clause is an
  2085 injection into a sequence value $v_{i+1}$
  2077 injection into a sequence value $v_{i+1}$
  2086 whose second child
  2078 whose second child
  2087 value is a star and the shape of the 
  2079 value is a star and the shape of the 
  2088 regular expression $r_i$ before injection 
  2080 regular expression $r_i$ before injection 
  2108 Finally, 
  2100 Finally, 
  2109 $\inj \; r \;c \; v$ is prepended
  2101 $\inj \; r \;c \; v$ is prepended
  2110 to the previous list of iterations and then
  2102 to the previous list of iterations and then
  2111 wrapped under the $\Stars$
  2103 wrapped under the $\Stars$
  2112 constructor, giving us $\Stars \; ((\inj \; r \; c \; v) ::vs)$.
  2104 constructor, giving us $\Stars \; ((\inj \; r \; c \; v) ::vs)$.
       
  2105 
       
  2106 Putting all together, Sulzmann and Lu obtained the following algorithm
       
  2107 outlined in the injection-based lexing diagram \ref{graph:inj}:
       
  2108 \begin{center}
       
  2109 \begin{tabular}{lcl}
       
  2110 	$\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\;  \Some(\mkeps \; r) \; \textit{else} \; \None$\\
       
  2111 	$\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
       
  2112 	& & $\quad \phantom{\mid}\; \None \implies \None$\\
       
  2113 	& & $\quad \mid           \Some(v) \implies \Some(\inj \; r\; c\; v)$
       
  2114 \end{tabular}
       
  2115 \end{center}
       
  2116 \noindent
       
  2117 
       
  2118 \subsection{Examples on How Injection and Lexer Works}
       
  2119 We will provide a few examples on how $\inj$ and $\lexer$
       
  2120 works by showing their values in each recursive call on some 
       
  2121 concrete examples.
       
  2122 We start with the call $\lexer \; (a+aa)^*\cdot c\; aac$
       
  2123 on the lexer, note that the value's character constructor
       
  2124 $\Char \;c$ is abbreviated as $c$ for readability.
       
  2125 Similarly the last derivative's sub-expression
       
  2126 is abbreviated as $r_{=0}$\footnote{which is equal to
       
  2127 $((\ZERO + (\ZERO a+ \ZERO))\cdot(a+aa)^* + (\ZERO + \ZERO a)\cdot (a+aa)^*)+
       
  2128 ((\ZERO+\ZERO a)\cdot(a+aa)^* + (\ZERO+\ZERO a)\cdot(a+aa)^*),
       
  2129 $.}
       
  2130 whose language interpretation is equivalent to that of
       
  2131 $\ZERO$ and therefore not crucial to be displayed fully expanded,
       
  2132 as they will not be injected into.%our example run.
       
  2133 \begin{center}
       
  2134 	\begin{tabular}{lcl}
       
  2135 		$(a+\textcolor{magenta}{a}\textcolor{blue}{a})^* \cdot \textcolor{red}{c}$ & $\stackrel{\backslash \textcolor{magenta}{a}}{\rightarrow}$ & $((\ONE + \textcolor{magenta}{\ONE} \textcolor{blue}{a})\cdot (a+aa)^*)\cdot \textcolor{red}{c}+\ZERO$\\
       
  2136 				   %& $\stackrel{\rightarrow}{\backslash a}$ & $((\ONE + \ONE a)\cdot (a+aa)^*)\cdot c + \ZERO$\\
       
  2137 				    & $\stackrel{\backslash \textcolor{blue}{a}}{\rightarrow}$  &
       
  2138 				    $(((\ZERO+(\ZERO a+ \textcolor{blue}{\ONE}))\cdot (a+aa)^* + (\ONE+\ONE a)\cdot (a+aa)^* )\cdot \textcolor{red}{\mathbf{c}} + \ZERO)+\ZERO$\\
       
  2139 				    & $\stackrel{\backslash \textcolor{red}{c}}{\rightarrow}$  &
       
  2140 				    $((r_{=0}\cdot c + \textcolor{red}{\ONE})+\ZERO)+\ZERO$\\
       
  2141 				    & $\stackrel{\mkeps}{\rightarrow}$ & $\Left (\Left \; (\Right \; \textcolor{red}{\Empty}))$ \\
       
  2142 				   & $\stackrel{\inj \;\textcolor{red}{c} }{\rightarrow}$ & 
       
  2143 				   $\Left \; (\Left \; (\Seq \;(\Left \; (\Seq \; (\Right \; (\Right\; \textcolor{blue}{\Empty})) \; \Stars \, [])) \; \textcolor{red}{c}))$\\
       
  2144 				   & $\stackrel{\inj \;\textcolor{blue}{a}}{\rightarrow}$ & 
       
  2145 				   $\Left\; (\Seq \; (\Seq\; (\Right \; (\Seq \; \textcolor{magenta}{\Empty }\; \textcolor{blue}{ \mathbf{a}}) )
       
  2146 				   \;\Stars \,[]) \; \textcolor{red}{c})$\\
       
  2147 				   & $\stackrel{\inj \;\textcolor{magenta}{a}}{\rightarrow}$ & $\Seq \; (\Stars \; [\Right \; (\Seq \; \mathbf{\textcolor{magenta}{a}} \; \textcolor{blue}{a})]) \; \textcolor{red}{ c}$
       
  2148 
       
  2149 				   %$\inj \; r \; c\A$
       
  2150 		%$\inj \; (a\cdot b)\cdot c \; \Seq \; \ONE \; b$ & $=$ & $(a+e)$\\
       
  2151 	\end{tabular}
       
  2152 \end{center}
       
  2153 \noindent
       
  2154 We have assigned different colours for each character,
       
  2155 as well as their corresponding locations in values and regular expressions.
       
  2156 The most recently injected character is marked with a bold font.
       
  2157 %Their corresponding derivative steps have been marked with the same
       
  2158 %colour. We also mark the specific location where the coloured character
       
  2159 %was injected into with the same colour, before and after the injection occurred.
       
  2160 %TODO: can be also used to motivate injection->blexer in Bitcoded1
       
  2161 To show the details of how $\inj$ works,
       
  2162 we zoom in to the second injection above, 
       
  2163 illustrating the recursive calls involved: 
       
  2164 \begin{center}
       
  2165 	\begin{tabular}{lcl}
       
  2166 $\inj \;\quad ((\ONE + {\ONE} 
       
  2167 	\textcolor{blue}{a})\cdot (a+aa)^*)\cdot 
       
  2168 	c + \ZERO \; \quad \textcolor{blue}{a} \; $ & &\\
       
  2169 $\quad \Left \; 
       
  2170 (\Left \; (\Seq \;(\Left \; (\Seq \; (\Right \; (\Right\; 
       
  2171 	\textcolor{blue}{\Empty})) \; \Stars \, [])) \; c))$ & \\$=$\\
       
  2172 	$\Left \; (\inj \; ((\ONE + \ONE 
       
  2173 	\textcolor{blue}{a})\cdot (a+aa)^*)\cdot 
       
  2174 	c \quad  \textcolor{blue}{a} \quad (\Left \; (\Seq\; ( \Left \; (\Seq \; (\Right \; (\Right\; 
       
  2175 	\textcolor{blue}{\Empty})) \; \Stars \, []) ) \; c ) )\;\;)$ &\\ $=$\\
       
  2176 	$\Left \; (\Seq \; (\inj \; (\ONE + \ONE \textcolor{blue}{a})\cdot(a+aa)^* \quad \textcolor{blue}{a} \quad
       
  2177 	(\Left \; (\Seq \; (\Right \; (\Right\;\textcolor{blue}{\Empty})))) ) \; c ) $ & \\$=$\\
       
  2178 	$\Left \; (\Seq \; (\Seq \; (\inj \quad (\ONE + \ONE \textcolor{blue}{a}) \quad \textcolor{blue}{a}\quad \Right \;(\Right \; \textcolor{blue}{\Empty}) ) \; \Stars \,[])\; c)$ &\\ $=$ \\
       
  2179 	$\Left \; (\Seq \; (\Seq \; (\Right \; (\inj \; \ONE \textcolor{blue}{a} \quad \textcolor{blue}{a}\quad (\Right \;\textcolor{blue}{\Empty})))) \; \Stars \, [])$
       
  2180 																		   & \\ $=$\\
       
  2181 																		   $\Left \; (\Seq \; (\Seq \; (\Right \; (\Seq \; (\mkeps \; \ONE)\;(\inj \;\textcolor{blue}{a} \; \textcolor{blue}{a} \; \textcolor{blue}{\Empty})) ) ) \; \Stars \, [] )$
       
  2182 																		   & \\ $=$\\
       
  2183 																		   $\Left \; (\Seq \; (\Seq \; (\Right \; (\Seq \; \Empty \; \textcolor{blue}{a}) ) ) \; \Stars \, [] )$
       
  2184  	\end{tabular}
       
  2185 \end{center}
       
  2186 \noindent
       
  2187 We will now introduce the proofs related to $\inj$ and
       
  2188 $\lexer$. These proofs have originally been found by Ausaf et al.
       
  2189 in their 2016 work \cite{AusafDyckhoffUrban2016}.
       
  2190 Proofs to some of these lemmas have also been discussed in more detail in 
       
  2191 Ausaf's thesis \cite{Ausaf}.
       
  2192 Nevertheless, we still introduce these proofs, to make this thesis
       
  2193 self-contained as we show inductive variants used in these proofs break
       
  2194 after more simplifications are introduced.
       
  2195 %Also note that lemmas like \ref{}described in this thesis is a more faithful
       
  2196 %representation of the actual accompanying Isabelle formalisation, and 
       
  2197 %therefore 
       
  2198 
       
  2199 
  2113 
  2200 
  2114 Recall that lemma 
  2201 Recall that lemma 
  2115 \ref{mePosix} tells us that
  2202 \ref{mePosix} tells us that
  2116 $\mkeps$ always generates the POSIX value.
  2203 $\mkeps$ always generates the POSIX value.
  2117 The function $\inj$ preserves the POSIXness, provided
  2204 The function $\inj$ preserves the POSIXness, provided
  2210 			as the POSIX value for $a\cdot b$.
  2297 			as the POSIX value for $a\cdot b$.
  2211 	\end{itemize}
  2298 	\end{itemize}
  2212 	The star case can be proven similarly.
  2299 	The star case can be proven similarly.
  2213 \end{proof}
  2300 \end{proof}
  2214 \noindent
  2301 \noindent
  2215 Putting all together, Sulzmann and Lu obtained the following algorithm
  2302 
  2216 outlined in the diagram \ref{graph:inj}:
  2303 
  2217 \begin{center}
  2304 
  2218 \begin{tabular}{lcl}
  2305 
  2219 	$\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\;  \Some(\mkeps \; r) \; \textit{else} \; \None$\\
  2306 
  2220 	$\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
  2307 
  2221 	& & $\quad \phantom{\mid}\; \None \implies \None$\\
  2308 
  2222 	& & $\quad \mid           \Some(v) \implies \Some(\inj \; r\; c\; v)$
  2309 %TODO: Cut from previous lexer def, need to make coherent
  2223 \end{tabular}
       
  2224 \end{center}
       
  2225 \noindent
       
  2226 The central property of the $\lexer$ is that it gives the correct result
  2310 The central property of the $\lexer$ is that it gives the correct result
  2227 according to
  2311 according to
  2228 POSIX rules. 
  2312 POSIX rules. 
  2229 \begin{theorem}\label{lexerCorrectness}
  2313 \begin{theorem}\label{lexerCorrectness}
  2230 	The $\lexer$ based on derivatives and injections is correct: 
  2314 	The $\lexer$ based on derivatives and injections is correct: