ChengsongTanPhdThesis/Chapters/Bitcoded1.tex
changeset 657 00171b627b8d
parent 653 bc5571c38d1f
child 667 660cf698eb26
equal deleted inserted replaced
656:753a3b0ee02b 657:00171b627b8d
  1179 \noindent
  1179 \noindent
  1180 As stated, $\retrieve$ collects the right bit-codes from the 
  1180 As stated, $\retrieve$ collects the right bit-codes from the 
  1181 final derivative $a_n$, guided by $v_n$. This can be proved
  1181 final derivative $a_n$, guided by $v_n$. This can be proved
  1182 as follows:
  1182 as follows:
  1183 \begin{lemma}\label{bmkepsRetrieve}
  1183 \begin{lemma}\label{bmkepsRetrieve}
  1184 	$\bnullable \; a \implies \bmkeps \; a = \retrieve \; a \; (\mkeps \; (\erase \; a))$
  1184 	\mbox{}
       
  1185 	\begin{itemize}
       
  1186 		\item
       
  1187 If $\forall v \in vs. \vdash v : r$, and  $\code \; v = \retrieve \; (\internalise\; r) \; v$\\
       
  1188 then $\code \; (\Stars \; vs) = \retrieve \; _{[]}((\internalise \;  r)^*) \; (\Stars \; vs)$
       
  1189 		\item
       
  1190 	$\vdash v : r \implies \retrieve \; (r)^\uparrow \; v = \code \; v$
       
  1191 		\item
       
  1192 	$\bnullable \; a \implies \bmkeps \; a = \retrieve \; a \; (\mkeps \; (a_\downarrow))$
       
  1193 	\end{itemize}
       
  1194 \begin{proof}
       
  1195 	The first part is by induction on the value list $vs$.
       
  1196 	The second part is by induction on $r$,
       
  1197 	and the star case uses the first part.
       
  1198 	The last part is by a routine induction on $a$.
       
  1199 \end{proof}
       
  1200 \noindent
  1185 \end{lemma}
  1201 \end{lemma}
  1186 \begin{proof}
  1202 \begin{proof}
  1187 	By a routine induction on $a$.
  1203 	By a routine induction on $a$.
  1188 \end{proof}
  1204 \end{proof}
  1189 \noindent
  1205 \noindent
  1190 %The design of $\retrieve$ enables us to extract bitcodes
  1206 %The design of $\retrieve$ enables us to extract bitcodes
  1191 %from both annotated regular expressions and values.
  1207 %from both annotated regular expressions and values.
  1192 %$\retrieve$ always ``sucks up'' all the information.
  1208 %$\retrieve$ always ``sucks up'' all the information.
  1193 When more information is stored in the value, we would be able to
  1209 %When more information is stored in the value, we would be able to
  1194 extract more from the value, and vice versa.
  1210 %extract more from the value, and vice versa.
  1195 For example in star iterations, $\retrieve$ will be able to extract from $\Stars \; vs$
  1211 %For example in star iterations, $\retrieve$ will be able to extract from $\Stars \; vs$
  1196 exactly the same bitcodes as $\code \; (\Stars \; vs)$:
  1212 %exactly the same bitcodes as $\code \; (\Stars \; vs)$:
  1197 \begin{lemma}\label{retrieveEncodeSTARS}
  1213 %\begin{lemma}\label{retrieveEncodeSTARS}
  1198   If $\forall v \in vs. \vdash v : r$, and  $\code \; v = \retrieve \; (\internalise\; r) \; v$\\
  1214 %  If $\forall v \in vs. \vdash v : r$, and  $\code \; v = \retrieve \; (\internalise\; r) \; v$\\
  1199   then $\code \; (\Stars \; vs) = \retrieve \; _{[]}((\internalise \;  r)^*) \; (\Stars \; vs)$
  1215 %  then $\code \; (\Stars \; vs) = \retrieve \; _{[]}((\internalise \;  r)^*) \; (\Stars \; vs)$
  1200 \end{lemma}
  1216 %\end{lemma}
  1201 \begin{proof}
  1217 %\begin{proof}
  1202 	By induction on the value list $vs$.
  1218 %	By induction on the value list $vs$.
  1203 \end{proof}
  1219 %\end{proof}
  1204 \noindent
  1220 %\noindent
  1205 Similarly, when the value list does not hold information,
  1221 %Similarly, when the value list does not hold information,
  1206 only the bitcodes plus some delimiter will be recorded:
  1222 %only the bitcodes plus some delimiter will be recorded:
  1207 \begin{center}
  1223 %\begin{center}
  1208   $\retrieve \; _{bs}((\internalise \;  r)^*) \; (\Stars \; [] )  = bs @ [S]$.
  1224 %  $\retrieve \; _{bs}((\internalise \;  r)^*) \; (\Stars \; [] )  = bs @ [S]$.
  1209 \end{center}
  1225 %\end{center}
  1210 In general, if we have a regular expression that is just internalised
  1226 %In general, if we have a regular expression that is just internalised
  1211 and the final lexing result value, we can $\retrieve$ the bitcodes
  1227 %and the final lexing result value, we can $\retrieve$ the bitcodes
  1212 that look as if we have en$\code$-ed the value as bitcodes:
  1228 %that look as if we have en$\code$-ed the value as bitcodes:
  1213 \begin{lemma}
  1229 %\begin{lemma}
  1214 	$\vdash v : r \implies \retrieve \; (r)^\uparrow \; v = \code \; v$
  1230 %	$\vdash v : r \implies \retrieve \; (r)^\uparrow \; v = \code \; v$
  1215 \end{lemma}
  1231 %\end{lemma}
  1216 \begin{proof}
  1232 %\begin{proof}
  1217 	By induction on $r$.
  1233 %	By induction on $r$.
  1218 	The star case uses lemma \ref{retrieveEncodeSTARS}.
  1234 %	The star case uses lemma \ref{retrieveEncodeSTARS}.
  1219 \end{proof}
  1235 %\end{proof}
  1220 \noindent
  1236 %\noindent
  1221 The following property of $\retrieve$ is crucial, as
  1237 The following property of $\retrieve$ is crucial, as
  1222 it provides a "bridge" between $a_0$ and $a_n $ in the
  1238 it provides a "bridge" between $a_0$ and $a_n $ in the
  1223 lexing diagram by linking adjacent regular expressions $a_i$ and
  1239 lexing diagram by linking adjacent regular expressions $a_i$ and
  1224 $a_{i+1}$.
  1240 $a_{i+1}$.
  1225 The property says that one 
  1241 The property says that one 
  1226 can retrieve the same bits from a derivative 
  1242 can retrieve the same bits from a derivative 
  1227 regular expression as those from 
  1243 regular expression as those from 
  1228 before the derivative took place,
  1244 before the derivative took place,
  1229 provided that the corresponding values are used respectively:
  1245 provided that the corresponding values are used respectively:
  1230 \begin{lemma}\label{retrieveStepwise}
  1246 \begin{lemma}\label{retrieveStepwise}
  1231 	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
  1247 	$\vdash v : (a\backslash c)_\downarrow \implies \retrieve \; (a \backslash c)  \;  v= 
       
  1248 							\retrieve \; a \; (\inj \; a_\downarrow\; c\; v)$
  1232 \end{lemma}
  1249 \end{lemma}
  1233 \begin{proof}
  1250 \begin{proof}
  1234 	We do an induction on $r$, generalising over $v$.
  1251 	We do an induction on $r$, generalising over $v$.
  1235 	The induction principle in our formalisation
  1252 	The induction principle in our formalisation
  1236 	is function $\erase$'s cases.
  1253 	is function $\erase$'s cases.
  1240 information being preserved and recoverable throughout each lexing step.
  1257 information being preserved and recoverable throughout each lexing step.
  1241 We shall see in the next chapter that this no longer
  1258 We shall see in the next chapter that this no longer
  1242 holds with simplifications which takes away certain sub-expressions
  1259 holds with simplifications which takes away certain sub-expressions
  1243 corresponding to non-POSIX lexical values.
  1260 corresponding to non-POSIX lexical values.
  1244 
  1261 
  1245 Before we move on to the next
  1262 %Before we move on to the next
  1246 helper function, we rewrite $\blexer$ in
  1263 %helper function, we rewrite $\blexer$ in
  1247 the following way which makes later proofs more convenient:
  1264 %the following way which makes later proofs more convenient:
  1248 \begin{lemma}\label{blexer_retrieve}
  1265 %\begin{lemma}\label{blexer_retrieve}
  1249 $\blexer \; r \; s = \decode  \; (\retrieve \; (\internalise \; r) \; (\mkeps \; (r \backslash s) )) \; r$
  1266 %$\blexer \; r \; s = \decode  \; (\retrieve \; (\internalise \; r) \; (\mkeps \; (r \backslash s) )) \; r$
  1250 \end{lemma}
  1267 %\end{lemma}
  1251 \begin{proof}
  1268 %\begin{proof}
  1252 	Using lemma \ref{bmkepsRetrieve}.
  1269 %	Using lemma \ref{bmkepsRetrieve}.
  1253 \end{proof}
  1270 %\end{proof}
  1254 \noindent
  1271 %\noindent
  1255 The function $\retrieve$ allows connecting
  1272 The function $\retrieve$ allows connecting
  1256 between the intermediate 
  1273 between the intermediate 
  1257 results $a_i$ and $a_{i+1}$ in $\blexer$.
  1274 results $a_i$ and $a_{i+1}$ in $\blexer$.
  1258 For plain regular expressions something similar is required.
  1275 For plain regular expressions something similar is required.
  1259 
  1276 
  1349 and finally give the correctness of $\blexer$--it outputs the same result as $\lexer$:
  1366 and finally give the correctness of $\blexer$--it outputs the same result as $\lexer$:
  1350 \begin{theorem}\label{blexerCorrect}
  1367 \begin{theorem}\label{blexerCorrect}
  1351 	$\blexer\; r \; s = \lexer \; r \; s$
  1368 	$\blexer\; r \; s = \lexer \; r \; s$
  1352 \end{theorem}
  1369 \end{theorem}
  1353 \begin{proof}
  1370 \begin{proof}
  1354 	From \ref{flex_retrieve} we have that 
  1371 	We can rewrite the expression $\blexer \; r \; s$ by using lemma \ref{bmkepsRetrieve}:
  1355 	$\textit{flex} \; r \; \textit{id} \; s \; \mkeps(r \backslash s)  = \blexer \; r \; s$,
  1372 \[
  1356 	which immediately implies this theorem.
  1373 	\blexer \; r \; s = \decode  \; (\retrieve \; (\; r^\uparrow) \; (\mkeps \; (r \backslash s) )) \; r
       
  1374 \]
       
  1375 	From \ref{flex_retrieve} we have that the left-hand-side is equal to 
       
  1376 \[
       
  1377 	\textit{flex} \; r \; \textit{id} \; s \; \mkeps \; (r \backslash s),
       
  1378 \]
       
  1379 which in turn equals 
       
  1380 \[
       
  1381 	\lexer \; r \; s.
       
  1382 \]
       
  1383 	%which immediately implies this theorem.
  1357 \end{proof}
  1384 \end{proof}
  1358 \noindent
  1385 \noindent
  1359 To piece things together and spell out the correctness
  1386 To piece things together 
  1360 result of the bitcoded lexer more explicitly,
  1387 %and spell out the correctness
       
  1388 %result of the bitcoded lexer more explicitly,
       
  1389 and give the correctness result explicitly,
  1361 we use the fact from the previous chapter that
  1390 we use the fact from the previous chapter that
  1362 \[
  1391 \[
  1363 	 (r, s) \rightarrow v \;\; \textit{iff} \;\; \lexer \; r \; s =\Some \;  v
  1392 	 (r, s) \rightarrow v \;\; \textit{iff} \;\; \lexer \; r \; s =\Some \;  v
  1364 	 \quad \quad
  1393 	 \quad \quad
  1365 	 \nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \lexer \;r\;s = None
  1394 	 \nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \lexer \;r\;s = None
  1366 \]
  1395 \]
  1367 to obtain this corollary:
  1396 to obtain this corollary:
  1368 \begin{corollary}\label{blexerPOSIX}
  1397 \begin{corollary}\label{blexerPOSIX}
  1369 	The $\blexer$ function correctly implements a POSIX lexer, namely\\
  1398 	The $\blexer$ function correctly implements a POSIX lexer, namely
  1370 	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \; s = \Some \; v$\\
  1399 	\begin{itemize}
       
  1400 	\item
       
  1401 	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \; s = \Some \; v$
       
  1402 	\item
  1371 	$\nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \blexer \;r\;s = None$
  1403 	$\nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \blexer \;r\;s = None$
       
  1404 	\end{itemize}
  1372 \end{corollary}
  1405 \end{corollary}
  1373 Our main reason for analysing the bit-coded algorithm over 
  1406 Our main reason for analysing the bit-coded algorithm over 
  1374 the injection-based one is that it allows us to define
  1407 the injection-based one is that it allows us to define
  1375 more aggressive simplifications.
  1408 more aggressive simplifications.
  1376 We will elaborate on this in the next chapter.
  1409 We will elaborate on this in the next chapter.