1179 \noindent |
1179 \noindent |
1180 As stated, $\retrieve$ collects the right bit-codes from the |
1180 As stated, $\retrieve$ collects the right bit-codes from the |
1181 final derivative $a_n$, guided by $v_n$. This can be proved |
1181 final derivative $a_n$, guided by $v_n$. This can be proved |
1182 as follows: |
1182 as follows: |
1183 \begin{lemma}\label{bmkepsRetrieve} |
1183 \begin{lemma}\label{bmkepsRetrieve} |
1184 $\bnullable \; a \implies \bmkeps \; a = \retrieve \; a \; (\mkeps \; (\erase \; a))$ |
1184 \mbox{} |
|
1185 \begin{itemize} |
|
1186 \item |
|
1187 If $\forall v \in vs. \vdash v : r$, and $\code \; v = \retrieve \; (\internalise\; r) \; v$\\ |
|
1188 then $\code \; (\Stars \; vs) = \retrieve \; _{[]}((\internalise \; r)^*) \; (\Stars \; vs)$ |
|
1189 \item |
|
1190 $\vdash v : r \implies \retrieve \; (r)^\uparrow \; v = \code \; v$ |
|
1191 \item |
|
1192 $\bnullable \; a \implies \bmkeps \; a = \retrieve \; a \; (\mkeps \; (a_\downarrow))$ |
|
1193 \end{itemize} |
|
1194 \begin{proof} |
|
1195 The first part is by induction on the value list $vs$. |
|
1196 The second part is by induction on $r$, |
|
1197 and the star case uses the first part. |
|
1198 The last part is by a routine induction on $a$. |
|
1199 \end{proof} |
|
1200 \noindent |
1185 \end{lemma} |
1201 \end{lemma} |
1186 \begin{proof} |
1202 \begin{proof} |
1187 By a routine induction on $a$. |
1203 By a routine induction on $a$. |
1188 \end{proof} |
1204 \end{proof} |
1189 \noindent |
1205 \noindent |
1190 %The design of $\retrieve$ enables us to extract bitcodes |
1206 %The design of $\retrieve$ enables us to extract bitcodes |
1191 %from both annotated regular expressions and values. |
1207 %from both annotated regular expressions and values. |
1192 %$\retrieve$ always ``sucks up'' all the information. |
1208 %$\retrieve$ always ``sucks up'' all the information. |
1193 When more information is stored in the value, we would be able to |
1209 %When more information is stored in the value, we would be able to |
1194 extract more from the value, and vice versa. |
1210 %extract more from the value, and vice versa. |
1195 For example in star iterations, $\retrieve$ will be able to extract from $\Stars \; vs$ |
1211 %For example in star iterations, $\retrieve$ will be able to extract from $\Stars \; vs$ |
1196 exactly the same bitcodes as $\code \; (\Stars \; vs)$: |
1212 %exactly the same bitcodes as $\code \; (\Stars \; vs)$: |
1197 \begin{lemma}\label{retrieveEncodeSTARS} |
1213 %\begin{lemma}\label{retrieveEncodeSTARS} |
1198 If $\forall v \in vs. \vdash v : r$, and $\code \; v = \retrieve \; (\internalise\; r) \; v$\\ |
1214 % If $\forall v \in vs. \vdash v : r$, and $\code \; v = \retrieve \; (\internalise\; r) \; v$\\ |
1199 then $\code \; (\Stars \; vs) = \retrieve \; _{[]}((\internalise \; r)^*) \; (\Stars \; vs)$ |
1215 % then $\code \; (\Stars \; vs) = \retrieve \; _{[]}((\internalise \; r)^*) \; (\Stars \; vs)$ |
1200 \end{lemma} |
1216 %\end{lemma} |
1201 \begin{proof} |
1217 %\begin{proof} |
1202 By induction on the value list $vs$. |
1218 % By induction on the value list $vs$. |
1203 \end{proof} |
1219 %\end{proof} |
1204 \noindent |
1220 %\noindent |
1205 Similarly, when the value list does not hold information, |
1221 %Similarly, when the value list does not hold information, |
1206 only the bitcodes plus some delimiter will be recorded: |
1222 %only the bitcodes plus some delimiter will be recorded: |
1207 \begin{center} |
1223 %\begin{center} |
1208 $\retrieve \; _{bs}((\internalise \; r)^*) \; (\Stars \; [] ) = bs @ [S]$. |
1224 % $\retrieve \; _{bs}((\internalise \; r)^*) \; (\Stars \; [] ) = bs @ [S]$. |
1209 \end{center} |
1225 %\end{center} |
1210 In general, if we have a regular expression that is just internalised |
1226 %In general, if we have a regular expression that is just internalised |
1211 and the final lexing result value, we can $\retrieve$ the bitcodes |
1227 %and the final lexing result value, we can $\retrieve$ the bitcodes |
1212 that look as if we have en$\code$-ed the value as bitcodes: |
1228 %that look as if we have en$\code$-ed the value as bitcodes: |
1213 \begin{lemma} |
1229 %\begin{lemma} |
1214 $\vdash v : r \implies \retrieve \; (r)^\uparrow \; v = \code \; v$ |
1230 % $\vdash v : r \implies \retrieve \; (r)^\uparrow \; v = \code \; v$ |
1215 \end{lemma} |
1231 %\end{lemma} |
1216 \begin{proof} |
1232 %\begin{proof} |
1217 By induction on $r$. |
1233 % By induction on $r$. |
1218 The star case uses lemma \ref{retrieveEncodeSTARS}. |
1234 % The star case uses lemma \ref{retrieveEncodeSTARS}. |
1219 \end{proof} |
1235 %\end{proof} |
1220 \noindent |
1236 %\noindent |
1221 The following property of $\retrieve$ is crucial, as |
1237 The following property of $\retrieve$ is crucial, as |
1222 it provides a "bridge" between $a_0$ and $a_n $ in the |
1238 it provides a "bridge" between $a_0$ and $a_n $ in the |
1223 lexing diagram by linking adjacent regular expressions $a_i$ and |
1239 lexing diagram by linking adjacent regular expressions $a_i$ and |
1224 $a_{i+1}$. |
1240 $a_{i+1}$. |
1225 The property says that one |
1241 The property says that one |
1226 can retrieve the same bits from a derivative |
1242 can retrieve the same bits from a derivative |
1227 regular expression as those from |
1243 regular expression as those from |
1228 before the derivative took place, |
1244 before the derivative took place, |
1229 provided that the corresponding values are used respectively: |
1245 provided that the corresponding values are used respectively: |
1230 \begin{lemma}\label{retrieveStepwise} |
1246 \begin{lemma}\label{retrieveStepwise} |
1231 $\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$ |
1247 $\vdash v : (a\backslash c)_\downarrow \implies \retrieve \; (a \backslash c) \; v= |
|
1248 \retrieve \; a \; (\inj \; a_\downarrow\; c\; v)$ |
1232 \end{lemma} |
1249 \end{lemma} |
1233 \begin{proof} |
1250 \begin{proof} |
1234 We do an induction on $r$, generalising over $v$. |
1251 We do an induction on $r$, generalising over $v$. |
1235 The induction principle in our formalisation |
1252 The induction principle in our formalisation |
1236 is function $\erase$'s cases. |
1253 is function $\erase$'s cases. |
1240 information being preserved and recoverable throughout each lexing step. |
1257 information being preserved and recoverable throughout each lexing step. |
1241 We shall see in the next chapter that this no longer |
1258 We shall see in the next chapter that this no longer |
1242 holds with simplifications which takes away certain sub-expressions |
1259 holds with simplifications which takes away certain sub-expressions |
1243 corresponding to non-POSIX lexical values. |
1260 corresponding to non-POSIX lexical values. |
1244 |
1261 |
1245 Before we move on to the next |
1262 %Before we move on to the next |
1246 helper function, we rewrite $\blexer$ in |
1263 %helper function, we rewrite $\blexer$ in |
1247 the following way which makes later proofs more convenient: |
1264 %the following way which makes later proofs more convenient: |
1248 \begin{lemma}\label{blexer_retrieve} |
1265 %\begin{lemma}\label{blexer_retrieve} |
1249 $\blexer \; r \; s = \decode \; (\retrieve \; (\internalise \; r) \; (\mkeps \; (r \backslash s) )) \; r$ |
1266 %$\blexer \; r \; s = \decode \; (\retrieve \; (\internalise \; r) \; (\mkeps \; (r \backslash s) )) \; r$ |
1250 \end{lemma} |
1267 %\end{lemma} |
1251 \begin{proof} |
1268 %\begin{proof} |
1252 Using lemma \ref{bmkepsRetrieve}. |
1269 % Using lemma \ref{bmkepsRetrieve}. |
1253 \end{proof} |
1270 %\end{proof} |
1254 \noindent |
1271 %\noindent |
1255 The function $\retrieve$ allows connecting |
1272 The function $\retrieve$ allows connecting |
1256 between the intermediate |
1273 between the intermediate |
1257 results $a_i$ and $a_{i+1}$ in $\blexer$. |
1274 results $a_i$ and $a_{i+1}$ in $\blexer$. |
1258 For plain regular expressions something similar is required. |
1275 For plain regular expressions something similar is required. |
1259 |
1276 |
1349 and finally give the correctness of $\blexer$--it outputs the same result as $\lexer$: |
1366 and finally give the correctness of $\blexer$--it outputs the same result as $\lexer$: |
1350 \begin{theorem}\label{blexerCorrect} |
1367 \begin{theorem}\label{blexerCorrect} |
1351 $\blexer\; r \; s = \lexer \; r \; s$ |
1368 $\blexer\; r \; s = \lexer \; r \; s$ |
1352 \end{theorem} |
1369 \end{theorem} |
1353 \begin{proof} |
1370 \begin{proof} |
1354 From \ref{flex_retrieve} we have that |
1371 We can rewrite the expression $\blexer \; r \; s$ by using lemma \ref{bmkepsRetrieve}: |
1355 $\textit{flex} \; r \; \textit{id} \; s \; \mkeps(r \backslash s) = \blexer \; r \; s$, |
1372 \[ |
1356 which immediately implies this theorem. |
1373 \blexer \; r \; s = \decode \; (\retrieve \; (\; r^\uparrow) \; (\mkeps \; (r \backslash s) )) \; r |
|
1374 \] |
|
1375 From \ref{flex_retrieve} we have that the left-hand-side is equal to |
|
1376 \[ |
|
1377 \textit{flex} \; r \; \textit{id} \; s \; \mkeps \; (r \backslash s), |
|
1378 \] |
|
1379 which in turn equals |
|
1380 \[ |
|
1381 \lexer \; r \; s. |
|
1382 \] |
|
1383 %which immediately implies this theorem. |
1357 \end{proof} |
1384 \end{proof} |
1358 \noindent |
1385 \noindent |
1359 To piece things together and spell out the correctness |
1386 To piece things together |
1360 result of the bitcoded lexer more explicitly, |
1387 %and spell out the correctness |
|
1388 %result of the bitcoded lexer more explicitly, |
|
1389 and give the correctness result explicitly, |
1361 we use the fact from the previous chapter that |
1390 we use the fact from the previous chapter that |
1362 \[ |
1391 \[ |
1363 (r, s) \rightarrow v \;\; \textit{iff} \;\; \lexer \; r \; s =\Some \; v |
1392 (r, s) \rightarrow v \;\; \textit{iff} \;\; \lexer \; r \; s =\Some \; v |
1364 \quad \quad |
1393 \quad \quad |
1365 \nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \lexer \;r\;s = None |
1394 \nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \lexer \;r\;s = None |
1366 \] |
1395 \] |
1367 to obtain this corollary: |
1396 to obtain this corollary: |
1368 \begin{corollary}\label{blexerPOSIX} |
1397 \begin{corollary}\label{blexerPOSIX} |
1369 The $\blexer$ function correctly implements a POSIX lexer, namely\\ |
1398 The $\blexer$ function correctly implements a POSIX lexer, namely |
1370 $(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \; s = \Some \; v$\\ |
1399 \begin{itemize} |
|
1400 \item |
|
1401 $(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \; s = \Some \; v$ |
|
1402 \item |
1371 $\nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \blexer \;r\;s = None$ |
1403 $\nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \blexer \;r\;s = None$ |
|
1404 \end{itemize} |
1372 \end{corollary} |
1405 \end{corollary} |
1373 Our main reason for analysing the bit-coded algorithm over |
1406 Our main reason for analysing the bit-coded algorithm over |
1374 the injection-based one is that it allows us to define |
1407 the injection-based one is that it allows us to define |
1375 more aggressive simplifications. |
1408 more aggressive simplifications. |
1376 We will elaborate on this in the next chapter. |
1409 We will elaborate on this in the next chapter. |