206 \begin{itemize} |
206 \begin{itemize} |
207 \item |
207 \item |
208 We first introduce the operations such as |
208 We first introduce the operations such as |
209 derivatives, simplification, size calculation, etc. |
209 derivatives, simplification, size calculation, etc. |
210 associated with $\rrexp$s, which we have introduced |
210 associated with $\rrexp$s, which we have introduced |
211 in chapter \ref{Bitcoded2}. |
211 in chapter \ref{Bitcoded2}. As promised we will discuss |
|
212 why they are needed in \ref{whyRerase}. |
212 The operations on $\rrexp$s are identical to those on |
213 The operations on $\rrexp$s are identical to those on |
213 annotated regular expressions except that they dispense with |
214 annotated regular expressions except that they dispense with |
214 bitcodes. This means that all proofs about size of $\rrexp$s will apply to |
215 bitcodes. This means that all proofs about size of $\rrexp$s will apply to |
215 annotated regular expressions, because the size of a regular |
216 annotated regular expressions, because the size of a regular |
216 expression is independent of the bitcodes. |
217 expression is independent of the bitcodes. |
279 $\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\ |
280 $\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\ |
280 $\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a} ^*$ |
281 $\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a} ^*$ |
281 \end{tabular} |
282 \end{tabular} |
282 \end{center} |
283 \end{center} |
283 |
284 |
284 \subsection{Why a New Datatype?} |
285 \subsection{Why a New Datatype?}\label{whyRerase} |
285 The reason we |
286 \marginpar{\em added label so this section can be referenced by other parts of the thesis |
286 define a new datatype is that |
287 so that interested readers can jump to/be reassured that there will explanations.} |
287 the $\erase$ function |
288 Originally the erase operation $(\_)_\downarrow$ was |
288 does not preserve the structure of annotated |
289 used by Ausaf et al. in their proofs related to $\blexer$. |
289 regular expressions. |
290 This function was not part of the lexing algorithm, and the sole purpose was to |
290 We initially started by using |
291 bridge the gap between the $r$ |
291 plain regular expressions and tried to prove |
292 %$\textit{rexp}$ |
292 lemma \ref{rsizeAsize}, |
293 (un-annotated) and $\textit{arexp}$ (annotated) |
293 however the $\erase$ function messes with the structure of the |
294 regular expression datatypes so as to leverage the correctness |
294 annotated regular expression. |
295 theorem of $\lexer$.%to establish the correctness of $\blexer$. |
295 The $+$ constructor |
296 For example, lemma \ref{retrieveStepwise} %and \ref{bmkepsRetrieve} |
296 of basic regular expressions is only binary, whereas $\sum$ |
297 uses $\erase$ to convert an annotated regular expression $a$ into |
297 takes a list. Therefore we need to convert between |
298 a plain one so that it can be used by $\inj$ to create the desired value |
298 annotated and normal regular expressions as follows: |
299 $\inj\; (a)_\downarrow \; c \; v$. |
|
300 |
|
301 Ideally $\erase$ should only remove the auxiliary information not related to the |
|
302 structure--the |
|
303 bitcodes. However there exists a complication |
|
304 where the alternative constructors have different arity for $\textit{arexp}$ |
|
305 and $\textit{r}$: |
299 \begin{center} |
306 \begin{center} |
300 \begin{tabular}{lcl} |
307 \begin{tabular}{lcl} |
301 $\erase \; _{bs}\sum [] $ & $\dn$ & $\ZERO$\\ |
308 $\textit{r}$ & $::=$ & $\ldots \;|\; (\_ + \_) \; ::\; "\textit{r} \Rightarrow \textit{r} \Rightarrow \textit{r}" | \ldots$\\ |
302 $\erase \; _{bs}\sum [a]$ & $\dn$ & $a$\\ |
309 $\textit{arexp}$ & $::=$ & $\ldots\; |\; (\Sigma \_ ) \; ::\; "\textit{arexp} \; list \Rightarrow \textit{arexp}" | \ldots$ |
303 $\erase \; _{bs}\sum a :: as$ & $\dn$ & $a + (\erase \; _{[]} \sum as)\quad \text{if $as$ length over 1}$ |
310 \end{tabular} |
304 \end{tabular} |
311 \end{center} |
305 \end{center} |
312 \noindent |
306 \noindent |
313 To convert between the two |
307 As can be seen, alternative regular expressions with an empty argument list |
314 $\erase$ has to recursively disassemble a list into nested binary applications of the |
308 will be turned into a $\ZERO$. |
315 $(\_ + \_)$ operator, |
309 The singleton alternative $\sum [r]$ becomes $r$ during the |
316 handling corner cases like empty or |
310 $\erase$ function. |
317 singleton alternative lists: |
311 The annotated regular expression $\sum[a, b, c]$ would turn into |
318 %becomes $r$ during the |
312 $(a+(b+c))$. |
319 %$\erase$ function. |
313 All these operations change the size and structure of |
320 %The annotated regular expression $\sum[a, b, c]$ would turn into |
314 an annotated regular expression, adding unnecessary |
321 %$(a+(b+c))$. |
315 complexities to the size bound proof. |
322 \begin{center} |
316 |
323 \begin{tabular}{lcl} |
|
324 $ (_{bs}\sum [])_\downarrow $ & $\dn$ & $\ZERO$\\ |
|
325 $ (_{bs}\sum [a])_\downarrow$ & $\dn$ & $a$\\ |
|
326 $ (_{bs}\sum a_1 :: a_2)_\downarrow$ & $\dn$ & $(a_1)_\downarrow + (a_2)_\downarrow)$\\ |
|
327 $ (_{bs}\sum a :: as)_\downarrow$ & $\dn$ & $a_\downarrow + (\erase \; _{[]} \sum as)$ |
|
328 \end{tabular} |
|
329 \end{center} |
|
330 \noindent |
|
331 These operations inevitably change the structure and size of |
|
332 an annotated regular expression. For example, |
|
333 $a_1 = \sum _{Z}[x]$ has size 2, but $(a_1)_\downarrow = x$ |
|
334 only has size 1. |
|
335 %adding unnecessary |
|
336 %complexities to the size bound proof. |
|
337 %The reason we |
|
338 %define a new datatype is that |
|
339 %the $\erase$ function |
|
340 %does not preserve the structure of annotated |
|
341 %regular expressions. |
|
342 %We initially started by using |
|
343 %plain regular expressions and tried to prove |
|
344 %lemma \ref{rsizeAsize}, |
|
345 %however the $\erase$ function messes with the structure of the |
|
346 %annotated regular expression. |
|
347 %The $+$ constructor |
|
348 %of basic regular expressions is only binary, whereas $\sum$ |
|
349 %takes a list. Therefore we need to convert between |
|
350 %annotated and normal regular expressions as follows: |
317 For example, if we define the size of a basic plain regular expression |
351 For example, if we define the size of a basic plain regular expression |
318 in the usual way, |
352 in the usual way, |
319 \begin{center} |
353 \begin{center} |
320 \begin{tabular}{lcl} |
354 \begin{tabular}{lcl} |
321 $\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\ |
355 $\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\ |
322 $\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\ |
356 $\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\ |
323 $\llbracket r_1 \cdot r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\ |
357 $\llbracket r_1 + r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\ |
324 $\llbracket \mathbf{c} \rrbracket_p $ & $\dn$ & $1$\\ |
358 $\llbracket \mathbf{c} \rrbracket_p $ & $\dn$ & $1$\\ |
325 $\llbracket r_1 \cdot r_2 \rrbracket_p $ & $\dn$ & $\llbracket r_1 \rrbracket_p \; + \llbracket r_2 \rrbracket_p + 1$\\ |
359 $\llbracket r_1 \cdot r_2 \rrbracket_p $ & $\dn$ & $\llbracket r_1 \rrbracket_p \; + \llbracket r_2 \rrbracket_p + 1$\\ |
326 $\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$ |
360 $\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$ |
327 \end{tabular} |
361 \end{tabular} |
328 \end{center} |
362 \end{center} |
330 Then the property |
364 Then the property |
331 \begin{center} |
365 \begin{center} |
332 $\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$ |
366 $\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$ |
333 \end{center} |
367 \end{center} |
334 does not hold. |
368 does not hold. |
335 With $\textit{rerase}$, however, |
369 %With $\textit{rerase}$, however, |
336 only the bitcodes are thrown away. |
370 %only the bitcodes are thrown away. |
337 Everything about the structure remains intact. |
371 That leads to us defining the new regular expression datatype without |
338 Therefore it does not change the size |
372 bitcodes but with a list alternative constructor, and defining a new erase function |
339 of an annotated regular expression and we have: |
373 in a strictly structure-preserving manner: |
340 \begin{lemma}\label{rsizeAsize} |
374 \begin{center} |
341 $\rsize{\rerase a} = \asize a$ |
375 \begin{tabular}{lcl} |
342 \end{lemma} |
376 $\textit{rrexp}$ & $::=$ & $\ldots\; |\; (\sum \_ ) \; ::\; "\textit{rrexp} \; list \Rightarrow \textit{rrexp}" | \ldots$\\ |
343 \begin{proof} |
377 $\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\ |
344 By routine structural induction on $a$. |
378 \end{tabular} |
345 \end{proof} |
379 \end{center} |
|
380 \noindent |
|
381 %But |
|
382 %Everything about the structure remains intact. |
|
383 %Therefore it does not change the size |
|
384 %of an annotated regular expression and we have: |
346 \noindent |
385 \noindent |
347 One might be able to prove an inequality such as |
386 One might be able to prove an inequality such as |
348 $\llbracket a \rrbracket \leq \llbracket a_\downarrow \rrbracket_p $ |
387 $\llbracket a \rrbracket \leq \llbracket a_\downarrow \rrbracket_p $ |
349 and then estimate $\llbracket a_\downarrow \rrbracket_p$, |
388 and then estimate $\llbracket a_\downarrow \rrbracket_p$, |
350 but we found our approach more straightforward.\\ |
389 but we found our approach more straightforward.\\ |
473 Everything about the size of annotated regular expressions after the application |
512 Everything about the size of annotated regular expressions after the application |
474 of function $\bsimp$ and $\backslash_{simps}$ |
513 of function $\bsimp$ and $\backslash_{simps}$ |
475 can be calculated via the size of r-regular expressions after the application |
514 can be calculated via the size of r-regular expressions after the application |
476 of $\rsimp$ and $\backslash_{rsimps}$: |
515 of $\rsimp$ and $\backslash_{rsimps}$: |
477 \begin{lemma}\label{sizeRelations} |
516 \begin{lemma}\label{sizeRelations} |
478 The following two equalities hold: |
517 The following equalities hold: |
479 \begin{itemize} |
518 \begin{itemize} |
|
519 \item |
|
520 $\rsize{\rerase a} = \asize a$ |
480 \item |
521 \item |
481 $\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$ |
522 $\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$ |
482 \item |
523 \item |
483 $\asize{\bderssimp{a}{s}} = \rsize{\rderssimp{\rerase{a}}{s}}$ |
524 $\asize{\bderssimp{a}{s}} = \rsize{\rderssimp{\rerase{a}}{s}}$ |
484 \end{itemize} |
525 \end{itemize} |
485 \end{lemma} |
526 \end{lemma} |
486 \begin{proof} |
527 \begin{proof} |
487 The first part is by induction on the inductive cases |
528 First part follows from the definition of $(\_)_{\downarrow_r}$. |
|
529 The second part is by induction on the inductive cases |
488 of $\textit{bsimp}$. |
530 of $\textit{bsimp}$. |
489 The second part is by induction on the string $s$, |
531 The third part is by induction on the string $s$, |
490 where the inductive step follows from part one. |
532 where the inductive step follows from part one. |
491 \end{proof} |
533 \end{proof} |
492 \noindent |
534 \noindent |
493 With lemma \ref{sizeRelations}, |
535 With lemma \ref{sizeRelations}, |
494 we will be able to focus on |
536 we will be able to focus on |