250 for a formal proof, we need something |
267 for a formal proof, we need something |
251 stronger than language equivalence. |
268 stronger than language equivalence. |
252 With the help of $\sflat{\_}$ we can state the equation in Indianpaper |
269 With the help of $\sflat{\_}$ we can state the equation in Indianpaper |
253 more rigorously: |
270 more rigorously: |
254 \begin{lemma} |
271 \begin{lemma} |
255 $\sflat{(r_1 \cdot r_2) \backslash s} = \AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}$ |
272 $\sflat{(r_1 \cdot r_2) \backslash s} = \RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}$ |
256 \end{lemma} |
273 \end{lemma} |
257 With this property we can prove the finiteness of the size of the regex $(r_1 \cdot r_2) \backslash s$, |
274 |
258 much like a recursive function's termination proof. |
275 The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string: |
259 The function $\vsuf{\_}{\_}$ is defined this way: |
276 |
260 %TODO: DEF of vsuf |
|
261 \begin{center} |
277 \begin{center} |
262 \begin{tabular}{lcl} |
278 \begin{tabular}{lcl} |
263 $\vsuf{[]}{\_} $ & $=$ & $[]$\\ |
279 $\vsuf{[]}{\_} $ & $=$ & $[]$\\ |
264 $\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} (\rnullable{r_1}) \textit{then} (\vsuf{cs}{(rder c r1)}) @ [c :: cs]$\\ |
280 $\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} (\rnullable{r_1}) \textit{then} \; (\vsuf{cs}{(\rder{c}{r_1})}) @ [c :: cs]$\\ |
265 && $\textit{else} (\vsuf{cs}{rder c r1}) $ |
281 && $\textit{else} \; (\vsuf{cs}{(\rder{c}{r_1}) }) $ |
266 \end{tabular} |
282 \end{tabular} |
267 \end{center} |
283 \end{center} |
268 It takes into account which prefixes $s'$ of $s$ would make $r \backslash s'$ nullable, |
284 It takes into account which prefixes $s'$ of $s$ would make $r \backslash s'$ nullable, |
269 and keep a list of suffixes $s''$ such that $s' @ s'' = s$. The list is sorted in |
285 and keep a list of suffixes $s''$ such that $s' @ s'' = s$. The list is sorted in |
270 the order $r_2\backslash s''$ appears in $(r_1\cdot r_2)\backslash s$. |
286 the order $r_2\backslash s''$ appears in $(r_1\cdot r_2)\backslash s$. |
271 In essence, $\vsuf{\_}{\_}$ is doing a "virtual derivative" of $r$, but instead of producing |
287 In essence, $\vsuf{\_}{\_}$ is doing a "virtual derivative" of $r_1 \cdot r_2$, but instead of producing |
272 the entire result of $(r_1 \cdot r_2) \backslash s$, |
288 the entire result of $(r_1 \cdot r_2) \backslash s$, |
273 it only stores all the $r_2 \backslash s''$ terms. |
289 it only stores all the $s''$ such that $r_2 \backslash s''$ are occurring terms in $(r_1\cdot r_2)\backslash s$. |
274 |
290 |
275 With this we can also add simplifications to both sides and get |
291 With this we can also add simplifications to both sides and get |
276 \begin{lemma} |
292 \begin{lemma} |
277 $\rsimp{\sflat{(r_1 \cdot r_2) \backslash s} }= \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$ |
293 $\rsimp{\sflat{(r_1 \cdot r_2) \backslash s} }= \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$ |
278 \end{lemma} |
294 \end{lemma} |
279 Together with the idempotency property of $\rsimp$, |
295 Together with the idempotency property of $\rsimp$, |
280 %TODO: state the idempotency property of rsimp |
296 %TODO: state the idempotency property of rsimp |
|
297 \begin{lemma} |
|
298 $\rsimp(r) = \rsimp (\rsimp(r))$ |
|
299 \end{lemma} |
281 it is possible to convert the above lemma to obtain a "closed form" |
300 it is possible to convert the above lemma to obtain a "closed form" |
282 for our lexer's intermediate result without bitcodes: |
301 for derivatives nested with simplification: |
283 \begin{lemma} |
302 \begin{lemma} |
284 $\rderssimp{\sflat{(r_1 \cdot r_2) \backslash s} }= \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$ |
303 $\rderssimp{(r_1 \cdot r_2)}{s} = \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$ |
285 \end{lemma} |
304 \end{lemma} |
286 And now the reason we have (2) in section 1 is clear: |
305 And now the reason we have (1) in section 1 is clear: |
287 $\rsize{\rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}}$ |
306 $\rsize{\rsimp{\RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map \;(r_2 \backslash \_) \; (\vsuf{s}{r_1}))}}}$, |
288 is bounded by $\llbracket\textit{\distinctWith}\,((\ASEQ{nil}{\bderssimp{r_1}{s}}{r_2}$) :: |
307 is equal to $\rsize{\rsimp{\RALTS{ ((r_1 \backslash s) \cdot r_2 :: (\map \; (r_2 \backslash \_) \; (\textit{Suffix}(r1, s))))}}}$ |
289 [$\bderssimp{ r_2}{s'} \;|\; s' \in \textit{Suffix}( r_1, s)])\,\approx\;{}\rrbracket + 1 $ , |
308 as $\vsuf{s}{r_1}$ is one way of expressing the list $\textit{Suffix}( r_1, s)$. |
290 as $\vsuf{s}{r_1}$ is a subset of s $\textit{Suffix}( r_1, s)])$. |
|
291 |
309 |
292 %---------------------------------------------------------------------------------------- |
310 %---------------------------------------------------------------------------------------- |
293 % SECTION 3 |
311 % SECTION 3 |
294 %---------------------------------------------------------------------------------------- |
312 %---------------------------------------------------------------------------------------- |
295 |
313 |
296 \section{interaction between $\distinctWith$ and $\flts$} |
314 \section{interaction between $\distinctWith$ and $\flts$} |
297 Note that it is not immediately obvious that |
315 Note that it is not immediately obvious that |
298 $\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket $ |
316 \begin{center} |
299 |
317 $\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket $. |
300 Sed ullamcorper quam eu nisl interdum at interdum enim egestas. Aliquam placerat justo sed lectus lobortis ut porta nisl porttitor. Vestibulum mi dolor, lacinia molestie gravida at, tempus vitae ligula. Donec eget quam sapien, in viverra eros. Donec pellentesque justo a massa fringilla non vestibulum metus vestibulum. Vestibulum in orci quis felis tempor lacinia. Vivamus ornare ultrices facilisis. Ut hendrerit volutpat vulputate. Morbi condimentum venenatis augue, id porta ipsum vulputate in. Curabitur luctus tempus justo. Vestibulum risus lectus, adipiscing nec condimentum quis, condimentum nec nisl. Aliquam dictum sagittis velit sed iaculis. Morbi tristique augue sit amet nulla pulvinar id facilisis ligula mollis. Nam elit libero, tincidunt ut aliquam at, molestie in quam. Aenean rhoncus vehicula hendrerit. |
318 \end{center} |
|
319 The intuition is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of |
|
320 duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$. |
|
321 |
|
322 %---------------------------------------------------------------------------------------- |
|
323 % SECTION 4 |
|
324 %---------------------------------------------------------------------------------------- |
|
325 \section{a bound for the star regular expression} |
|
326 We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using |
|
327 the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then |
|
328 the property of the $\distinct$ function. |
|
329 Now we try to get a bound on $r^* \backslash s$ as well. |
|
330 Again, we first look at how a star's derivatives evolve, if they grow maximally: |
|
331 \begin{center} |
|
332 |
|
333 $r^* \quad \longrightarrow_{\backslash c} \quad (r\backslash c) \cdot r^* \quad \longrightarrow_{\backslash c'} \quad |
|
334 r \backslash cc' \cdot r^* + r \backslash c' \cdot r^* \quad \longrightarrow_{\backslash c''} \quad |
|
335 (r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) \quad \longrightarrow_{\backslash c'''} |
|
336 \quad \ldots$ |
|
337 |
|
338 \end{center} |
|
339 When we have a string $s = c :: c' :: c'' \ldots$ such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$, |
|
340 $r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable, |
|
341 the number of terms in $r^* \backslash s$ will grow exponentially, causing the size |
|
342 of the derivatives $r^* \backslash s$ to grow exponentially, even if we do not |
|
343 count the possible size explosions of $r \backslash c$ themselves. |
|
344 |
|
345 Thanks to $\flts$ and $\distinctWith$, we are able to open up regexes like |
|
346 $(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ |
|
347 into $\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'', r \backslash c'c'' \cdot r^*, r \backslash c'' \cdot r^*]}$ |
|
348 and then de-duplicate terms of the form $r\backslash s' \cdot r^*$ ($s'$ being a substring of $s$). |
|
349 For this we define $\hflataux{\_}$ and $\hflat{\_}$, similar to $\sflataux{\_}$ and $\sflat{\_}$: |
|
350 %TODO: definitions of and \hflataux \hflat |
|
351 \begin{center} |
|
352 \begin{tabular}{ccc} |
|
353 $\hflataux{r_1 + r_2}$ & $=$ & $\hflataux{r_1} @ \hflataux{r_2}$\\ |
|
354 $\hflataux r$ & $=$ & $ [r]$ |
|
355 \end{tabular} |
|
356 \end{center} |
|
357 |
|
358 \begin{center} |
|
359 \begin{tabular}{ccc} |
|
360 $\hflat{r_1 + r_2}$ & $=$ & $\RALTS{\hflataux{r_1} @ \hflataux{r_2}}$\\ |
|
361 $\hflat r$ & $=$ & $ r$ |
|
362 \end{tabular} |
|
363 \end{center} |
|
364 Again these definitions are tailor-made for dealing with alternatives that have |
|
365 originated from a star's derivatives, so we don't attempt to open up all possible |
|
366 regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2 |
|
367 elements. |
|
368 We give a predicate for such "star-created" regular expressions: |
|
369 \begin{center} |
|
370 \begin{tabular}{lcr} |
|
371 & & $\createdByStar{\RSEQ{ra}{\RSTAR{rb}}}$\\ |
|
372 $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$ |
|
373 \end{tabular} |
|
374 \end{center} |
|
375 |
|
376 One might wonder the actual bound rather than the loose bound we gave |
|
377 for the convenience of a easier proof. |
|
378 How much can the regex $r^* \backslash s$ grow? As hinted earlier, |
|
379 they can grow at a maximum speed |
|
380 of exponential $w.r.t$ the number of characters. |
|
381 But they will eventually level off when the string $s$ is long enough, |
|
382 as suggested by the finiteness bound proof. |
|
383 |
|
384 %TODO: give out a graph showing how the size of the regex grows and levels off at a size exponential to the original regex |
|
385 |
|
386 |