9 %its correctness proof in |
9 %its correctness proof in |
10 %Chapter 3\ref{Chapter3}. |
10 %Chapter 3\ref{Chapter3}. |
11 In this chapter, we are going to describe the bit-coded algorithm |
11 In this chapter, we are going to describe the bit-coded algorithm |
12 introduced by Sulzmann and Lu \parencite{Sulzmann2014} to address the growth problem of |
12 introduced by Sulzmann and Lu \parencite{Sulzmann2014} to address the growth problem of |
13 regular expressions. |
13 regular expressions. |
|
14 We have implemented their algorithm in Scala, and found out inefficiencies |
|
15 in their algorithm such as de-duplication not working properly and redundant |
|
16 fixpoint construction. Their algorithm is improved and verified with the help of |
|
17 formal proofs. |
14 \section{Bit-coded Algorithm} |
18 \section{Bit-coded Algorithm} |
15 The lexer algorithm in Chapter \ref{Inj}, as shown in \ref{InjFigure}, |
19 We first do a recap of what was going on |
|
20 in the lexer algorithm in Chapter \ref{Inj}, |
|
21 \begin{center} |
|
22 \begin{tabular}{lcl} |
|
23 $\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\; \Some(\mkeps \; r) \; \textit{else} \; \None$\\ |
|
24 $\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\ |
|
25 & & $\quad \phantom{\mid}\; \None \implies \None$\\ |
|
26 & & $\quad \mid \Some(v) \implies \Some(\inj \; r\; c\; v)$ |
|
27 \end{tabular} |
|
28 \end{center} |
|
29 \noindent |
|
30 The algorithm recursively calls $\lexer$ on |
|
31 each new character input, |
|
32 and before starting a child call |
|
33 it stores information of previous lexing steps |
|
34 on a stack, in the form of regular expressions |
|
35 and characters: $r_0$, $c_0$, $r_1$, $c_1$, etc. |
|
36 Each descent into deeper recursive calls in $\lexer$ |
|
37 causes a new pair of $r_i, c_i$ to be pushed to the call stack. |
|
38 \begin{figure} |
|
39 \begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] |
|
40 %\draw (-6,-6) grid (6,6); |
|
41 \node [ circle ] (r) at (-6, 5) {$r$}; |
|
42 |
|
43 %\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$}; |
|
44 \node [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$}; |
|
45 % |
|
46 %\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$}; |
|
47 \node [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$}; |
|
48 |
|
49 \node [minimum width = 2cm, rectangle, draw] (stack) at (0, 3) {Stack}; |
|
50 |
|
51 \path |
|
52 (r) |
|
53 edge [->, >=stealth',shorten >=1pt] node[left] {} (r1); |
|
54 |
|
55 \path (r1) |
|
56 edge [bend right, dashed] node {saved} (stack); |
|
57 \path (c1) |
|
58 edge [bend right, dashed] node {} (stack); |
|
59 |
|
60 |
|
61 \end{tikzpicture} |
|
62 \caption{First Derivative Taken} |
|
63 \end{figure} |
|
64 |
|
65 |
|
66 |
|
67 \begin{figure} |
|
68 \begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] |
|
69 %\draw (-6,-6) grid (6,6); |
|
70 \node [ circle ] (r) at (-6, 5) {$r$}; |
|
71 |
|
72 %\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$}; |
|
73 \node [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$}; |
|
74 % |
|
75 %\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$}; |
|
76 \node [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$}; |
|
77 |
|
78 |
|
79 \node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$}; |
|
80 % |
|
81 %\node (2, 5) (r2) circle [radius = 0.5] {$r_2$}; |
|
82 \node [circle, draw] (r2) at (2, 5) {$r_2$}; |
|
83 \node [minimum width = 3cm, minimum height = 1cm, rectangle, draw] (stack) at (0, 2) {\large Stack}; |
|
84 |
|
85 \path |
|
86 (r) |
|
87 edge [->, >=stealth',shorten >=1pt] node[left] {} (r1); |
|
88 |
|
89 \path (r2) |
|
90 edge [bend right, dashed] node {} (stack); |
|
91 \path (c2) |
|
92 edge [bend right, dashed] node {} (stack); |
|
93 |
|
94 \path (r1) |
|
95 edge [] node {} (r2); |
|
96 |
|
97 \end{tikzpicture} |
|
98 \caption{Second Derivative Taken} |
|
99 \end{figure} |
|
100 \noindent |
|
101 As the number of derivative steps increase, |
|
102 the stack would increase: |
|
103 |
|
104 \begin{figure} |
|
105 \begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] |
|
106 %\draw (-6,-6) grid (6,6); |
|
107 \node [ circle ] (r) at (-6, 5) {$r$}; |
|
108 |
|
109 %\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$}; |
|
110 \node [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$}; |
|
111 % |
|
112 %\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$}; |
|
113 \node [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$}; |
|
114 |
|
115 |
|
116 \node [circle, minimum size = 0.1, ] (c2) at (0, 5.4) {$c_2$}; |
|
117 % |
|
118 %\node (2, 5) (r2) circle [radius = 0.5] {$r_2$}; |
|
119 \node [circle, ] (r2) at (2, 5) {$r_2$}; |
|
120 \node [minimum width = 4cm, minimum height = 2.5cm, rectangle, draw] (stack) at (0, 1) { \large Stack}; |
|
121 |
|
122 \node [] (ldots) at (3.5, 5) {}; |
|
123 %\node (6, 5) (rn) circle [radius = 0.5] {$r_n$}; |
|
124 |
|
125 \node [minimum size = 0.5, circle, ] (rn) at (6, 5) {}; |
|
126 |
|
127 \node (rldots) at ($(ldots)!.4!(rn)$) {\ldots}; |
|
128 |
|
129 \path |
|
130 (r) |
|
131 edge [->, >=stealth',shorten >=1pt] node[left] {} (r1); |
|
132 |
|
133 \path (rldots) |
|
134 edge [bend left, dashed] node {} (stack); |
|
135 |
|
136 \path (r1) |
|
137 edge [] node {} (r2); |
|
138 |
|
139 \path (r2) |
|
140 edge [] node {} (ldots); |
|
141 \path (ldots) |
|
142 edge [bend left, dashed] node {} (stack); |
|
143 \path (5.03, 4.9) |
|
144 edge [bend left, dashed] node {} (stack); |
|
145 \end{tikzpicture} |
|
146 \caption{More Derivatives Taken} |
|
147 \end{figure} |
|
148 \noindent |
|
149 After all derivatives have been taken, the stack grows to a maximum size |
|
150 and the pair of regular expressions and characters $r_i, c_{i+1}$ |
|
151 are then popped out and used in the injection phase: |
|
152 |
|
153 \begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] |
|
154 %\draw (-6,-6) grid (6,6); |
|
155 \node [radius = 0.5, circle, draw] (r) at (-6, 5) {$r$}; |
|
156 |
|
157 %\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$}; |
|
158 \node [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$}; |
|
159 % |
|
160 %\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$}; |
|
161 \node [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$}; |
|
162 % |
|
163 %\node (0, 6) (c2) circle [radius = 0.3] {$c_2$}; |
|
164 \node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$}; |
|
165 % |
|
166 %\node (2, 5) (r2) circle [radius = 0.5] {$r_2$}; |
|
167 \node [circle, draw] (r2) at (2, 5) {$r_2$}; |
|
168 % |
|
169 % |
|
170 \node [] (ldots) at (4.5, 5) {}; |
|
171 %\node (6, 5) (rn) circle [radius = 0.5] {$r_n$}; |
|
172 |
|
173 \node [minimum size = 0.5, circle, draw] (rn) at (6, 5) {$r_n$}; |
|
174 |
|
175 \node at ($(ldots)!.4!(rn)$) {\ldots}; |
|
176 |
|
177 \node [minimum size = 0.5, circle, draw] (vn) at (6, -5) {$v_n$}; |
|
178 |
|
179 \node [] (ldots2) at (3.5, -5) {}; |
|
180 |
|
181 \node [minimum size = 0.5, circle, draw] (v2) at (2, -5) {$v_2$}; |
|
182 |
|
183 \node at ($(ldots2)!.4!(v2)$) {\ldots}; |
|
184 |
|
185 |
|
186 \node [circle, draw] (v1) at (-2, -5) {$v_1$}; |
|
187 |
|
188 \node [radius = 0.5, circle, draw] (v) at (-6, -5) {$v$}; |
|
189 |
|
190 \node [minimum size = 6cm, rectangle, draw] (stack) at (0, 0) {\Huge Stack}; |
|
191 |
|
192 \path |
|
193 (r) |
|
194 edge [->, >=stealth',shorten >=1pt] node[left] {} (r1); |
|
195 \path |
|
196 (r1) |
|
197 edge [] node {} (r2); |
|
198 \path (r2) |
|
199 edge [] node {} (ldots); |
|
200 \path (rn) |
|
201 edge [] node {$\mkeps$} (vn); |
|
202 \path (vn) |
|
203 edge [] node {} (ldots2); |
|
204 \path (v2) |
|
205 edge [] node {$\inj \; r_1 \; c_2\;v_2$} (v1); |
|
206 |
|
207 \path (v1) |
|
208 edge [] node {$\inj \; r \; c_1 \; v_1$} (v); |
|
209 \path (r) |
|
210 edge [dashed, bend right] node {} (stack); |
|
211 |
|
212 \path (r1) |
|
213 edge [dashed, ] node {} (stack); |
|
214 |
|
215 \path (c1) |
|
216 edge [dashed, bend right] node {} (stack); |
|
217 |
|
218 \path (c2) |
|
219 edge [dashed] node {} (stack); |
|
220 \path (4.5, 5) |
|
221 edge [dashed, bend left] node {} (stack); |
|
222 \path (4.9, 5) |
|
223 edge [dashed, bend left] node {} (stack); |
|
224 \path (5.3, 5) |
|
225 edge [dashed, bend left] node {} (stack); |
|
226 \path (r2) |
|
227 edge [dashed, ] node {} (stack); |
|
228 \path (rn) |
|
229 edge [dashed, bend left] node {} (stack); |
|
230 \end{tikzpicture} |
|
231 %\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] |
|
232 %%\draw (-6,-6) grid (6,6); |
|
233 %\node [radius = 0.5, circle, draw] (r) at (-6, 5) {$r$}; |
|
234 % |
|
235 %%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$}; |
|
236 %\node [circle, minimum size = 0.1] (c1) at (-4, 5.4) {$c_1$}; |
|
237 %% |
|
238 %%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$}; |
|
239 %\node [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$}; |
|
240 %% |
|
241 %%\node (0, 6) (c2) circle [radius = 0.3] {$c_2$}; |
|
242 %\node [circle, minimum size = 0.1] (c2) at (0, 5.4) {$c_2$}; |
|
243 %% |
|
244 %%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$}; |
|
245 %\node [circle, draw] (r2) at (2, 5) {$r_2$}; |
|
246 %% |
|
247 %% |
|
248 %\node [] (ldots) at (4.5, 5) {}; |
|
249 %%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$}; |
|
250 % |
|
251 %\node [minimum size = 0.5, circle, draw] (rn) at (6, 5) {$r_n$}; |
|
252 % |
|
253 %\node at ($(ldots)!.4!(rn)$) {\ldots}; |
|
254 % |
|
255 %\node [minimum size = 0.5, circle, draw] (vn) at (6, -5) {$v_n$}; |
|
256 % |
|
257 %\node [] (ldots2) at (3.5, -5) {}; |
|
258 % |
|
259 %\node [minimum size = 0.5, circle, draw] (v2) at (2, -5) {$v_2$}; |
|
260 % |
|
261 %\node at ($(ldots2)!.4!(v2)$) {\ldots}; |
|
262 % |
|
263 % |
|
264 %\node [circle, draw] (v1) at (-2, -5) {$v_1$}; |
|
265 % |
|
266 %\node [radius = 0.5, circle, draw] (v) at (-6, -5) {$v$}; |
|
267 % |
|
268 %\node [minimum size = 6cm, rectangle, draw] (stack) at (0, 0) {\Huge Stack}; |
|
269 % |
|
270 %\path |
|
271 % (r) |
|
272 % edge [->, >=stealth',shorten >=1pt] node[left] {} (r1); |
|
273 %\path |
|
274 % (r1) |
|
275 % edge [] node {} (r2); |
|
276 %\path (r2) |
|
277 % edge [] node {} (ldots); |
|
278 %\path (rn) |
|
279 % edge [] node {$\mkeps$} (vn); |
|
280 %\path (vn) |
|
281 % edge [] node {} (ldots2); |
|
282 %\path (v2) |
|
283 % edge [] node {} (v1); |
|
284 % |
|
285 %\path (v1) |
|
286 % edge [] node {} (v); |
|
287 %\path (r) |
|
288 % edge [] node {saved} (stack); |
|
289 % |
|
290 %\path (r1) |
|
291 % edge [] node {saved} (stack); |
|
292 %\end{tikzpicture} |
|
293 \noindent |
|
294 The information stored in characters and regular expressions |
|
295 make the algorithm work in an elegant way, at the expense of being |
|
296 storing quite a bit of verbose information. |
|
297 |
|
298 |
|
299 |
|
300 The lexer algorithm in Chapter \ref{Inj}, |
16 stores information of previous lexing steps |
301 stores information of previous lexing steps |
17 on a stack, in the form of regular expressions |
302 on a stack, in the form of regular expressions |
18 and characters: $r_0$, $c_0$, $r_1$, $c_1$, etc. |
303 and characters: $r_0$, $c_0$, $r_1$, $c_1$, etc. |
|
304 The red part represents what we already know during the first |
|
305 derivative phase, |
|
306 and the blue part represents the unknown part of input. |
19 \begin{ceqn} |
307 \begin{ceqn} |
20 \begin{equation}%\label{graph:injLexer} |
308 \begin{equation}%\label{graph:injLexer} |
21 \begin{tikzcd}[ampersand replacement=\&, execute at end picture={ |
309 \begin{tikzcd}[ampersand replacement=\&, execute at end picture={ |
22 \begin{scope}[on background layer] |
310 \begin{scope}[on background layer] |
23 \node[rectangle, fill={red!30}, |
311 \node[rectangle, fill={red!30}, |