2070 to inject the character back into. |
2070 to inject the character back into. |
2071 Once the character is |
2071 Once the character is |
2072 injected back to that sub-value; |
2072 injected back to that sub-value; |
2073 $\inj$ assembles all parts |
2073 $\inj$ assembles all parts |
2074 to form a new value. |
2074 to form a new value. |
2075 \subsection{An Example of how Injection Works} |
2075 |
2076 We will provide a few examples on why |
|
2077 |
|
2078 \begin{center} |
|
2079 \begin{tabular}{lcl} |
|
2080 $\inj \; (a\cdot b)\cdot c \; \Seq \; \ONE \; b$ & $=$ & $(a+e)$\\ |
|
2081 $A$ & $B$ & $C$ |
|
2082 \end{tabular} |
|
2083 \end{center} |
|
2084 For instance, the last clause is an |
2076 For instance, the last clause is an |
2085 injection into a sequence value $v_{i+1}$ |
2077 injection into a sequence value $v_{i+1}$ |
2086 whose second child |
2078 whose second child |
2087 value is a star and the shape of the |
2079 value is a star and the shape of the |
2088 regular expression $r_i$ before injection |
2080 regular expression $r_i$ before injection |
2108 Finally, |
2100 Finally, |
2109 $\inj \; r \;c \; v$ is prepended |
2101 $\inj \; r \;c \; v$ is prepended |
2110 to the previous list of iterations and then |
2102 to the previous list of iterations and then |
2111 wrapped under the $\Stars$ |
2103 wrapped under the $\Stars$ |
2112 constructor, giving us $\Stars \; ((\inj \; r \; c \; v) ::vs)$. |
2104 constructor, giving us $\Stars \; ((\inj \; r \; c \; v) ::vs)$. |
|
2105 |
|
2106 Putting all together, Sulzmann and Lu obtained the following algorithm |
|
2107 outlined in the injection-based lexing diagram \ref{graph:inj}: |
|
2108 \begin{center} |
|
2109 \begin{tabular}{lcl} |
|
2110 $\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\; \Some(\mkeps \; r) \; \textit{else} \; \None$\\ |
|
2111 $\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\ |
|
2112 & & $\quad \phantom{\mid}\; \None \implies \None$\\ |
|
2113 & & $\quad \mid \Some(v) \implies \Some(\inj \; r\; c\; v)$ |
|
2114 \end{tabular} |
|
2115 \end{center} |
|
2116 \noindent |
|
2117 |
|
2118 \subsection{Examples on How Injection and Lexer Works} |
|
2119 We will provide a few examples on how $\inj$ and $\lexer$ |
|
2120 works by showing their values in each recursive call on some |
|
2121 concrete examples. |
|
2122 We start with the call $\lexer \; (a+aa)^*\cdot c\; aac$ |
|
2123 on the lexer, note that the value's character constructor |
|
2124 $\Char \;c$ is abbreviated as $c$ for readability. |
|
2125 Similarly the last derivative's sub-expression |
|
2126 is abbreviated as $r_{=0}$\footnote{which is equal to |
|
2127 $((\ZERO + (\ZERO a+ \ZERO))\cdot(a+aa)^* + (\ZERO + \ZERO a)\cdot (a+aa)^*)+ |
|
2128 ((\ZERO+\ZERO a)\cdot(a+aa)^* + (\ZERO+\ZERO a)\cdot(a+aa)^*), |
|
2129 $.} |
|
2130 whose language interpretation is equivalent to that of |
|
2131 $\ZERO$ and therefore not crucial to be displayed fully expanded, |
|
2132 as they will not be injected into.%our example run. |
|
2133 \begin{center} |
|
2134 \begin{tabular}{lcl} |
|
2135 $(a+\textcolor{magenta}{a}\textcolor{blue}{a})^* \cdot \textcolor{red}{c}$ & $\stackrel{\backslash \textcolor{magenta}{a}}{\rightarrow}$ & $((\ONE + \textcolor{magenta}{\ONE} \textcolor{blue}{a})\cdot (a+aa)^*)\cdot \textcolor{red}{c}+\ZERO$\\ |
|
2136 %& $\stackrel{\rightarrow}{\backslash a}$ & $((\ONE + \ONE a)\cdot (a+aa)^*)\cdot c + \ZERO$\\ |
|
2137 & $\stackrel{\backslash \textcolor{blue}{a}}{\rightarrow}$ & |
|
2138 $(((\ZERO+(\ZERO a+ \textcolor{blue}{\ONE}))\cdot (a+aa)^* + (\ONE+\ONE a)\cdot (a+aa)^* )\cdot \textcolor{red}{\mathbf{c}} + \ZERO)+\ZERO$\\ |
|
2139 & $\stackrel{\backslash \textcolor{red}{c}}{\rightarrow}$ & |
|
2140 $((r_{=0}\cdot c + \textcolor{red}{\ONE})+\ZERO)+\ZERO$\\ |
|
2141 & $\stackrel{\mkeps}{\rightarrow}$ & $\Left (\Left \; (\Right \; \textcolor{red}{\Empty}))$ \\ |
|
2142 & $\stackrel{\inj \;\textcolor{red}{c} }{\rightarrow}$ & |
|
2143 $\Left \; (\Left \; (\Seq \;(\Left \; (\Seq \; (\Right \; (\Right\; \textcolor{blue}{\Empty})) \; \Stars \, [])) \; \textcolor{red}{c}))$\\ |
|
2144 & $\stackrel{\inj \;\textcolor{blue}{a}}{\rightarrow}$ & |
|
2145 $\Left\; (\Seq \; (\Seq\; (\Right \; (\Seq \; \textcolor{magenta}{\Empty }\; \textcolor{blue}{ \mathbf{a}}) ) |
|
2146 \;\Stars \,[]) \; \textcolor{red}{c})$\\ |
|
2147 & $\stackrel{\inj \;\textcolor{magenta}{a}}{\rightarrow}$ & $\Seq \; (\Stars \; [\Right \; (\Seq \; \mathbf{\textcolor{magenta}{a}} \; \textcolor{blue}{a})]) \; \textcolor{red}{ c}$ |
|
2148 |
|
2149 %$\inj \; r \; c\A$ |
|
2150 %$\inj \; (a\cdot b)\cdot c \; \Seq \; \ONE \; b$ & $=$ & $(a+e)$\\ |
|
2151 \end{tabular} |
|
2152 \end{center} |
|
2153 \noindent |
|
2154 We have assigned different colours for each character, |
|
2155 as well as their corresponding locations in values and regular expressions. |
|
2156 The most recently injected character is marked with a bold font. |
|
2157 %Their corresponding derivative steps have been marked with the same |
|
2158 %colour. We also mark the specific location where the coloured character |
|
2159 %was injected into with the same colour, before and after the injection occurred. |
|
2160 %TODO: can be also used to motivate injection->blexer in Bitcoded1 |
|
2161 To show the details of how $\inj$ works, |
|
2162 we zoom in to the second injection above, |
|
2163 illustrating the recursive calls involved: |
|
2164 \begin{center} |
|
2165 \begin{tabular}{lcl} |
|
2166 $\inj \;\quad ((\ONE + {\ONE} |
|
2167 \textcolor{blue}{a})\cdot (a+aa)^*)\cdot |
|
2168 c + \ZERO \; \quad \textcolor{blue}{a} \; $ & &\\ |
|
2169 $\quad \Left \; |
|
2170 (\Left \; (\Seq \;(\Left \; (\Seq \; (\Right \; (\Right\; |
|
2171 \textcolor{blue}{\Empty})) \; \Stars \, [])) \; c))$ & \\$=$\\ |
|
2172 $\Left \; (\inj \; ((\ONE + \ONE |
|
2173 \textcolor{blue}{a})\cdot (a+aa)^*)\cdot |
|
2174 c \quad \textcolor{blue}{a} \quad (\Left \; (\Seq\; ( \Left \; (\Seq \; (\Right \; (\Right\; |
|
2175 \textcolor{blue}{\Empty})) \; \Stars \, []) ) \; c ) )\;\;)$ &\\ $=$\\ |
|
2176 $\Left \; (\Seq \; (\inj \; (\ONE + \ONE \textcolor{blue}{a})\cdot(a+aa)^* \quad \textcolor{blue}{a} \quad |
|
2177 (\Left \; (\Seq \; (\Right \; (\Right\;\textcolor{blue}{\Empty})))) ) \; c ) $ & \\$=$\\ |
|
2178 $\Left \; (\Seq \; (\Seq \; (\inj \quad (\ONE + \ONE \textcolor{blue}{a}) \quad \textcolor{blue}{a}\quad \Right \;(\Right \; \textcolor{blue}{\Empty}) ) \; \Stars \,[])\; c)$ &\\ $=$ \\ |
|
2179 $\Left \; (\Seq \; (\Seq \; (\Right \; (\inj \; \ONE \textcolor{blue}{a} \quad \textcolor{blue}{a}\quad (\Right \;\textcolor{blue}{\Empty})))) \; \Stars \, [])$ |
|
2180 & \\ $=$\\ |
|
2181 $\Left \; (\Seq \; (\Seq \; (\Right \; (\Seq \; (\mkeps \; \ONE)\;(\inj \;\textcolor{blue}{a} \; \textcolor{blue}{a} \; \textcolor{blue}{\Empty})) ) ) \; \Stars \, [] )$ |
|
2182 & \\ $=$\\ |
|
2183 $\Left \; (\Seq \; (\Seq \; (\Right \; (\Seq \; \Empty \; \textcolor{blue}{a}) ) ) \; \Stars \, [] )$ |
|
2184 \end{tabular} |
|
2185 \end{center} |
|
2186 \noindent |
|
2187 We will now introduce the proofs related to $\inj$ and |
|
2188 $\lexer$. These proofs have originally been found by Ausaf et al. |
|
2189 in their 2016 work \cite{AusafDyckhoffUrban2016}. |
|
2190 Proofs to some of these lemmas have also been discussed in more detail in |
|
2191 Ausaf's thesis \cite{Ausaf}. |
|
2192 Nevertheless, we still introduce these proofs, to make this thesis |
|
2193 self-contained as we show inductive variants used in these proofs break |
|
2194 after more simplifications are introduced. |
|
2195 %Also note that lemmas like \ref{}described in this thesis is a more faithful |
|
2196 %representation of the actual accompanying Isabelle formalisation, and |
|
2197 %therefore |
|
2198 |
|
2199 |
2113 |
2200 |
2114 Recall that lemma |
2201 Recall that lemma |
2115 \ref{mePosix} tells us that |
2202 \ref{mePosix} tells us that |
2116 $\mkeps$ always generates the POSIX value. |
2203 $\mkeps$ always generates the POSIX value. |
2117 The function $\inj$ preserves the POSIXness, provided |
2204 The function $\inj$ preserves the POSIXness, provided |