ninems/ninems.tex
changeset 79 481c8000de6d
parent 77 058133a9ffe0
child 80 d9d61a648292
equal deleted inserted replaced
78:a67aff8fb06a 79:481c8000de6d
  1136   Figure 6 yields POSIX parse trees. We have tested this claim
  1136   Figure 6 yields POSIX parse trees. We have tested this claim
  1137   extensively by using the method in Figure~3 as a reference but yet
  1137   extensively by using the method in Figure~3 as a reference but yet
  1138   have to work out all proof details.''
  1138   have to work out all proof details.''
  1139 \end{quote}  
  1139 \end{quote}  
  1140 
  1140 
  1141 \noindent
  1141 \noindent We would settle this correctness claim. It is relatively
  1142 We would settle this correctness claim. It is relatively straightforward
  1142 straightforward to establish that after one simplification step, the part of a
  1143 to establish that after one simplification step, the part of a nullable
  1143 nullable derivative that corresponds to a POSIX value remains intact and can
  1144 derivative that corresponds to a POSIX value remains intact and can
  1144 still be collected, in other words, we can show that\comment{Double-check....I
  1145 still be collected, in other words, we can show that\comment{Double-check....I think this  is not the case}
  1145 think this  is not the case}
  1146 
  1146 
  1147 \begin{center}
  1147 \begin{center}
  1148 $\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$
  1148 $\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$
  1149 \end{center}
  1149 \end{center}
  1150 
  1150 
  1152 as this basically comes down to proving actions like removing the
  1152 as this basically comes down to proving actions like removing the
  1153 additional $r$ in $r+r$  does not delete important POSIX information in
  1153 additional $r$ in $r+r$  does not delete important POSIX information in
  1154 a regular expression. The hard part of this proof is to establish that
  1154 a regular expression. The hard part of this proof is to establish that
  1155 
  1155 
  1156 \begin{center}
  1156 \begin{center}
  1157 $\textit{bmkeps} \; \textit{blexer}\_{simp} \; r = \textit{bmkeps} \; \textit{blexer} \; \textit{simp} \; r$
  1157 	$\textit{bmkeps} \; \textit{blexer}\_{simp}(s, \; r) = \textit{bmkeps} \; \textit{blexer} \; \textit{simp}(s, \; r)$
  1158 \end{center}
  1158 \end{center}
  1159 
  1159 
  1160 \noindent\comment{OK from here on you still need to work. Did not read.}
  1160 \noindent That is, if we do derivative on regular expression $r$ and then
  1161 That is, if we do derivative on regular expression r and the simplified version, 
  1161 simplify it, and repeat this process until we exhaust the string, we get a
  1162 they can still provide the same POSIX value if there is one . 
  1162 regular expression $r''$ that provides the POSIX matching as the result $r'$ of
  1163 This is not as straightforward as the previous proposition, as the two regular expressions $r$ and $\textit{simp}\; r$
  1163 the normal derivative algorithm which only does derivative operation repeatedly
  1164   might become very different regular expressions after repeated application of $\textit{simp}$ and derivative.
  1164 and has no simplification at all.  This might seem at first glance very
  1165 The crucial point is to find the indispensable information of 
  1165 unintuitive, as the $r'$ is exponentially larger than $r''$. But this can be
  1166 a regular expression and how it is kept intact during simplification so that it performs 
  1166 explained in the following way: we are pruning away the possible matches that
  1167 as good as a regular expression that has not been simplified in the subsequent derivative operations.
  1167 are not POSIX. Since there are exponentially non-POSIX matchings and only 1
  1168 To aid this, we use the helping function retrieve described by Sulzmann and Lu:
  1168 POSIX matching, it is understandable that our $r''$ can be a lot smaller.  we
  1169 \\definition of retrieve\\
  1169 can still provide the same POSIX value if there is one.  This is not as
       
  1170 straightforward as the previous proposition, as the two regular expressions $r$
       
  1171 and $\textit{simp}\; r$ might become very different regular expressions after
       
  1172 repeated application of $\textit{simp}$ and derivative.  The crucial point is
       
  1173 to find the indispensable information of a regular expression and how it is
       
  1174 kept intact during simplification so that it performs as good as a regular
       
  1175 expression that has not been simplified in the subsequent derivative
       
  1176 operations.  To aid this, we use the helping function retrieve described by
       
  1177 Sulzmann and Lu: \\definition of retrieve\\
  1170  
  1178  
  1171 This function assembled the bitcode that corresponds to a parse tree for
  1179 This function assembled the bitcode that corresponds to a parse tree for
  1172 how the current derivative matches the suffix of the string(the
  1180 how the current derivative matches the suffix of the string(the
  1173 characters that have not yet appeared, but is stored in the value).
  1181 characters that have not yet appeared, but is stored in the value).
  1174 Sulzmann and Lu used this to connect the bitcoded algorithm to the older
  1182 Sulzmann and Lu used this to connect the bitcoded algorithm to the older
  1175 algorithm by the following equation:
  1183 algorithm by the following equation:
  1176  
  1184  
  1177  \begin{center}
  1185  \begin{center} $inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\;
  1178  $inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\; ((\textit{internalise}\; r)\backslash_{simp} c) v)$
  1186 	 ((\textit{internalise}\; r)\backslash_{simp} c) v)$ \end{center} A
  1179  \end{center}
  1187 	 little fact that needs to be stated to help comprehension:
  1180  A little fact that needs to be stated to help comprehension:
  1188 	 \begin{center} $r^\uparrow = a$($a$ stands for $\textit{annotated}).$
  1181  \begin{center}
  1189 	 \end{center} Ausaf and Urban also used this fact to prove  the
  1182  $r^\uparrow = a$($a$ stands for $\textit{annotated}).$
  1190 	 correctness of bitcoded algorithm without simplification.  Our purpose
  1183  \end{center}
  1191 	 of using this, however, is try to establish \\ $ \textit{retrieve} \;
  1184  Ausaf and Urban also used this fact to prove  the correctness of bitcoded algorithm without simplification.
  1192 	 a \; v \;=\; \textit{retrieve}  \; \textit{simp}(a) \; v'.$\\ The idea
  1185  Our purpose of using this, however, is try to establish \\
  1193 	 is that using $v'$, a simplified version of $v$ that possibly had gone
  1186 $ \textit{retrieve} \; a \; v \;=\; \textit{retrieve}  \; \textit{simp}(a) \; v'.$\\
  1194 	 through the same simplification step as $\textit{simp}(a)$ we are
  1187  The idea is that using $v'$,
  1195 	 still  able to extract the bit-sequence that gives the same parsing
  1188   a simplified version of $v$ that possibly had gone through the same simplification step as $\textit{simp}(a)$ we are still  able to extract the bit-sequence that gives the same parsing information as the unsimplified one.
  1196 	 information as the unsimplified one.  After establishing this, we
  1189  After establishing this, we might be able to finally bridge the gap of proving\\
  1197 	 might be able to finally bridge the gap of proving\\
  1190  $\textit{retrieve} \; r   \backslash  s \; v = \;\textit{retrieve} \; \textit{simp}(r)  \backslash  s \; v'$\\
  1198 	 $\textit{retrieve} \; r   \backslash  s \; v = \;\textit{retrieve} \;
  1191  and subsequently\\
  1199 	 \textit{simp}(r)  \backslash  s \; v'$\\ and subsequently\\
  1192  $\textit{retrieve} \; r \backslash  s \; v\; = \; \textit{retrieve} \; r  \backslash_{simp}   s \; v'$.\\
  1200 	 $\textit{retrieve} \; r \backslash  s \; v\; = \; \textit{retrieve} \;
  1193  This proves that our simplified version of regular expression still contains all the bitcodes needed.
  1201 	 r  \backslash_{simp}   s \; v'$.\\ This proves that our simplified
       
  1202 	 version of regular expression still contains all the bitcodes needed.
  1194 
  1203 
  1195 
  1204 
  1196 The second task is to speed up the more aggressive simplification.
  1205 The second task is to speed up the more aggressive simplification.
  1197 Currently it is slower than a naive simplification(the naive version as
  1206 Currently it is slower than a naive simplification(the naive version as
  1198 implemented in ADU of course can explode in some cases). So it needs to
  1207 implemented in ADU of course can explode in some cases). So it needs to