thys/Paper/Paper.thy
changeset 177 e85d10b238d0
parent 176 f1d800062d4f
child 178 2835d13be702
equal deleted inserted replaced
176:f1d800062d4f 177:e85d10b238d0
   822   r)"}} is nullable iff @{term r} is, and so on. The purpose of the record
   822   r)"}} is nullable iff @{term r} is, and so on. The purpose of the record
   823   regular expression is to mark certain parts of a regular expression and
   823   regular expression is to mark certain parts of a regular expression and
   824   then record in the calculated value which parts of the string were matched
   824   then record in the calculated value which parts of the string were matched
   825   by this part. The label can then serve as classification for the tokens.
   825   by this part. The label can then serve as classification for the tokens.
   826   For this recall the regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for
   826   For this recall the regular expression @{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\<star>"} for
   827   keywords and identifiers from the Introduction. With record regular
   827   keywords and identifiers from the Introduction. With the record regular
   828   expression we can form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}}
   828   expression we can form \mbox{@{text "((key : r\<^bsub>key\<^esub>) + (id : r\<^bsub>id\<^esub>))\<^sup>\<star>"}}
   829   and then traverse the calculated value and only collect the underlying
   829   and then traverse the calculated value and only collect the underlying
   830   strings in record values. With this we obtain finite sequences of pairs of
   830   strings in record values. With this we obtain finite sequences of pairs of
   831   labels and strings, for example
   831   labels and strings, for example
   832 
   832 
   847   steps.
   847   steps.
   848 
   848 
   849   While the simplification of regular expressions according to 
   849   While the simplification of regular expressions according to 
   850   rules like
   850   rules like
   851 
   851 
   852   \begin{center}
   852   \begin{equation}\label{Simpl}
   853   \begin{tabular}{lcl}
   853   \begin{array}{lcl}
   854   @{term "ALT ZERO r"} & @{text "\<Rightarrow>"} & @{term r}\\
   854   @{term "ALT ZERO r"} & @{text "\<Rightarrow>"} & @{term r}\\
   855   @{term "ALT r ZERO"} & @{text "\<Rightarrow>"} & @{term r}\\
   855   @{term "ALT r ZERO"} & @{text "\<Rightarrow>"} & @{term r}\\
   856   @{term "SEQ ONE r"}  & @{text "\<Rightarrow>"} & @{term r}\\
   856   @{term "SEQ ONE r"}  & @{text "\<Rightarrow>"} & @{term r}\\
   857   @{term "SEQ r ONE"}  & @{text "\<Rightarrow>"} & @{term r}
   857   @{term "SEQ r ONE"}  & @{text "\<Rightarrow>"} & @{term r}
   858   \end{tabular}
   858   \end{array}
   859   \end{center}
   859   \end{equation}
   860 
   860 
   861   \noindent is well understood, there is an obstacle with the POSIX value
   861   \noindent is well understood, there is an obstacle with the POSIX value
   862   calculation algorithm by Sulzmann and Lu: if we build a derivative regular
   862   calculation algorithm by Sulzmann and Lu: if we build a derivative regular
   863   expression and then simplify it, we will calculate a POSIX value for this
   863   expression and then simplify it, we will calculate a POSIX value for this
   864   simplified derivative regular expression, \emph{not} for the original (unsimplified)
   864   simplified derivative regular expression, \emph{not} for the original (unsimplified)
   890   \end{tabular}
   890   \end{tabular}
   891   \end{center}
   891   \end{center}
   892 
   892 
   893   \noindent
   893   \noindent
   894   The functions @{text "simp\<^bsub>Alt\<^esub>"} and @{text "simp\<^bsub>Seq\<^esub>"} encode the simplification rules
   894   The functions @{text "simp\<^bsub>Alt\<^esub>"} and @{text "simp\<^bsub>Seq\<^esub>"} encode the simplification rules
   895   and compose the rectification functions. The main simplification function is then 
   895   in \eqref{Simpl} and compose the rectification functions (simplifications can occur
       
   896   deep inside the regular expression). The main simplification function is then 
   896 
   897 
   897   \begin{center}
   898   \begin{center}
   898   \begin{tabular}{lcl}
   899   \begin{tabular}{lcl}
   899   @{term "simp (ALT r\<^sub>1 r\<^sub>2)"} & $\dn$ & @{term "simp_ALT (simp r\<^sub>1) (simp r\<^sub>2)"}\\
   900   @{term "simp (ALT r\<^sub>1 r\<^sub>2)"} & $\dn$ & @{term "simp_ALT (simp r\<^sub>1) (simp r\<^sub>2)"}\\
   900   @{term "simp (SEQ r\<^sub>1 r\<^sub>2)"} & $\dn$ & @{term "simp_SEQ (simp r\<^sub>1) (simp r\<^sub>2)"}\\
   901   @{term "simp (SEQ r\<^sub>1 r\<^sub>2)"} & $\dn$ & @{term "simp_SEQ (simp r\<^sub>1) (simp r\<^sub>2)"}\\
   922   \noindent
   923   \noindent
   923   In the second clause we first calculate the derivative @{term "der c r"}
   924   In the second clause we first calculate the derivative @{term "der c r"}
   924   and then simplify the result. This gives us a simplified derivative
   925   and then simplify the result. This gives us a simplified derivative
   925   @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The lexer
   926   @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The lexer
   926   is then recursively called with the simplified derivative, but before
   927   is then recursively called with the simplified derivative, but before
   927   we inject the character @{term c} into the value, we need to rectify
   928   we inject the character @{term c} into the value @{term v}, we need to rectify
   928   it (that is construct @{term "f\<^sub>r v"}). We can prove that
   929   @{term v} (that is construct @{term "f\<^sub>r v"}). We can prove that
   929 
   930 
   930   \begin{theorem}
   931   \begin{theorem}
   931   @{thm slexer_correctness}
   932   @{thm slexer_correctness}
   932   \end{theorem}
   933   \end{theorem}
   933 
   934 
  1171   \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}.\medskip
  1172   \url{http://www.inf.kcl.ac.uk/staff/urbanc/lex}.\medskip
  1172 
  1173 
  1173   \noindent
  1174   \noindent
  1174   {\bf Acknowledgements:}
  1175   {\bf Acknowledgements:}
  1175   We are very grateful to Martin Sulzmann for his comments on our work and 
  1176   We are very grateful to Martin Sulzmann for his comments on our work and 
  1176   also patiently explaining to us the details in \cite{Sulzmann2014}.
  1177   moreover patiently explaining to us the details in \cite{Sulzmann2014}. We
       
  1178   also received very helpful comments from James Cheney and anonymous referees.
  1177   \small
  1179   \small
  1178   \bibliographystyle{plain}
  1180   \bibliographystyle{plain}
  1179   \bibliography{root}
  1181   \bibliography{root}
  1180 
  1182 
  1181 *}
  1183 *}