# HG changeset patch # User Christian Urban # Date 1503666320 -7200 # Node ID 12772d537b713e31dcff1fc0dbeb7f724cea853f # Parent 6746f5e1f1f84403ad555779728a547a1cd4bb17 updated diff -r 6746f5e1f1f8 -r 12772d537b71 Fahad/results.pdf Binary file Fahad/results.pdf has changed diff -r 6746f5e1f1f8 -r 12772d537b71 Fahad/results.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/Fahad/results.tex Fri Aug 25 15:05:20 2017 +0200 @@ -0,0 +1,59 @@ +\documentclass[11pt]{article} +\usepackage{times} +\usepackage{amsmath} +\usepackage{amssymb} + +\begin{document} + +\noindent +{\LARGE\bf Fahad's results for thesis}\bigskip + +\noindent +Part 1 (regular expressions) + +\begin{itemize} +\item he corrected the paper by Sulzmann and Lu by giving a new definition + for POSIX matching---his proof is completely different than the one + by Sulzmann and Lu and formalised in Isabelle + +\item he extended the results by Sulzmann and Lu to all regular + expression constructors, excluding back references; to deal with + backreferences is probably too difficult for the remaining time + +\item he formalised in Isabelle the definition of POSIX matching by Okui and + Suzuki and showed that it is equivalent to his definition (his + definition is declarative, whereas the one by Okui \& Suzuki is by a + position caluculus---so both definitions are quite different + technically, but the result is that they define the same `thing') + +\item from his correctness result for the algorithm by Sulzmann and + Lu, he proved the correctness of various stages of optimisations: POSIX + matching including simplification of regular expressions; bit-coded + representation of regular expressions has not yet been done, but + should be easy to do in the remaining time +\end{itemize} + +\noindent +Part 2 (TLS protocol in F-Star/internship at Microsoft Research in Cambridge) + +\begin{itemize} +\item he added features to the F-Star language in order to verify their + TLS parser in F-Star + +\item he created a verified byte-library for F-Star, which provides an + API to manipulate byte sequences and indexed arrays + +\item he modified the existing TLS parser in F-Star to use his new byte + library for the representation of strings + +\item verification of the parser is not yet done, but the hope is + he finishes this by the end of his internship (in Sepember) +\end{itemize} + + +\end{document} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% End: diff -r 6746f5e1f1f8 -r 12772d537b71 Literature/backref-automata2.pdf Binary file Literature/backref-automata2.pdf has changed diff -r 6746f5e1f1f8 -r 12772d537b71 thys/Journal/Paper.thy --- a/thys/Journal/Paper.thy Fri Aug 18 14:51:29 2017 +0100 +++ b/thys/Journal/Paper.thy Fri Aug 25 15:05:20 2017 +0200 @@ -197,12 +197,13 @@ YES/NO answer for whether a string is being matched by a regular expression. Sulzmann and Lu~\cite{Sulzmann2014} extended this matcher to allow generation not just of a YES/NO answer but of an actual -matching, called a [lexical] {\em value}. They give a simple algorithm -to calculate a value that appears to be the value associated with -POSIX matching. The challenge then is to specify that value, in an -algorithm-independent fashion, and to show that Sulzmann and Lu's -derivative-based algorithm does indeed calculate a value that is -correct according to the specification. +matching, called a [lexical] {\em value}. \marginpar{explain values; +who introduced them} They give a simple algorithm to calculate a value +that appears to be the value associated with POSIX matching. The +challenge then is to specify that value, in an algorithm-independent +fashion, and to show that Sulzmann and Lu's derivative-based algorithm +does indeed calculate a value that is correct according to the +specification. The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a relation (called an ``order relation'') on the set of values of @{term @@ -480,7 +481,7 @@ a regular expression correspond to the language of a regular expression, namely - \begin{proposition} + \begin{proposition}\label{inhabs} @{thm L_flat_Prf} \end{proposition} @@ -935,25 +936,30 @@ elements. Okui and Suzuki \cite{OkuiSuzuki2010,OkuiSuzukiTech} described - another ordering of values, which they use to establish the correctness of - their automata-based algorithm for POSIX matching. Their ordering - resembles some aspects of the one given by Sulzmann and Lu, but - is quite different. To begin with, Okui and Suzuki identify POSIX - values as least elements in their ordering. A more substantial - difference is that the ordering by Okui - and Suzuki uses \emph{positions} in order to identify and - compare subvalues, whereby positions are lists of natural - numbers. This allows them to quite naturally formalise the Longest - Match and Priority rules of the informal POSIX standard. Consider - for example the value @{term v} of the form @{term "Stars [Seq - (Char x) (Char y), Char z]"}, say. At position @{text "[0,1]"} of - this value is the subvalue @{text "Char y"} and at position @{text - "[1]"} the subvalue @{term "Char z"}. At the `root' position, or - empty list @{term "[]"}, is the whole value @{term v}. The - positions @{text "[0,1,0]"} and @{text "[2]"}, for example, are - outside of @{text v}. If it exists, the subvalue of @{term v} at a - position @{text p}, written @{term "at v p"}, can be recursively - defined by + another ordering of values, which they use to establish the + correctness of their automata-based algorithm for POSIX matching. + Their ordering resembles some aspects of the one given by Sulzmann + and Lu, but is quite different. To begin with, Okui and Suzuki + identify POSIX values as least, rather than maximal, elements in + their ordering. A more substantial difference is that the ordering + by Okui and Suzuki uses \emph{positions} in order to identify and + compare subvalues. Positions are lists of natural numbers. This + allows them to quite naturally formalise the Longest Match and + Priority rules of the informal POSIX standard. Consider for example + the value @{term v} + + \begin{center} + @{term "v == Stars [Seq (Char x) (Char y), Char z]"} + \end{center} + + \noindent + At position @{text "[0,1]"} of this value is the + subvalue @{text "Char y"} and at position @{text "[1]"} the + subvalue @{term "Char z"}. At the `root' position, or empty list + @{term "[]"}, is the whole value @{term v}. The positions @{text + "[0,1,0]"} and @{text "[2]"}, for example, are outside of @{text + v}. If it exists, the subvalue of @{term v} at a position @{text + p}, written @{term "at v p"}, can be recursively defined by \begin{center} \begin{tabular}{r@ {\hspace{0mm}}lcl} @@ -970,7 +976,7 @@ \end{tabular} \end{center} - \noindent We use Isabelle's notation @{term "vs ! n"} for the + \noindent In the last clause we use Isabelle's notation @{term "vs ! n"} for the @{text n}th element in a list. The set of positions inside a value @{text v}, written @{term "Pos v"}, is given by the clauses @@ -1117,11 +1123,12 @@ \noindent We can show that @{term "DUMMY :\val DUMMY"} is a partial order. Okui and Suzuki also show that it is a linear order - for lexical values \cite{OkuiSuzuki2010}, but we have not done - this. What we are going to show below is that for a given @{text r} - and @{text s}, the ordering has a unique minimal element on the set - @{term "LV r s"} , which is the POSIX value we defined in the - previous section. + for lexical values \cite{OkuiSuzuki2010} of a given regular + expression and given string, but we have not done this. It is not + essential for our results. What we are going to show below is that + for a given @{text r} and @{text s}, the ordering has a unique + minimal element on the set @{term "LV r s"}, which is the POSIX value + we defined in the previous section. Lemma 1 @@ -1145,28 +1152,44 @@ @{thm [mode=IfThen] Posix_PosOrd[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} \end{theorem} - \begin{proof} - By induction on our POSIX rules. The two base cases are straightforward: for example - for @{term "v\<^sub>1 = Void"}, we have that @{term "v\<^sub>2 \ LV ONE []"} must also - be of the form \mbox{@{term "v\<^sub>2 = Void"}}. Therfore we have @{term "v\<^sub>1 :\val v\<^sub>2"}. - The inductive cases are as follows: + \begin{proof} By induction on our POSIX rules. It is clear that + @{text "v\<^sub>1"} and @{text "v\<^sub>2"} have the same underlying + string. + + The two base cases are straightforward: for example for @{term + "v\<^sub>1 = Void"}, we have that @{term "v\<^sub>2 \ LV ONE + []"} must also be of the form \mbox{@{term "v\<^sub>2 = + Void"}}. Therefore we have @{term "v\<^sub>1 :\val + v\<^sub>2"}. The inductive cases are as follows: - \begin{itemize} - \item[$\bullet$] Case @{term "s \ (ALT r\<^sub>1 r\<^sub>2) \ (Left w\<^sub>1)"}: - In this case @{term "v\<^sub>1 = Left w\<^sub>1"} and the value @{term "v\<^sub>2"} is either - of the form @{term "Left w\<^sub>2"} or @{term "Right w\<^sub>2"}. In the latter case we - can immediately conclude with @{term "v\<^sub>1 :\val v\<^sub>2"} since a @{text Left}-value - with the same underlying string @{text s} is always smaller or equal than a @{text Right}-value. - In the former case we have @{term "w\<^sub>2 \ LV r\<^sub>1 s"} and can use the induction - hypothesis to infer @{term "w\<^sub>1 :\val w\<^sub>2"}. Because @{term "w\<^sub>1"} - and @{term "w\<^sub>2"} have the same underlying string @{text s}, we can conclude with - @{term "Left w\<^sub>1 :\val Left w\<^sub>2"}. + \begin{itemize} \item[$\bullet$] Case @{term "s \ (ALT r\<^sub>1 + r\<^sub>2) \ (Left w\<^sub>1)"}: In this case @{term + "v\<^sub>1 = Left w\<^sub>1"} and the value @{term "v\<^sub>2"} is + either of the form @{term "Left w\<^sub>2"} or @{term "Right + w\<^sub>2"}. In the latter case we can immediately conclude with + @{term "v\<^sub>1 :\val v\<^sub>2"} since a @{text + Left}-value with the same underlying string @{text s} is always + smaller or equal than a @{text Right}-value. In the former case we + have @{term "w\<^sub>2 \ LV r\<^sub>1 s"} and can use the + induction hypothesis to infer @{term "w\<^sub>1 :\val + w\<^sub>2"}. Because @{term "w\<^sub>1"} and @{term "w\<^sub>2"} + have the same underlying string @{text s}, we can conclude with + @{term "Left w\<^sub>1 :\val Left w\<^sub>2"}.\smallskip - \item[$\bullet$] Case @{term "s \ (ALT r\<^sub>1 r\<^sub>2) \ (Right w\<^sub>1)"}: - Similarly for the case where - @{term "v\<^sub>1 = Right w\<^sub>1"}. + \item[$\bullet$] Case @{term "s \ (ALT r\<^sub>1 r\<^sub>2) + \ (Right w\<^sub>1)"}: This case similar as the previous + case, except that we know that @{term "s \ L + r\<^sub>1"}. This is needed when @{term "v\<^sub>2 = Left + w\<^sub>2"}: since \mbox{@{term "flat v\<^sub>2 = flat w\<^sub>2"} + @{text "= s"}} and @{term "\ w\<^sub>2 : r\<^sub>1"}, we + can derive a contradiction using Prop.~\ref{inhabs}. So also in this + case \mbox{@{term "v\<^sub>1 :\val v\<^sub>2"}}.\smallskip - \item[$\bullet$] + \item[$\bullet$] Case @{term "(s\<^sub>1 @ s\<^sub>2) \ (SEQ r\<^sub>1 r\<^sub>2) + \ (Seq w\<^sub>1 w\<^sub>2)"}: Assume @{term "v\<^sub>2 = + Seq (u\<^sub>1) (u\<^sub>2)"} with @{term "\ u\<^sub>1 : r\<^sub>1"} + and \mbox{@{term "\ u\<^sub>2 : r\<^sub>2"}}. We have + \end{itemize} \end{proof} diff -r 6746f5e1f1f8 -r 12772d537b71 thys/Journal/document/root.tex --- a/thys/Journal/document/root.tex Fri Aug 18 14:51:29 2017 +0100 +++ b/thys/Journal/document/root.tex Fri Aug 25 15:05:20 2017 +0200 @@ -70,7 +70,10 @@ expressions. They can be used for a very simple regular expression matching algorithm. Sulzmann and Lu cleverly extended this algorithm in order to deal with POSIX matching, which is the underlying -disambiguation strategy for regular expressions needed in lexers. In +disambiguation strategy for regular expressions needed in lexers. +Their algorithm generates POSIX values which encode the information of +\emph{how} a regular expression matched a string---that is which part +of the string is matched by which part of the regular expression. In the first part of this paper we give our inductive definition of what a POSIX value is and show $(i)$ that such a value is unique (for given regular expression and string being matched) and $(ii)$ that Sulzmann diff -r 6746f5e1f1f8 -r 12772d537b71 thys/Paper/Paper.thy --- a/thys/Paper/Paper.thy Fri Aug 18 14:51:29 2017 +0100 +++ b/thys/Paper/Paper.thy Fri Aug 25 15:05:20 2017 +0200 @@ -143,7 +143,7 @@ YES/NO answer for whether a string is being matched by a regular expression. Sulzmann and Lu~\cite{Sulzmann2014} extended this matcher to allow generation not just of a YES/NO answer but of an actual -matching, called a [lexical] {\em value}. They give a simple algorithm +matching, called a [lexical] {\em value}. \marginpar{explain what values are} They give a simple algorithm to calculate a value that appears to be the value associated with POSIX matching. The challenge then is to specify that value, in an algorithm-independent fashion, and to show that Sulzmann and Lu's diff -r 6746f5e1f1f8 -r 12772d537b71 thys/Positions.thy --- a/thys/Positions.thy Fri Aug 18 14:51:29 2017 +0100 +++ b/thys/Positions.thy Fri Aug 25 15:05:20 2017 +0200 @@ -38,6 +38,7 @@ shows "[] \ Pos v" by (induct v rule: Pos.induct)(auto) + abbreviation "intlen vs \ int (length vs)" @@ -133,6 +134,14 @@ "v1 \val p v2 \ pflat_len v1 p > pflat_len v2 p \ (\q \ Pos v1 \ Pos v2. q \lex p \ pflat_len v1 q = pflat_len v2 q)" +lemma test: + shows "v1 \val p v2 \ + pflat_len v1 p > pflat_len v2 p \ + (\q \ Pos v1. q \lex p \ pflat_len v1 q = pflat_len v2 q) \ + (\q \ Pos v2. q \lex p \ pflat_len v1 q = pflat_len v2 q)" +unfolding PosOrd_def +apply(auto) +done definition PosOrd_ex:: "val \ val \ bool" ("_ :\val _" [60, 59] 60) @@ -233,8 +242,6 @@ by auto - - lemma PosOrd_shorterE: assumes "v1 :\val v2" shows "length (flat v2) \ length (flat v1)" @@ -257,73 +264,70 @@ unfolding prefix_list_def sprefix_list_def by (metis append_Nil2 append_eq_conv_conj drop_all le_less_linear) +lemma pflat_len_inside: + assumes "pflat_len v2 p < pflat_len v1 p" + shows "p \ Pos v1" +using assms +unfolding pflat_len_def +by (auto split: if_splits) lemma PosOrd_Left_Right: assumes "flat v1 = flat v2" shows "Left v1 :\val Right v2" unfolding PosOrd_ex_def apply(rule_tac x="[0]" in exI) -using assms -apply(auto simp add: PosOrd_def pflat_len_simps) +apply(auto simp add: PosOrd_def pflat_len_simps assms) done +lemma PosOrd_LeftE: + assumes "Left v1 :\val Left v2" "flat v1 = flat v2" + shows "v1 :\val v2" +using assms +unfolding PosOrd_ex_def test +apply(auto simp add: pflat_len_simps) +apply(frule pflat_len_inside) +apply(auto simp add: pflat_len_simps) +by (metis lex_simps(3) pflat_len_simps(3)) + +lemma PosOrd_LeftI: + assumes "v1 :\val v2" "flat v1 = flat v2" + shows "Left v1 :\val Left v2" +using assms +unfolding PosOrd_ex_def test +apply(auto simp add: pflat_len_simps) +by (metis less_numeral_extra(3) lex_simps(3) pflat_len_simps(3)) + lemma PosOrd_Left_eq: - assumes "flat v = flat v'" - shows "Left v :\val Left v' \ v :\val v'" + assumes "flat v1 = flat v2" + shows "Left v1 :\val Left v2 \ v1 :\val v2" +using assms PosOrd_LeftE PosOrd_LeftI +by blast + + +lemma PosOrd_RightE: + assumes "Right v1 :\val Right v2" "flat v1 = flat v2" + shows "v1 :\val v2" using assms -unfolding PosOrd_ex_def -apply(auto) -apply(case_tac p) -apply(simp add: PosOrd_def pflat_len_simps) -apply(case_tac a) -apply(simp add: PosOrd_def pflat_len_simps) -apply(rule_tac x="list" in exI) -apply(auto simp add: PosOrd_def pflat_len_simps)[1] -apply (smt Un_def lex_list.intros(2) mem_Collect_eq pflat_len_simps(3)) -apply (smt Collect_disj_eq lex_list.intros(2) mem_Collect_eq pflat_len_simps(3)) -apply(auto simp add: PosOrd_def pflat_len_outside)[1] -apply(rule_tac x="0#p" in exI) -apply(auto simp add: PosOrd_def pflat_len_simps) -done - +unfolding PosOrd_ex_def test +apply(auto simp add: pflat_len_simps) +apply(frule pflat_len_inside) +apply(auto simp add: pflat_len_simps) +by (metis lex_simps(3) pflat_len_simps(5)) lemma PosOrd_RightI: - assumes "v :\val v'" "flat v = flat v'" - shows "Right v :\val Right v'" -using assms(1) -unfolding PosOrd_ex_def -apply(auto) -apply(rule_tac x="Suc 0#p" in exI) -using assms(2) -apply(auto simp add: PosOrd_def pflat_len_simps) -done - -lemma PosOrd_RightE: - assumes "Right v1 :\val Right v2" - shows "v1 :\val v2" + assumes "v1 :\val v2" "flat v1 = flat v2" + shows "Right v1 :\val Right v2" using assms -apply(simp add: PosOrd_ex_def) -apply(erule exE) -apply(case_tac p) -apply(simp add: PosOrd_def) +unfolding PosOrd_ex_def test apply(auto simp add: pflat_len_simps) -apply(rule_tac x="[]" in exI) -apply(simp add: Pos_empty pflat_len_simps) -apply(case_tac a) -apply(simp add: pflat_len_def PosOrd_def) -apply(case_tac nat) -prefer 2 -apply(simp add: pflat_len_def PosOrd_def) -apply(auto simp add: pflat_len_simps PosOrd_def) -apply(rule_tac x="list" in exI) -apply(auto) -apply(drule_tac x="Suc 0#q" in bspec) -apply(simp) -apply(simp add: pflat_len_simps) -apply(drule_tac x="Suc 0#q" in bspec) -apply(simp) -apply(simp add: pflat_len_simps) -done +by (metis lex_simps(3) nat_neq_iff pflat_len_simps(5)) + + +lemma PosOrd_Right_eq: + assumes "flat v1 = flat v2" + shows "Right v1 :\val Right v2 \ v1 :\val v2" +using assms PosOrd_RightE PosOrd_RightI +by blast lemma PosOrd_SeqI1: