updated
authorChristian Urban <urbanc@in.tum.de>
Fri, 25 Aug 2017 15:05:20 +0200
changeset 269 12772d537b71
parent 268 6746f5e1f1f8
child 270 462d893ecb3d
updated
Fahad/results.pdf
Fahad/results.tex
Literature/backref-automata2.pdf
thys/Journal/Paper.thy
thys/Journal/document/root.tex
thys/Paper/Paper.thy
thys/Positions.thy
Binary file Fahad/results.pdf has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Fahad/results.tex	Fri Aug 25 15:05:20 2017 +0200
@@ -0,0 +1,59 @@
+\documentclass[11pt]{article}
+\usepackage{times}
+\usepackage{amsmath}
+\usepackage{amssymb}
+
+\begin{document}
+
+\noindent
+{\LARGE\bf Fahad's results for thesis}\bigskip
+
+\noindent
+Part 1 (regular expressions)
+
+\begin{itemize}
+\item he corrected the paper by Sulzmann and Lu by giving a new definition
+  for POSIX matching---his proof is completely different than the one
+  by Sulzmann and Lu and formalised in Isabelle
+  
+\item he extended the results by Sulzmann and Lu to all regular
+  expression constructors, excluding back references; to deal with
+  backreferences is probably too difficult for the remaining time
+
+\item he formalised in Isabelle the definition of POSIX matching by Okui and
+  Suzuki and showed that it is equivalent to his definition (his
+  definition is declarative, whereas the one by Okui \& Suzuki is by a
+  position caluculus---so both definitions are quite different
+  technically, but the result is that they define the same `thing')
+
+\item from his correctness result for the algorithm by Sulzmann and
+  Lu, he proved the correctness of various stages of optimisations: POSIX
+  matching including simplification of regular expressions; bit-coded
+  representation of regular expressions has not yet been done, but
+  should be easy to do in the remaining time
+\end{itemize}  
+
+\noindent
+Part 2 (TLS protocol in F-Star/internship at Microsoft Research in Cambridge)
+
+\begin{itemize}
+\item he added features to the F-Star language in order to verify their
+  TLS parser in F-Star
+  
+\item he created a verified byte-library for F-Star, which provides an
+  API to manipulate byte sequences and indexed arrays 
+
+\item he modified the existing TLS parser in F-Star to use his new byte
+  library for the representation of strings
+
+\item verification of the parser is not yet done, but the hope is
+  he finishes this by the end of his internship (in Sepember)
+\end{itemize}
+
+
+\end{document}
+
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: t
+%%% End:
Binary file Literature/backref-automata2.pdf has changed
--- a/thys/Journal/Paper.thy	Fri Aug 18 14:51:29 2017 +0100
+++ b/thys/Journal/Paper.thy	Fri Aug 25 15:05:20 2017 +0200
@@ -197,12 +197,13 @@
 YES/NO answer for whether a string is being matched by a regular
 expression.  Sulzmann and Lu~\cite{Sulzmann2014} extended this matcher
 to allow generation not just of a YES/NO answer but of an actual
-matching, called a [lexical] {\em value}. They give a simple algorithm
-to calculate a value that appears to be the value associated with
-POSIX matching.  The challenge then is to specify that value, in an
-algorithm-independent fashion, and to show that Sulzmann and Lu's
-derivative-based algorithm does indeed calculate a value that is
-correct according to the specification.
+matching, called a [lexical] {\em value}. \marginpar{explain values;
+who introduced them} They give a simple algorithm to calculate a value
+that appears to be the value associated with POSIX matching.  The
+challenge then is to specify that value, in an algorithm-independent
+fashion, and to show that Sulzmann and Lu's derivative-based algorithm
+does indeed calculate a value that is correct according to the
+specification.
 
 The answer given by Sulzmann and Lu \cite{Sulzmann2014} is to define a
 relation (called an ``order relation'') on the set of values of @{term
@@ -480,7 +481,7 @@
   a regular expression correspond to the language of a regular
   expression, namely
 
-  \begin{proposition}
+  \begin{proposition}\label{inhabs}
   @{thm L_flat_Prf}
   \end{proposition}
 
@@ -935,25 +936,30 @@
   elements.
  
   Okui and Suzuki \cite{OkuiSuzuki2010,OkuiSuzukiTech} described
-   another ordering of values, which they use to establish the correctness of
-   their automata-based algorithm for POSIX matching.  Their ordering
-   resembles some aspects of the one given by Sulzmann and Lu, but
-   is quite different. To begin with, Okui and Suzuki identify POSIX
-   values as least elements in their ordering. A more substantial 
-    difference is that the ordering by Okui
-   and Suzuki uses \emph{positions} in order to identify and
-   compare subvalues, whereby positions are lists of natural
-   numbers. This allows them to quite naturally formalise the Longest
-   Match and Priority rules of the informal POSIX standard.  Consider
-   for example the value @{term v} of the form @{term "Stars [Seq
-   (Char x) (Char y), Char z]"}, say.  At position @{text "[0,1]"} of
-   this value is the subvalue @{text "Char y"} and at position @{text
-   "[1]"} the subvalue @{term "Char z"}.  At the `root' position, or
-   empty list @{term "[]"}, is the whole value @{term v}. The
-   positions @{text "[0,1,0]"} and @{text "[2]"}, for example, are
-   outside of @{text v}. If it exists, the subvalue of @{term v} at a
-   position @{text p}, written @{term "at v p"}, can be recursively
-   defined by
+  another ordering of values, which they use to establish the
+  correctness of their automata-based algorithm for POSIX matching.
+  Their ordering resembles some aspects of the one given by Sulzmann
+  and Lu, but is quite different. To begin with, Okui and Suzuki
+  identify POSIX values as least, rather than maximal, elements in
+  their ordering. A more substantial difference is that the ordering
+  by Okui and Suzuki uses \emph{positions} in order to identify and
+  compare subvalues. Positions are lists of natural numbers. This
+  allows them to quite naturally formalise the Longest Match and
+  Priority rules of the informal POSIX standard.  Consider for example
+  the value @{term v}
+
+  \begin{center}
+  @{term "v == Stars [Seq (Char x) (Char y), Char z]"}
+  \end{center}
+
+  \noindent
+  At position @{text "[0,1]"} of this value is the
+  subvalue @{text "Char y"} and at position @{text "[1]"} the
+  subvalue @{term "Char z"}.  At the `root' position, or empty list
+  @{term "[]"}, is the whole value @{term v}. The positions @{text
+  "[0,1,0]"} and @{text "[2]"}, for example, are outside of @{text
+  v}. If it exists, the subvalue of @{term v} at a position @{text
+  p}, written @{term "at v p"}, can be recursively defined by
   
   \begin{center}
   \begin{tabular}{r@ {\hspace{0mm}}lcl}
@@ -970,7 +976,7 @@
   \end{tabular} 
   \end{center}
 
-  \noindent We use Isabelle's notation @{term "vs ! n"} for the
+  \noindent In the last clause we use Isabelle's notation @{term "vs ! n"} for the
   @{text n}th element in a list.  The set of positions inside a value @{text v},
   written @{term "Pos v"}, is given by the clauses
 
@@ -1117,11 +1123,12 @@
 
  \noindent We can show that @{term "DUMMY :\<sqsubseteq>val DUMMY"} is
  a partial order.  Okui and Suzuki also show that it is a linear order
- for lexical values \cite{OkuiSuzuki2010}, but we have not done
- this. What we are going to show below is that for a given @{text r}
- and @{text s}, the ordering has a unique minimal element on the set
- @{term "LV r s"} , which is the POSIX value we defined in the
- previous section.
+ for lexical values \cite{OkuiSuzuki2010} of a given regular
+ expression and given string, but we have not done this. It is not
+ essential for our results. What we are going to show below is that
+ for a given @{text r} and @{text s}, the ordering has a unique
+ minimal element on the set @{term "LV r s"}, which is the POSIX value
+ we defined in the previous section.
 
 
  Lemma 1
@@ -1145,28 +1152,44 @@
   @{thm [mode=IfThen] Posix_PosOrd[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]}
   \end{theorem}
 
-  \begin{proof}
-  By induction on our POSIX rules. The two base cases are straightforward: for example 
-  for @{term "v\<^sub>1 = Void"}, we have that @{term "v\<^sub>2 \<in> LV ONE []"} must also 
-  be of the form \mbox{@{term "v\<^sub>2 = Void"}}. Therfore we have @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"}.
-  The inductive cases are as follows:
+  \begin{proof} By induction on our POSIX rules. It is clear that
+  @{text "v\<^sub>1"} and @{text "v\<^sub>2"} have the same underlying
+  string.
+
+  The two base cases are straightforward: for example for @{term
+  "v\<^sub>1 = Void"}, we have that @{term "v\<^sub>2 \<in> LV ONE
+  []"} must also be of the form \mbox{@{term "v\<^sub>2 =
+  Void"}}. Therefore we have @{term "v\<^sub>1 :\<sqsubseteq>val
+  v\<^sub>2"}.  The inductive cases are as follows:
 
-  \begin{itemize}
-  \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2) \<rightarrow> (Left w\<^sub>1)"}:
-  In this case @{term "v\<^sub>1 = Left w\<^sub>1"} and the value @{term "v\<^sub>2"} is either 
-  of the form @{term "Left w\<^sub>2"} or @{term "Right w\<^sub>2"}. In the latter case we 
-  can immediately conclude with @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"} since a @{text Left}-value 
-  with the same underlying string @{text s} is always smaller or equal than a @{text Right}-value.
-  In the former case we have @{term "w\<^sub>2 \<in> LV r\<^sub>1 s"} and can use the induction
-  hypothesis to infer @{term "w\<^sub>1 :\<sqsubseteq>val w\<^sub>2"}. Because @{term "w\<^sub>1"}
-  and @{term "w\<^sub>2"} have the same underlying string @{text s}, we can conclude with 
-  @{term "Left w\<^sub>1 :\<sqsubseteq>val Left w\<^sub>2"}. 
+  \begin{itemize} \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1
+  r\<^sub>2) \<rightarrow> (Left w\<^sub>1)"}: In this case @{term
+  "v\<^sub>1 = Left w\<^sub>1"} and the value @{term "v\<^sub>2"} is
+  either of the form @{term "Left w\<^sub>2"} or @{term "Right
+  w\<^sub>2"}. In the latter case we can immediately conclude with
+  @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"} since a @{text
+  Left}-value with the same underlying string @{text s} is always
+  smaller or equal than a @{text Right}-value.  In the former case we
+  have @{term "w\<^sub>2 \<in> LV r\<^sub>1 s"} and can use the
+  induction hypothesis to infer @{term "w\<^sub>1 :\<sqsubseteq>val
+  w\<^sub>2"}. Because @{term "w\<^sub>1"} and @{term "w\<^sub>2"}
+  have the same underlying string @{text s}, we can conclude with
+  @{term "Left w\<^sub>1 :\<sqsubseteq>val Left w\<^sub>2"}.\smallskip
 
-  \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2) \<rightarrow> (Right w\<^sub>1)"}:
-  Similarly for the case where
-  @{term "v\<^sub>1 = Right w\<^sub>1"}.
+  \item[$\bullet$] Case @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2)
+  \<rightarrow> (Right w\<^sub>1)"}: This case similar as the previous
+  case, except that we know that @{term "s \<notin> L
+  r\<^sub>1"}. This is needed when @{term "v\<^sub>2 = Left
+  w\<^sub>2"}: since \mbox{@{term "flat v\<^sub>2 = flat w\<^sub>2"}
+  @{text "= s"}} and @{term "\<Turnstile> w\<^sub>2 : r\<^sub>1"}, we
+  can derive a contradiction using Prop.~\ref{inhabs}. So also in this
+  case \mbox{@{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"}}.\smallskip
 
-  \item[$\bullet$]
+  \item[$\bullet$]  Case @{term "(s\<^sub>1 @ s\<^sub>2) \<in> (SEQ r\<^sub>1 r\<^sub>2)
+  \<rightarrow> (Seq w\<^sub>1 w\<^sub>2)"}: Assume @{term "v\<^sub>2 =
+  Seq (u\<^sub>1) (u\<^sub>2)"} with @{term "\<Turnstile> u\<^sub>1 : r\<^sub>1"}
+  and \mbox{@{term "\<Turnstile> u\<^sub>2 : r\<^sub>2"}}. We have
+  
   \end{itemize}
   \end{proof}
 
--- a/thys/Journal/document/root.tex	Fri Aug 18 14:51:29 2017 +0100
+++ b/thys/Journal/document/root.tex	Fri Aug 25 15:05:20 2017 +0200
@@ -70,7 +70,10 @@
 expressions. They can be used for a very simple regular expression
 matching algorithm.  Sulzmann and Lu cleverly extended this algorithm
 in order to deal with POSIX matching, which is the underlying
-disambiguation strategy for regular expressions needed in lexers.  In
+disambiguation strategy for regular expressions needed in lexers.
+Their algorithm generates POSIX values which encode the information of
+\emph{how} a regular expression matched a string---that is which part
+of the string is matched by which part of the regular expression.  In
 the first part of this paper we give our inductive definition of what
 a POSIX value is and show $(i)$ that such a value is unique (for given
 regular expression and string being matched) and $(ii)$ that Sulzmann
--- a/thys/Paper/Paper.thy	Fri Aug 18 14:51:29 2017 +0100
+++ b/thys/Paper/Paper.thy	Fri Aug 25 15:05:20 2017 +0200
@@ -143,7 +143,7 @@
 YES/NO answer for whether a string is being matched by a regular
 expression.  Sulzmann and Lu~\cite{Sulzmann2014} extended this matcher
 to allow generation not just of a YES/NO answer but of an actual
-matching, called a [lexical] {\em value}. They give a simple algorithm
+matching, called a [lexical] {\em value}. \marginpar{explain what values are} They give a simple algorithm
 to calculate a value that appears to be the value associated with
 POSIX matching.  The challenge then is to specify that value, in an
 algorithm-independent fashion, and to show that Sulzmann and Lu's
--- a/thys/Positions.thy	Fri Aug 18 14:51:29 2017 +0100
+++ b/thys/Positions.thy	Fri Aug 25 15:05:20 2017 +0200
@@ -38,6 +38,7 @@
   shows "[] \<in> Pos v"
 by (induct v rule: Pos.induct)(auto)
 
+
 abbreviation
   "intlen vs \<equiv> int (length vs)"
 
@@ -133,6 +134,14 @@
   "v1 \<sqsubset>val p v2 \<equiv> pflat_len v1 p > pflat_len v2 p \<and>
                    (\<forall>q \<in> Pos v1 \<union> Pos v2. q \<sqsubset>lex p \<longrightarrow> pflat_len v1 q = pflat_len v2 q)"
 
+lemma test:
+  shows "v1 \<sqsubset>val p v2 \<longleftrightarrow> 
+         pflat_len v1 p > pflat_len v2 p \<and>
+         (\<forall>q \<in> Pos v1. q \<sqsubset>lex p \<longrightarrow> pflat_len v1 q = pflat_len v2 q) \<and>
+         (\<forall>q \<in> Pos v2. q \<sqsubset>lex p \<longrightarrow> pflat_len v1 q = pflat_len v2 q)"
+unfolding PosOrd_def
+apply(auto)
+done
 
 
 definition PosOrd_ex:: "val \<Rightarrow> val \<Rightarrow> bool" ("_ :\<sqsubset>val _" [60, 59] 60)
@@ -233,8 +242,6 @@
 by auto
 
 
-
-
 lemma PosOrd_shorterE:
   assumes "v1 :\<sqsubset>val v2" 
   shows "length (flat v2) \<le> length (flat v1)"
@@ -257,73 +264,70 @@
 unfolding prefix_list_def sprefix_list_def
 by (metis append_Nil2 append_eq_conv_conj drop_all le_less_linear)
 
+lemma pflat_len_inside:
+  assumes "pflat_len v2 p < pflat_len v1 p"
+  shows "p \<in> Pos v1"
+using assms 
+unfolding pflat_len_def
+by (auto split: if_splits)
 
 lemma PosOrd_Left_Right:
   assumes "flat v1 = flat v2"
   shows "Left v1 :\<sqsubset>val Right v2" 
 unfolding PosOrd_ex_def
 apply(rule_tac x="[0]" in exI)
-using assms 
-apply(auto simp add: PosOrd_def pflat_len_simps)
+apply(auto simp add: PosOrd_def pflat_len_simps assms)
 done
 
+lemma PosOrd_LeftE:
+  assumes "Left v1 :\<sqsubset>val Left v2" "flat v1 = flat v2"
+  shows "v1 :\<sqsubset>val v2"
+using assms
+unfolding PosOrd_ex_def test
+apply(auto simp add: pflat_len_simps)
+apply(frule pflat_len_inside)
+apply(auto simp add: pflat_len_simps)
+by (metis lex_simps(3) pflat_len_simps(3))
+
+lemma PosOrd_LeftI:
+  assumes "v1 :\<sqsubset>val v2" "flat v1 = flat v2"
+  shows  "Left v1 :\<sqsubset>val Left v2"
+using assms
+unfolding PosOrd_ex_def test
+apply(auto simp add: pflat_len_simps)
+by (metis less_numeral_extra(3) lex_simps(3) pflat_len_simps(3))
+
 lemma PosOrd_Left_eq:
-  assumes "flat v = flat v'"
-  shows "Left v :\<sqsubset>val Left v' \<longleftrightarrow> v :\<sqsubset>val v'" 
+  assumes "flat v1 = flat v2"
+  shows "Left v1 :\<sqsubset>val Left v2 \<longleftrightarrow> v1 :\<sqsubset>val v2" 
+using assms PosOrd_LeftE PosOrd_LeftI
+by blast
+
+
+lemma PosOrd_RightE:
+  assumes "Right v1 :\<sqsubset>val Right v2" "flat v1 = flat v2"
+  shows "v1 :\<sqsubset>val v2"
 using assms
-unfolding PosOrd_ex_def
-apply(auto)
-apply(case_tac p)
-apply(simp add: PosOrd_def pflat_len_simps)
-apply(case_tac a)
-apply(simp add: PosOrd_def pflat_len_simps)
-apply(rule_tac x="list" in exI)
-apply(auto simp add: PosOrd_def pflat_len_simps)[1]
-apply (smt Un_def lex_list.intros(2) mem_Collect_eq pflat_len_simps(3))
-apply (smt Collect_disj_eq lex_list.intros(2) mem_Collect_eq pflat_len_simps(3))
-apply(auto simp add: PosOrd_def pflat_len_outside)[1]
-apply(rule_tac x="0#p" in exI)
-apply(auto simp add: PosOrd_def pflat_len_simps)
-done
-
+unfolding PosOrd_ex_def test
+apply(auto simp add: pflat_len_simps)
+apply(frule pflat_len_inside)
+apply(auto simp add: pflat_len_simps)
+by (metis lex_simps(3) pflat_len_simps(5))
 
 lemma PosOrd_RightI:
-  assumes "v :\<sqsubset>val v'" "flat v = flat v'"
-  shows "Right v :\<sqsubset>val Right v'" 
-using assms(1)
-unfolding PosOrd_ex_def
-apply(auto)
-apply(rule_tac x="Suc 0#p" in exI)
-using assms(2)
-apply(auto simp add: PosOrd_def pflat_len_simps)
-done
-
-lemma PosOrd_RightE:
-  assumes "Right v1 :\<sqsubset>val Right v2"
-  shows "v1 :\<sqsubset>val v2"
+  assumes "v1 :\<sqsubset>val v2" "flat v1 = flat v2"
+  shows  "Right v1 :\<sqsubset>val Right v2"
 using assms
-apply(simp add: PosOrd_ex_def)
-apply(erule exE)
-apply(case_tac p)
-apply(simp add: PosOrd_def)
+unfolding PosOrd_ex_def test
 apply(auto simp add: pflat_len_simps)
-apply(rule_tac x="[]" in exI)
-apply(simp add: Pos_empty pflat_len_simps)
-apply(case_tac a)
-apply(simp add: pflat_len_def PosOrd_def)
-apply(case_tac nat)
-prefer 2
-apply(simp add: pflat_len_def PosOrd_def)
-apply(auto simp add: pflat_len_simps PosOrd_def)
-apply(rule_tac x="list" in exI)
-apply(auto)
-apply(drule_tac x="Suc 0#q" in bspec)
-apply(simp)
-apply(simp add: pflat_len_simps)
-apply(drule_tac x="Suc 0#q" in bspec)
-apply(simp)
-apply(simp add: pflat_len_simps)
-done
+by (metis lex_simps(3) nat_neq_iff pflat_len_simps(5))
+
+
+lemma PosOrd_Right_eq:
+  assumes "flat v1 = flat v2"
+  shows "Right v1 :\<sqsubset>val Right v2 \<longleftrightarrow> v1 :\<sqsubset>val v2" 
+using assms PosOrd_RightE PosOrd_RightI
+by blast
 
 
 lemma PosOrd_SeqI1: