regexp: changeset 236:2d4f1334b5ca

Binary file csupp.pdf has changed

--- a/csupp.tex	Mon Sep 05 12:40:22 2011 +0000
+++ b/csupp.tex	Mon Sep 05 13:09:38 2011 +0000
@@ -39,123 +39,78 @@
 structure that can be analyzed by computers for further processing.
 One might think that parsing has been studied to death, and after
 \emph{yacc} and \emph{lex} no new results can be obtained in this area.
-However recent results and novel approaches make it increasingly clear,
+However recent developments and novel approaches make it increasingly clear,
 that this is not true anymore.
 
-We propose to approach the subject of parsing from a certification point
-of view. Increasingly, parsers are part of certified compilers, like \mbox{\emph{CompCert}},
-which are guaranteed to be correct and bug-free. Such certified compilers are
-crucial in areas where software just cannot fail. However, so far the
-parsers of these compilers have been left out of the certification.
-This is because parsing algorithms are often ad hoc and their semantics
-is not clearly specified. Unfortunately, this means parsers can harbour
-errors that potentially invalidate the whole certification and correctness
-of the compiler. In this project, we like to change that.
+We propose to on parsers from a certification point of view. Increasingly,
+parsers are part of certified compilers, like \mbox{\emph{CompCert}}, which
+are guaranteed to be correct and bug-free. Such certified compilers are
+crucial in areas where software just cannot fail. However, so far the parsers
+of these compilers have been left out of the certification.  This is because
+parsing algorithms are often ad hoc and their semantics is not clearly
+specified. Unfortunately, this means parsers can harbour errors that
+potentially invalidate the whole certification and correctness of the
+compiler. In this project, we like to change that with the help of theorem
+provers.
 
-Only in the last few years, theorem provers have become good enough
-for establishing the correctness of some standard lexing and
-parsing algorithms. For this, the algorithms still need to be formulated
-in way so that it is easy to reason about them. In our earlier work
-about lexing and regular languages, the authors showed that this
-precludes well-known algorithms based automata. However we showed 
-that regular
-languages can be formulated and reasoned about entirely in terms
-regular expressions, which can be easily represented in theorem
-provers. This work uses the device of derivatives of regular
-expressions. We like to extend this device to parsers and grammars.
+Only in the last few years, theorem provers have become good enough for
+establishing the correctness of some standard lexing and parsing
+algorithms. For this, the algorithms still need to be formulated in way so
+that it is easy to reason about them. In our earlier work about lexing and
+regular languages, we showed that this precludes well-known algorithms based
+automata. However we showed also that regular languages can be formulated and
+reasoned about entirely in terms regular expressions, which can be easily
+represented in theorem provers. This work uses the device of derivatives of
+regular expressions. We like to extend this device to parsers and grammars.
 The aim is to come up with elegant and practical useful parsing algorithms
-whose correctness can be certified in a
-theorem prover.
+whose correctness can be certified in a theorem prover.
+
 
 \section*{Proposed Work}
 
 A recent development in parsing is Parsing Expression Grammars (PEG), which
-are an extension of the standard Context Free Grammars
+are an extension of the weel-known Context Free Grammars
 (CFG)~\cite{Ford04a}. The extension introduces new regular operators, such as
 negation and conjunction, on the right-hand sides of grammar rules, as well as
-priority orderings on rules. With these extensions, PEG parsing becomes much
+priority orderings. With these extensions, PEG parsing becomes much
 more powerful. For example disambiguation, formerly expressed by semantic
-filters, can now be expressed directly using grammar rules. This means a
-simpler and more systematic treatment of ambiguity and more concise grammar
-specifications for programming languages.
-
-However, a serious disadvantage of PEG is that it does not allow left
-recursion, because parsing algorithms for PEG~\cite{Ford02b} can not deal with
-left recursions. Although a new PEG parsing algorithm has been proposed
-that can deal with left recursion~\cite{conf/pepm/WarthDM08}, there is no
-correctness proof, not even in ``paper-and-pencil'' form. One aim of this
-research is to solve this sorry state-of-affairs by either certifying this
-algorithm or inventing a new one. For this we will first formalize a fixed
-point semantics of PEG, based on which an efficient, certified parsing
-algorithm can be given given.
+filters, can now be expressed directly using grammar rules. 
 
-There are several existing works we can draw upon:
-\begin{enumerate}
-\item The works on PEG.
-  \begin {enumerate}
-  \item An operation semantics for PEG has already been given
-    in~\cite{Ford04a}, but it is not adequate to deal with left recursions. But this
-    work gives at least a precise description of what the original PEG meant
-    for. This will serve an a basis to show the conservativeness of
-    the fixed point semantics we are
-    going to develop.
-    
-  \item The new algorithm~\cite{conf/pepm/WarthDM08} which claimed to be able
-    to deal with left recursions. Although there is no correctness proof yet, this
-    may provide some useful inspirations for our new algorithm design.
-  \end{enumerate}
+However, there is serious disadvantage of PEG for applications: is does not
+support grammrs involving left recursion~\cite{Ford02b}. Although a new PEG
+parsing algorithm has been proposed that can deal with left
+recursion~\cite{conf/pepm/WarthDM08}, there is no correctness proof, not even
+in ``paper-and-pencil'' form. One aim of this research is to solve this sorry
+state-of-affairs by either certifying this algorithm or inventing a new
+one. For this we will first formalize a fixed point semantics of PEG, based on
+which an efficient, certified parsing algorithm can be given given. For this
+we take as starting point the paper~\cite{Ford04a}, which does not treat
+left-recursion, but gives an operational semantics for PEG parsing. For the
+semantics, it seems plausible that we can adapt work on Boolean
+Grammars~\cite{Okhotin/04a}, which are similar to PEGs, and for which the
+paper ~\cite{journals/iandc/KountouriotisNR09} gives a semantics to negation
+operators, but not to Kleene's star operation.
 
-\item The works on Boolean Grammars~\cite{Okhotin/04a}. Boolean Grammar is
-  very closely related to PEG, because it also contains negative and conjunctive
-  grammars. The main differences are: First, Boolean Grammar has no ordering on
-  productions; Second: Boolean Grammar does not contain STAR operator. There are
-  two works about Boolean Grammar which might be useful for this research:
-  
-  \begin{enumerate}
-  \item A fixed point semantics for Boolean
-    Grammar~\cite{journals/iandc/KountouriotisNR09}. The idea to define the
-    semantics of negative and conjunctive operators is certainly what we can
-    borrow. Therefore, this work gives the basis on which we can add in production
-    ordering and STAR operator.
-
-  \item A parsing algorithm for Boolean Grammar based on CYK
-    parsing~\cite{journals/iandc/KountouriotisNR09}. The draw back of CYK parsing
-    is that: the original grammar specification needs to be transformed into a
-    normal form. This transformation may lead to grammar explosion and is
-    undesirable. One aim of this research is to see whether this transformation
-    can be avoided. For this purpose, other parsing style may provide useful inspirations, for example:
+For the parsing algorithm, we might also be able to draw inspiration from
+parsers based on Cocke-Younger-Kasami (CYK)
+algorithms~\cite{journals/iandc/KountouriotisNR09} and
+Early~\cite{Earley70,AycHor02} parsers. The defect CYK algorithms is that the
+original grammar specification needs to be transformed into a normal
+form. This transformation may lead to grammar explosion and inefficient
+parsing. We will investigate whether this transformation can be avoided.
+Early style parsers, which have recently been certified by Ridge [???], 
+need to be extended to PEG parsing in order to be helpful for us.
 
-    \begin{enumerate}
-    \item Derivative
-      Parsing~\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023}. Christian
-      Urban has used derivative methods to establish the correctness of a regular
-      expression matcher, as well the the finite partition property of regular
-      expression~\cite{WuZhangUrban11}.  There are well founded envisage that the
-      derivative methods may provide the foundation to the new parsing algorithms of PEG.
-
-    \item Early parsing~\cite{Earley70,AycHor02}. It is a refinement of CYK
-      parsing which does not require the transformation to normal forms, and
-      therefore provide one possible direction to adapt the current CYK based
-      parsing algorithm of Boolean Grammar for PEG grammar.
+Finally, we want to investigate whether derivatives of regular expressions
+~\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023}
+can be extended to parsing. Lexing based on derivatives gives rise to very
+elegant regular expression matchers that can be certified in a theorem prover 
+with ease.  We will study whether the idea of taking a derivative of a regular
+expression can be extended to rules in grammars. The problem that needs to be
+overcome again arises from possible left recursion in parsing. 
 
-    \item The new parsing algorithm proposed by Tom Ridge[???]. Recently,
-      T. Ridge has proposed and certified an combinator style parsing algorithm for
-      CFG, which borrows some ideas from Early parsing. The proposed algorithm is
-      very simple and elegant. We are going to strive for a parsing algorithm as elegant as this one.
-    \end{enumerate}
     
-    Which of the above possibilities will finally get into our final solutions
-    is an interesting point about this current research.
-
-  \end{enumerate}
-\end{enumerate}
-
-Based on these works, we are quite confident that our idea may lead to some concrete results.
-
-\mbox{}\\[15cm]
-
-\noindent
-
+\newpage
 
 
 \small

author	urbanc
	Mon, 05 Sep 2011 13:09:38 +0000
changeset 236	2d4f1334b5ca
parent 235	a7ddcad0a023
child 237	e02155fe8136

csupp.pdf		file \| annotate \| diff \| comparison \| revisions
csupp.tex		file \| annotate \| diff \| comparison \| revisions