regexp: comparison csupp.tex

equal deleted inserted replaced

-:a7ddcad0a023
+:2d4f1334b5ca
 \noindent
 Parsing is the act of transforming plain text into some
 structure that can be analyzed by computers for further processing.
 One might think that parsing has been studied to death, and after
 \emph{yacc} and \emph{lex} no new results can be obtained in this area.
-However recent results and novel approaches make it increasingly clear,
+However recent developments and novel approaches make it increasingly clear,
 that this is not true anymore.
-We propose to approach the subject of parsing from a certification point
+We propose to on parsers from a certification point of view. Increasingly,
-of view. Increasingly, parsers are part of certified compilers, like \mbox{\emph{CompCert}},
+parsers are part of certified compilers, like \mbox{\emph{CompCert}}, which
-which are guaranteed to be correct and bug-free. Such certified compilers are
+are guaranteed to be correct and bug-free. Such certified compilers are
-crucial in areas where software just cannot fail. However, so far the
+crucial in areas where software just cannot fail. However, so far the parsers
-parsers of these compilers have been left out of the certification.
+of these compilers have been left out of the certification.  This is because
-This is because parsing algorithms are often ad hoc and their semantics
+parsing algorithms are often ad hoc and their semantics is not clearly
-is not clearly specified. Unfortunately, this means parsers can harbour
+specified. Unfortunately, this means parsers can harbour errors that
-errors that potentially invalidate the whole certification and correctness
+potentially invalidate the whole certification and correctness of the
-of the compiler. In this project, we like to change that.
+compiler. In this project, we like to change that with the help of theorem
+provers.
-Only in the last few years, theorem provers have become good enough
+Only in the last few years, theorem provers have become good enough for
-for establishing the correctness of some standard lexing and
+establishing the correctness of some standard lexing and parsing
-parsing algorithms. For this, the algorithms still need to be formulated
+algorithms. For this, the algorithms still need to be formulated in way so
-in way so that it is easy to reason about them. In our earlier work
+that it is easy to reason about them. In our earlier work about lexing and
-about lexing and regular languages, the authors showed that this
+regular languages, we showed that this precludes well-known algorithms based
-precludes well-known algorithms based automata. However we showed
+automata. However we showed also that regular languages can be formulated and
-that regular
+reasoned about entirely in terms regular expressions, which can be easily
-languages can be formulated and reasoned about entirely in terms
+represented in theorem provers. This work uses the device of derivatives of
-regular expressions, which can be easily represented in theorem
+regular expressions. We like to extend this device to parsers and grammars.
-provers. This work uses the device of derivatives of regular
-expressions. We like to extend this device to parsers and grammars.
 The aim is to come up with elegant and practical useful parsing algorithms
-whose correctness can be certified in a
+whose correctness can be certified in a theorem prover.
-theorem prover.
 \section*{Proposed Work}
 A recent development in parsing is Parsing Expression Grammars (PEG), which
-are an extension of the standard Context Free Grammars
+are an extension of the weel-known Context Free Grammars
 (CFG)~\cite{Ford04a}. The extension introduces new regular operators, such as
 negation and conjunction, on the right-hand sides of grammar rules, as well as
-priority orderings on rules. With these extensions, PEG parsing becomes much
+priority orderings. With these extensions, PEG parsing becomes much
 more powerful. For example disambiguation, formerly expressed by semantic
-filters, can now be expressed directly using grammar rules. This means a
+filters, can now be expressed directly using grammar rules.
-simpler and more systematic treatment of ambiguity and more concise grammar
-specifications for programming languages.
-However, a serious disadvantage of PEG is that it does not allow left
+However, there is serious disadvantage of PEG for applications: is does not
-recursion, because parsing algorithms for PEG~\cite{Ford02b} can not deal with
+support grammrs involving left recursion~\cite{Ford02b}. Although a new PEG
-left recursions. Although a new PEG parsing algorithm has been proposed
+parsing algorithm has been proposed that can deal with left
-that can deal with left recursion~\cite{conf/pepm/WarthDM08}, there is no
+recursion~\cite{conf/pepm/WarthDM08}, there is no correctness proof, not even
-correctness proof, not even in ``paper-and-pencil'' form. One aim of this
+in ``paper-and-pencil'' form. One aim of this research is to solve this sorry
-research is to solve this sorry state-of-affairs by either certifying this
+state-of-affairs by either certifying this algorithm or inventing a new
-algorithm or inventing a new one. For this we will first formalize a fixed
+one. For this we will first formalize a fixed point semantics of PEG, based on
-point semantics of PEG, based on which an efficient, certified parsing
+which an efficient, certified parsing algorithm can be given given. For this
-algorithm can be given given.
+we take as starting point the paper~\cite{Ford04a}, which does not treat
+left-recursion, but gives an operational semantics for PEG parsing. For the
+semantics, it seems plausible that we can adapt work on Boolean
+Grammars~\cite{Okhotin/04a}, which are similar to PEGs, and for which the
+paper ~\cite{journals/iandc/KountouriotisNR09} gives a semantics to negation
+operators, but not to Kleene's star operation.
-There are several existing works we can draw upon:
+For the parsing algorithm, we might also be able to draw inspiration from
-\begin{enumerate}
+parsers based on Cocke-Younger-Kasami (CYK)
-\item The works on PEG.
+algorithms~\cite{journals/iandc/KountouriotisNR09} and
-\begin {enumerate}
+Early~\cite{Earley70,AycHor02} parsers. The defect CYK algorithms is that the
-\item An operation semantics for PEG has already been given
+original grammar specification needs to be transformed into a normal
-in~\cite{Ford04a}, but it is not adequate to deal with left recursions. But this
+form. This transformation may lead to grammar explosion and inefficient
-work gives at least a precise description of what the original PEG meant
+parsing. We will investigate whether this transformation can be avoided.
-for. This will serve an a basis to show the conservativeness of
+Early style parsers, which have recently been certified by Ridge [???],
-the fixed point semantics we are
+need to be extended to PEG parsing in order to be helpful for us.
-going to develop.
+Finally, we want to investigate whether derivatives of regular expressions
+~\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023}
+can be extended to parsing. Lexing based on derivatives gives rise to very
+elegant regular expression matchers that can be certified in a theorem prover
+with ease.  We will study whether the idea of taking a derivative of a regular
+expression can be extended to rules in grammars. The problem that needs to be
+overcome again arises from possible left recursion in parsing.
-\item The new algorithm~\cite{conf/pepm/WarthDM08} which claimed to be able
+\newpage
-to deal with left recursions. Although there is no correctness proof yet, this
-may provide some useful inspirations for our new algorithm design.
-\end{enumerate}
-\item The works on Boolean Grammars~\cite{Okhotin/04a}. Boolean Grammar is
-very closely related to PEG, because it also contains negative and conjunctive
-grammars. The main differences are: First, Boolean Grammar has no ordering on
-productions; Second: Boolean Grammar does not contain STAR operator. There are
-two works about Boolean Grammar which might be useful for this research:
-\begin{enumerate}
-\item A fixed point semantics for Boolean
-Grammar~\cite{journals/iandc/KountouriotisNR09}. The idea to define the
-semantics of negative and conjunctive operators is certainly what we can
-borrow. Therefore, this work gives the basis on which we can add in production
-ordering and STAR operator.
-\item A parsing algorithm for Boolean Grammar based on CYK
-parsing~\cite{journals/iandc/KountouriotisNR09}. The draw back of CYK parsing
-is that: the original grammar specification needs to be transformed into a
-normal form. This transformation may lead to grammar explosion and is
-undesirable. One aim of this research is to see whether this transformation
-can be avoided. For this purpose, other parsing style may provide useful inspirations, for example:
-\begin{enumerate}
-\item Derivative
-Parsing~\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023}. Christian
-Urban has used derivative methods to establish the correctness of a regular
-expression matcher, as well the the finite partition property of regular
-expression~\cite{WuZhangUrban11}.  There are well founded envisage that the
-derivative methods may provide the foundation to the new parsing algorithms of PEG.
-\item Early parsing~\cite{Earley70,AycHor02}. It is a refinement of CYK
-parsing which does not require the transformation to normal forms, and
-therefore provide one possible direction to adapt the current CYK based
-parsing algorithm of Boolean Grammar for PEG grammar.
-\item The new parsing algorithm proposed by Tom Ridge[???]. Recently,
-T. Ridge has proposed and certified an combinator style parsing algorithm for
-CFG, which borrows some ideas from Early parsing. The proposed algorithm is
-very simple and elegant. We are going to strive for a parsing algorithm as elegant as this one.
-\end{enumerate}
-Which of the above possibilities will finally get into our final solutions
-is an interesting point about this current research.
-\end{enumerate}
-\end{enumerate}
-Based on these works, we are quite confident that our idea may lead to some concrete results.
-\mbox{}\\[15cm]
-\noindent
 \small
 \bibliography{Journal/document/root}
 \bibliographystyle{abbrv}

changeset 236	2d4f1334b5ca
parent 235	a7ddcad0a023
child 242	093e45c44d91