regexp: csupp.tex@2d4f1334b5ca


\documentclass{article}
\usepackage{a4wide,ot1patch}
\usepackage[latin1]{inputenc}
\usepackage{multicol}
\usepackage{charter}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{fancyheadings}

\addtolength{\oddsidemargin}{-6mm}
\addtolength{\evensidemargin}{-6mm}
\addtolength{\textwidth}{11mm}
\addtolength{\columnsep}{3mm}
\addtolength{\textheight}{8mm}
\addtolength{\topmargin}{-7.5mm}

\pagestyle{fancyplain}
\lhead[\fancyplain{}{A}]{\fancyplain{}{}}
\rhead[\fancyplain{}{C}]{\fancyplain{}{}}
\renewcommand{\headrulewidth}{0pt}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}


\begin{center}
\begin{tabular}{c}
\\[-5mm]
\LARGE\bf Certified Parsing\\[-10mm]
\mbox{}
\end{tabular}
\end{center}
\thispagestyle{empty}
\mbox{}\\[-5mm]

\begin{multicols}{2}
\section*{Background}
\noindent
Parsing is the act of transforming plain text into some
structure that can be analyzed by computers for further processing.
One might think that parsing has been studied to death, and after
\emph{yacc} and \emph{lex} no new results can be obtained in this area.
However recent developments and novel approaches make it increasingly clear,
that this is not true anymore.

We propose to on parsers from a certification point of view. Increasingly,
parsers are part of certified compilers, like \mbox{\emph{CompCert}}, which
are guaranteed to be correct and bug-free. Such certified compilers are
crucial in areas where software just cannot fail. However, so far the parsers
of these compilers have been left out of the certification.  This is because
parsing algorithms are often ad hoc and their semantics is not clearly
specified. Unfortunately, this means parsers can harbour errors that
potentially invalidate the whole certification and correctness of the
compiler. In this project, we like to change that with the help of theorem
provers.

Only in the last few years, theorem provers have become good enough for
establishing the correctness of some standard lexing and parsing
algorithms. For this, the algorithms still need to be formulated in way so
that it is easy to reason about them. In our earlier work about lexing and
regular languages, we showed that this precludes well-known algorithms based
automata. However we showed also that regular languages can be formulated and
reasoned about entirely in terms regular expressions, which can be easily
represented in theorem provers. This work uses the device of derivatives of
regular expressions. We like to extend this device to parsers and grammars.
The aim is to come up with elegant and practical useful parsing algorithms
whose correctness can be certified in a theorem prover.


\section*{Proposed Work}

A recent development in parsing is Parsing Expression Grammars (PEG), which
are an extension of the weel-known Context Free Grammars
(CFG)~\cite{Ford04a}. The extension introduces new regular operators, such as
negation and conjunction, on the right-hand sides of grammar rules, as well as
priority orderings. With these extensions, PEG parsing becomes much
more powerful. For example disambiguation, formerly expressed by semantic
filters, can now be expressed directly using grammar rules. 

However, there is serious disadvantage of PEG for applications: is does not
support grammrs involving left recursion~\cite{Ford02b}. Although a new PEG
parsing algorithm has been proposed that can deal with left
recursion~\cite{conf/pepm/WarthDM08}, there is no correctness proof, not even
in ``paper-and-pencil'' form. One aim of this research is to solve this sorry
state-of-affairs by either certifying this algorithm or inventing a new
one. For this we will first formalize a fixed point semantics of PEG, based on
which an efficient, certified parsing algorithm can be given given. For this
we take as starting point the paper~\cite{Ford04a}, which does not treat
left-recursion, but gives an operational semantics for PEG parsing. For the
semantics, it seems plausible that we can adapt work on Boolean
Grammars~\cite{Okhotin/04a}, which are similar to PEGs, and for which the
paper ~\cite{journals/iandc/KountouriotisNR09} gives a semantics to negation
operators, but not to Kleene's star operation.

For the parsing algorithm, we might also be able to draw inspiration from
parsers based on Cocke-Younger-Kasami (CYK)
algorithms~\cite{journals/iandc/KountouriotisNR09} and
Early~\cite{Earley70,AycHor02} parsers. The defect CYK algorithms is that the
original grammar specification needs to be transformed into a normal
form. This transformation may lead to grammar explosion and inefficient
parsing. We will investigate whether this transformation can be avoided.
Early style parsers, which have recently been certified by Ridge [???], 
need to be extended to PEG parsing in order to be helpful for us.

Finally, we want to investigate whether derivatives of regular expressions
~\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023}
can be extended to parsing. Lexing based on derivatives gives rise to very
elegant regular expression matchers that can be certified in a theorem prover 
with ease.  We will study whether the idea of taking a derivative of a regular
expression can be extended to rules in grammars. The problem that needs to be
overcome again arises from possible left recursion in parsing. 

    
\newpage


\small
\bibliography{Journal/document/root}
\bibliographystyle{abbrv}
\end{multicols}

%  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%  \noindent {\bf Objectives:} The overall goals of the project are as follows:
%  \begin{itemize}
%  \item To solve the POPLmark challenge.

%  \item To complete and greatly improve the existing implementation of the
%    nominal datatype package.
%  \item To explore the strengths of this package by proving the
%    safety of SML.
%  \item To provide a basis for extracting programs from safety proofs.

%  \item To make the nominal datatype package usable for teaching
%    students about the lambda-calculus and the theory of programming
%    languages. \smallskip
%  \end{itemize}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{document}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% TeX-command-default: "PdfLaTeX"
%%% TeX-view-style: (("." "kpdf %s.pdf"))
%%% End:
author	urbanc
	Mon, 05 Sep 2011 13:09:38 +0000
changeset 236	2d4f1334b5ca
parent 235	a7ddcad0a023
child 242	093e45c44d91
permissions	-rwxr-xr-x