\documentclass{article}+ −
\usepackage{a4wide,ot1patch}+ −
\usepackage[latin1]{inputenc}+ −
\usepackage{multicol}+ −
\usepackage{charter}+ −
\usepackage{amsmath,amssymb,amsthm}+ −
\usepackage{fancyheadings}+ −
+ −
\addtolength{\oddsidemargin}{-6mm}+ −
\addtolength{\evensidemargin}{-6mm}+ −
\addtolength{\textwidth}{11mm}+ −
\addtolength{\columnsep}{3mm}+ −
\addtolength{\textheight}{8mm}+ −
\addtolength{\topmargin}{-7.5mm}+ −
+ −
\pagestyle{fancyplain}+ −
\lhead[\fancyplain{}{A}]{\fancyplain{}{}}+ −
\rhead[\fancyplain{}{C}]{\fancyplain{}{}}+ −
\renewcommand{\headrulewidth}{0pt}+ −
+ −
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%+ −
\begin{document}+ −
+ −
+ −
\begin{center}+ −
\begin{tabular}{c}+ −
\\[-5mm]+ −
\LARGE\bf Novel Certified Parsers\\[-10mm]+ −
\mbox{}+ −
\end{tabular}+ −
\end{center}+ −
\thispagestyle{empty}+ −
\mbox{}\\[-5mm]+ −
+ −
\begin{multicols}{2}+ −
\section*{Background}+ −
+ −
\noindent+ −
Parsers transform plain text into some abstract structure that can be analyzed by+ −
computers for further processing. One might think that parsers have been+ −
studied to death, and after \emph{yacc} and \emph{lex} no new results can be+ −
obtained in this area. However recent developments and novel approaches make+ −
it increasingly clear, that this is not true anymore~\cite{Might11}. And+ −
there is a real practical need for new results: for example the future HTML5 + −
Standard abandons a well-defined grammar specification, in favour of a bespoke+ −
parser given as pseudo code. Proving any property about this parser is nearly + −
impossible.+ −
+ −
This work targets parsers from a certification point of view. Increasingly,+ −
parsers are part of certified compilers, like+ −
\mbox{\emph{CompCert}}~\cite{Leroy09}, which are guaranteed to be+ −
bug-free. Such certified compilers are crucial in areas where software just+ −
cannot fail. However, so far the parsers of these compilers have been left out+ −
of the certification. This is because parsing algorithms are often ad hoc and+ −
their semantics is not clearly specified. This means, unfortunately, parsers+ −
can harbour errors that potentially invalidate the whole certification and+ −
correctness of the compiler. In this project, we aim to change this situation+ −
with the help of the theorem prover Isabelle/HOL.+ −
+ −
Only in the last few years, theorem provers have become powerful enough for+ −
establishing the correctness of some standard lexing and parsing+ −
algorithms. For this, the algorithms still need to be formulated in a way so+ −
that it is easy to reason about them. In our earlier work about lexing and+ −
regular languages, we showed that this precludes well-known algorithms based+ −
automata~\cite{WuZhangUrban11}. However we showed also that regular languages can be formulated and+ −
reasoned about entirely in terms regular expressions, which can be easily+ −
represented in theorem provers. This work uses the device of derivatives of+ −
regular expressions. We like to extend this device to parsers and grammars.+ −
The aim is to come up with elegant and practical useful parsing algorithms+ −
whose correctness can be certified in Isabelle/HOL.+ −
+ −
+ −
\section*{Proposed Work}+ −
+ −
A recent development in the field of parsing are Parsing Expression Grammars+ −
(PEGs)~\cite{Ford04}, which+ −
are an extension of the well-known Context Free Grammars+ −
(CFGs). This extension introduces new regular operators, such as+ −
negation and conjunction, on the right-hand side of grammar rules, as well as+ −
priority orderings for rules. With these extensions, PEG parsing becomes much+ −
more powerful and more useful in practice. For example disambiguation, formerly expressed by semantic+ −
filters, can now be expressed directly using grammar rules. + −
+ −
However, there is a serious limitation of PEGs, which affects potential+ −
applications: grammars involving left recursion are not+ −
allowed~\cite{Ford02}. Although one PEG parsing algorithm has already been+ −
proposed that can deal with left recursion~\cite{WarthDM08}, there+ −
is no correctness proof for it, not even one with ``pencil and paper''. One aim of this+ −
research is to solve this sorry state-of-affairs by either certifying this+ −
algorithm or inventing a new one. For this we will first devise a+ −
fixed-point semantics of PEGs, against which we can certify a parser. For this+ −
semantics we take as starting point the paper~\cite{Ford04}, which does not+ −
treat left-recursion, but gives an operational semantics for PEG+ −
parsing. There are also good indications that we can adapt work on Boolean+ −
Grammars~\cite{Okhotin04}, which are similar to PEGs and for which the+ −
paper~\cite{KountouriotisNR09} gives a fixed-point semantics + −
for negation operators, but not to the Kleene star.+ −
+ −
For our parsing algorithm, we might be able to build upon+ −
the classic Cocke-Younger-Kasami (CYK)+ −
algorithms~\cite{KountouriotisNR09} and+ −
Early~\cite{AycHor02, Earley70} parsers. The defect of CYK algorithms, however,+ −
is that the grammar specifications given by the user need to be transformed+ −
into a normal form. This transformation may potentially lead to rule explosion+ −
and hence inefficient parsing. We will investigate whether this transformation+ −
can be avoided. The problem with Early-style parsers is that they need to be + −
extended to PEG parsing in order to be helpful for us.+ −
+ −
+ −
Finally, we want to investigate whether derivatives of regular expressions+ −
~\cite{Brzozowski64,Might11,OwensReppyTuron09}+ −
can be generalised to PEG parsing. In earlier work, we showed that lexing based on+ −
derivatives gives rise to very elegant regular expression matchers that can be+ −
certified in a theorem prover with ease. We will study whether the idea of+ −
taking a derivative of a regular expression can be extended to rules in+ −
grammars. The problem that needs to be addressed is again how to deal with + −
left-recursive grammar rules.+ −
+ −
+ −
+ −
\bibliography{Journal/document/root}+ −
\bibliographystyle{abbrv}+ −
\end{multicols}+ −
+ −
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%+ −
% \noindent {\bf Objectives:} The overall goals of the project are as follows:+ −
% \begin{itemize}+ −
% \item To solve the POPLmark challenge.+ −
+ −
% \item To complete and greatly improve the existing implementation of the+ −
% nominal datatype package.+ −
% \item To explore the strengths of this package by proving the+ −
% safety of SML.+ −
% \item To provide a basis for extracting programs from safety proofs.+ −
+ −
% \item To make the nominal datatype package usable for teaching+ −
% students about the lambda-calculus and the theory of programming+ −
% languages. \smallskip+ −
% \end{itemize}+ −
+ −
+ −
+ −
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%+ −
\end{document}+ −
+ −
%%% Local Variables:+ −
%%% mode: latex+ −
%%% TeX-master: t+ −
%%% TeX-command-default: "PdfLaTeX"+ −
%%% TeX-view-style: (("." "kpdf %s.pdf"))+ −
%%% End:+ −