\documentclass{article}\usepackage{a4wide,ot1patch}\usepackage[latin1]{inputenc}\usepackage{multicol}\usepackage{charter}\usepackage{amsmath,amssymb,amsthm}\usepackage{fancyheadings}\addtolength{\oddsidemargin}{-6mm}\addtolength{\evensidemargin}{-6mm}\addtolength{\textwidth}{11mm}\addtolength{\columnsep}{3mm}\addtolength{\textheight}{8mm}\addtolength{\topmargin}{-7.5mm}\pagestyle{fancyplain}\lhead[\fancyplain{}{A}]{\fancyplain{}{}}\rhead[\fancyplain{}{C}]{\fancyplain{}{}}\renewcommand{\headrulewidth}{0pt}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\begin{document}\begin{center}\begin{tabular}{c}\\[-5mm]\LARGE\bf Novel Certified Parsers\\[-10mm]\mbox{}\end{tabular}\end{center}\thispagestyle{empty}\mbox{}\\[-5mm]\begin{multicols}{2}\section*{Background}\noindentParsers transform plain text into some abstract structure that can be analyzed bycomputers for further processing. One might think that parsers have beenstudied to death, and after \emph{yacc} and \emph{lex} no new results can beobtained in this area. However recent developments and novel approaches makeit increasingly clear, that this is not true anymore~\cite{Might11}. Andthere is a real practical need for new results: for example the future HTML5 Standard abandons a well-defined grammar specification, in favour of a bespokeparser given as pseudo code. Proving any property about this parser is nearly impossible.This work targets parsers from a certification point of view. Increasingly,parsers are part of certified compilers, like\mbox{\emph{CompCert}}~\cite{Leroy09}, which are guaranteed to bebug-free. Such certified compilers are crucial in areas where software justcannot fail. However, so far the parsers of these compilers have been left outof the certification. This is because parsing algorithms are often ad hoc andtheir semantics is not clearly specified. This means, unfortunately, parserscan harbour errors that potentially invalidate the whole certification andcorrectness of the compiler. In this project, we aim to change this situationwith the help of the theorem prover Isabelle/HOL.Only in the last few years, theorem provers have become powerful enough forestablishing the correctness of some standard lexing and parsingalgorithms. For this, the algorithms still need to be formulated in a way sothat it is easy to reason about them. In our earlier work about lexing andregular languages, we showed that this precludes well-known algorithms basedautomata~\cite{WuZhangUrban11}. However we showed also that regular languages can be formulated andreasoned about entirely in terms regular expressions, which can be easilyrepresented in theorem provers. This work uses the device of derivatives ofregular expressions. We like to extend this device to parsers and grammars.The aim is to come up with elegant and practical useful parsing algorithmswhose correctness can be certified in Isabelle/HOL.\section*{Proposed Work}A recent development in the field of parsing are Parsing Expression Grammars(PEGs)~\cite{Ford04}, whichare an extension of the well-known Context Free Grammars(CFGs). This extension introduces new regular operators, such asnegation and conjunction, on the right-hand side of grammar rules, as well aspriority orderings for rules. With these extensions, PEG parsing becomes muchmore powerful and more useful in practice. For example disambiguation, formerly expressed by semanticfilters, can now be expressed directly using grammar rules. However, there is a serious limitation of PEGs, which affects potentialapplications: grammars involving left recursion are notallowed~\cite{Ford02}. Although one PEG parsing algorithm has already beenproposed that can deal with left recursion~\cite{WarthDM08}, thereis no correctness proof for it, not even one with ``pencil and paper''. One aim of thisresearch is to solve this sorry state-of-affairs by either certifying thisalgorithm or inventing a new one. For this we will first devise afixed-point semantics of PEGs, against which we can certify a parser. For thissemantics we take as starting point the paper~\cite{Ford04}, which does nottreat left-recursion, but gives an operational semantics for PEGparsing. There are also good indications that we can adapt work on BooleanGrammars~\cite{Okhotin04}, which are similar to PEGs and for which thepaper~\cite{KountouriotisNR09} gives a fixed-point semantics for negation operators, but not to the Kleene star.For our parsing algorithm, we might be able to build uponthe classic Cocke-Younger-Kasami (CYK)algorithms~\cite{KountouriotisNR09} andEarly~\cite{AycHor02, Earley70} parsers. The defect of CYK algorithms, however,is that the grammar specifications given by the user need to be transformedinto a normal form. This transformation may potentially lead to rule explosionand hence inefficient parsing. We will investigate whether this transformationcan be avoided. The problem with Early-style parsers is that they need to be extended to PEG parsing in order to be helpful for us.Finally, we want to investigate whether derivatives of regular expressions~\cite{Brzozowski64,Might11,OwensReppyTuron09}can be generalised to PEG parsing. In earlier work, we showed that lexing based onderivatives gives rise to very elegant regular expression matchers that can becertified in a theorem prover with ease. We will study whether the idea oftaking a derivative of a regular expression can be extended to rules ingrammars. The problem that needs to be addressed is again how to deal with left-recursive grammar rules.\bibliography{Journal/document/root}\bibliographystyle{abbrv}\end{multicols}% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \noindent {\bf Objectives:} The overall goals of the project are as follows:% \begin{itemize}% \item To solve the POPLmark challenge.% \item To complete and greatly improve the existing implementation of the% nominal datatype package.% \item To explore the strengths of this package by proving the% safety of SML.% \item To provide a basis for extracting programs from safety proofs.% \item To make the nominal datatype package usable for teaching% students about the lambda-calculus and the theory of programming% languages. \smallskip% \end{itemize}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\end{document}%%% Local Variables:%%% mode: latex%%% TeX-master: t%%% TeX-command-default: "PdfLaTeX"%%% TeX-view-style: (("." "kpdf %s.pdf"))%%% End: