author | urbanc |
Mon, 05 Sep 2011 13:44:01 +0000 | |
changeset 238 | a6513d0b16fc |
parent 236 | 2d4f1334b5ca |
child 242 | 093e45c44d91 |
permissions | -rwxr-xr-x |
225 | 1 |
\documentclass{article} |
2 |
\usepackage{a4wide,ot1patch} |
|
3 |
\usepackage[latin1]{inputenc} |
|
4 |
\usepackage{multicol} |
|
5 |
\usepackage{charter} |
|
6 |
\usepackage{amsmath,amssymb,amsthm} |
|
7 |
\usepackage{fancyheadings} |
|
8 |
||
9 |
\addtolength{\oddsidemargin}{-6mm} |
|
10 |
\addtolength{\evensidemargin}{-6mm} |
|
11 |
\addtolength{\textwidth}{11mm} |
|
12 |
\addtolength{\columnsep}{3mm} |
|
13 |
\addtolength{\textheight}{8mm} |
|
14 |
\addtolength{\topmargin}{-7.5mm} |
|
15 |
||
16 |
\pagestyle{fancyplain} |
|
17 |
\lhead[\fancyplain{}{A}]{\fancyplain{}{}} |
|
18 |
\rhead[\fancyplain{}{C}]{\fancyplain{}{}} |
|
19 |
\renewcommand{\headrulewidth}{0pt} |
|
20 |
||
21 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
|
22 |
\begin{document} |
|
23 |
||
24 |
||
25 |
\begin{center} |
|
26 |
\begin{tabular}{c} |
|
27 |
\\[-5mm] |
|
28 |
\LARGE\bf Certified Parsing\\[-10mm] |
|
229 | 29 |
\mbox{} |
225 | 30 |
\end{tabular} |
31 |
\end{center} |
|
32 |
\thispagestyle{empty} |
|
228 | 33 |
\mbox{}\\[-5mm] |
225 | 34 |
|
35 |
\begin{multicols}{2} |
|
228 | 36 |
\section*{Background} |
225 | 37 |
\noindent |
38 |
Parsing is the act of transforming plain text into some |
|
232
114064363ef0
Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents:
230
diff
changeset
|
39 |
structure that can be analyzed by computers for further processing. |
235 | 40 |
One might think that parsing has been studied to death, and after |
229 | 41 |
\emph{yacc} and \emph{lex} no new results can be obtained in this area. |
236 | 42 |
However recent developments and novel approaches make it increasingly clear, |
225 | 43 |
that this is not true anymore. |
44 |
||
236 | 45 |
We propose to on parsers from a certification point of view. Increasingly, |
46 |
parsers are part of certified compilers, like \mbox{\emph{CompCert}}, which |
|
47 |
are guaranteed to be correct and bug-free. Such certified compilers are |
|
48 |
crucial in areas where software just cannot fail. However, so far the parsers |
|
49 |
of these compilers have been left out of the certification. This is because |
|
50 |
parsing algorithms are often ad hoc and their semantics is not clearly |
|
51 |
specified. Unfortunately, this means parsers can harbour errors that |
|
52 |
potentially invalidate the whole certification and correctness of the |
|
53 |
compiler. In this project, we like to change that with the help of theorem |
|
54 |
provers. |
|
225 | 55 |
|
236 | 56 |
Only in the last few years, theorem provers have become good enough for |
57 |
establishing the correctness of some standard lexing and parsing |
|
58 |
algorithms. For this, the algorithms still need to be formulated in way so |
|
59 |
that it is easy to reason about them. In our earlier work about lexing and |
|
60 |
regular languages, we showed that this precludes well-known algorithms based |
|
61 |
automata. However we showed also that regular languages can be formulated and |
|
62 |
reasoned about entirely in terms regular expressions, which can be easily |
|
63 |
represented in theorem provers. This work uses the device of derivatives of |
|
64 |
regular expressions. We like to extend this device to parsers and grammars. |
|
234 | 65 |
The aim is to come up with elegant and practical useful parsing algorithms |
236 | 66 |
whose correctness can be certified in a theorem prover. |
67 |
||
228 | 68 |
|
229 | 69 |
\section*{Proposed Work} |
70 |
||
234 | 71 |
A recent development in parsing is Parsing Expression Grammars (PEG), which |
236 | 72 |
are an extension of the weel-known Context Free Grammars |
234 | 73 |
(CFG)~\cite{Ford04a}. The extension introduces new regular operators, such as |
74 |
negation and conjunction, on the right-hand sides of grammar rules, as well as |
|
236 | 75 |
priority orderings. With these extensions, PEG parsing becomes much |
234 | 76 |
more powerful. For example disambiguation, formerly expressed by semantic |
236 | 77 |
filters, can now be expressed directly using grammar rules. |
230 | 78 |
|
236 | 79 |
However, there is serious disadvantage of PEG for applications: is does not |
80 |
support grammrs involving left recursion~\cite{Ford02b}. Although a new PEG |
|
81 |
parsing algorithm has been proposed that can deal with left |
|
82 |
recursion~\cite{conf/pepm/WarthDM08}, there is no correctness proof, not even |
|
83 |
in ``paper-and-pencil'' form. One aim of this research is to solve this sorry |
|
84 |
state-of-affairs by either certifying this algorithm or inventing a new |
|
85 |
one. For this we will first formalize a fixed point semantics of PEG, based on |
|
86 |
which an efficient, certified parsing algorithm can be given given. For this |
|
87 |
we take as starting point the paper~\cite{Ford04a}, which does not treat |
|
88 |
left-recursion, but gives an operational semantics for PEG parsing. For the |
|
89 |
semantics, it seems plausible that we can adapt work on Boolean |
|
90 |
Grammars~\cite{Okhotin/04a}, which are similar to PEGs, and for which the |
|
91 |
paper ~\cite{journals/iandc/KountouriotisNR09} gives a semantics to negation |
|
92 |
operators, but not to Kleene's star operation. |
|
234 | 93 |
|
236 | 94 |
For the parsing algorithm, we might also be able to draw inspiration from |
95 |
parsers based on Cocke-Younger-Kasami (CYK) |
|
96 |
algorithms~\cite{journals/iandc/KountouriotisNR09} and |
|
97 |
Early~\cite{Earley70,AycHor02} parsers. The defect CYK algorithms is that the |
|
98 |
original grammar specification needs to be transformed into a normal |
|
99 |
form. This transformation may lead to grammar explosion and inefficient |
|
100 |
parsing. We will investigate whether this transformation can be avoided. |
|
101 |
Early style parsers, which have recently been certified by Ridge [???], |
|
102 |
need to be extended to PEG parsing in order to be helpful for us. |
|
234 | 103 |
|
236 | 104 |
Finally, we want to investigate whether derivatives of regular expressions |
105 |
~\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023} |
|
106 |
can be extended to parsing. Lexing based on derivatives gives rise to very |
|
107 |
elegant regular expression matchers that can be certified in a theorem prover |
|
108 |
with ease. We will study whether the idea of taking a derivative of a regular |
|
109 |
expression can be extended to rules in grammars. The problem that needs to be |
|
110 |
overcome again arises from possible left recursion in parsing. |
|
234 | 111 |
|
112 |
||
236 | 113 |
\newpage |
229 | 114 |
|
225 | 115 |
|
232
114064363ef0
Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents:
230
diff
changeset
|
116 |
\small |
114064363ef0
Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents:
230
diff
changeset
|
117 |
\bibliography{Journal/document/root} |
114064363ef0
Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents:
230
diff
changeset
|
118 |
\bibliographystyle{abbrv} |
225 | 119 |
\end{multicols} |
120 |
||
121 |
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
|
122 |
% \noindent {\bf Objectives:} The overall goals of the project are as follows: |
|
123 |
% \begin{itemize} |
|
124 |
% \item To solve the POPLmark challenge. |
|
125 |
||
126 |
% \item To complete and greatly improve the existing implementation of the |
|
127 |
% nominal datatype package. |
|
128 |
% \item To explore the strengths of this package by proving the |
|
129 |
% safety of SML. |
|
130 |
% \item To provide a basis for extracting programs from safety proofs. |
|
131 |
||
229 | 132 |
% \item To make the nominal datatype package usable for teaching |
225 | 133 |
% students about the lambda-calculus and the theory of programming |
134 |
% languages. \smallskip |
|
135 |
% \end{itemize} |
|
136 |
||
137 |
||
138 |
||
139 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
|
140 |
\end{document} |
|
141 |
||
229 | 142 |
%%% Local Variables: |
225 | 143 |
%%% mode: latex |
144 |
%%% TeX-master: t |
|
145 |
%%% TeX-command-default: "PdfLaTeX" |
|
146 |
%%% TeX-view-style: (("." "kpdf %s.pdf")) |
|
229 | 147 |
%%% End: |