csupp.tex
author urbanc
Mon, 05 Sep 2011 20:59:50 +0000
changeset 241 68d48522ea9a
parent 236 2d4f1334b5ca
child 242 093e45c44d91
permissions -rwxr-xr-x
tuning on the derivatives and closures theories
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     1
\documentclass{article}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     2
\usepackage{a4wide,ot1patch}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     3
\usepackage[latin1]{inputenc}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     4
\usepackage{multicol}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     5
\usepackage{charter}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     6
\usepackage{amsmath,amssymb,amsthm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     7
\usepackage{fancyheadings}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     8
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     9
\addtolength{\oddsidemargin}{-6mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    10
\addtolength{\evensidemargin}{-6mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    11
\addtolength{\textwidth}{11mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    12
\addtolength{\columnsep}{3mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    13
\addtolength{\textheight}{8mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    14
\addtolength{\topmargin}{-7.5mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    15
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    16
\pagestyle{fancyplain}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    17
\lhead[\fancyplain{}{A}]{\fancyplain{}{}}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    18
\rhead[\fancyplain{}{C}]{\fancyplain{}{}}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    19
\renewcommand{\headrulewidth}{0pt}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    20
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    21
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    22
\begin{document}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    23
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    24
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    25
\begin{center}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    26
\begin{tabular}{c}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    27
\\[-5mm]
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    28
\LARGE\bf Certified Parsing\\[-10mm]
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    29
\mbox{}
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    30
\end{tabular}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    31
\end{center}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    32
\thispagestyle{empty}
228
87a8dc29d7ae latest changes
urbanc
parents: 227
diff changeset
    33
\mbox{}\\[-5mm]
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    34
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    35
\begin{multicols}{2}
228
87a8dc29d7ae latest changes
urbanc
parents: 227
diff changeset
    36
\section*{Background}
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    37
\noindent
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    38
Parsing is the act of transforming plain text into some
232
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    39
structure that can be analyzed by computers for further processing.
235
a7ddcad0a023 some changes
urbanc
parents: 234
diff changeset
    40
One might think that parsing has been studied to death, and after
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    41
\emph{yacc} and \emph{lex} no new results can be obtained in this area.
236
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    42
However recent developments and novel approaches make it increasingly clear,
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    43
that this is not true anymore.
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    44
236
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    45
We propose to on parsers from a certification point of view. Increasingly,
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    46
parsers are part of certified compilers, like \mbox{\emph{CompCert}}, which
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    47
are guaranteed to be correct and bug-free. Such certified compilers are
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    48
crucial in areas where software just cannot fail. However, so far the parsers
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    49
of these compilers have been left out of the certification.  This is because
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    50
parsing algorithms are often ad hoc and their semantics is not clearly
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    51
specified. Unfortunately, this means parsers can harbour errors that
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    52
potentially invalidate the whole certification and correctness of the
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    53
compiler. In this project, we like to change that with the help of theorem
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    54
provers.
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    55
236
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    56
Only in the last few years, theorem provers have become good enough for
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    57
establishing the correctness of some standard lexing and parsing
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    58
algorithms. For this, the algorithms still need to be formulated in way so
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    59
that it is easy to reason about them. In our earlier work about lexing and
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    60
regular languages, we showed that this precludes well-known algorithms based
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    61
automata. However we showed also that regular languages can be formulated and
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    62
reasoned about entirely in terms regular expressions, which can be easily
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    63
represented in theorem provers. This work uses the device of derivatives of
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    64
regular expressions. We like to extend this device to parsers and grammars.
234
eeadb4e51d74 some changes
urbanc
parents: 232
diff changeset
    65
The aim is to come up with elegant and practical useful parsing algorithms
236
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    66
whose correctness can be certified in a theorem prover.
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    67
228
87a8dc29d7ae latest changes
urbanc
parents: 227
diff changeset
    68
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    69
\section*{Proposed Work}
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    70
234
eeadb4e51d74 some changes
urbanc
parents: 232
diff changeset
    71
A recent development in parsing is Parsing Expression Grammars (PEG), which
236
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    72
are an extension of the weel-known Context Free Grammars
234
eeadb4e51d74 some changes
urbanc
parents: 232
diff changeset
    73
(CFG)~\cite{Ford04a}. The extension introduces new regular operators, such as
eeadb4e51d74 some changes
urbanc
parents: 232
diff changeset
    74
negation and conjunction, on the right-hand sides of grammar rules, as well as
236
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    75
priority orderings. With these extensions, PEG parsing becomes much
234
eeadb4e51d74 some changes
urbanc
parents: 232
diff changeset
    76
more powerful. For example disambiguation, formerly expressed by semantic
236
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    77
filters, can now be expressed directly using grammar rules. 
230
6bb8ad9093e6 More modification by Xingyuan.
zhang
parents: 229
diff changeset
    78
236
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    79
However, there is serious disadvantage of PEG for applications: is does not
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    80
support grammrs involving left recursion~\cite{Ford02b}. Although a new PEG
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    81
parsing algorithm has been proposed that can deal with left
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    82
recursion~\cite{conf/pepm/WarthDM08}, there is no correctness proof, not even
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    83
in ``paper-and-pencil'' form. One aim of this research is to solve this sorry
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    84
state-of-affairs by either certifying this algorithm or inventing a new
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    85
one. For this we will first formalize a fixed point semantics of PEG, based on
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    86
which an efficient, certified parsing algorithm can be given given. For this
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    87
we take as starting point the paper~\cite{Ford04a}, which does not treat
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    88
left-recursion, but gives an operational semantics for PEG parsing. For the
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    89
semantics, it seems plausible that we can adapt work on Boolean
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    90
Grammars~\cite{Okhotin/04a}, which are similar to PEGs, and for which the
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    91
paper ~\cite{journals/iandc/KountouriotisNR09} gives a semantics to negation
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    92
operators, but not to Kleene's star operation.
234
eeadb4e51d74 some changes
urbanc
parents: 232
diff changeset
    93
236
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    94
For the parsing algorithm, we might also be able to draw inspiration from
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    95
parsers based on Cocke-Younger-Kasami (CYK)
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    96
algorithms~\cite{journals/iandc/KountouriotisNR09} and
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    97
Early~\cite{Earley70,AycHor02} parsers. The defect CYK algorithms is that the
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    98
original grammar specification needs to be transformed into a normal
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
    99
form. This transformation may lead to grammar explosion and inefficient
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
   100
parsing. We will investigate whether this transformation can be avoided.
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
   101
Early style parsers, which have recently been certified by Ridge [???], 
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
   102
need to be extended to PEG parsing in order to be helpful for us.
234
eeadb4e51d74 some changes
urbanc
parents: 232
diff changeset
   103
236
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
   104
Finally, we want to investigate whether derivatives of regular expressions
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
   105
~\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023}
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
   106
can be extended to parsing. Lexing based on derivatives gives rise to very
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
   107
elegant regular expression matchers that can be certified in a theorem prover 
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
   108
with ease.  We will study whether the idea of taking a derivative of a regular
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
   109
expression can be extended to rules in grammars. The problem that needs to be
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
   110
overcome again arises from possible left recursion in parsing. 
234
eeadb4e51d74 some changes
urbanc
parents: 232
diff changeset
   111
eeadb4e51d74 some changes
urbanc
parents: 232
diff changeset
   112
    
236
2d4f1334b5ca shortened
urbanc
parents: 235
diff changeset
   113
\newpage
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
   114
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   115
232
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
   116
\small
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
   117
\bibliography{Journal/document/root}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
   118
\bibliographystyle{abbrv}
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   119
\end{multicols}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   120
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   121
%  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   122
%  \noindent {\bf Objectives:} The overall goals of the project are as follows:
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   123
%  \begin{itemize}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   124
%  \item To solve the POPLmark challenge.
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   125
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   126
%  \item To complete and greatly improve the existing implementation of the
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   127
%    nominal datatype package.
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   128
%  \item To explore the strengths of this package by proving the
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   129
%    safety of SML.
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   130
%  \item To provide a basis for extracting programs from safety proofs.
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   131
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
   132
%  \item To make the nominal datatype package usable for teaching
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   133
%    students about the lambda-calculus and the theory of programming
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   134
%    languages. \smallskip
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   135
%  \end{itemize}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   136
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   137
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   138
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   139
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   140
\end{document}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   141
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
   142
%%% Local Variables:
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   143
%%% mode: latex
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   144
%%% TeX-master: t
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   145
%%% TeX-command-default: "PdfLaTeX"
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   146
%%% TeX-view-style: (("." "kpdf %s.pdf"))
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
   147
%%% End: