csupp.tex
author zhang
Sun, 04 Sep 2011 07:28:48 +0000
changeset 232 114064363ef0
parent 230 6bb8ad9093e6
child 234 eeadb4e51d74
permissions -rwxr-xr-x
Proposal paragraphs by Xingyuan completed (with references added).
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     1
\documentclass{article}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     2
\usepackage{a4wide,ot1patch}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     3
\usepackage[latin1]{inputenc}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     4
\usepackage{multicol}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     5
\usepackage{charter}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     6
\usepackage{amsmath,amssymb,amsthm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     7
\usepackage{fancyheadings}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     8
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
     9
\addtolength{\oddsidemargin}{-6mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    10
\addtolength{\evensidemargin}{-6mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    11
\addtolength{\textwidth}{11mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    12
\addtolength{\columnsep}{3mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    13
\addtolength{\textheight}{8mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    14
\addtolength{\topmargin}{-7.5mm}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    15
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    16
\pagestyle{fancyplain}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    17
\lhead[\fancyplain{}{A}]{\fancyplain{}{}}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    18
\rhead[\fancyplain{}{C}]{\fancyplain{}{}}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    19
\renewcommand{\headrulewidth}{0pt}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    20
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    21
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    22
\begin{document}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    23
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    24
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    25
\begin{center}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    26
\begin{tabular}{c}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    27
\\[-5mm]
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    28
\LARGE\bf Certified Parsing\\[-10mm]
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    29
\mbox{}
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    30
\end{tabular}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    31
\end{center}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    32
\thispagestyle{empty}
228
87a8dc29d7ae latest changes
urbanc
parents: 227
diff changeset
    33
\mbox{}\\[-5mm]
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    34
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    35
\begin{multicols}{2}
228
87a8dc29d7ae latest changes
urbanc
parents: 227
diff changeset
    36
\section*{Background}
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    37
\noindent
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    38
Parsing is the act of transforming plain text into some
232
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    39
structure that can be analyzed by computers for further processing.
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    40
One might think that parsing has been studied to death and after
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    41
\emph{yacc} and \emph{lex} no new results can be obtained in this area.
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    42
However recent results and novel approaches make it increasingly clear,
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    43
that this is not true anymore.
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    44
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    45
We propose to approach the subject of parsing from a certification point
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    46
of view. Parsers are increasingly part of certified compilers, like \mbox{\emph{CompCert}},
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    47
which are guaranteed to be correct and bug-free. Such certified compilers are
228
87a8dc29d7ae latest changes
urbanc
parents: 227
diff changeset
    48
crucial in areas where software just cannot fail. However, so far the
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    49
parsers of these compilers have been left out of the certification.
232
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    50
This is because parsing algorithms are often ad hoc and their semantics
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    51
is not clearly specified. Unfortunately, this means parsers can harbour
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    52
errors that potentially invalidate the whole certification and correctness
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    53
of the compiler. In this project, we like to change that.
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    54
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    55
Only in the last few years, theorem provers have become good enough
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    56
for establishing the correctness of some standard lexing and
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    57
parsing algorithms. For this, the algorithms need to be formulated
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    58
in way so that it is easy to reason about them. In earlier work
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    59
about lexing and regular languages, the authors showed that this
227
9c281a4b767d small change
urbanc
parents: 225
diff changeset
    60
precludes well-known algorithms working over graphs. However regular
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    61
languages can be formulated and reasoned about entirely in terms
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    62
regular expressions, which can be easily represented in theorem
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    63
provers. This work uses the device of derivatives of regular
227
9c281a4b767d small change
urbanc
parents: 225
diff changeset
    64
expressions. We like to extend this device to parsers and grammars.
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    65
The aim is to come up with elegant and useful parsing algorithms
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    66
whose correctness and the absence of bugs can be certified in a
228
87a8dc29d7ae latest changes
urbanc
parents: 227
diff changeset
    67
theorem prover.
87a8dc29d7ae latest changes
urbanc
parents: 227
diff changeset
    68
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    69
\section*{Proposed Work}
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    70
232
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    71
One new development in formal grammar is the introduction of Parsing Expression Grammar (PEG) as an extension of the standard Context Free Grammar (CFG)\cite{Ford04a}. The extension introduces new regular operators such as negation and conjunction to the right hand side of productions, as well as well as an priority ordering on productions. With these extensions, PEG becomes more powerful such that disambiguation formerly expressed using semantic filters can now be expressed directly using production expressions. This means a simpler and more systematic treatment of ambiguity and more concise grammar specification for programming languages.
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    72
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    73
However, one disadvantage of PEG is that it does not allow left recursion in grammar specification, because the accompanying algorithms of PEG\cite{Ford02b} can not deal with left recursions. Although some authors claimed new PEG parsing algorithm for left recursion\cite{conf/pepm/WarthDM08}, there is no correctness proof, not even in paper-and-pencil form. One aim of this research is to formalize a fixed point semantics of PEG, based on which an efficient, certified parsing algorithm is given.
230
6bb8ad9093e6 More modification by Xingyuan.
zhang
parents: 229
diff changeset
    74
232
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    75
There are several existing works we can draw upon:
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    76
\begin{enumerate}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    77
    \item The works on PEG.
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    78
        \begin {enumerate}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    79
            \item An operation semantics for PEG has already been given in \cite{Ford04a}, but it is not adequate to deal with left recursions. But this work gives at least a precise description of what the original PEG  meant for. This will serve an a basis to show the conservativeness of the fixed point semantics we are going to develop.
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    80
            \item The new algorithm\cite{conf/pepm/WarthDM08} which claimed to be able to deal with left recursions. Although there is no correctness proof yet, this may provide some useful inspirations for our new algorithm design.
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    81
        \end{enumerate}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    82
    \item The works on Boolean Grammar\cite{Okhotin/04a}. Boolean Grammar is very closely related to PEG, because it also contains negative and conjunctive grammars. The main differences are: First, Boolean Grammar has no ordering on productions; Second: Boolean Grammar does not contain STAR operator. There are two works about Boolean Grammar which might be useful for this research:
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    83
        \begin{enumerate}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    84
            \item A fixed point semantics for Boolean Grammar\cite{journals/iandc/KountouriotisNR09}. The idea to define the semantics of negative and conjunctive operators is certainly what we can borrow. Therefore, this work gives the basis on which we can add in production ordering and STAR operator.
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    85
            \item A parsing algorithm for Boolean Grammar based on CYK parsing\cite{journals/iandc/KountouriotisNR09}. The draw back of CYK parsing is that: the original grammar specification needs to be transformed into a normal form. This transformation may lead to grammar explosion and is undesirable. One aim of this research is to see whether this transformation can be avoided. For this purpose, other parsing style may provide useful inspirations, for example:
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    86
                    \begin{enumerate}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    87
                        \item Derivative Parsing\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023}. Christian Urban has used derivative methods to establish the correctness of a regular expression matcher, as well the the finite partition property of regular expression\cite{WuZhangUrban11}.  There are well founded envisage that the derivative methods may provide the foundation to the new parsing algorithms of PEG.
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    88
                        \item Early parsing\cite{Earley70,AycHor02}. It is a refinement of CYK parsing which does not require the transformation to normal forms, and therefore provide one possible direction to adapt the current CYK based parsing algorithm of Boolean Grammar for PEG grammar.
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    89
                        \item The new parsing algorithm proposed by Tom Ridge[???]. Recently, T. Ridge has proposed and certified an combinator style parsing algorithm for CFG, which borrows some ideas from Early parsing. The proposed algorithm is very simple and elegant. We are going to strive for a parsing algorithm as elegant as this one.
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    90
                    \end{enumerate}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    91
                Which of the above possibilities will finally get into our final solutions is an interesting point about this current research.
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    92
        \end{enumerate}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    93
\end{enumerate}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
    94
Based on these works, we are quite confident that our idea may lead to some concrete results.
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    95
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    96
\mbox{}\\[15cm]
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
    97
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    98
\noindent
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
    99
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
   100
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   101
232
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
   102
\small
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
   103
\bibliography{Journal/document/root}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added).
zhang
parents: 230
diff changeset
   104
\bibliographystyle{abbrv}
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   105
\end{multicols}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   106
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   107
%  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   108
%  \noindent {\bf Objectives:} The overall goals of the project are as follows:
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   109
%  \begin{itemize}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   110
%  \item To solve the POPLmark challenge.
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   111
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   112
%  \item To complete and greatly improve the existing implementation of the
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   113
%    nominal datatype package.
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   114
%  \item To explore the strengths of this package by proving the
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   115
%    safety of SML.
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   116
%  \item To provide a basis for extracting programs from safety proofs.
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   117
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
   118
%  \item To make the nominal datatype package usable for teaching
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   119
%    students about the lambda-calculus and the theory of programming
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   120
%    languages. \smallskip
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   121
%  \end{itemize}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   122
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   123
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   124
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   125
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   126
\end{document}
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   127
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
   128
%%% Local Variables:
225
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   129
%%% mode: latex
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   130
%%% TeX-master: t
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   131
%%% TeX-command-default: "PdfLaTeX"
bc3ffe0dd1d8 added a start for a proposal
urbanc
parents:
diff changeset
   132
%%% TeX-view-style: (("." "kpdf %s.pdf"))
229
2087fc59f2a1 One passage added.
zhang
parents: 228
diff changeset
   133
%%% End: