regexp: csupp.tex@eeadb4e51d74 (annotated)

225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	1	\documentclass{article}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	2	\usepackage{a4wide,ot1patch}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	3	\usepackage[latin1]{inputenc}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	4	\usepackage{multicol}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	5	\usepackage{charter}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	6	\usepackage{amsmath,amssymb,amsthm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	7	\usepackage{fancyheadings}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	8
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	9	\addtolength{\oddsidemargin}{-6mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	10	\addtolength{\evensidemargin}{-6mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	11	\addtolength{\textwidth}{11mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	12	\addtolength{\columnsep}{3mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	13	\addtolength{\textheight}{8mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	14	\addtolength{\topmargin}{-7.5mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	15
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	16	\pagestyle{fancyplain}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	17	\lhead[\fancyplain{}{A}]{\fancyplain{}{}}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	18	\rhead[\fancyplain{}{C}]{\fancyplain{}{}}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	19	\renewcommand{\headrulewidth}{0pt}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	20
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	21	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	22	\begin{document}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	23
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	24
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	25	\begin{center}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	26	\begin{tabular}{c}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	27	\\[-5mm]
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	28	\LARGE\bf Certified Parsing\\[-10mm]
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	29	\mbox{}
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	30	\end{tabular}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	31	\end{center}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	32	\thispagestyle{empty}
228 87a8dc29d7ae latest changes urbanc parents: 227 diff changeset	33	\mbox{}\\[-5mm]
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	34
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	35	\begin{multicols}{2}
228 87a8dc29d7ae latest changes urbanc parents: 227 diff changeset	36	\section*{Background}
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	37	\noindent
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	38	Parsing is the act of transforming plain text into some
232 114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	39	structure that can be analyzed by computers for further processing.
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	40	One might think that parsing has been studied to death and after
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	41	\emph{yacc} and \emph{lex} no new results can be obtained in this area.
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	42	However recent results and novel approaches make it increasingly clear,
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	43	that this is not true anymore.
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	44
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	45	We propose to approach the subject of parsing from a certification point
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	46	of view. Increasingly, parsers are part of certified compilers, like \mbox{\emph{CompCert}},
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	47	which are guaranteed to be correct and bug-free. Such certified compilers are
228 87a8dc29d7ae latest changes urbanc parents: 227 diff changeset	48	crucial in areas where software just cannot fail. However, so far the
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	49	parsers of these compilers have been left out of the certification.
232 114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	50	This is because parsing algorithms are often ad hoc and their semantics
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	51	is not clearly specified. Unfortunately, this means parsers can harbour
2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	52	errors that potentially invalidate the whole certification and correctness
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	53	of the compiler. In this project, we like to change that.
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	54
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	55	Only in the last few years, theorem provers have become good enough
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	56	for establishing the correctness of some standard lexing and
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	57	parsing algorithms. For this, the algorithms still need to be formulated
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	58	in way so that it is easy to reason about them. In our earlier work
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	59	about lexing and regular languages, the authors showed that this
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	60	precludes well-known algorithms based automata. However we showed
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	61	that regular
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	62	languages can be formulated and reasoned about entirely in terms
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	63	regular expressions, which can be easily represented in theorem
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	64	provers. This work uses the device of derivatives of regular
227 9c281a4b767d small change urbanc parents: 225 diff changeset	65	expressions. We like to extend this device to parsers and grammars.
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	66	The aim is to come up with elegant and practical useful parsing algorithms
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	67	whose correctness can be certified in a
228 87a8dc29d7ae latest changes urbanc parents: 227 diff changeset	68	theorem prover.
87a8dc29d7ae latest changes urbanc parents: 227 diff changeset	69
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	70	\section*{Proposed Work}
2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	71
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	72	A recent development in parsing is Parsing Expression Grammars (PEG), which
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	73	are an extension of the standard Context Free Grammars
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	74	(CFG)~\cite{Ford04a}. The extension introduces new regular operators, such as
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	75	negation and conjunction, on the right-hand sides of grammar rules, as well as
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	76	priority orderings on rules. With these extensions, PEG parsing becomes much
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	77	more powerful. For example disambiguation, formerly expressed by semantic
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	78	filters, can now be expressed directly using grammar rules. This means a
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	79	simpler and more systematic treatment of ambiguity and more concise grammar
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	80	specifications for programming languages.
232 114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	81
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	82	However, a serious disadvantage of PEG is that it does not allow left
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	83	recursion, because parsing algorithms for PEG~\cite{Ford02b} can not deal with
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	84	left recursions. Although a new PEG parsing algorithm has been proposed
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	85	that can deal with left recursion~\cite{conf/pepm/WarthDM08}, there is no
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	86	correctness proof, not even in ``paper-and-pencil'' form. One aim of this
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	87	research is to solve this sorry state-of-affairs by either certifying this
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	88	algorithm or inventing a new one. For this we will first formalize a fixed
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	89	point semantics of PEG, based on which an efficient, certified parsing
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	90	algorithm can be given given.
230 6bb8ad9093e6 More modification by Xingyuan. zhang parents: 229 diff changeset	91
232 114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	92	There are several existing works we can draw upon:
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	93	\begin{enumerate}
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	94	\item The works on PEG.
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	95	\begin {enumerate}
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	96	\item An operation semantics for PEG has already been given
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	97	in~\cite{Ford04a}, but it is not adequate to deal with left recursions. But this
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	98	work gives at least a precise description of what the original PEG meant
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	99	for. This will serve an a basis to show the conservativeness of
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	100	the fixed point semantics we are
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	101	going to develop.
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	102
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	103	\item The new algorithm~\cite{conf/pepm/WarthDM08} which claimed to be able
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	104	to deal with left recursions. Although there is no correctness proof yet, this
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	105	may provide some useful inspirations for our new algorithm design.
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	106	\end{enumerate}
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	107
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	108	\item The works on Boolean Grammars~\cite{Okhotin/04a}. Boolean Grammar is
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	109	very closely related to PEG, because it also contains negative and conjunctive
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	110	grammars. The main differences are: First, Boolean Grammar has no ordering on
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	111	productions; Second: Boolean Grammar does not contain STAR operator. There are
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	112	two works about Boolean Grammar which might be useful for this research:
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	113
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	114	\begin{enumerate}
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	115	\item A fixed point semantics for Boolean
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	116	Grammar~\cite{journals/iandc/KountouriotisNR09}. The idea to define the
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	117	semantics of negative and conjunctive operators is certainly what we can
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	118	borrow. Therefore, this work gives the basis on which we can add in production
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	119	ordering and STAR operator.
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	120
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	121	\item A parsing algorithm for Boolean Grammar based on CYK
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	122	parsing~\cite{journals/iandc/KountouriotisNR09}. The draw back of CYK parsing
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	123	is that: the original grammar specification needs to be transformed into a
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	124	normal form. This transformation may lead to grammar explosion and is
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	125	undesirable. One aim of this research is to see whether this transformation
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	126	can be avoided. For this purpose, other parsing style may provide useful inspirations, for example:
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	127
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	128	\begin{enumerate}
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	129	\item Derivative
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	130	Parsing~\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023}. Christian
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	131	Urban has used derivative methods to establish the correctness of a regular
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	132	expression matcher, as well the the finite partition property of regular
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	133	expression~\cite{WuZhangUrban11}. There are well founded envisage that the
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	134	derivative methods may provide the foundation to the new parsing algorithms of PEG.
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	135
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	136	\item Early parsing~\cite{Earley70,AycHor02}. It is a refinement of CYK
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	137	parsing which does not require the transformation to normal forms, and
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	138	therefore provide one possible direction to adapt the current CYK based
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	139	parsing algorithm of Boolean Grammar for PEG grammar.
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	140
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	141	\item The new parsing algorithm proposed by Tom Ridge[???]. Recently,
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	142	T. Ridge has proposed and certified an combinator style parsing algorithm for
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	143	CFG, which borrows some ideas from Early parsing. The proposed algorithm is
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	144	very simple and elegant. We are going to strive for a parsing algorithm as elegant as this one.
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	145	\end{enumerate}
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	146
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	147	Which of the above possibilities will finally get into our final solutions
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	148	is an interesting point about this current research.
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	149
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	150	\end{enumerate}
232 114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	151	\end{enumerate}
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	152
232 114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	153	Based on these works, we are quite confident that our idea may lead to some concrete results.
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	154
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	155	\mbox{}\\[15cm]
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	156
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	157	\noindent
2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	158
2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	159
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	160
232 114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	161	\small
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	162	\bibliography{Journal/document/root}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	163	\bibliographystyle{abbrv}
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	164	\end{multicols}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	165
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	166	% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	167	% \noindent {\bf Objectives:} The overall goals of the project are as follows:
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	168	% \begin{itemize}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	169	% \item To solve the POPLmark challenge.
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	170
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	171	% \item To complete and greatly improve the existing implementation of the
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	172	% nominal datatype package.
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	173	% \item To explore the strengths of this package by proving the
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	174	% safety of SML.
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	175	% \item To provide a basis for extracting programs from safety proofs.
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	176
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	177	% \item To make the nominal datatype package usable for teaching
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	178	% students about the lambda-calculus and the theory of programming
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	179	% languages. \smallskip
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	180	% \end{itemize}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	181
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	182
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	183
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	184	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	185	\end{document}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	186
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	187	%%% Local Variables:
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	188	%%% mode: latex
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	189	%%% TeX-master: t
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	190	%%% TeX-command-default: "PdfLaTeX"
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	191	%%% TeX-view-style: (("." "kpdf %s.pdf"))
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	192	%%% End:

author	urbanc
	Mon, 05 Sep 2011 12:33:26 +0000 (2011-09-05)
changeset 234	eeadb4e51d74
parent 232	114064363ef0
child 235	a7ddcad0a023
permissions	-rwxr-xr-x