regexp: csupp.tex@68d48522ea9a (annotated)

225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	1	\documentclass{article}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	2	\usepackage{a4wide,ot1patch}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	3	\usepackage[latin1]{inputenc}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	4	\usepackage{multicol}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	5	\usepackage{charter}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	6	\usepackage{amsmath,amssymb,amsthm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	7	\usepackage{fancyheadings}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	8
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	9	\addtolength{\oddsidemargin}{-6mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	10	\addtolength{\evensidemargin}{-6mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	11	\addtolength{\textwidth}{11mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	12	\addtolength{\columnsep}{3mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	13	\addtolength{\textheight}{8mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	14	\addtolength{\topmargin}{-7.5mm}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	15
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	16	\pagestyle{fancyplain}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	17	\lhead[\fancyplain{}{A}]{\fancyplain{}{}}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	18	\rhead[\fancyplain{}{C}]{\fancyplain{}{}}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	19	\renewcommand{\headrulewidth}{0pt}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	20
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	21	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	22	\begin{document}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	23
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	24
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	25	\begin{center}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	26	\begin{tabular}{c}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	27	\\[-5mm]
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	28	\LARGE\bf Certified Parsing\\[-10mm]
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	29	\mbox{}
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	30	\end{tabular}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	31	\end{center}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	32	\thispagestyle{empty}
228 87a8dc29d7ae latest changes urbanc parents: 227 diff changeset	33	\mbox{}\\[-5mm]
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	34
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	35	\begin{multicols}{2}
228 87a8dc29d7ae latest changes urbanc parents: 227 diff changeset	36	\section*{Background}
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	37	\noindent
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	38	Parsing is the act of transforming plain text into some
232 114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	39	structure that can be analyzed by computers for further processing.
235 a7ddcad0a023 some changes urbanc parents: 234 diff changeset	40	One might think that parsing has been studied to death, and after
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	41	\emph{yacc} and \emph{lex} no new results can be obtained in this area.
236 2d4f1334b5ca shortened urbanc parents: 235 diff changeset	42	However recent developments and novel approaches make it increasingly clear,
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	43	that this is not true anymore.
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	44
236 2d4f1334b5ca shortened urbanc parents: 235 diff changeset	45	We propose to on parsers from a certification point of view. Increasingly,
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	46	parsers are part of certified compilers, like \mbox{\emph{CompCert}}, which
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	47	are guaranteed to be correct and bug-free. Such certified compilers are
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	48	crucial in areas where software just cannot fail. However, so far the parsers
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	49	of these compilers have been left out of the certification. This is because
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	50	parsing algorithms are often ad hoc and their semantics is not clearly
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	51	specified. Unfortunately, this means parsers can harbour errors that
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	52	potentially invalidate the whole certification and correctness of the
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	53	compiler. In this project, we like to change that with the help of theorem
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	54	provers.
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	55
236 2d4f1334b5ca shortened urbanc parents: 235 diff changeset	56	Only in the last few years, theorem provers have become good enough for
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	57	establishing the correctness of some standard lexing and parsing
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	58	algorithms. For this, the algorithms still need to be formulated in way so
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	59	that it is easy to reason about them. In our earlier work about lexing and
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	60	regular languages, we showed that this precludes well-known algorithms based
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	61	automata. However we showed also that regular languages can be formulated and
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	62	reasoned about entirely in terms regular expressions, which can be easily
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	63	represented in theorem provers. This work uses the device of derivatives of
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	64	regular expressions. We like to extend this device to parsers and grammars.
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	65	The aim is to come up with elegant and practical useful parsing algorithms
236 2d4f1334b5ca shortened urbanc parents: 235 diff changeset	66	whose correctness can be certified in a theorem prover.
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	67
228 87a8dc29d7ae latest changes urbanc parents: 227 diff changeset	68
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	69	\section*{Proposed Work}
2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	70
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	71	A recent development in parsing is Parsing Expression Grammars (PEG), which
236 2d4f1334b5ca shortened urbanc parents: 235 diff changeset	72	are an extension of the weel-known Context Free Grammars
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	73	(CFG)~\cite{Ford04a}. The extension introduces new regular operators, such as
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	74	negation and conjunction, on the right-hand sides of grammar rules, as well as
236 2d4f1334b5ca shortened urbanc parents: 235 diff changeset	75	priority orderings. With these extensions, PEG parsing becomes much
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	76	more powerful. For example disambiguation, formerly expressed by semantic
236 2d4f1334b5ca shortened urbanc parents: 235 diff changeset	77	filters, can now be expressed directly using grammar rules.
230 6bb8ad9093e6 More modification by Xingyuan. zhang parents: 229 diff changeset	78
236 2d4f1334b5ca shortened urbanc parents: 235 diff changeset	79	However, there is serious disadvantage of PEG for applications: is does not
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	80	support grammrs involving left recursion~\cite{Ford02b}. Although a new PEG
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	81	parsing algorithm has been proposed that can deal with left
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	82	recursion~\cite{conf/pepm/WarthDM08}, there is no correctness proof, not even
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	83	in ``paper-and-pencil'' form. One aim of this research is to solve this sorry
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	84	state-of-affairs by either certifying this algorithm or inventing a new
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	85	one. For this we will first formalize a fixed point semantics of PEG, based on
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	86	which an efficient, certified parsing algorithm can be given given. For this
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	87	we take as starting point the paper~\cite{Ford04a}, which does not treat
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	88	left-recursion, but gives an operational semantics for PEG parsing. For the
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	89	semantics, it seems plausible that we can adapt work on Boolean
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	90	Grammars~\cite{Okhotin/04a}, which are similar to PEGs, and for which the
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	91	paper ~\cite{journals/iandc/KountouriotisNR09} gives a semantics to negation
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	92	operators, but not to Kleene's star operation.
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	93
236 2d4f1334b5ca shortened urbanc parents: 235 diff changeset	94	For the parsing algorithm, we might also be able to draw inspiration from
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	95	parsers based on Cocke-Younger-Kasami (CYK)
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	96	algorithms~\cite{journals/iandc/KountouriotisNR09} and
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	97	Early~\cite{Earley70,AycHor02} parsers. The defect CYK algorithms is that the
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	98	original grammar specification needs to be transformed into a normal
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	99	form. This transformation may lead to grammar explosion and inefficient
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	100	parsing. We will investigate whether this transformation can be avoided.
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	101	Early style parsers, which have recently been certified by Ridge [???],
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	102	need to be extended to PEG parsing in order to be helpful for us.
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	103
236 2d4f1334b5ca shortened urbanc parents: 235 diff changeset	104	Finally, we want to investigate whether derivatives of regular expressions
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	105	~\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023}
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	106	can be extended to parsing. Lexing based on derivatives gives rise to very
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	107	elegant regular expression matchers that can be certified in a theorem prover
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	108	with ease. We will study whether the idea of taking a derivative of a regular
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	109	expression can be extended to rules in grammars. The problem that needs to be
2d4f1334b5ca shortened urbanc parents: 235 diff changeset	110	overcome again arises from possible left recursion in parsing.
234 eeadb4e51d74 some changes urbanc parents: 232 diff changeset	111
eeadb4e51d74 some changes urbanc parents: 232 diff changeset	112
236 2d4f1334b5ca shortened urbanc parents: 235 diff changeset	113	\newpage
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	114
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	115
232 114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	116	\small
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	117	\bibliography{Journal/document/root}
114064363ef0 Proposal paragraphs by Xingyuan completed (with references added). zhang parents: 230 diff changeset	118	\bibliographystyle{abbrv}
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	119	\end{multicols}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	120
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	121	% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	122	% \noindent {\bf Objectives:} The overall goals of the project are as follows:
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	123	% \begin{itemize}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	124	% \item To solve the POPLmark challenge.
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	125
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	126	% \item To complete and greatly improve the existing implementation of the
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	127	% nominal datatype package.
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	128	% \item To explore the strengths of this package by proving the
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	129	% safety of SML.
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	130	% \item To provide a basis for extracting programs from safety proofs.
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	131
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	132	% \item To make the nominal datatype package usable for teaching
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	133	% students about the lambda-calculus and the theory of programming
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	134	% languages. \smallskip
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	135	% \end{itemize}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	136
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	137
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	138
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	139	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	140	\end{document}
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	141
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	142	%%% Local Variables:
225 bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	143	%%% mode: latex
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	144	%%% TeX-master: t
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	145	%%% TeX-command-default: "PdfLaTeX"
bc3ffe0dd1d8 added a start for a proposal urbanc parents: diff changeset	146	%%% TeX-view-style: (("." "kpdf %s.pdf"))
229 2087fc59f2a1 One passage added. zhang parents: 228 diff changeset	147	%%% End:

author	urbanc
	Mon, 05 Sep 2011 20:59:50 +0000
changeset 241	68d48522ea9a
parent 236	2d4f1334b5ca
child 242	093e45c44d91
permissions	-rwxr-xr-x