csupp.tex
changeset 232 114064363ef0
parent 230 6bb8ad9093e6
child 234 eeadb4e51d74
equal deleted inserted replaced
231:999fce5f9d34 232:114064363ef0
    34 
    34 
    35 \begin{multicols}{2}
    35 \begin{multicols}{2}
    36 \section*{Background}
    36 \section*{Background}
    37 \noindent
    37 \noindent
    38 Parsing is the act of transforming plain text into some
    38 Parsing is the act of transforming plain text into some
    39 structure that can be analysed by computers for further processing.
    39 structure that can be analyzed by computers for further processing.
    40 One might think that parsing has been studied to death and after
    40 One might think that parsing has been studied to death and after
    41 \emph{yacc} and \emph{lex} no new results can be obtained in this area.
    41 \emph{yacc} and \emph{lex} no new results can be obtained in this area.
    42 However recent results and novel approaches make it increasingly clear,
    42 However recent results and novel approaches make it increasingly clear,
    43 that this is not true anymore.
    43 that this is not true anymore.
    44 
    44 
    45 We propose to approach the subject of parsing from a certification point
    45 We propose to approach the subject of parsing from a certification point
    46 of view. Parsers are increasingly part of certified compilers, like \mbox{\emph{CompCert}},
    46 of view. Parsers are increasingly part of certified compilers, like \mbox{\emph{CompCert}},
    47 which are guaranteed to be correct and bug-free. Such certified compilers are
    47 which are guaranteed to be correct and bug-free. Such certified compilers are
    48 crucial in areas where software just cannot fail. However, so far the
    48 crucial in areas where software just cannot fail. However, so far the
    49 parsers of these compilers have been left out of the certification.
    49 parsers of these compilers have been left out of the certification.
    50 This is because parsing algorithms are often adhoc and their semantics
    50 This is because parsing algorithms are often ad hoc and their semantics
    51 is not clearly specified. Unfortunately, this means parsers can harbour
    51 is not clearly specified. Unfortunately, this means parsers can harbour
    52 errors that potentially invalidate the whole certification and correctness
    52 errors that potentially invalidate the whole certification and correctness
    53 of the compiler. In this project, we like to change that.
    53 of the compiler. In this project, we like to change that.
    54 
    54 
    55 Only in the last few years, theorem provers have become good enough
    55 Only in the last few years, theorem provers have become good enough
    66 whose correctness and the absence of bugs can be certified in a
    66 whose correctness and the absence of bugs can be certified in a
    67 theorem prover.
    67 theorem prover.
    68 
    68 
    69 \section*{Proposed Work}
    69 \section*{Proposed Work}
    70 
    70 
    71 One new development in formal grammar is the Parsing Expression Grammar (PEG) which is proposed as an refinement of standard Context Free Grammar. The aim of this extension is to internalize disambiguition normally done with semantic methods. 
    71 One new development in formal grammar is the introduction of Parsing Expression Grammar (PEG) as an extension of the standard Context Free Grammar (CFG)\cite{Ford04a}. The extension introduces new regular operators such as negation and conjunction to the right hand side of productions, as well as well as an priority ordering on productions. With these extensions, PEG becomes more powerful such that disambiguation formerly expressed using semantic filters can now be expressed directly using production expressions. This means a simpler and more systematic treatment of ambiguity and more concise grammar specification for programming languages.
    72 The idea is to introduce negative, conjunctive operators as well as production priorities, so that the grammars written in
       
    73 PEG are unambiguous in the first place. Another benefit of PEG is that it admits a very efficient linear parsing algorithm.
       
    74 
    72 
    75 However, one disadvantage of PEG is that it does not allow left recursion in grammar specification, i.e., standard parsing algorithms of PEG can not deal with left recursion. Although some authors claimed PEG parsing algorithms for left recursion, none of them provide correctness proof, not even in paper-and-pencil form. 
    73 However, one disadvantage of PEG is that it does not allow left recursion in grammar specification, because the accompanying algorithms of PEG\cite{Ford02b} can not deal with left recursions. Although some authors claimed new PEG parsing algorithm for left recursion\cite{conf/pepm/WarthDM08}, there is no correctness proof, not even in paper-and-pencil form. One aim of this research is to formalize a fixed point semantics of PEG, based on which an efficient, certified parsing algorithm is given.
    76 
    74 
       
    75 There are several existing works we can draw upon:
       
    76 \begin{enumerate}
       
    77     \item The works on PEG.
       
    78         \begin {enumerate}
       
    79             \item An operation semantics for PEG has already been given in \cite{Ford04a}, but it is not adequate to deal with left recursions. But this work gives at least a precise description of what the original PEG  meant for. This will serve an a basis to show the conservativeness of the fixed point semantics we are going to develop.
       
    80             \item The new algorithm\cite{conf/pepm/WarthDM08} which claimed to be able to deal with left recursions. Although there is no correctness proof yet, this may provide some useful inspirations for our new algorithm design.
       
    81         \end{enumerate}
       
    82     \item The works on Boolean Grammar\cite{Okhotin/04a}. Boolean Grammar is very closely related to PEG, because it also contains negative and conjunctive grammars. The main differences are: First, Boolean Grammar has no ordering on productions; Second: Boolean Grammar does not contain STAR operator. There are two works about Boolean Grammar which might be useful for this research:
       
    83         \begin{enumerate}
       
    84             \item A fixed point semantics for Boolean Grammar\cite{journals/iandc/KountouriotisNR09}. The idea to define the semantics of negative and conjunctive operators is certainly what we can borrow. Therefore, this work gives the basis on which we can add in production ordering and STAR operator.
       
    85             \item A parsing algorithm for Boolean Grammar based on CYK parsing\cite{journals/iandc/KountouriotisNR09}. The draw back of CYK parsing is that: the original grammar specification needs to be transformed into a normal form. This transformation may lead to grammar explosion and is undesirable. One aim of this research is to see whether this transformation can be avoided. For this purpose, other parsing style may provide useful inspirations, for example:
       
    86                     \begin{enumerate}
       
    87                         \item Derivative Parsing\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023}. Christian Urban has used derivative methods to establish the correctness of a regular expression matcher, as well the the finite partition property of regular expression\cite{WuZhangUrban11}.  There are well founded envisage that the derivative methods may provide the foundation to the new parsing algorithms of PEG.
       
    88                         \item Early parsing\cite{Earley70,AycHor02}. It is a refinement of CYK parsing which does not require the transformation to normal forms, and therefore provide one possible direction to adapt the current CYK based parsing algorithm of Boolean Grammar for PEG grammar.
       
    89                         \item The new parsing algorithm proposed by Tom Ridge[???]. Recently, T. Ridge has proposed and certified an combinator style parsing algorithm for CFG, which borrows some ideas from Early parsing. The proposed algorithm is very simple and elegant. We are going to strive for a parsing algorithm as elegant as this one.
       
    90                     \end{enumerate}
       
    91                 Which of the above possibilities will finally get into our final solutions is an interesting point about this current research.
       
    92         \end{enumerate}
       
    93 \end{enumerate}
       
    94 Based on these works, we are quite confident that our idea may lead to some concrete results.
    77 
    95 
    78 \mbox{}\\[15cm]
    96 \mbox{}\\[15cm]
    79 
    97 
    80 \noindent
    98 \noindent
    81 
    99 
    82 
   100 
    83 
   101 
    84 %\small
   102 \small
    85 %\bibliography{../../bib/all}
   103 \bibliography{Journal/document/root}
    86 %\bibliographystyle{abbrv}
   104 \bibliographystyle{abbrv}
    87 \end{multicols}
   105 \end{multicols}
    88 
   106 
    89 %  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   107 %  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    90 %  \noindent {\bf Objectives:} The overall goals of the project are as follows:
   108 %  \noindent {\bf Objectives:} The overall goals of the project are as follows:
    91 %  \begin{itemize}
   109 %  \begin{itemize}