csupp.tex
changeset 243 86a4182c73e7
parent 242 093e45c44d91
child 244 a9598a206c41
equal deleted inserted replaced
242:093e45c44d91 243:86a4182c73e7
    23 
    23 
    24 
    24 
    25 \begin{center}
    25 \begin{center}
    26 \begin{tabular}{c}
    26 \begin{tabular}{c}
    27 \\[-5mm]
    27 \\[-5mm]
    28 \LARGE\bf Certified Parsing\\[-10mm]
    28 \LARGE\bf Novel Certified Parsers\\[-10mm]
    29 \mbox{}
    29 \mbox{}
    30 \end{tabular}
    30 \end{tabular}
    31 \end{center}
    31 \end{center}
    32 \thispagestyle{empty}
    32 \thispagestyle{empty}
    33 \mbox{}\\[-5mm]
    33 \mbox{}\\[-5mm]
    39 Parsers transform plain text into some abstract structure that can be analyzed by
    39 Parsers transform plain text into some abstract structure that can be analyzed by
    40 computers for further processing.  One might think that parsers have been
    40 computers for further processing.  One might think that parsers have been
    41 studied to death, and after \emph{yacc} and \emph{lex} no new results can be
    41 studied to death, and after \emph{yacc} and \emph{lex} no new results can be
    42 obtained in this area.  However recent developments and novel approaches make
    42 obtained in this area.  However recent developments and novel approaches make
    43 it increasingly clear, that this is not true anymore~\cite{Might11}. And
    43 it increasingly clear, that this is not true anymore~\cite{Might11}. And
    44 there is a practical need for new results: for example the future HTML 5 
    44 there is a real practical need for new results: for example the future HTML 5 
    45 Standard abandons a well-defined grammar specification, in favour of a bespoke
    45 Standard abandons a well-defined grammar specification, in favour of a bespoke
    46 parser given as pseudo code.
    46 parser given as pseudo code.
    47 
    47 
    48 This work targets parsers from a certification point of view. Increasingly,
    48 This work targets parsers from a certification point of view. Increasingly,
    49 parsers are part of certified compilers, like
    49 parsers are part of certified compilers, like
    56 correctness of the compiler. In this project, we aim to change this situation
    56 correctness of the compiler. In this project, we aim to change this situation
    57 with the help of the theorem prover Isabelle/HOL.
    57 with the help of the theorem prover Isabelle/HOL.
    58 
    58 
    59 Only in the last few years, theorem provers have become powerful enough for
    59 Only in the last few years, theorem provers have become powerful enough for
    60 establishing the correctness of some standard lexing and parsing
    60 establishing the correctness of some standard lexing and parsing
    61 algorithms. For this, the algorithms still need to be formulated in way so
    61 algorithms. For this, the algorithms still need to be formulated in a way so
    62 that it is easy to reason about them. In our earlier work about lexing and
    62 that it is easy to reason about them. In our earlier work about lexing and
    63 regular languages, we showed that this precludes well-known algorithms based
    63 regular languages, we showed that this precludes well-known algorithms based
    64 automata~\cite{WuZhangUrban11}. However we showed also that regular languages can be formulated and
    64 automata~\cite{WuZhangUrban11}. However we showed also that regular languages can be formulated and
    65 reasoned about entirely in terms regular expressions, which can be easily
    65 reasoned about entirely in terms regular expressions, which can be easily
    66 represented in theorem provers. This work uses the device of derivatives of
    66 represented in theorem provers. This work uses the device of derivatives of
    89 algorithm or inventing a new one. For this we will first devise a
    89 algorithm or inventing a new one. For this we will first devise a
    90 fixed-point semantics of PEGs, against which we can certify a parser. For this
    90 fixed-point semantics of PEGs, against which we can certify a parser. For this
    91 semantics we take as starting point the paper~\cite{Ford04}, which does not
    91 semantics we take as starting point the paper~\cite{Ford04}, which does not
    92 treat left-recursion, but gives an operational semantics for PEG
    92 treat left-recursion, but gives an operational semantics for PEG
    93 parsing. There are also good indications that we can adapt work on Boolean
    93 parsing. There are also good indications that we can adapt work on Boolean
    94 Grammars~\cite{Okhotin04}, which are similar to PEGs, and for which the
    94 Grammars~\cite{Okhotin04}, which are similar to PEGs and for which the
    95 paper~\cite{KountouriotisNR09} gives a fixed-point semantics 
    95 paper~\cite{KountouriotisNR09} gives a fixed-point semantics 
    96 to negation operators, but not to the Kleene star.
    96 to negation operators, but not to the Kleene star.
    97 
    97 
    98 For the parsing algorithm, we might be able to draw inspiration from parsers
    98 For the parsing algorithm, we might be able to build upon
    99 based on the classic Cocke-Younger-Kasami (CYK)
    99 the classic Cocke-Younger-Kasami (CYK)
   100 algorithms~\cite{KountouriotisNR09} and
   100 algorithms~\cite{KountouriotisNR09} and
   101 Early~\cite{AycHor02, Earley70} parsers. The defect of CYK algorithms, however,
   101 Early~\cite{AycHor02, Earley70} parsers. The defect of CYK algorithms, however,
   102 is that the grammar specifications given by the user need to be transformed
   102 is that the grammar specifications given by the user need to be transformed
   103 into a normal form. This transformation may potentially lead to rule explosion
   103 into a normal form. This transformation may potentially lead to rule explosion
   104 and hence inefficient parsing. We will investigate whether this transformation
   104 and hence inefficient parsing. We will investigate whether this transformation
   110 ~\cite{Brzozowski64,Might11,OwensReppyTuron09}
   110 ~\cite{Brzozowski64,Might11,OwensReppyTuron09}
   111 can be generalised to PEG parsing. In earlier work, we showed that lexing based on
   111 can be generalised to PEG parsing. In earlier work, we showed that lexing based on
   112 derivatives gives rise to very elegant regular expression matchers that can be
   112 derivatives gives rise to very elegant regular expression matchers that can be
   113 certified in a theorem prover with ease.  We will study whether the idea of
   113 certified in a theorem prover with ease.  We will study whether the idea of
   114 taking a derivative of a regular expression can be extended to rules in
   114 taking a derivative of a regular expression can be extended to rules in
   115 grammars. The problem that needs to be overcome is again how to deal with possible
   115 grammars. The problem that needs to be addressed is again how to deal with 
   116 left-recursion in grammar rules.
   116 left-recursive grammar rules.
   117 
   117 
   118 
   118 
   119 
   119 
   120 \bibliography{Journal/document/root}
   120 \bibliography{Journal/document/root}
   121 \bibliographystyle{abbrv}
   121 \bibliographystyle{abbrv}