csupp.tex
changeset 258 1abf8586ee6b
parent 244 a9598a206c41
equal deleted inserted replaced
257:f512026d5d6e 258:1abf8586ee6b
    41 studied to death, and after \emph{yacc} and \emph{lex} no new results can be
    41 studied to death, and after \emph{yacc} and \emph{lex} no new results can be
    42 obtained in this area.  However recent developments and novel approaches make
    42 obtained in this area.  However recent developments and novel approaches make
    43 it increasingly clear, that this is not true anymore~\cite{Might11}. And
    43 it increasingly clear, that this is not true anymore~\cite{Might11}. And
    44 there is a real practical need for new results: for example the future HTML5 
    44 there is a real practical need for new results: for example the future HTML5 
    45 Standard abandons a well-defined grammar specification, in favour of a bespoke
    45 Standard abandons a well-defined grammar specification, in favour of a bespoke
    46 parser given as pseudo code.
    46 parser given as pseudo code. Proving any property about this parser is nearly 
       
    47 impossible.
    47 
    48 
    48 This work targets parsers from a certification point of view. Increasingly,
    49 This work targets parsers from a certification point of view. Increasingly,
    49 parsers are part of certified compilers, like
    50 parsers are part of certified compilers, like
    50 \mbox{\emph{CompCert}}~\cite{Leroy09}, which are guaranteed to be
    51 \mbox{\emph{CompCert}}~\cite{Leroy09}, which are guaranteed to be
    51 bug-free. Such certified compilers are crucial in areas where software just
    52 bug-free. Such certified compilers are crucial in areas where software just
    75 (PEGs)~\cite{Ford04}, which
    76 (PEGs)~\cite{Ford04}, which
    76 are an extension of the well-known Context Free Grammars
    77 are an extension of the well-known Context Free Grammars
    77 (CFGs). This extension introduces new regular operators, such as
    78 (CFGs). This extension introduces new regular operators, such as
    78 negation and conjunction, on the right-hand side of grammar rules, as well as
    79 negation and conjunction, on the right-hand side of grammar rules, as well as
    79 priority orderings for rules. With these extensions, PEG parsing becomes much
    80 priority orderings for rules. With these extensions, PEG parsing becomes much
    80 more powerful and more useful in practise. For example disambiguation, formerly expressed by semantic
    81 more powerful and more useful in practice. For example disambiguation, formerly expressed by semantic
    81 filters, can now be expressed directly using grammar rules. 
    82 filters, can now be expressed directly using grammar rules. 
    82 
    83 
    83 However, there is a serious limitation of PEGs, which affects potential
    84 However, there is a serious limitation of PEGs, which affects potential
    84 applications: grammars involving left recursion are not
    85 applications: grammars involving left recursion are not
    85 allowed~\cite{Ford02}. Although one PEG parsing algorithm has already been
    86 allowed~\cite{Ford02}. Although one PEG parsing algorithm has already been
    91 semantics we take as starting point the paper~\cite{Ford04}, which does not
    92 semantics we take as starting point the paper~\cite{Ford04}, which does not
    92 treat left-recursion, but gives an operational semantics for PEG
    93 treat left-recursion, but gives an operational semantics for PEG
    93 parsing. There are also good indications that we can adapt work on Boolean
    94 parsing. There are also good indications that we can adapt work on Boolean
    94 Grammars~\cite{Okhotin04}, which are similar to PEGs and for which the
    95 Grammars~\cite{Okhotin04}, which are similar to PEGs and for which the
    95 paper~\cite{KountouriotisNR09} gives a fixed-point semantics 
    96 paper~\cite{KountouriotisNR09} gives a fixed-point semantics 
    96 to negation operators, but not to the Kleene star.
    97 for negation operators, but not to the Kleene star.
    97 
    98 
    98 For the parsing algorithm, we might be able to build upon
    99 For our parsing algorithm, we might be able to build upon
    99 the classic Cocke-Younger-Kasami (CYK)
   100 the classic Cocke-Younger-Kasami (CYK)
   100 algorithms~\cite{KountouriotisNR09} and
   101 algorithms~\cite{KountouriotisNR09} and
   101 Early~\cite{AycHor02, Earley70} parsers. The defect of CYK algorithms, however,
   102 Early~\cite{AycHor02, Earley70} parsers. The defect of CYK algorithms, however,
   102 is that the grammar specifications given by the user need to be transformed
   103 is that the grammar specifications given by the user need to be transformed
   103 into a normal form. This transformation may potentially lead to rule explosion
   104 into a normal form. This transformation may potentially lead to rule explosion