41 studied to death, and after \emph{yacc} and \emph{lex} no new results can be |
41 studied to death, and after \emph{yacc} and \emph{lex} no new results can be |
42 obtained in this area. However recent developments and novel approaches make |
42 obtained in this area. However recent developments and novel approaches make |
43 it increasingly clear, that this is not true anymore~\cite{Might11}. And |
43 it increasingly clear, that this is not true anymore~\cite{Might11}. And |
44 there is a real practical need for new results: for example the future HTML5 |
44 there is a real practical need for new results: for example the future HTML5 |
45 Standard abandons a well-defined grammar specification, in favour of a bespoke |
45 Standard abandons a well-defined grammar specification, in favour of a bespoke |
46 parser given as pseudo code. |
46 parser given as pseudo code. Proving any property about this parser is nearly |
|
47 impossible. |
47 |
48 |
48 This work targets parsers from a certification point of view. Increasingly, |
49 This work targets parsers from a certification point of view. Increasingly, |
49 parsers are part of certified compilers, like |
50 parsers are part of certified compilers, like |
50 \mbox{\emph{CompCert}}~\cite{Leroy09}, which are guaranteed to be |
51 \mbox{\emph{CompCert}}~\cite{Leroy09}, which are guaranteed to be |
51 bug-free. Such certified compilers are crucial in areas where software just |
52 bug-free. Such certified compilers are crucial in areas where software just |
75 (PEGs)~\cite{Ford04}, which |
76 (PEGs)~\cite{Ford04}, which |
76 are an extension of the well-known Context Free Grammars |
77 are an extension of the well-known Context Free Grammars |
77 (CFGs). This extension introduces new regular operators, such as |
78 (CFGs). This extension introduces new regular operators, such as |
78 negation and conjunction, on the right-hand side of grammar rules, as well as |
79 negation and conjunction, on the right-hand side of grammar rules, as well as |
79 priority orderings for rules. With these extensions, PEG parsing becomes much |
80 priority orderings for rules. With these extensions, PEG parsing becomes much |
80 more powerful and more useful in practise. For example disambiguation, formerly expressed by semantic |
81 more powerful and more useful in practice. For example disambiguation, formerly expressed by semantic |
81 filters, can now be expressed directly using grammar rules. |
82 filters, can now be expressed directly using grammar rules. |
82 |
83 |
83 However, there is a serious limitation of PEGs, which affects potential |
84 However, there is a serious limitation of PEGs, which affects potential |
84 applications: grammars involving left recursion are not |
85 applications: grammars involving left recursion are not |
85 allowed~\cite{Ford02}. Although one PEG parsing algorithm has already been |
86 allowed~\cite{Ford02}. Although one PEG parsing algorithm has already been |
91 semantics we take as starting point the paper~\cite{Ford04}, which does not |
92 semantics we take as starting point the paper~\cite{Ford04}, which does not |
92 treat left-recursion, but gives an operational semantics for PEG |
93 treat left-recursion, but gives an operational semantics for PEG |
93 parsing. There are also good indications that we can adapt work on Boolean |
94 parsing. There are also good indications that we can adapt work on Boolean |
94 Grammars~\cite{Okhotin04}, which are similar to PEGs and for which the |
95 Grammars~\cite{Okhotin04}, which are similar to PEGs and for which the |
95 paper~\cite{KountouriotisNR09} gives a fixed-point semantics |
96 paper~\cite{KountouriotisNR09} gives a fixed-point semantics |
96 to negation operators, but not to the Kleene star. |
97 for negation operators, but not to the Kleene star. |
97 |
98 |
98 For the parsing algorithm, we might be able to build upon |
99 For our parsing algorithm, we might be able to build upon |
99 the classic Cocke-Younger-Kasami (CYK) |
100 the classic Cocke-Younger-Kasami (CYK) |
100 algorithms~\cite{KountouriotisNR09} and |
101 algorithms~\cite{KountouriotisNR09} and |
101 Early~\cite{AycHor02, Earley70} parsers. The defect of CYK algorithms, however, |
102 Early~\cite{AycHor02, Earley70} parsers. The defect of CYK algorithms, however, |
102 is that the grammar specifications given by the user need to be transformed |
103 is that the grammar specifications given by the user need to be transformed |
103 into a normal form. This transformation may potentially lead to rule explosion |
104 into a normal form. This transformation may potentially lead to rule explosion |