39 Parsers transform plain text into some abstract structure that can be analyzed by |
39 Parsers transform plain text into some abstract structure that can be analyzed by |
40 computers for further processing. One might think that parsers have been |
40 computers for further processing. One might think that parsers have been |
41 studied to death, and after \emph{yacc} and \emph{lex} no new results can be |
41 studied to death, and after \emph{yacc} and \emph{lex} no new results can be |
42 obtained in this area. However recent developments and novel approaches make |
42 obtained in this area. However recent developments and novel approaches make |
43 it increasingly clear, that this is not true anymore~\cite{Might11}. And |
43 it increasingly clear, that this is not true anymore~\cite{Might11}. And |
44 there is a practical need for new results: for example the future HTML 5 |
44 there is a real practical need for new results: for example the future HTML 5 |
45 Standard abandons a well-defined grammar specification, in favour of a bespoke |
45 Standard abandons a well-defined grammar specification, in favour of a bespoke |
46 parser given as pseudo code. |
46 parser given as pseudo code. |
47 |
47 |
48 This work targets parsers from a certification point of view. Increasingly, |
48 This work targets parsers from a certification point of view. Increasingly, |
49 parsers are part of certified compilers, like |
49 parsers are part of certified compilers, like |
56 correctness of the compiler. In this project, we aim to change this situation |
56 correctness of the compiler. In this project, we aim to change this situation |
57 with the help of the theorem prover Isabelle/HOL. |
57 with the help of the theorem prover Isabelle/HOL. |
58 |
58 |
59 Only in the last few years, theorem provers have become powerful enough for |
59 Only in the last few years, theorem provers have become powerful enough for |
60 establishing the correctness of some standard lexing and parsing |
60 establishing the correctness of some standard lexing and parsing |
61 algorithms. For this, the algorithms still need to be formulated in way so |
61 algorithms. For this, the algorithms still need to be formulated in a way so |
62 that it is easy to reason about them. In our earlier work about lexing and |
62 that it is easy to reason about them. In our earlier work about lexing and |
63 regular languages, we showed that this precludes well-known algorithms based |
63 regular languages, we showed that this precludes well-known algorithms based |
64 automata~\cite{WuZhangUrban11}. However we showed also that regular languages can be formulated and |
64 automata~\cite{WuZhangUrban11}. However we showed also that regular languages can be formulated and |
65 reasoned about entirely in terms regular expressions, which can be easily |
65 reasoned about entirely in terms regular expressions, which can be easily |
66 represented in theorem provers. This work uses the device of derivatives of |
66 represented in theorem provers. This work uses the device of derivatives of |
89 algorithm or inventing a new one. For this we will first devise a |
89 algorithm or inventing a new one. For this we will first devise a |
90 fixed-point semantics of PEGs, against which we can certify a parser. For this |
90 fixed-point semantics of PEGs, against which we can certify a parser. For this |
91 semantics we take as starting point the paper~\cite{Ford04}, which does not |
91 semantics we take as starting point the paper~\cite{Ford04}, which does not |
92 treat left-recursion, but gives an operational semantics for PEG |
92 treat left-recursion, but gives an operational semantics for PEG |
93 parsing. There are also good indications that we can adapt work on Boolean |
93 parsing. There are also good indications that we can adapt work on Boolean |
94 Grammars~\cite{Okhotin04}, which are similar to PEGs, and for which the |
94 Grammars~\cite{Okhotin04}, which are similar to PEGs and for which the |
95 paper~\cite{KountouriotisNR09} gives a fixed-point semantics |
95 paper~\cite{KountouriotisNR09} gives a fixed-point semantics |
96 to negation operators, but not to the Kleene star. |
96 to negation operators, but not to the Kleene star. |
97 |
97 |
98 For the parsing algorithm, we might be able to draw inspiration from parsers |
98 For the parsing algorithm, we might be able to build upon |
99 based on the classic Cocke-Younger-Kasami (CYK) |
99 the classic Cocke-Younger-Kasami (CYK) |
100 algorithms~\cite{KountouriotisNR09} and |
100 algorithms~\cite{KountouriotisNR09} and |
101 Early~\cite{AycHor02, Earley70} parsers. The defect of CYK algorithms, however, |
101 Early~\cite{AycHor02, Earley70} parsers. The defect of CYK algorithms, however, |
102 is that the grammar specifications given by the user need to be transformed |
102 is that the grammar specifications given by the user need to be transformed |
103 into a normal form. This transformation may potentially lead to rule explosion |
103 into a normal form. This transformation may potentially lead to rule explosion |
104 and hence inefficient parsing. We will investigate whether this transformation |
104 and hence inefficient parsing. We will investigate whether this transformation |
110 ~\cite{Brzozowski64,Might11,OwensReppyTuron09} |
110 ~\cite{Brzozowski64,Might11,OwensReppyTuron09} |
111 can be generalised to PEG parsing. In earlier work, we showed that lexing based on |
111 can be generalised to PEG parsing. In earlier work, we showed that lexing based on |
112 derivatives gives rise to very elegant regular expression matchers that can be |
112 derivatives gives rise to very elegant regular expression matchers that can be |
113 certified in a theorem prover with ease. We will study whether the idea of |
113 certified in a theorem prover with ease. We will study whether the idea of |
114 taking a derivative of a regular expression can be extended to rules in |
114 taking a derivative of a regular expression can be extended to rules in |
115 grammars. The problem that needs to be overcome is again how to deal with possible |
115 grammars. The problem that needs to be addressed is again how to deal with |
116 left-recursion in grammar rules. |
116 left-recursive grammar rules. |
117 |
117 |
118 |
118 |
119 |
119 |
120 \bibliography{Journal/document/root} |
120 \bibliography{Journal/document/root} |
121 \bibliographystyle{abbrv} |
121 \bibliographystyle{abbrv} |