|     32 \thispagestyle{empty} |     32 \thispagestyle{empty} | 
|     33 \mbox{}\\[-5mm] |     33 \mbox{}\\[-5mm] | 
|     34  |     34  | 
|     35 \begin{multicols}{2} |     35 \begin{multicols}{2} | 
|     36 \section*{Background} |     36 \section*{Background} | 
|         |     37  | 
|     37 \noindent |     38 \noindent | 
|     38 Parsing is the act of transforming plain text into some |     39 Parsers transform plain text into some abstract structure that can be analyzed by | 
|     39 structure that can be analyzed by computers for further processing. |     40 computers for further processing.  One might think that parsers have been | 
|     40 One might think that parsing has been studied to death, and after |     41 studied to death, and after \emph{yacc} and \emph{lex} no new results can be | 
|     41 \emph{yacc} and \emph{lex} no new results can be obtained in this area. |     42 obtained in this area.  However recent developments and novel approaches make | 
|     42 However recent developments and novel approaches make it increasingly clear, |     43 it increasingly clear, that this is not true anymore~\cite{Might11}. And | 
|     43 that this is not true anymore. |     44 there is a practical need for new results: for example the future HTML 5  | 
|         |     45 Standard abandons a well-defined grammar specification, in favour of a bespoke | 
|         |     46 parser given as pseudo code. | 
|     44  |     47  | 
|     45 We propose to on parsers from a certification point of view. Increasingly, |     48 This work targets parsers from a certification point of view. Increasingly, | 
|     46 parsers are part of certified compilers, like \mbox{\emph{CompCert}}, which |     49 parsers are part of certified compilers, like | 
|     47 are guaranteed to be correct and bug-free. Such certified compilers are |     50 \mbox{\emph{CompCert}}~\cite{Leroy09}, which are guaranteed to be | 
|     48 crucial in areas where software just cannot fail. However, so far the parsers |     51 bug-free. Such certified compilers are crucial in areas where software just | 
|     49 of these compilers have been left out of the certification.  This is because |     52 cannot fail. However, so far the parsers of these compilers have been left out | 
|     50 parsing algorithms are often ad hoc and their semantics is not clearly |     53 of the certification.  This is because parsing algorithms are often ad hoc and | 
|     51 specified. Unfortunately, this means parsers can harbour errors that |     54 their semantics is not clearly specified. This means, unfortunately, parsers | 
|     52 potentially invalidate the whole certification and correctness of the |     55 can harbour errors that potentially invalidate the whole certification and | 
|     53 compiler. In this project, we like to change that with the help of theorem |     56 correctness of the compiler. In this project, we aim to change this situation | 
|     54 provers. |     57 with the help of the theorem prover Isabelle/HOL. | 
|     55  |     58  | 
|     56 Only in the last few years, theorem provers have become good enough for |     59 Only in the last few years, theorem provers have become powerful enough for | 
|     57 establishing the correctness of some standard lexing and parsing |     60 establishing the correctness of some standard lexing and parsing | 
|     58 algorithms. For this, the algorithms still need to be formulated in way so |     61 algorithms. For this, the algorithms still need to be formulated in way so | 
|     59 that it is easy to reason about them. In our earlier work about lexing and |     62 that it is easy to reason about them. In our earlier work about lexing and | 
|     60 regular languages, we showed that this precludes well-known algorithms based |     63 regular languages, we showed that this precludes well-known algorithms based | 
|     61 automata. However we showed also that regular languages can be formulated and |     64 automata~\cite{WuZhangUrban11}. However we showed also that regular languages can be formulated and | 
|     62 reasoned about entirely in terms regular expressions, which can be easily |     65 reasoned about entirely in terms regular expressions, which can be easily | 
|     63 represented in theorem provers. This work uses the device of derivatives of |     66 represented in theorem provers. This work uses the device of derivatives of | 
|     64 regular expressions. We like to extend this device to parsers and grammars. |     67 regular expressions. We like to extend this device to parsers and grammars. | 
|     65 The aim is to come up with elegant and practical useful parsing algorithms |     68 The aim is to come up with elegant and practical useful parsing algorithms | 
|     66 whose correctness can be certified in a theorem prover. |     69 whose correctness can be certified in Isabelle/HOL. | 
|     67  |     70  | 
|     68  |     71  | 
|     69 \section*{Proposed Work} |     72 \section*{Proposed Work} | 
|     70  |     73  | 
|     71 A recent development in parsing is Parsing Expression Grammars (PEG), which |     74 A recent development in the field of parsing are Parsing Expression Grammars | 
|     72 are an extension of the weel-known Context Free Grammars |     75 (PEGs)~\cite{Ford04}, which | 
|     73 (CFG)~\cite{Ford04a}. The extension introduces new regular operators, such as |     76 are an extension of the well-known Context Free Grammars | 
|     74 negation and conjunction, on the right-hand sides of grammar rules, as well as |     77 (CFGs). This extension introduces new regular operators, such as | 
|     75 priority orderings. With these extensions, PEG parsing becomes much |     78 negation and conjunction, on the right-hand side of grammar rules, as well as | 
|     76 more powerful. For example disambiguation, formerly expressed by semantic |     79 priority orderings for rules. With these extensions, PEG parsing becomes much | 
|         |     80 more powerful and more useful in practise. For example disambiguation, formerly expressed by semantic | 
|     77 filters, can now be expressed directly using grammar rules.  |     81 filters, can now be expressed directly using grammar rules.  | 
|     78  |     82  | 
|     79 However, there is serious disadvantage of PEG for applications: is does not |     83 However, there is a serious limitation of PEGs, which affects potential | 
|     80 support grammrs involving left recursion~\cite{Ford02b}. Although a new PEG |     84 applications: grammars involving left recursion are not | 
|     81 parsing algorithm has been proposed that can deal with left |     85 allowed~\cite{Ford02}. Although one PEG parsing algorithm has already been | 
|     82 recursion~\cite{conf/pepm/WarthDM08}, there is no correctness proof, not even |     86 proposed that can deal with left recursion~\cite{WarthDM08}, there | 
|     83 in ``paper-and-pencil'' form. One aim of this research is to solve this sorry |     87 is no correctness proof for it, not even one with ``pencil and paper''. One aim of this | 
|     84 state-of-affairs by either certifying this algorithm or inventing a new |     88 research is to solve this sorry state-of-affairs by either certifying this | 
|     85 one. For this we will first formalize a fixed point semantics of PEG, based on |     89 algorithm or inventing a new one. For this we will first devise a | 
|     86 which an efficient, certified parsing algorithm can be given given. For this |     90 fixed-point semantics of PEGs, against which we can certify a parser. For this | 
|     87 we take as starting point the paper~\cite{Ford04a}, which does not treat |     91 semantics we take as starting point the paper~\cite{Ford04}, which does not | 
|     88 left-recursion, but gives an operational semantics for PEG parsing. For the |     92 treat left-recursion, but gives an operational semantics for PEG | 
|     89 semantics, it seems plausible that we can adapt work on Boolean |     93 parsing. There are also good indications that we can adapt work on Boolean | 
|     90 Grammars~\cite{Okhotin/04a}, which are similar to PEGs, and for which the |     94 Grammars~\cite{Okhotin04}, which are similar to PEGs, and for which the | 
|     91 paper ~\cite{journals/iandc/KountouriotisNR09} gives a semantics to negation |     95 paper~\cite{KountouriotisNR09} gives a fixed-point semantics  | 
|     92 operators, but not to Kleene's star operation. |     96 to negation operators, but not to the Kleene star. | 
|     93  |     97  | 
|     94 For the parsing algorithm, we might also be able to draw inspiration from |     98 For the parsing algorithm, we might be able to draw inspiration from parsers | 
|     95 parsers based on Cocke-Younger-Kasami (CYK) |     99 based on the classic Cocke-Younger-Kasami (CYK) | 
|     96 algorithms~\cite{journals/iandc/KountouriotisNR09} and |    100 algorithms~\cite{KountouriotisNR09} and | 
|     97 Early~\cite{Earley70,AycHor02} parsers. The defect CYK algorithms is that the |    101 Early~\cite{AycHor02, Earley70} parsers. The defect of CYK algorithms, however, | 
|     98 original grammar specification needs to be transformed into a normal |    102 is that the grammar specifications given by the user need to be transformed | 
|     99 form. This transformation may lead to grammar explosion and inefficient |    103 into a normal form. This transformation may potentially lead to rule explosion | 
|    100 parsing. We will investigate whether this transformation can be avoided. |    104 and hence inefficient parsing. We will investigate whether this transformation | 
|    101 Early style parsers, which have recently been certified by Ridge [???],  |    105 can be avoided.  The problem with Early-style parsers is that they need to be  | 
|    102 need to be extended to PEG parsing in order to be helpful for us. |    106 extended to PEG parsing in order to be helpful for us. | 
|         |    107  | 
|    103  |    108  | 
|    104 Finally, we want to investigate whether derivatives of regular expressions |    109 Finally, we want to investigate whether derivatives of regular expressions | 
|    105 ~\cite{Brzozowski64,Almeidaetal10,OwensReppyTuron09,journals/corr/abs-1010-5023} |    110 ~\cite{Brzozowski64,Might11,OwensReppyTuron09} | 
|    106 can be extended to parsing. Lexing based on derivatives gives rise to very |    111 can be generalised to PEG parsing. In earlier work, we showed that lexing based on | 
|    107 elegant regular expression matchers that can be certified in a theorem prover  |    112 derivatives gives rise to very elegant regular expression matchers that can be | 
|    108 with ease.  We will study whether the idea of taking a derivative of a regular |    113 certified in a theorem prover with ease.  We will study whether the idea of | 
|    109 expression can be extended to rules in grammars. The problem that needs to be |    114 taking a derivative of a regular expression can be extended to rules in | 
|    110 overcome again arises from possible left recursion in parsing.  |    115 grammars. The problem that needs to be overcome is again how to deal with possible | 
|    111  |    116 left-recursion in grammar rules. | 
|    112      |         | 
|    113 \newpage |         | 
|    114  |    117  | 
|    115  |    118  | 
|    116 \small |    119  | 
|    117 \bibliography{Journal/document/root} |    120 \bibliography{Journal/document/root} | 
|    118 \bibliographystyle{abbrv} |    121 \bibliographystyle{abbrv} | 
|    119 \end{multicols} |    122 \end{multicols} | 
|    120  |    123  | 
|    121 %  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |    124 %  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |