diff -r 896a5f91838d -r 780486571e38 handouts/ho05.tex --- a/handouts/ho05.tex Sat Oct 22 13:11:33 2016 +0100 +++ b/handouts/ho05.tex Sat Oct 22 15:18:11 2016 +0100 @@ -1,7 +1,7 @@ \documentclass{article} \usepackage{../style} \usepackage{../langs} - +\usepackage{../grammar} \begin{document} @@ -48,25 +48,27 @@ Context-free languages are usually specified by grammars. For example a grammar for well-parenthesised expressions is -\begin{center} -$P \;\;\rightarrow\;\; ( \cdot P \cdot ) \cdot P \;|\; \epsilon$ -\end{center} - +\begin{plstx}[margin=3cm] +: \meta{P} ::= ( \cdot \meta{P} \cdot ) \cdot \meta{P} + | \epsilon\\ +\end{plstx} + \noindent or a grammar for recognising strings consisting of ones is -\begin{center} -$O \;\;\rightarrow\;\; 1 \cdot O \;|\; 1$ -\end{center} +\begin{plstx}[margin=3cm] +: \meta{O} ::= 1 \cdot \meta{O} + | 1\\ +\end{plstx} In general grammars consist of finitely many rules built up from \emph{terminal symbols} (usually lower-case letters) and -\emph{non-terminal symbols} (upper-case letters). Rules have +\emph{non-terminal symbols} (upper-case letters inside \meta{\mbox{}}). Rules have the shape -\begin{center} -$NT \;\;\rightarrow\;\; \textit{rhs}$ -\end{center} +\begin{plstx}[margin=3cm] +: \meta{NT} ::= rhs\\ +\end{plstx} \noindent where on the left-hand side is a single non-terminal and on the right a string consisting of both terminals and @@ -77,29 +79,29 @@ convention to use $|$ as a shorthand notation for several rules. For example -\begin{center} -$NT \;\;\rightarrow\;\; \textit{rhs}_1 \;|\; \textit{rhs}_2$ -\end{center} +\begin{plstx}[margin=3cm] +: \meta{NT} ::= rhs_1 + | rhs_2\\ +\end{plstx} -\noindent means that the non-terminal $NT$ can be replaced by +\noindent means that the non-terminal \meta{NT} can be replaced by either $\textit{rhs}_1$ or $\textit{rhs}_2$. If there are more than one non-terminal on the left-hand side of the rules, then we need to indicate what is the \emph{starting} symbol of the grammar. For example the grammar for arithmetic expressions can be given as follows -\begin{center} -\begin{tabular}{lcl@{\hspace{2cm}}l} -$E$ & $\rightarrow$ & $N$ & (1)\\ -$E$ & $\rightarrow$ & $E \cdot + \cdot E$ & (2)\\ -$E$ & $\rightarrow$ & $E \cdot - \cdot E$ & (3)\\ -$E$ & $\rightarrow$ & $E \cdot * \cdot E$ & (4)\\ -$E$ & $\rightarrow$ & $( \cdot E \cdot )$ & (5)\\ -$N$ & $\rightarrow$ & $N \cdot N \;|\; 0 \;|\; 1 \;|\: \ldots \;|\; 9$ & (6\ldots) -\end{tabular} -\end{center} +\begin{plstx}[margin=3cm,one per line] +\mbox{\rm (1)}: \meta{E} ::= \meta{N}\\ +\mbox{\rm (2)}: \meta{E} ::= \meta{E} \cdot + \cdot \meta{E}\\ +\mbox{\rm (3)}: \meta{E} ::= \meta{E} \cdot - \cdot \meta{E}\\ +\mbox{\rm (4)}: \meta{E} ::= \meta{E} \cdot * \cdot \meta{E}\\ +\mbox{\rm (5)}: \meta{E} ::= ( \cdot \meta{E} \cdot )\\ +\mbox{\rm (6\ldots)}: \meta{N} ::= \meta{N} \cdot \meta{N} + \mid 0 \mid 1 \mid \ldots \mid 9\\ +\end{plstx} -\noindent where $E$ is the starting symbol. A +\noindent where \meta{E} is the starting symbol. A \emph{derivation} for a grammar starts with the starting symbol of the grammar and in each step replaces one non-terminal by a right-hand side of a rule. A derivation ends @@ -109,12 +111,12 @@ \begin{center} \begin{tabular}{lll@{\hspace{2cm}}l} -$E$ & $\rightarrow$ & $E+E$ & by (2)\\ - & $\rightarrow$ & $(E)+E$ & by (5)\\ - & $\rightarrow$ & $(E+E)+E$ & by (2)\\ - & $\rightarrow$ & $(E+E)+N$ & by (1)\\ - & $\rightarrow$ & $(E+E)+3$ & by (6\dots)\\ - & $\rightarrow$ & $(N+E)+3$ & by (1)\\ +\meta{E} & $\rightarrow$ & $\meta{E}+\meta{E}$ & by (2)\\ + & $\rightarrow$ & $(\meta{E})+\meta{E}$ & by (5)\\ + & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{E}$ & by (2)\\ + & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{N}$ & by (1)\\ + & $\rightarrow$ & $(\meta{E}+\meta{E})+3$ & by (6\dots)\\ + & $\rightarrow$ & $(\meta{N}+\meta{E})+3$ & by (1)\\ & $\rightarrow^+$ & $(1+2)+3$ & by (1, 6\ldots)\\ \end{tabular} \end{center} @@ -239,6 +241,9 @@ undecidable. But in simple instance (the ones we deal in this module) one can usually see when a grammar is ambiguous. + +\subsection*{Parser Combinators} + Let us now turn to the problem of generating a parse-tree for a grammar and string. In what follows we explain \emph{parser combinators}, because they are easy to implement and closely