afl-material: comparison handouts/ho05.tex

equal deleted inserted replaced

-:896a5f91838d
+:780486571e38
 \documentclass{article}
 \usepackage{../style}
 \usepackage{../langs}
+\usepackage{../grammar}
 \begin{document}
 \section*{Handout 5 (Grammars \& Parser)}
 \noindent Context-free languages play an important role in
 `day-to-day' text processing and in programming languages.
 Context-free languages are usually specified by grammars. For
 example a grammar for well-parenthesised expressions is
-\begin{center}
+\begin{plstx}[margin=3cm]
-$P \;\;\rightarrow\;\; ( \cdot  P \cdot ) \cdot P \;|\; \epsilon$
+: \meta{P} ::=  ( \cdot  \meta{P} \cdot ) \cdot \meta{P}
-\end{center}
+| \epsilon\\
+\end{plstx}
 \noindent
 or a grammar for recognising strings consisting of ones is
-\begin{center}
+\begin{plstx}[margin=3cm]
-$O \;\;\rightarrow\;\; 1 \cdot  O \;|\; 1$
+: \meta{O} ::= 1 \cdot  \meta{O}
-\end{center}
+| 1\\
+\end{plstx}
 In general grammars consist of finitely many rules built up
 from \emph{terminal symbols} (usually lower-case letters) and
-\emph{non-terminal symbols} (upper-case letters). Rules have
+\emph{non-terminal symbols} (upper-case letters inside \meta{\mbox{}}). Rules have
 the shape
-\begin{center}
+\begin{plstx}[margin=3cm]
-$NT \;\;\rightarrow\;\; \textit{rhs}$
+: \meta{NT} ::= rhs\\
-\end{center}
+\end{plstx}
 \noindent where on the left-hand side is a single non-terminal
 and on the right a string consisting of both terminals and
 non-terminals including the $\epsilon$-symbol for indicating
 the empty string. We use the convention to separate components
 on the right hand-side by using the $\cdot$ symbol, as in the
 grammar for well-parenthesised expressions. We also use the
 convention to use $|$ as a shorthand notation for several
 rules. For example
-\begin{center}
+\begin{plstx}[margin=3cm]
-$NT \;\;\rightarrow\;\; \textit{rhs}_1 \;|\; \textit{rhs}_2$
+: \meta{NT} ::= rhs_1
-\end{center}
+| rhs_2\\
+\end{plstx}
-\noindent means that the non-terminal $NT$ can be replaced by
+\noindent means that the non-terminal \meta{NT} can be replaced by
 either $\textit{rhs}_1$ or $\textit{rhs}_2$. If there are more
 than one non-terminal on the left-hand side of the rules, then
 we need to indicate what is the \emph{starting} symbol of the
 grammar. For example the grammar for arithmetic expressions
 can be given as follows
-\begin{center}
+\begin{plstx}[margin=3cm,one per line]
-\begin{tabular}{lcl@{\hspace{2cm}}l}
+\mbox{\rm (1)}: \meta{E} ::= \meta{N}\\
-$E$ & $\rightarrow$ &  $N$                 & (1)\\
+\mbox{\rm (2)}: \meta{E} ::= \meta{E} \cdot + \cdot \meta{E}\\
-$E$ & $\rightarrow$ &  $E \cdot + \cdot E$ & (2)\\
+\mbox{\rm (3)}: \meta{E} ::= \meta{E} \cdot - \cdot \meta{E}\\
-$E$ & $\rightarrow$ &  $E \cdot - \cdot E$ & (3)\\
+\mbox{\rm (4)}: \meta{E} ::= \meta{E} \cdot * \cdot \meta{E}\\
-$E$ & $\rightarrow$ &  $E \cdot * \cdot E$ & (4)\\
+\mbox{\rm (5)}: \meta{E} ::= ( \cdot \meta{E} \cdot )\\
-$E$ & $\rightarrow$ &  $( \cdot E \cdot )$ & (5)\\
+\mbox{\rm (6\ldots)}: \meta{N} ::= \meta{N} \cdot \meta{N}
-$N$ & $\rightarrow$ & $N \cdot N \;|\; 0 \;|\; 1 \;|\: \ldots \;|\; 9$ & (6\ldots)
+\mid 0 \mid 1 \mid \ldots \mid 9\\
-\end{tabular}
+\end{plstx}
-\end{center}
+\noindent where \meta{E} is the starting symbol. A
-\noindent where $E$ is the starting symbol. A
 \emph{derivation} for a grammar starts with the starting
 symbol of the grammar and in each step replaces one
 non-terminal by a right-hand side of a rule. A derivation ends
 with a string in which only terminal symbols are left. For
 example a derivation for the string $(1 + 2) + 3$ is as
 follows:
 \begin{center}
 \begin{tabular}{lll@{\hspace{2cm}}l}
-$E$ & $\rightarrow$ & $E+E$          & by (2)\\
+\meta{E} & $\rightarrow$ & $\meta{E}+\meta{E}$          & by (2)\\
-& $\rightarrow$ & $(E)+E$     & by (5)\\
+& $\rightarrow$ & $(\meta{E})+\meta{E}$     & by (5)\\
-& $\rightarrow$ & $(E+E)+E$   & by (2)\\
+& $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{E}$   & by (2)\\
-& $\rightarrow$ & $(E+E)+N$   & by (1)\\
+& $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{N}$   & by (1)\\
-& $\rightarrow$ & $(E+E)+3$   & by (6\dots)\\
+& $\rightarrow$ & $(\meta{E}+\meta{E})+3$   & by (6\dots)\\
-& $\rightarrow$ & $(N+E)+3$   & by (1)\\
+& $\rightarrow$ & $(\meta{N}+\meta{E})+3$   & by (1)\\
 & $\rightarrow^+$ & $(1+2)+3$ & by (1, 6\ldots)\\
 \end{tabular}
 \end{center}
 \noindent where on the right it is indicated which
 left-associative). Unfortunately already the problem of
 deciding whether a grammar is ambiguous or not is in general
 undecidable. But in simple instance (the ones we deal in this
 module) one can usually see when a grammar is ambiguous.
+\subsection*{Parser Combinators}
 Let us now turn to the problem of generating a parse-tree for
 a grammar and string. In what follows we explain \emph{parser
 combinators}, because they are easy to implement and closely
 resemble grammar rules. Imagine that a grammar describes the
 strings of natural numbers, such as the grammar $N$ shown

changeset 459	780486571e38
parent 385	7f8516ff408d
child 545	76a98ed71a2a