afl-material: comparison handouts/ho06.tex

equal deleted inserted replaced

-:a5cc09c9e69c
+:5801e8c0e528
 $E$ & $\rightarrow$ &  $N$ \\
 $E$ & $\rightarrow$ &  $E \cdot + \cdot E$ \\
 $E$ & $\rightarrow$ &  $E \cdot - \cdot E$ \\
 $E$ & $\rightarrow$ &  $E \cdot * \cdot E$ \\
 $E$ & $\rightarrow$ &  $( \cdot E \cdot )$\\
-$N$ & $\rightarrow$ & $\epsilon \;|\; 0 \cdot N \;|\; 1 \cdot N \;|\: \ldots \;|\; 9 \cdot N$
+$N$ & $\rightarrow$ & $N \cdot N \;|\; 0 \;|\; 1 \;|\: \ldots \;|\; 9$
 \end{tabular}
 \end{center}
 \noindent
 where $E$ is the starting symbol. A \emph{derivation} for a grammar
 starts with the staring symbol of the grammar and in each step replaces one
-non-terminal by a right-hand side of a rule.
+non-terminal by a right-hand side of a rule. A derivation ends with a string
+in which only terminal symbols are left. For example a derivation for the
+string $(1 + 2) + 3$ is as follows:
+\begin{center}
+\begin{tabular}{lll}
+$E$ & $\rightarrow$ & $E+E$\\
+& $\rightarrow$ & $(E)+E$\\
+& $\rightarrow$ & $(E+E)+E$\\
+& $\rightarrow$ & $(E+E)+N$\\
+& $\rightarrow$ & $(E+E)+3$\\
+& $\rightarrow$ & $(N+E)+3$\\
+& $\rightarrow^+$ & $(1+2)+3$\\
+\end{tabular}
+\end{center}
+\noindent
+The \emph{language} of a context-free grammar $G$ with start symbol $S$
+is defined as the set of strings derivable by a derivation, that is
+\begin{center}
+$\{c_1\ldots c_n \;|\; S \rightarrow^* c_1\ldots c_n \;\;\text{with all} \; c_i \;\text{being non-terminals}\}$
+\end{center}
+\noindent
+A \emph{parse-tree} encodes how a string is derived with the starting symbol on
+top and each non-terminal containing a subtree for how it is replaced in a derivation.
+The parse tree for the string $(1 + 23)+4$ is as follows:
+\begin{center}
+\begin{tikzpicture}[level distance=8mm, black]
+\node {$E$}
+child {node {$E$}
+child {node {$($}}
+child {node {$E$}
+child {node {$E$} child {node {$N$} child {node {$1$}}}}
+child {node {$+$}}
+child {node {$E$}
+child {node {$N$} child {node {$2$}}}
+child {node {$N$} child {node {$3$}}}
+}
+}
+child {node {$)$}}
+}
+child {node {$+$}}
+child {node {$E$}
+child {node {$N$} child {node {$4$}}}
+};
+\end{tikzpicture}
+\end{center}
+\noindent
+We are often interested in these parse-trees since they encode the structure of
+how a string is derived by a grammar. Before we come to the problem of constructing
+such parse-trees, we need to consider the following two properties of grammars.
+A grammar is \emph{left-recursive} if there is a derivation starting from a non-terminal, say
+$NT$ which leads to a string which again starts with $NT$. This means a derivation of the
+form.
+\begin{center}
+$NT \rightarrow \ldots \rightarrow NT \cdot \ldots$
+\end{center}
+\noindent
+It can be easily seems that the grammar above for arithmetic expressions is left-recursive:
+for example the rules $E \rightarrow E\cdot + \cdot E$ and $N \rightarrow N\cdot N$
+show that this grammar is left-recursive. Some algorithms cannot cope with left-recursive
+grammars. Fortunately every left-recursive grammar can be transformed into one that is
+not left-recursive, although this transformation might make the grammar less human-readable.
+For example if we want to give a non-left-recursive grammar for numbers we might
+specify
+\begin{center}
+$N \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;1\cdot N\;|\;2\cdot N\;|\;\ldots\;|\;9\cdot N$
+\end{center}
+\noindent
+Using this grammar we can still derive every number string, but we will never be able
+to derive a string of the form $\ldots \rightarrow N \cdot \ldots$.
+The other property we have to watch out is when a grammar is
+\emph{ambiguous}. A grammar is said to be ambiguous if there are two parse-trees
+for one string. Again the grammar for arithmetic expressions shown above is ambiguous.
+While the shown parse tree for the string $(1 + 23) + 4$ is unique, there are two parse
+trees for the string $1 + 2 + 3$, namely
+\begin{center}
+\begin{tabular}{c@{\hspace{10mm}}c}
+\begin{tikzpicture}[level distance=8mm, black]
+\node {$E$}
+child {node {$E$} child {node {$N$} child {node {$1$}}}}
+child {node {$+$}}
+child {node {$E$}
+child {node {$E$} child {node {$N$} child {node {$2$}}}}
+child {node {$+$}}
+child {node {$E$} child {node {$N$} child {node {$3$}}}}
+}
+;
+\end{tikzpicture}
+&
+\begin{tikzpicture}[level distance=8mm, black]
+\node {$E$}
+child {node {$E$}
+child {node {$E$} child {node {$N$} child {node {$1$}}}}
+child {node {$+$}}
+child {node {$E$} child {node {$N$} child {node {$2$}}}}
+}
+child {node {$+$}}
+child {node {$E$} child {node {$N$} child {node {$3$}}}}
+;
+\end{tikzpicture}
+\end{tabular}
+\end{center}
+\noindent
+In particular in programming languages we will try to avoid ambiguous
+grammars as two different parse-trees for a string means a program can
+be interpreted in two different ways. Then we have to somehow make sure
+the two different ways do not matter, or disambiguate the grammar in
+some way (for example making the $+$ left-associative). Unfortunately already
+the problem of deciding whether a grammar
+is ambiguous or not is in general undecidable. But in concrete instances we can
+often make a choice.
 \end{document}
 %%% Local Variables:
 %%% mode: latex

changeset 175	5801e8c0e528
parent 173	7cfb7a6f7c99
child 176	3c2653fc8b5a