--- a/handouts/ho06.tex Fri Nov 01 11:58:27 2013 +0000
+++ b/handouts/ho06.tex Fri Nov 01 13:16:43 2013 +0000
@@ -165,15 +165,136 @@
$E$ & $\rightarrow$ & $E \cdot - \cdot E$ \\
$E$ & $\rightarrow$ & $E \cdot * \cdot E$ \\
$E$ & $\rightarrow$ & $( \cdot E \cdot )$\\
-$N$ & $\rightarrow$ & $\epsilon \;|\; 0 \cdot N \;|\; 1 \cdot N \;|\: \ldots \;|\; 9 \cdot N$
+$N$ & $\rightarrow$ & $N \cdot N \;|\; 0 \;|\; 1 \;|\: \ldots \;|\; 9$
\end{tabular}
\end{center}
\noindent
where $E$ is the starting symbol. A \emph{derivation} for a grammar
starts with the staring symbol of the grammar and in each step replaces one
-non-terminal by a right-hand side of a rule.
+non-terminal by a right-hand side of a rule. A derivation ends with a string
+in which only terminal symbols are left. For example a derivation for the
+string $(1 + 2) + 3$ is as follows:
+
+\begin{center}
+\begin{tabular}{lll}
+$E$ & $\rightarrow$ & $E+E$\\
+ & $\rightarrow$ & $(E)+E$\\
+ & $\rightarrow$ & $(E+E)+E$\\
+ & $\rightarrow$ & $(E+E)+N$\\
+ & $\rightarrow$ & $(E+E)+3$\\
+ & $\rightarrow$ & $(N+E)+3$\\
+ & $\rightarrow^+$ & $(1+2)+3$\\
+\end{tabular}
+\end{center}
+
+\noindent
+The \emph{language} of a context-free grammar $G$ with start symbol $S$
+is defined as the set of strings derivable by a derivation, that is
+
+\begin{center}
+$\{c_1\ldots c_n \;|\; S \rightarrow^* c_1\ldots c_n \;\;\text{with all} \; c_i \;\text{being non-terminals}\}$
+\end{center}
+
+\noindent
+A \emph{parse-tree} encodes how a string is derived with the starting symbol on
+top and each non-terminal containing a subtree for how it is replaced in a derivation.
+The parse tree for the string $(1 + 23)+4$ is as follows:
+
+\begin{center}
+\begin{tikzpicture}[level distance=8mm, black]
+ \node {$E$}
+ child {node {$E$}
+ child {node {$($}}
+ child {node {$E$}
+ child {node {$E$} child {node {$N$} child {node {$1$}}}}
+ child {node {$+$}}
+ child {node {$E$}
+ child {node {$N$} child {node {$2$}}}
+ child {node {$N$} child {node {$3$}}}
+ }
+ }
+ child {node {$)$}}
+ }
+ child {node {$+$}}
+ child {node {$E$}
+ child {node {$N$} child {node {$4$}}}
+ };
+\end{tikzpicture}
+\end{center}
+
+\noindent
+We are often interested in these parse-trees since they encode the structure of
+how a string is derived by a grammar. Before we come to the problem of constructing
+such parse-trees, we need to consider the following two properties of grammars.
+A grammar is \emph{left-recursive} if there is a derivation starting from a non-terminal, say
+$NT$ which leads to a string which again starts with $NT$. This means a derivation of the
+form.
+\begin{center}
+$NT \rightarrow \ldots \rightarrow NT \cdot \ldots$
+\end{center}
+
+\noindent
+It can be easily seems that the grammar above for arithmetic expressions is left-recursive:
+for example the rules $E \rightarrow E\cdot + \cdot E$ and $N \rightarrow N\cdot N$
+show that this grammar is left-recursive. Some algorithms cannot cope with left-recursive
+grammars. Fortunately every left-recursive grammar can be transformed into one that is
+not left-recursive, although this transformation might make the grammar less human-readable.
+For example if we want to give a non-left-recursive grammar for numbers we might
+specify
+
+\begin{center}
+$N \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;1\cdot N\;|\;2\cdot N\;|\;\ldots\;|\;9\cdot N$
+\end{center}
+
+\noindent
+Using this grammar we can still derive every number string, but we will never be able
+to derive a string of the form $\ldots \rightarrow N \cdot \ldots$.
+
+The other property we have to watch out is when a grammar is
+\emph{ambiguous}. A grammar is said to be ambiguous if there are two parse-trees
+for one string. Again the grammar for arithmetic expressions shown above is ambiguous.
+While the shown parse tree for the string $(1 + 23) + 4$ is unique, there are two parse
+trees for the string $1 + 2 + 3$, namely
+
+\begin{center}
+\begin{tabular}{c@{\hspace{10mm}}c}
+\begin{tikzpicture}[level distance=8mm, black]
+ \node {$E$}
+ child {node {$E$} child {node {$N$} child {node {$1$}}}}
+ child {node {$+$}}
+ child {node {$E$}
+ child {node {$E$} child {node {$N$} child {node {$2$}}}}
+ child {node {$+$}}
+ child {node {$E$} child {node {$N$} child {node {$3$}}}}
+ }
+ ;
+\end{tikzpicture}
+&
+\begin{tikzpicture}[level distance=8mm, black]
+ \node {$E$}
+ child {node {$E$}
+ child {node {$E$} child {node {$N$} child {node {$1$}}}}
+ child {node {$+$}}
+ child {node {$E$} child {node {$N$} child {node {$2$}}}}
+ }
+ child {node {$+$}}
+ child {node {$E$} child {node {$N$} child {node {$3$}}}}
+ ;
+\end{tikzpicture}
+\end{tabular}
+\end{center}
+
+\noindent
+In particular in programming languages we will try to avoid ambiguous
+grammars as two different parse-trees for a string means a program can
+be interpreted in two different ways. Then we have to somehow make sure
+the two different ways do not matter, or disambiguate the grammar in
+some way (for example making the $+$ left-associative). Unfortunately already
+the problem of deciding whether a grammar
+is ambiguous or not is in general undecidable. But in concrete instances we can
+often make a choice.
\end{document}