handouts/ho06.tex
changeset 175 5801e8c0e528
parent 173 7cfb7a6f7c99
child 176 3c2653fc8b5a
--- a/handouts/ho06.tex	Fri Nov 01 11:58:27 2013 +0000
+++ b/handouts/ho06.tex	Fri Nov 01 13:16:43 2013 +0000
@@ -165,15 +165,136 @@
 $E$ & $\rightarrow$ &  $E \cdot - \cdot E$ \\
 $E$ & $\rightarrow$ &  $E \cdot * \cdot E$ \\
 $E$ & $\rightarrow$ &  $( \cdot E \cdot )$\\
-$N$ & $\rightarrow$ & $\epsilon \;|\; 0 \cdot N \;|\; 1 \cdot N \;|\: \ldots \;|\; 9 \cdot N$ 
+$N$ & $\rightarrow$ & $N \cdot N \;|\; 0 \;|\; 1 \;|\: \ldots \;|\; 9$ 
 \end{tabular}
 \end{center}
 
 \noindent
 where $E$ is the starting symbol. A \emph{derivation} for a grammar
 starts with the staring symbol of the grammar and in each step replaces one
-non-terminal by a right-hand side of a rule.
+non-terminal by a right-hand side of a rule. A derivation ends with a string
+in which only terminal symbols are left. For example a derivation for the
+string $(1 + 2) + 3$ is as follows:
+
+\begin{center}
+\begin{tabular}{lll}
+$E$ & $\rightarrow$ & $E+E$\\
+       & $\rightarrow$ & $(E)+E$\\
+       & $\rightarrow$ & $(E+E)+E$\\
+       & $\rightarrow$ & $(E+E)+N$\\
+       & $\rightarrow$ & $(E+E)+3$\\
+       & $\rightarrow$ & $(N+E)+3$\\	
+       & $\rightarrow^+$ & $(1+2)+3$\\
+\end{tabular} 
+\end{center}
+
+\noindent
+The \emph{language} of a context-free grammar $G$ with start symbol $S$ 
+is defined as the set of strings derivable by a derivation, that is
+
+\begin{center}
+$\{c_1\ldots c_n \;|\; S \rightarrow^* c_1\ldots c_n \;\;\text{with all} \; c_i \;\text{being non-terminals}\}$
+\end{center}
+
+\noindent
+A \emph{parse-tree} encodes how a string is derived with the starting symbol on 
+top and each non-terminal containing a subtree for how it is replaced in a derivation.
+The parse tree for the string $(1 + 23)+4$ is as follows:
+
+\begin{center}
+\begin{tikzpicture}[level distance=8mm, black]
+  \node {$E$}
+    child {node {$E$} 
+       child {node {$($}}
+       child {node {$E$}       
+         child {node {$E$} child {node {$N$} child {node {$1$}}}}
+         child {node {$+$}}
+         child {node {$E$} 
+            child {node {$N$} child {node {$2$}}}
+            child {node {$N$} child {node {$3$}}}
+            } 
+        }
+       child {node {$)$}}
+     }
+     child {node {$+$}}
+     child {node {$E$}
+        child {node {$N$} child {node {$4$}}}
+     };
+\end{tikzpicture}
+\end{center}
+
+\noindent
+We are often interested in these parse-trees since they encode the structure of
+how a string is derived by a grammar. Before we come to the problem of constructing
+such parse-trees, we need to consider the following two properties of grammars.
+A grammar is \emph{left-recursive} if there is a derivation starting from a non-terminal, say
+$NT$ which leads to a string which again starts with $NT$. This means a derivation of the
+form.
 
+\begin{center}
+$NT \rightarrow \ldots \rightarrow NT \cdot \ldots$
+\end{center}
+
+\noindent
+It can be easily seems that the grammar above for arithmetic expressions is left-recursive:
+for example the rules $E \rightarrow E\cdot + \cdot E$ and $N \rightarrow N\cdot N$ 
+show that this grammar is left-recursive. Some algorithms cannot cope with left-recursive 
+grammars. Fortunately every left-recursive grammar can be transformed into one that is
+not left-recursive, although this transformation might make the grammar less human-readable.
+For example if we want to give a non-left-recursive grammar for numbers we might
+specify
+
+\begin{center}
+$N \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;1\cdot N\;|\;2\cdot N\;|\;\ldots\;|\;9\cdot N$
+\end{center}
+
+\noindent
+Using this grammar we can still derive every number string, but we will never be able 
+to derive a string of the form $\ldots \rightarrow N \cdot \ldots$.
+
+The other property we have to watch out is when a grammar is
+\emph{ambiguous}. A grammar is said to be ambiguous if there are two parse-trees
+for one string. Again the grammar for arithmetic expressions shown above is ambiguous.
+While the shown parse tree for the string $(1 + 23) + 4$ is unique, there are two parse
+trees for the string $1 + 2 + 3$, namely
+
+\begin{center}
+\begin{tabular}{c@{\hspace{10mm}}c}
+\begin{tikzpicture}[level distance=8mm, black]
+  \node {$E$}
+    child {node {$E$} child {node {$N$} child {node {$1$}}}}
+    child {node {$+$}}
+    child {node {$E$}
+       child {node {$E$} child {node {$N$} child {node {$2$}}}}
+       child {node {$+$}}
+       child {node {$E$} child {node {$N$} child {node {$3$}}}}
+    }
+    ;
+\end{tikzpicture} 
+&
+\begin{tikzpicture}[level distance=8mm, black]
+  \node {$E$}
+    child {node {$E$}
+       child {node {$E$} child {node {$N$} child {node {$1$}}}}
+       child {node {$+$}}
+       child {node {$E$} child {node {$N$} child {node {$2$}}}} 
+    }
+    child {node {$+$}}
+    child {node {$E$} child {node {$N$} child {node {$3$}}}}
+    ;
+\end{tikzpicture}
+\end{tabular} 
+\end{center}
+
+\noindent
+In particular in programming languages we will try to avoid ambiguous
+grammars as two different parse-trees for a string means a program can
+be interpreted in two different ways. Then we have to somehow make sure
+the two different ways do not matter, or disambiguate the grammar in
+some way (for example making the $+$ left-associative). Unfortunately already 
+the problem of deciding whether a grammar
+is ambiguous or not is in general undecidable. But in concrete instances we can
+often make a choice.
 
 \end{document}