# HG changeset patch # User Christian Urban # Date 1383311803 0 # Node ID 5801e8c0e528f0758cb0bb9a1e737a9722045abe # Parent a5cc09c9e69cb1660ee23335de09d4255e6edf4d updated diff -r a5cc09c9e69c -r 5801e8c0e528 handouts/ho06.pdf Binary file handouts/ho06.pdf has changed diff -r a5cc09c9e69c -r 5801e8c0e528 handouts/ho06.tex --- a/handouts/ho06.tex Fri Nov 01 11:58:27 2013 +0000 +++ b/handouts/ho06.tex Fri Nov 01 13:16:43 2013 +0000 @@ -165,15 +165,136 @@ $E$ & $\rightarrow$ & $E \cdot - \cdot E$ \\ $E$ & $\rightarrow$ & $E \cdot * \cdot E$ \\ $E$ & $\rightarrow$ & $( \cdot E \cdot )$\\ -$N$ & $\rightarrow$ & $\epsilon \;|\; 0 \cdot N \;|\; 1 \cdot N \;|\: \ldots \;|\; 9 \cdot N$ +$N$ & $\rightarrow$ & $N \cdot N \;|\; 0 \;|\; 1 \;|\: \ldots \;|\; 9$ \end{tabular} \end{center} \noindent where $E$ is the starting symbol. A \emph{derivation} for a grammar starts with the staring symbol of the grammar and in each step replaces one -non-terminal by a right-hand side of a rule. +non-terminal by a right-hand side of a rule. A derivation ends with a string +in which only terminal symbols are left. For example a derivation for the +string $(1 + 2) + 3$ is as follows: + +\begin{center} +\begin{tabular}{lll} +$E$ & $\rightarrow$ & $E+E$\\ + & $\rightarrow$ & $(E)+E$\\ + & $\rightarrow$ & $(E+E)+E$\\ + & $\rightarrow$ & $(E+E)+N$\\ + & $\rightarrow$ & $(E+E)+3$\\ + & $\rightarrow$ & $(N+E)+3$\\ + & $\rightarrow^+$ & $(1+2)+3$\\ +\end{tabular} +\end{center} + +\noindent +The \emph{language} of a context-free grammar $G$ with start symbol $S$ +is defined as the set of strings derivable by a derivation, that is + +\begin{center} +$\{c_1\ldots c_n \;|\; S \rightarrow^* c_1\ldots c_n \;\;\text{with all} \; c_i \;\text{being non-terminals}\}$ +\end{center} + +\noindent +A \emph{parse-tree} encodes how a string is derived with the starting symbol on +top and each non-terminal containing a subtree for how it is replaced in a derivation. +The parse tree for the string $(1 + 23)+4$ is as follows: + +\begin{center} +\begin{tikzpicture}[level distance=8mm, black] + \node {$E$} + child {node {$E$} + child {node {$($}} + child {node {$E$} + child {node {$E$} child {node {$N$} child {node {$1$}}}} + child {node {$+$}} + child {node {$E$} + child {node {$N$} child {node {$2$}}} + child {node {$N$} child {node {$3$}}} + } + } + child {node {$)$}} + } + child {node {$+$}} + child {node {$E$} + child {node {$N$} child {node {$4$}}} + }; +\end{tikzpicture} +\end{center} + +\noindent +We are often interested in these parse-trees since they encode the structure of +how a string is derived by a grammar. Before we come to the problem of constructing +such parse-trees, we need to consider the following two properties of grammars. +A grammar is \emph{left-recursive} if there is a derivation starting from a non-terminal, say +$NT$ which leads to a string which again starts with $NT$. This means a derivation of the +form. +\begin{center} +$NT \rightarrow \ldots \rightarrow NT \cdot \ldots$ +\end{center} + +\noindent +It can be easily seems that the grammar above for arithmetic expressions is left-recursive: +for example the rules $E \rightarrow E\cdot + \cdot E$ and $N \rightarrow N\cdot N$ +show that this grammar is left-recursive. Some algorithms cannot cope with left-recursive +grammars. Fortunately every left-recursive grammar can be transformed into one that is +not left-recursive, although this transformation might make the grammar less human-readable. +For example if we want to give a non-left-recursive grammar for numbers we might +specify + +\begin{center} +$N \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;1\cdot N\;|\;2\cdot N\;|\;\ldots\;|\;9\cdot N$ +\end{center} + +\noindent +Using this grammar we can still derive every number string, but we will never be able +to derive a string of the form $\ldots \rightarrow N \cdot \ldots$. + +The other property we have to watch out is when a grammar is +\emph{ambiguous}. A grammar is said to be ambiguous if there are two parse-trees +for one string. Again the grammar for arithmetic expressions shown above is ambiguous. +While the shown parse tree for the string $(1 + 23) + 4$ is unique, there are two parse +trees for the string $1 + 2 + 3$, namely + +\begin{center} +\begin{tabular}{c@{\hspace{10mm}}c} +\begin{tikzpicture}[level distance=8mm, black] + \node {$E$} + child {node {$E$} child {node {$N$} child {node {$1$}}}} + child {node {$+$}} + child {node {$E$} + child {node {$E$} child {node {$N$} child {node {$2$}}}} + child {node {$+$}} + child {node {$E$} child {node {$N$} child {node {$3$}}}} + } + ; +\end{tikzpicture} +& +\begin{tikzpicture}[level distance=8mm, black] + \node {$E$} + child {node {$E$} + child {node {$E$} child {node {$N$} child {node {$1$}}}} + child {node {$+$}} + child {node {$E$} child {node {$N$} child {node {$2$}}}} + } + child {node {$+$}} + child {node {$E$} child {node {$N$} child {node {$3$}}}} + ; +\end{tikzpicture} +\end{tabular} +\end{center} + +\noindent +In particular in programming languages we will try to avoid ambiguous +grammars as two different parse-trees for a string means a program can +be interpreted in two different ways. Then we have to somehow make sure +the two different ways do not matter, or disambiguate the grammar in +some way (for example making the $+$ left-associative). Unfortunately already +the problem of deciding whether a grammar +is ambiguous or not is in general undecidable. But in concrete instances we can +often make a choice. \end{document}