handouts/ho06.tex
changeset 175 5801e8c0e528
parent 173 7cfb7a6f7c99
child 176 3c2653fc8b5a
equal deleted inserted replaced
174:a5cc09c9e69c 175:5801e8c0e528
   163 $E$ & $\rightarrow$ &  $N$ \\
   163 $E$ & $\rightarrow$ &  $N$ \\
   164 $E$ & $\rightarrow$ &  $E \cdot + \cdot E$ \\
   164 $E$ & $\rightarrow$ &  $E \cdot + \cdot E$ \\
   165 $E$ & $\rightarrow$ &  $E \cdot - \cdot E$ \\
   165 $E$ & $\rightarrow$ &  $E \cdot - \cdot E$ \\
   166 $E$ & $\rightarrow$ &  $E \cdot * \cdot E$ \\
   166 $E$ & $\rightarrow$ &  $E \cdot * \cdot E$ \\
   167 $E$ & $\rightarrow$ &  $( \cdot E \cdot )$\\
   167 $E$ & $\rightarrow$ &  $( \cdot E \cdot )$\\
   168 $N$ & $\rightarrow$ & $\epsilon \;|\; 0 \cdot N \;|\; 1 \cdot N \;|\: \ldots \;|\; 9 \cdot N$ 
   168 $N$ & $\rightarrow$ & $N \cdot N \;|\; 0 \;|\; 1 \;|\: \ldots \;|\; 9$ 
   169 \end{tabular}
   169 \end{tabular}
   170 \end{center}
   170 \end{center}
   171 
   171 
   172 \noindent
   172 \noindent
   173 where $E$ is the starting symbol. A \emph{derivation} for a grammar
   173 where $E$ is the starting symbol. A \emph{derivation} for a grammar
   174 starts with the staring symbol of the grammar and in each step replaces one
   174 starts with the staring symbol of the grammar and in each step replaces one
   175 non-terminal by a right-hand side of a rule.
   175 non-terminal by a right-hand side of a rule. A derivation ends with a string
   176 
   176 in which only terminal symbols are left. For example a derivation for the
       
   177 string $(1 + 2) + 3$ is as follows:
       
   178 
       
   179 \begin{center}
       
   180 \begin{tabular}{lll}
       
   181 $E$ & $\rightarrow$ & $E+E$\\
       
   182        & $\rightarrow$ & $(E)+E$\\
       
   183        & $\rightarrow$ & $(E+E)+E$\\
       
   184        & $\rightarrow$ & $(E+E)+N$\\
       
   185        & $\rightarrow$ & $(E+E)+3$\\
       
   186        & $\rightarrow$ & $(N+E)+3$\\	
       
   187        & $\rightarrow^+$ & $(1+2)+3$\\
       
   188 \end{tabular} 
       
   189 \end{center}
       
   190 
       
   191 \noindent
       
   192 The \emph{language} of a context-free grammar $G$ with start symbol $S$ 
       
   193 is defined as the set of strings derivable by a derivation, that is
       
   194 
       
   195 \begin{center}
       
   196 $\{c_1\ldots c_n \;|\; S \rightarrow^* c_1\ldots c_n \;\;\text{with all} \; c_i \;\text{being non-terminals}\}$
       
   197 \end{center}
       
   198 
       
   199 \noindent
       
   200 A \emph{parse-tree} encodes how a string is derived with the starting symbol on 
       
   201 top and each non-terminal containing a subtree for how it is replaced in a derivation.
       
   202 The parse tree for the string $(1 + 23)+4$ is as follows:
       
   203 
       
   204 \begin{center}
       
   205 \begin{tikzpicture}[level distance=8mm, black]
       
   206   \node {$E$}
       
   207     child {node {$E$} 
       
   208        child {node {$($}}
       
   209        child {node {$E$}       
       
   210          child {node {$E$} child {node {$N$} child {node {$1$}}}}
       
   211          child {node {$+$}}
       
   212          child {node {$E$} 
       
   213             child {node {$N$} child {node {$2$}}}
       
   214             child {node {$N$} child {node {$3$}}}
       
   215             } 
       
   216         }
       
   217        child {node {$)$}}
       
   218      }
       
   219      child {node {$+$}}
       
   220      child {node {$E$}
       
   221         child {node {$N$} child {node {$4$}}}
       
   222      };
       
   223 \end{tikzpicture}
       
   224 \end{center}
       
   225 
       
   226 \noindent
       
   227 We are often interested in these parse-trees since they encode the structure of
       
   228 how a string is derived by a grammar. Before we come to the problem of constructing
       
   229 such parse-trees, we need to consider the following two properties of grammars.
       
   230 A grammar is \emph{left-recursive} if there is a derivation starting from a non-terminal, say
       
   231 $NT$ which leads to a string which again starts with $NT$. This means a derivation of the
       
   232 form.
       
   233 
       
   234 \begin{center}
       
   235 $NT \rightarrow \ldots \rightarrow NT \cdot \ldots$
       
   236 \end{center}
       
   237 
       
   238 \noindent
       
   239 It can be easily seems that the grammar above for arithmetic expressions is left-recursive:
       
   240 for example the rules $E \rightarrow E\cdot + \cdot E$ and $N \rightarrow N\cdot N$ 
       
   241 show that this grammar is left-recursive. Some algorithms cannot cope with left-recursive 
       
   242 grammars. Fortunately every left-recursive grammar can be transformed into one that is
       
   243 not left-recursive, although this transformation might make the grammar less human-readable.
       
   244 For example if we want to give a non-left-recursive grammar for numbers we might
       
   245 specify
       
   246 
       
   247 \begin{center}
       
   248 $N \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;1\cdot N\;|\;2\cdot N\;|\;\ldots\;|\;9\cdot N$
       
   249 \end{center}
       
   250 
       
   251 \noindent
       
   252 Using this grammar we can still derive every number string, but we will never be able 
       
   253 to derive a string of the form $\ldots \rightarrow N \cdot \ldots$.
       
   254 
       
   255 The other property we have to watch out is when a grammar is
       
   256 \emph{ambiguous}. A grammar is said to be ambiguous if there are two parse-trees
       
   257 for one string. Again the grammar for arithmetic expressions shown above is ambiguous.
       
   258 While the shown parse tree for the string $(1 + 23) + 4$ is unique, there are two parse
       
   259 trees for the string $1 + 2 + 3$, namely
       
   260 
       
   261 \begin{center}
       
   262 \begin{tabular}{c@{\hspace{10mm}}c}
       
   263 \begin{tikzpicture}[level distance=8mm, black]
       
   264   \node {$E$}
       
   265     child {node {$E$} child {node {$N$} child {node {$1$}}}}
       
   266     child {node {$+$}}
       
   267     child {node {$E$}
       
   268        child {node {$E$} child {node {$N$} child {node {$2$}}}}
       
   269        child {node {$+$}}
       
   270        child {node {$E$} child {node {$N$} child {node {$3$}}}}
       
   271     }
       
   272     ;
       
   273 \end{tikzpicture} 
       
   274 &
       
   275 \begin{tikzpicture}[level distance=8mm, black]
       
   276   \node {$E$}
       
   277     child {node {$E$}
       
   278        child {node {$E$} child {node {$N$} child {node {$1$}}}}
       
   279        child {node {$+$}}
       
   280        child {node {$E$} child {node {$N$} child {node {$2$}}}} 
       
   281     }
       
   282     child {node {$+$}}
       
   283     child {node {$E$} child {node {$N$} child {node {$3$}}}}
       
   284     ;
       
   285 \end{tikzpicture}
       
   286 \end{tabular} 
       
   287 \end{center}
       
   288 
       
   289 \noindent
       
   290 In particular in programming languages we will try to avoid ambiguous
       
   291 grammars as two different parse-trees for a string means a program can
       
   292 be interpreted in two different ways. Then we have to somehow make sure
       
   293 the two different ways do not matter, or disambiguate the grammar in
       
   294 some way (for example making the $+$ left-associative). Unfortunately already 
       
   295 the problem of deciding whether a grammar
       
   296 is ambiguous or not is in general undecidable. But in concrete instances we can
       
   297 often make a choice.
   177 
   298 
   178 \end{document}
   299 \end{document}
   179 
   300 
   180 %%% Local Variables: 
   301 %%% Local Variables: 
   181 %%% mode: latex  
   302 %%% mode: latex