163 $E$ & $\rightarrow$ & $N$ \\ |
163 $E$ & $\rightarrow$ & $N$ \\ |
164 $E$ & $\rightarrow$ & $E \cdot + \cdot E$ \\ |
164 $E$ & $\rightarrow$ & $E \cdot + \cdot E$ \\ |
165 $E$ & $\rightarrow$ & $E \cdot - \cdot E$ \\ |
165 $E$ & $\rightarrow$ & $E \cdot - \cdot E$ \\ |
166 $E$ & $\rightarrow$ & $E \cdot * \cdot E$ \\ |
166 $E$ & $\rightarrow$ & $E \cdot * \cdot E$ \\ |
167 $E$ & $\rightarrow$ & $( \cdot E \cdot )$\\ |
167 $E$ & $\rightarrow$ & $( \cdot E \cdot )$\\ |
168 $N$ & $\rightarrow$ & $\epsilon \;|\; 0 \cdot N \;|\; 1 \cdot N \;|\: \ldots \;|\; 9 \cdot N$ |
168 $N$ & $\rightarrow$ & $N \cdot N \;|\; 0 \;|\; 1 \;|\: \ldots \;|\; 9$ |
169 \end{tabular} |
169 \end{tabular} |
170 \end{center} |
170 \end{center} |
171 |
171 |
172 \noindent |
172 \noindent |
173 where $E$ is the starting symbol. A \emph{derivation} for a grammar |
173 where $E$ is the starting symbol. A \emph{derivation} for a grammar |
174 starts with the staring symbol of the grammar and in each step replaces one |
174 starts with the staring symbol of the grammar and in each step replaces one |
175 non-terminal by a right-hand side of a rule. |
175 non-terminal by a right-hand side of a rule. A derivation ends with a string |
176 |
176 in which only terminal symbols are left. For example a derivation for the |
|
177 string $(1 + 2) + 3$ is as follows: |
|
178 |
|
179 \begin{center} |
|
180 \begin{tabular}{lll} |
|
181 $E$ & $\rightarrow$ & $E+E$\\ |
|
182 & $\rightarrow$ & $(E)+E$\\ |
|
183 & $\rightarrow$ & $(E+E)+E$\\ |
|
184 & $\rightarrow$ & $(E+E)+N$\\ |
|
185 & $\rightarrow$ & $(E+E)+3$\\ |
|
186 & $\rightarrow$ & $(N+E)+3$\\ |
|
187 & $\rightarrow^+$ & $(1+2)+3$\\ |
|
188 \end{tabular} |
|
189 \end{center} |
|
190 |
|
191 \noindent |
|
192 The \emph{language} of a context-free grammar $G$ with start symbol $S$ |
|
193 is defined as the set of strings derivable by a derivation, that is |
|
194 |
|
195 \begin{center} |
|
196 $\{c_1\ldots c_n \;|\; S \rightarrow^* c_1\ldots c_n \;\;\text{with all} \; c_i \;\text{being non-terminals}\}$ |
|
197 \end{center} |
|
198 |
|
199 \noindent |
|
200 A \emph{parse-tree} encodes how a string is derived with the starting symbol on |
|
201 top and each non-terminal containing a subtree for how it is replaced in a derivation. |
|
202 The parse tree for the string $(1 + 23)+4$ is as follows: |
|
203 |
|
204 \begin{center} |
|
205 \begin{tikzpicture}[level distance=8mm, black] |
|
206 \node {$E$} |
|
207 child {node {$E$} |
|
208 child {node {$($}} |
|
209 child {node {$E$} |
|
210 child {node {$E$} child {node {$N$} child {node {$1$}}}} |
|
211 child {node {$+$}} |
|
212 child {node {$E$} |
|
213 child {node {$N$} child {node {$2$}}} |
|
214 child {node {$N$} child {node {$3$}}} |
|
215 } |
|
216 } |
|
217 child {node {$)$}} |
|
218 } |
|
219 child {node {$+$}} |
|
220 child {node {$E$} |
|
221 child {node {$N$} child {node {$4$}}} |
|
222 }; |
|
223 \end{tikzpicture} |
|
224 \end{center} |
|
225 |
|
226 \noindent |
|
227 We are often interested in these parse-trees since they encode the structure of |
|
228 how a string is derived by a grammar. Before we come to the problem of constructing |
|
229 such parse-trees, we need to consider the following two properties of grammars. |
|
230 A grammar is \emph{left-recursive} if there is a derivation starting from a non-terminal, say |
|
231 $NT$ which leads to a string which again starts with $NT$. This means a derivation of the |
|
232 form. |
|
233 |
|
234 \begin{center} |
|
235 $NT \rightarrow \ldots \rightarrow NT \cdot \ldots$ |
|
236 \end{center} |
|
237 |
|
238 \noindent |
|
239 It can be easily seems that the grammar above for arithmetic expressions is left-recursive: |
|
240 for example the rules $E \rightarrow E\cdot + \cdot E$ and $N \rightarrow N\cdot N$ |
|
241 show that this grammar is left-recursive. Some algorithms cannot cope with left-recursive |
|
242 grammars. Fortunately every left-recursive grammar can be transformed into one that is |
|
243 not left-recursive, although this transformation might make the grammar less human-readable. |
|
244 For example if we want to give a non-left-recursive grammar for numbers we might |
|
245 specify |
|
246 |
|
247 \begin{center} |
|
248 $N \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;1\cdot N\;|\;2\cdot N\;|\;\ldots\;|\;9\cdot N$ |
|
249 \end{center} |
|
250 |
|
251 \noindent |
|
252 Using this grammar we can still derive every number string, but we will never be able |
|
253 to derive a string of the form $\ldots \rightarrow N \cdot \ldots$. |
|
254 |
|
255 The other property we have to watch out is when a grammar is |
|
256 \emph{ambiguous}. A grammar is said to be ambiguous if there are two parse-trees |
|
257 for one string. Again the grammar for arithmetic expressions shown above is ambiguous. |
|
258 While the shown parse tree for the string $(1 + 23) + 4$ is unique, there are two parse |
|
259 trees for the string $1 + 2 + 3$, namely |
|
260 |
|
261 \begin{center} |
|
262 \begin{tabular}{c@{\hspace{10mm}}c} |
|
263 \begin{tikzpicture}[level distance=8mm, black] |
|
264 \node {$E$} |
|
265 child {node {$E$} child {node {$N$} child {node {$1$}}}} |
|
266 child {node {$+$}} |
|
267 child {node {$E$} |
|
268 child {node {$E$} child {node {$N$} child {node {$2$}}}} |
|
269 child {node {$+$}} |
|
270 child {node {$E$} child {node {$N$} child {node {$3$}}}} |
|
271 } |
|
272 ; |
|
273 \end{tikzpicture} |
|
274 & |
|
275 \begin{tikzpicture}[level distance=8mm, black] |
|
276 \node {$E$} |
|
277 child {node {$E$} |
|
278 child {node {$E$} child {node {$N$} child {node {$1$}}}} |
|
279 child {node {$+$}} |
|
280 child {node {$E$} child {node {$N$} child {node {$2$}}}} |
|
281 } |
|
282 child {node {$+$}} |
|
283 child {node {$E$} child {node {$N$} child {node {$3$}}}} |
|
284 ; |
|
285 \end{tikzpicture} |
|
286 \end{tabular} |
|
287 \end{center} |
|
288 |
|
289 \noindent |
|
290 In particular in programming languages we will try to avoid ambiguous |
|
291 grammars as two different parse-trees for a string means a program can |
|
292 be interpreted in two different ways. Then we have to somehow make sure |
|
293 the two different ways do not matter, or disambiguate the grammar in |
|
294 some way (for example making the $+$ left-associative). Unfortunately already |
|
295 the problem of deciding whether a grammar |
|
296 is ambiguous or not is in general undecidable. But in concrete instances we can |
|
297 often make a choice. |
177 |
298 |
178 \end{document} |
299 \end{document} |
179 |
300 |
180 %%% Local Variables: |
301 %%% Local Variables: |
181 %%% mode: latex |
302 %%% mode: latex |