51 |
52 |
52 \noindent Each ``bubble'' stands for sets of languages (remember |
53 \noindent Each ``bubble'' stands for sets of languages (remember |
53 languages are sets of strings). As indicated the set of regular |
54 languages are sets of strings). As indicated the set of regular |
54 languages are fully included inside the context-free languages, |
55 languages are fully included inside the context-free languages, |
55 meaning every regular language is also context-free, but not vice |
56 meaning every regular language is also context-free, but not vice |
56 versa. Below I will let you think for example what the context-free |
57 versa. Below I will let you think, for example, what the context-free |
57 grammar is for the language corresponding to the regular expression |
58 grammar is for the language corresponding to the regular expression |
58 $(aaa)^*a$. |
59 $(aaa)^*a$. |
59 |
60 |
60 Because of their convenience, context-free languages play an important |
61 Because of their convenience, context-free languages play an important |
61 role in `day-to-day' text processing and in programming |
62 role in `day-to-day' text processing and in programming |
62 languages. Context-free in this setting means that ``words'' have one |
63 languages. Context-free in this setting means that ``words'' have one |
63 meaning only ansd this meaning is independend from in which context |
64 meaning only and this meaning is independent from in which context |
64 the ``words'' appear. For example ambiguity issues like |
65 the ``words'' appear. For example ambiguity issues like |
65 |
66 |
66 \begin{center} |
67 \begin{center} |
67 \tt Time flies like an arrow; fruit flies like a banana. |
68 \tt Time flies like an arrow; fruit flies like bananas. |
68 \end{center} |
69 \end{center} |
69 |
70 |
70 \noindent |
71 \noindent |
71 from natural languages were the meaning of \emph{flies} depend on the |
72 from natural languages were the meaning of \emph{flies} depend on the |
72 surounding \emph{context} are avoided as much as possible. |
73 surrounding \emph{context} are avoided as much as possible. |
73 |
74 |
74 Context-free languages are usually specified by grammars. For example |
75 Context-free languages are usually specified by grammars. For example |
75 a grammar for well-parenthesised expressions can be given as follows: |
76 a grammar for well-parenthesised expressions can be given as follows: |
76 |
77 |
77 \begin{plstx}[margin=3cm] |
78 \begin{plstx}[margin=3cm] |
165 top and each non-terminal containing a subtree for how it is replaced in a derivation. |
166 top and each non-terminal containing a subtree for how it is replaced in a derivation. |
166 The parse tree for the string $(1 + 23)+4$ is as follows: |
167 The parse tree for the string $(1 + 23)+4$ is as follows: |
167 |
168 |
168 \begin{center} |
169 \begin{center} |
169 \begin{tikzpicture}[level distance=8mm, black] |
170 \begin{tikzpicture}[level distance=8mm, black] |
170 \node {$E$} |
171 \node {\meta{E}} |
171 child {node {$E$} |
172 child {node {\meta{E} } |
172 child {node {$($}} |
173 child {node {$($}} |
173 child {node {$E$} |
174 child {node {\meta{E} } |
174 child {node {$E$} child {node {$N$} child {node {$1$}}}} |
175 child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}} |
175 child {node {$+$}} |
176 child {node {$+$}} |
176 child {node {$E$} |
177 child {node {\meta{E} } |
177 child {node {$N$} child {node {$2$}}} |
178 child {node {\meta{N} } child {node {$2$}}} |
178 child {node {$N$} child {node {$3$}}} |
179 child {node {\meta{N} } child {node {$3$}}} |
179 } |
180 } |
180 } |
181 } |
181 child {node {$)$}} |
182 child {node {$)$}} |
182 } |
183 } |
183 child {node {$+$}} |
184 child {node {$+$}} |
184 child {node {$E$} |
185 child {node {\meta{E} } |
185 child {node {$N$} child {node {$4$}}} |
186 child {node {\meta{N} } child {node {$4$}}} |
186 }; |
187 }; |
187 \end{tikzpicture} |
188 \end{tikzpicture} |
188 \end{center} |
189 \end{center} |
189 |
190 |
190 \noindent We are often interested in these parse-trees since |
191 \noindent We are often interested in these parse-trees since |
191 they encode the structure of how a string is derived by a |
192 they encode the structure of how a string is derived by a |
192 grammar. Before we come to the problem of constructing such |
193 grammar. Before we come to the problem of constructing such |
193 parse-trees, we need to consider the following two properties |
194 parse-trees, we need to consider the following two properties |
194 of grammars. A grammar is \emph{left-recursive} if there is a |
195 of grammars. A grammar is \emph{left-recursive} if there is a |
195 derivation starting from a non-terminal, say $NT$ which leads |
196 derivation starting from a non-terminal, say \meta{NT} which leads |
196 to a string which again starts with $NT$. This means a |
197 to a string which again starts with \meta{NT}. This means a |
197 derivation of the form. |
198 derivation of the form. |
198 |
199 |
199 \begin{center} |
200 \begin{center} |
200 $NT \rightarrow \ldots \rightarrow NT \cdot \ldots$ |
201 $\meta{NT} \rightarrow \ldots \rightarrow \meta{NT} \cdot \ldots$ |
201 \end{center} |
202 \end{center} |
202 |
203 |
203 \noindent It can be easily seen that the grammar above for |
204 \noindent It can be easily seen that the grammar above for |
204 arithmetic expressions is left-recursive: for example the |
205 arithmetic expressions is left-recursive: for example the |
205 rules $E \rightarrow E\cdot + \cdot E$ and $N \rightarrow |
206 rules $\meta{E} \rightarrow \meta{E}\cdot + \cdot \meta{E}$ and |
206 N\cdot N$ show that this grammar is left-recursive. But note |
207 $\meta{N} \rightarrow \meta{N}\cdot \meta{N}$ show that this |
|
208 grammar is left-recursive. But note |
207 that left-recursiveness can involve more than one step in the |
209 that left-recursiveness can involve more than one step in the |
208 derivation. The problem with left-recursive grammars is that |
210 derivation. The problem with left-recursive grammars is that |
209 some algorithms cannot cope with them: they fall into a loop. |
211 some algorithms cannot cope with them: they fall into a loop. |
210 Fortunately every left-recursive grammar can be transformed |
212 Fortunately every left-recursive grammar can be transformed |
211 into one that is not left-recursive, although this |
213 into one that is not left-recursive, although this |
212 transformation might make the grammar less ``human-readable''. |
214 transformation might make the grammar less ``human-readable''. |
213 For example if we want to give a non-left-recursive grammar |
215 For example if we want to give a non-left-recursive grammar |
214 for numbers we might specify |
216 for numbers we might specify |
215 |
217 |
216 \begin{center} |
218 \begin{center} |
217 $N \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;1\cdot N\;|\;2\cdot N\;|\;\ldots\;|\;9\cdot N$ |
219 $\meta{N} \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\; |
|
220 1\cdot \meta{N}\;|\;2\cdot \meta{N}\;|\;\ldots\;|\;9\cdot \meta{N}$ |
218 \end{center} |
221 \end{center} |
219 |
222 |
220 \noindent Using this grammar we can still derive every number |
223 \noindent Using this grammar we can still derive every number |
221 string, but we will never be able to derive a string of the |
224 string, but we will never be able to derive a string of the |
222 form $N \to \ldots \to N \cdot \ldots$. |
225 form $\meta{N} \to \ldots \to \meta{N} \cdot \ldots$. |
223 |
226 |
224 The other property we have to watch out for is when a grammar |
227 The other property we have to watch out for is when a grammar |
225 is \emph{ambiguous}. A grammar is said to be ambiguous if |
228 is \emph{ambiguous}. A grammar is said to be ambiguous if |
226 there are two parse-trees for one string. Again the grammar |
229 there are two parse-trees for one string. Again the grammar |
227 for arithmetic expressions shown above is ambiguous. While the |
230 for arithmetic expressions shown above is ambiguous. While the |
230 trees for the string $1 + 2 + 3$, namely |
233 trees for the string $1 + 2 + 3$, namely |
231 |
234 |
232 \begin{center} |
235 \begin{center} |
233 \begin{tabular}{c@{\hspace{10mm}}c} |
236 \begin{tabular}{c@{\hspace{10mm}}c} |
234 \begin{tikzpicture}[level distance=8mm, black] |
237 \begin{tikzpicture}[level distance=8mm, black] |
235 \node {$E$} |
238 \node {\meta{E} } |
236 child {node {$E$} child {node {$N$} child {node {$1$}}}} |
239 child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}} |
237 child {node {$+$}} |
240 child {node {$+$}} |
238 child {node {$E$} |
241 child {node {\meta{E} } |
239 child {node {$E$} child {node {$N$} child {node {$2$}}}} |
242 child {node {\meta{E} } child {node {\meta{N} } child {node {$2$}}}} |
240 child {node {$+$}} |
243 child {node {$+$}} |
241 child {node {$E$} child {node {$N$} child {node {$3$}}}} |
244 child {node {\meta{E} } child {node {\meta{N} } child {node {$3$}}}} |
242 } |
245 } |
243 ; |
246 ; |
244 \end{tikzpicture} |
247 \end{tikzpicture} |
245 & |
248 & |
246 \begin{tikzpicture}[level distance=8mm, black] |
249 \begin{tikzpicture}[level distance=8mm, black] |
247 \node {$E$} |
250 \node {\meta{E} } |
248 child {node {$E$} |
251 child {node {\meta{E} } |
249 child {node {$E$} child {node {$N$} child {node {$1$}}}} |
252 child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}} |
250 child {node {$+$}} |
253 child {node {$+$}} |
251 child {node {$E$} child {node {$N$} child {node {$2$}}}} |
254 child {node {\meta{E} } child {node {\meta{N} } child {node {$2$}}}} |
252 } |
255 } |
253 child {node {$+$}} |
256 child {node {$+$}} |
254 child {node {$E$} child {node {$N$} child {node {$3$}}}} |
257 child {node {\meta{E} } child {node {\meta{N} } child {node {$3$}}}} |
255 ; |
258 ; |
256 \end{tikzpicture} |
259 \end{tikzpicture} |
257 \end{tabular} |
260 \end{tabular} |
258 \end{center} |
261 \end{center} |
259 |
262 |
273 |
276 |
274 Let us now turn to the problem of generating a parse-tree for |
277 Let us now turn to the problem of generating a parse-tree for |
275 a grammar and string. In what follows we explain \emph{parser |
278 a grammar and string. In what follows we explain \emph{parser |
276 combinators}, because they are easy to implement and closely |
279 combinators}, because they are easy to implement and closely |
277 resemble grammar rules. Imagine that a grammar describes the |
280 resemble grammar rules. Imagine that a grammar describes the |
278 strings of natural numbers, such as the grammar $N$ shown |
281 strings of natural numbers, such as the grammar \meta{N} shown |
279 above. For all such strings we want to generate the |
282 above. For all such strings we want to generate the |
280 parse-trees or later on we actually want to extract the |
283 parse-trees or later on we actually want to extract the |
281 meaning of these strings, that is the concrete integers |
284 meaning of these strings, that is the concrete integers |
282 ``behind'' these strings. In Scala the parser combinators will |
285 ``behind'' these strings. In Scala the parser combinators will |
283 be functions of type |
286 be functions of type |