handouts/ho05.tex
changeset 665 6d74d2a0a4b0
parent 618 f4818c95a32e
child 680 eecc4d5a2172
equal deleted inserted replaced
664:dc2f5eb33a9a 665:6d74d2a0a4b0
       
     1 % !TEX program = xelatex
     1 \documentclass{article}
     2 \documentclass{article}
     2 \usepackage{../style}
     3 \usepackage{../style}
     3 \usepackage{../langs}
     4 \usepackage{../langs}
     4 \usepackage{../grammar}
     5 \usepackage{../grammar}
     5 
     6 
    51 
    52 
    52 \noindent Each ``bubble'' stands for sets of languages (remember
    53 \noindent Each ``bubble'' stands for sets of languages (remember
    53 languages are sets of strings). As indicated the set of regular
    54 languages are sets of strings). As indicated the set of regular
    54 languages are fully included inside the context-free languages,
    55 languages are fully included inside the context-free languages,
    55 meaning every regular language is also context-free, but not vice
    56 meaning every regular language is also context-free, but not vice
    56 versa. Below I will let you think for example what the context-free
    57 versa. Below I will let you think, for example, what the context-free
    57 grammar is for the language corresponding to the regular expression
    58 grammar is for the language corresponding to the regular expression
    58 $(aaa)^*a$.
    59 $(aaa)^*a$.
    59 
    60 
    60 Because of their convenience, context-free languages play an important
    61 Because of their convenience, context-free languages play an important
    61 role in `day-to-day' text processing and in programming
    62 role in `day-to-day' text processing and in programming
    62 languages. Context-free in this setting means that ``words'' have one
    63 languages. Context-free in this setting means that ``words'' have one
    63 meaning only ansd this meaning is independend from in which context
    64 meaning only and this meaning is independent from in which context
    64 the ``words'' appear. For example ambiguity issues like
    65 the ``words'' appear. For example ambiguity issues like
    65 
    66 
    66 \begin{center}
    67 \begin{center}
    67 \tt Time flies like an arrow; fruit flies like a banana.
    68 \tt Time flies like an arrow; fruit flies like bananas.
    68 \end{center}  
    69 \end{center}  
    69 
    70 
    70 \noindent
    71 \noindent
    71 from natural languages were the meaning of \emph{flies} depend on the
    72 from natural languages were the meaning of \emph{flies} depend on the
    72 surounding \emph{context} are avoided as much as possible.
    73 surrounding \emph{context} are avoided as much as possible.
    73 
    74 
    74 Context-free languages are usually specified by grammars. For example
    75 Context-free languages are usually specified by grammars. For example
    75 a grammar for well-parenthesised expressions can be given as follows:
    76 a grammar for well-parenthesised expressions can be given as follows:
    76 
    77 
    77 \begin{plstx}[margin=3cm]
    78 \begin{plstx}[margin=3cm]
   165 top and each non-terminal containing a subtree for how it is replaced in a derivation.
   166 top and each non-terminal containing a subtree for how it is replaced in a derivation.
   166 The parse tree for the string $(1 + 23)+4$ is as follows:
   167 The parse tree for the string $(1 + 23)+4$ is as follows:
   167 
   168 
   168 \begin{center}
   169 \begin{center}
   169 \begin{tikzpicture}[level distance=8mm, black]
   170 \begin{tikzpicture}[level distance=8mm, black]
   170   \node {$E$}
   171   \node {\meta{E}}
   171     child {node {$E$} 
   172     child {node {\meta{E} } 
   172        child {node {$($}}
   173        child {node {$($}}
   173        child {node {$E$}       
   174        child {node {\meta{E} }       
   174          child {node {$E$} child {node {$N$} child {node {$1$}}}}
   175          child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}}
   175          child {node {$+$}}
   176          child {node {$+$}}
   176          child {node {$E$} 
   177          child {node {\meta{E} } 
   177             child {node {$N$} child {node {$2$}}}
   178             child {node {\meta{N} } child {node {$2$}}}
   178             child {node {$N$} child {node {$3$}}}
   179             child {node {\meta{N} } child {node {$3$}}}
   179             } 
   180             } 
   180         }
   181         }
   181        child {node {$)$}}
   182        child {node {$)$}}
   182      }
   183      }
   183      child {node {$+$}}
   184      child {node {$+$}}
   184      child {node {$E$}
   185      child {node {\meta{E} }
   185         child {node {$N$} child {node {$4$}}}
   186         child {node {\meta{N} } child {node {$4$}}}
   186      };
   187      };
   187 \end{tikzpicture}
   188 \end{tikzpicture}
   188 \end{center}
   189 \end{center}
   189 
   190 
   190 \noindent We are often interested in these parse-trees since
   191 \noindent We are often interested in these parse-trees since
   191 they encode the structure of how a string is derived by a
   192 they encode the structure of how a string is derived by a
   192 grammar. Before we come to the problem of constructing such
   193 grammar. Before we come to the problem of constructing such
   193 parse-trees, we need to consider the following two properties
   194 parse-trees, we need to consider the following two properties
   194 of grammars. A grammar is \emph{left-recursive} if there is a
   195 of grammars. A grammar is \emph{left-recursive} if there is a
   195 derivation starting from a non-terminal, say $NT$ which leads
   196 derivation starting from a non-terminal, say \meta{NT} which leads
   196 to a string which again starts with $NT$. This means a
   197 to a string which again starts with \meta{NT}. This means a
   197 derivation of the form.
   198 derivation of the form.
   198 
   199 
   199 \begin{center}
   200 \begin{center}
   200 $NT \rightarrow \ldots \rightarrow NT \cdot \ldots$
   201 $\meta{NT} \rightarrow \ldots \rightarrow \meta{NT} \cdot \ldots$
   201 \end{center}
   202 \end{center}
   202 
   203 
   203 \noindent It can be easily seen that the grammar above for
   204 \noindent It can be easily seen that the grammar above for
   204 arithmetic expressions is left-recursive: for example the
   205 arithmetic expressions is left-recursive: for example the
   205 rules $E \rightarrow E\cdot + \cdot E$ and $N \rightarrow
   206 rules $\meta{E} \rightarrow \meta{E}\cdot + \cdot \meta{E}$ and 
   206 N\cdot N$ show that this grammar is left-recursive. But note
   207 $\meta{N} \rightarrow \meta{N}\cdot \meta{N}$ show that this 
       
   208 grammar is left-recursive. But note
   207 that left-recursiveness can involve more than one step in the
   209 that left-recursiveness can involve more than one step in the
   208 derivation. The problem with left-recursive grammars is that
   210 derivation. The problem with left-recursive grammars is that
   209 some algorithms cannot cope with them: they fall into a loop.
   211 some algorithms cannot cope with them: they fall into a loop.
   210 Fortunately every left-recursive grammar can be transformed
   212 Fortunately every left-recursive grammar can be transformed
   211 into one that is not left-recursive, although this
   213 into one that is not left-recursive, although this
   212 transformation might make the grammar less ``human-readable''.
   214 transformation might make the grammar less ``human-readable''.
   213 For example if we want to give a non-left-recursive grammar
   215 For example if we want to give a non-left-recursive grammar
   214 for numbers we might specify
   216 for numbers we might specify
   215 
   217 
   216 \begin{center}
   218 \begin{center}
   217 $N \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;1\cdot N\;|\;2\cdot N\;|\;\ldots\;|\;9\cdot N$
   219 $\meta{N} \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;
       
   220 1\cdot \meta{N}\;|\;2\cdot \meta{N}\;|\;\ldots\;|\;9\cdot \meta{N}$
   218 \end{center}
   221 \end{center}
   219 
   222 
   220 \noindent Using this grammar we can still derive every number
   223 \noindent Using this grammar we can still derive every number
   221 string, but we will never be able to derive a string of the
   224 string, but we will never be able to derive a string of the
   222 form $N \to \ldots \to N \cdot \ldots$.
   225 form $\meta{N} \to \ldots \to \meta{N} \cdot \ldots$.
   223 
   226 
   224 The other property we have to watch out for is when a grammar
   227 The other property we have to watch out for is when a grammar
   225 is \emph{ambiguous}. A grammar is said to be ambiguous if
   228 is \emph{ambiguous}. A grammar is said to be ambiguous if
   226 there are two parse-trees for one string. Again the grammar
   229 there are two parse-trees for one string. Again the grammar
   227 for arithmetic expressions shown above is ambiguous. While the
   230 for arithmetic expressions shown above is ambiguous. While the
   230 trees for the string $1 + 2 + 3$, namely
   233 trees for the string $1 + 2 + 3$, namely
   231 
   234 
   232 \begin{center}
   235 \begin{center}
   233 \begin{tabular}{c@{\hspace{10mm}}c}
   236 \begin{tabular}{c@{\hspace{10mm}}c}
   234 \begin{tikzpicture}[level distance=8mm, black]
   237 \begin{tikzpicture}[level distance=8mm, black]
   235   \node {$E$}
   238   \node {\meta{E} }
   236     child {node {$E$} child {node {$N$} child {node {$1$}}}}
   239     child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}}
   237     child {node {$+$}}
   240     child {node {$+$}}
   238     child {node {$E$}
   241     child {node {\meta{E} }
   239        child {node {$E$} child {node {$N$} child {node {$2$}}}}
   242        child {node {\meta{E} } child {node {\meta{N} } child {node {$2$}}}}
   240        child {node {$+$}}
   243        child {node {$+$}}
   241        child {node {$E$} child {node {$N$} child {node {$3$}}}}
   244        child {node {\meta{E} } child {node {\meta{N} } child {node {$3$}}}}
   242     }
   245     }
   243     ;
   246     ;
   244 \end{tikzpicture} 
   247 \end{tikzpicture} 
   245 &
   248 &
   246 \begin{tikzpicture}[level distance=8mm, black]
   249 \begin{tikzpicture}[level distance=8mm, black]
   247   \node {$E$}
   250   \node {\meta{E} }
   248     child {node {$E$}
   251     child {node {\meta{E} }
   249        child {node {$E$} child {node {$N$} child {node {$1$}}}}
   252        child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}}
   250        child {node {$+$}}
   253        child {node {$+$}}
   251        child {node {$E$} child {node {$N$} child {node {$2$}}}} 
   254        child {node {\meta{E} } child {node {\meta{N} } child {node {$2$}}}} 
   252     }
   255     }
   253     child {node {$+$}}
   256     child {node {$+$}}
   254     child {node {$E$} child {node {$N$} child {node {$3$}}}}
   257     child {node {\meta{E} } child {node {\meta{N} } child {node {$3$}}}}
   255     ;
   258     ;
   256 \end{tikzpicture}
   259 \end{tikzpicture}
   257 \end{tabular} 
   260 \end{tabular} 
   258 \end{center}
   261 \end{center}
   259 
   262 
   273 
   276 
   274 Let us now turn to the problem of generating a parse-tree for
   277 Let us now turn to the problem of generating a parse-tree for
   275 a grammar and string. In what follows we explain \emph{parser
   278 a grammar and string. In what follows we explain \emph{parser
   276 combinators}, because they are easy to implement and closely
   279 combinators}, because they are easy to implement and closely
   277 resemble grammar rules. Imagine that a grammar describes the
   280 resemble grammar rules. Imagine that a grammar describes the
   278 strings of natural numbers, such as the grammar $N$ shown
   281 strings of natural numbers, such as the grammar \meta{N}  shown
   279 above. For all such strings we want to generate the
   282 above. For all such strings we want to generate the
   280 parse-trees or later on we actually want to extract the
   283 parse-trees or later on we actually want to extract the
   281 meaning of these strings, that is the concrete integers
   284 meaning of these strings, that is the concrete integers
   282 ``behind'' these strings. In Scala the parser combinators will
   285 ``behind'' these strings. In Scala the parser combinators will
   283 be functions of type
   286 be functions of type