diff -r dc2f5eb33a9a -r 6d74d2a0a4b0 handouts/ho05.tex --- a/handouts/ho05.tex Wed Oct 23 21:39:11 2019 +0100 +++ b/handouts/ho05.tex Wed Oct 23 22:23:10 2019 +0100 @@ -1,3 +1,4 @@ +% !TEX program = xelatex \documentclass{article} \usepackage{../style} \usepackage{../langs} @@ -53,23 +54,23 @@ languages are sets of strings). As indicated the set of regular languages are fully included inside the context-free languages, meaning every regular language is also context-free, but not vice -versa. Below I will let you think for example what the context-free +versa. Below I will let you think, for example, what the context-free grammar is for the language corresponding to the regular expression $(aaa)^*a$. Because of their convenience, context-free languages play an important role in `day-to-day' text processing and in programming languages. Context-free in this setting means that ``words'' have one -meaning only ansd this meaning is independend from in which context +meaning only and this meaning is independent from in which context the ``words'' appear. For example ambiguity issues like \begin{center} -\tt Time flies like an arrow; fruit flies like a banana. +\tt Time flies like an arrow; fruit flies like bananas. \end{center} \noindent from natural languages were the meaning of \emph{flies} depend on the -surounding \emph{context} are avoided as much as possible. +surrounding \emph{context} are avoided as much as possible. Context-free languages are usually specified by grammars. For example a grammar for well-parenthesised expressions can be given as follows: @@ -167,22 +168,22 @@ \begin{center} \begin{tikzpicture}[level distance=8mm, black] - \node {$E$} - child {node {$E$} + \node {\meta{E}} + child {node {\meta{E} } child {node {$($}} - child {node {$E$} - child {node {$E$} child {node {$N$} child {node {$1$}}}} + child {node {\meta{E} } + child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}} child {node {$+$}} - child {node {$E$} - child {node {$N$} child {node {$2$}}} - child {node {$N$} child {node {$3$}}} + child {node {\meta{E} } + child {node {\meta{N} } child {node {$2$}}} + child {node {\meta{N} } child {node {$3$}}} } } child {node {$)$}} } child {node {$+$}} - child {node {$E$} - child {node {$N$} child {node {$4$}}} + child {node {\meta{E} } + child {node {\meta{N} } child {node {$4$}}} }; \end{tikzpicture} \end{center} @@ -192,18 +193,19 @@ grammar. Before we come to the problem of constructing such parse-trees, we need to consider the following two properties of grammars. A grammar is \emph{left-recursive} if there is a -derivation starting from a non-terminal, say $NT$ which leads -to a string which again starts with $NT$. This means a +derivation starting from a non-terminal, say \meta{NT} which leads +to a string which again starts with \meta{NT}. This means a derivation of the form. \begin{center} -$NT \rightarrow \ldots \rightarrow NT \cdot \ldots$ +$\meta{NT} \rightarrow \ldots \rightarrow \meta{NT} \cdot \ldots$ \end{center} \noindent It can be easily seen that the grammar above for arithmetic expressions is left-recursive: for example the -rules $E \rightarrow E\cdot + \cdot E$ and $N \rightarrow -N\cdot N$ show that this grammar is left-recursive. But note +rules $\meta{E} \rightarrow \meta{E}\cdot + \cdot \meta{E}$ and +$\meta{N} \rightarrow \meta{N}\cdot \meta{N}$ show that this +grammar is left-recursive. But note that left-recursiveness can involve more than one step in the derivation. The problem with left-recursive grammars is that some algorithms cannot cope with them: they fall into a loop. @@ -214,12 +216,13 @@ for numbers we might specify \begin{center} -$N \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;1\cdot N\;|\;2\cdot N\;|\;\ldots\;|\;9\cdot N$ +$\meta{N} \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\; +1\cdot \meta{N}\;|\;2\cdot \meta{N}\;|\;\ldots\;|\;9\cdot \meta{N}$ \end{center} \noindent Using this grammar we can still derive every number string, but we will never be able to derive a string of the -form $N \to \ldots \to N \cdot \ldots$. +form $\meta{N} \to \ldots \to \meta{N} \cdot \ldots$. The other property we have to watch out for is when a grammar is \emph{ambiguous}. A grammar is said to be ambiguous if @@ -232,26 +235,26 @@ \begin{center} \begin{tabular}{c@{\hspace{10mm}}c} \begin{tikzpicture}[level distance=8mm, black] - \node {$E$} - child {node {$E$} child {node {$N$} child {node {$1$}}}} + \node {\meta{E} } + child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}} child {node {$+$}} - child {node {$E$} - child {node {$E$} child {node {$N$} child {node {$2$}}}} + child {node {\meta{E} } + child {node {\meta{E} } child {node {\meta{N} } child {node {$2$}}}} child {node {$+$}} - child {node {$E$} child {node {$N$} child {node {$3$}}}} + child {node {\meta{E} } child {node {\meta{N} } child {node {$3$}}}} } ; \end{tikzpicture} & \begin{tikzpicture}[level distance=8mm, black] - \node {$E$} - child {node {$E$} - child {node {$E$} child {node {$N$} child {node {$1$}}}} + \node {\meta{E} } + child {node {\meta{E} } + child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}} child {node {$+$}} - child {node {$E$} child {node {$N$} child {node {$2$}}}} + child {node {\meta{E} } child {node {\meta{N} } child {node {$2$}}}} } child {node {$+$}} - child {node {$E$} child {node {$N$} child {node {$3$}}}} + child {node {\meta{E} } child {node {\meta{N} } child {node {$3$}}}} ; \end{tikzpicture} \end{tabular} @@ -275,7 +278,7 @@ a grammar and string. In what follows we explain \emph{parser combinators}, because they are easy to implement and closely resemble grammar rules. Imagine that a grammar describes the -strings of natural numbers, such as the grammar $N$ shown +strings of natural numbers, such as the grammar \meta{N} shown above. For all such strings we want to generate the parse-trees or later on we actually want to extract the meaning of these strings, that is the concrete integers