| author | Christian Urban <urbanc@in.tum.de> | 
| Wed, 06 Nov 2019 21:52:42 +0000 | |
| changeset 682 | 612976492d25 | 
| parent 681 | 9efdee02c95e | 
| child 686 | 5fe95ea0bad0 | 
| permissions | -rw-r--r-- | 
| 665 | 1 | % !TEX program = xelatex | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 2 | \documentclass{article}
 | 
| 297 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 3 | \usepackage{../style}
 | 
| 217 
cd6066f1056a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
183diff
changeset | 4 | \usepackage{../langs}
 | 
| 459 | 5 | \usepackage{../grammar}
 | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 6 | |
| 545 | 7 | % epsilon and left-recursion elimination | 
| 8 | % http://www.mollypages.org/page/grammar/index.mp | |
| 9 | ||
| 618 | 10 | %% parsing scala files | 
| 11 | %%https://scalameta.org/ | |
| 12 | ||
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 13 | \begin{document}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 14 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
362diff
changeset | 15 | \section*{Handout 5 (Grammars \& Parser)}
 | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 16 | |
| 680 | 17 | While regular expressions are very useful for lexing and for recognising | 
| 18 | many patterns in strings (like email addresses), they have their | |
| 19 | limitations. For example there is no regular expression that can | |
| 682 | 20 | recognise the language $a^nb^n$ (where you have strings starting with $n$ $a$'s | 
| 680 | 21 | followed by the same amount of $b$'s). Another example for which there | 
| 22 | exists no regular expression is the language of well-parenthesised | |
| 23 | expressions. In languages like Lisp, which use parentheses rather | |
| 24 | extensively, it might be of interest to know whether the following two | |
| 25 | expressions are well-parenthesised or not (the left one is, the right | |
| 26 | one is not): | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 27 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 28 | \begin{center}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 29 | $(((()()))())$  \hspace{10mm} $(((()()))()))$
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 30 | \end{center}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 31 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 32 | \noindent Not being able to solve such recognition problems is | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 33 | a serious limitation. In order to solve such recognition | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 34 | problems, we need more powerful techniques than regular | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 35 | expressions. We will in particular look at \emph{context-free
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 36 | languages}. They include the regular languages as the picture | 
| 582 | 37 | below about language classes shows: | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 38 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 39 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 40 | \begin{center}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 41 | \begin{tikzpicture}
 | 
| 297 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 42 | [rect/.style={draw=black!50, 
 | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 43 | top color=white,bottom color=black!20, | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 44 | rectangle, very thick, rounded corners}] | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 45 | |
| 297 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 46 | \draw (0,0) node [rect, text depth=30mm, text width=46mm] {\small all languages};
 | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 47 | \draw (0,-0.4) node [rect, text depth=20mm, text width=44mm] {\small decidable languages};
 | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 48 | \draw (0,-0.65) node [rect, text depth=13mm] {\small context sensitive languages};
 | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 49 | \draw (0,-0.84) node [rect, text depth=7mm, text width=35mm] {\small context-free languages};
 | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 50 | \draw (0,-1.05) node [rect] {\small regular languages};
 | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 51 | \end{tikzpicture}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 52 | \end{center}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 53 | |
| 582 | 54 | \noindent Each ``bubble'' stands for sets of languages (remember | 
| 55 | languages are sets of strings). As indicated the set of regular | |
| 680 | 56 | languages is fully included inside the context-free languages, | 
| 582 | 57 | meaning every regular language is also context-free, but not vice | 
| 665 | 58 | versa. Below I will let you think, for example, what the context-free | 
| 582 | 59 | grammar is for the language corresponding to the regular expression | 
| 60 | $(aaa)^*a$. | |
| 61 | ||
| 62 | Because of their convenience, context-free languages play an important | |
| 63 | role in `day-to-day' text processing and in programming | |
| 64 | languages. Context-free in this setting means that ``words'' have one | |
| 680 | 65 | meaning only and this meaning is independent from the context | 
| 66 | the ``words'' appear in. For example ambiguity issues like | |
| 582 | 67 | |
| 68 | \begin{center}
 | |
| 682 | 69 | \tt Time flies like an arrow. Fruit flies like bananas. | 
| 582 | 70 | \end{center}  
 | 
| 71 | ||
| 72 | \noindent | |
| 680 | 73 | from natural languages were the meaning of \emph{flies} depends on the
 | 
| 665 | 74 | surrounding \emph{context} are avoided as much as possible.
 | 
| 582 | 75 | |
| 76 | Context-free languages are usually specified by grammars. For example | |
| 77 | a grammar for well-parenthesised expressions can be given as follows: | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 78 | |
| 459 | 79 | \begin{plstx}[margin=3cm]
 | 
| 80 | : \meta{P} ::=  ( \cdot  \meta{P} \cdot ) \cdot \meta{P}
 | |
| 81 | | \epsilon\\ | |
| 82 | \end{plstx}
 | |
| 83 | ||
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 84 | \noindent | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 85 | or a grammar for recognising strings consisting of ones is | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 86 | |
| 459 | 87 | \begin{plstx}[margin=3cm]
 | 
| 88 | : \meta{O} ::= 1 \cdot  \meta{O} 
 | |
| 89 | | 1\\ | |
| 90 | \end{plstx}
 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 91 | |
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 92 | In general grammars consist of finitely many rules built up | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 93 | from \emph{terminal symbols} (usually lower-case letters) and
 | 
| 582 | 94 | \emph{non-terminal symbols} (upper-case letters written in
 | 
| 95 | bold like \meta{A}, \meta{N} and so on). Rules have
 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 96 | the shape | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 97 | |
| 459 | 98 | \begin{plstx}[margin=3cm]
 | 
| 99 | : \meta{NT} ::= rhs\\
 | |
| 100 | \end{plstx}
 | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 101 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 102 | \noindent where on the left-hand side is a single non-terminal | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 103 | and on the right a string consisting of both terminals and | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 104 | non-terminals including the $\epsilon$-symbol for indicating | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 105 | the empty string. We use the convention to separate components | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 106 | on the right hand-side by using the $\cdot$ symbol, as in the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 107 | grammar for well-parenthesised expressions. We also use the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 108 | convention to use $|$ as a shorthand notation for several | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 109 | rules. For example | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 110 | |
| 459 | 111 | \begin{plstx}[margin=3cm]
 | 
| 112 | : \meta{NT} ::= rhs_1
 | |
| 113 | | rhs_2\\ | |
| 114 | \end{plstx}
 | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 115 | |
| 459 | 116 | \noindent means that the non-terminal \meta{NT} can be replaced by
 | 
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 117 | either $\textit{rhs}_1$ or $\textit{rhs}_2$. If there are more
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 118 | than one non-terminal on the left-hand side of the rules, then | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 119 | we need to indicate what is the \emph{starting} symbol of the
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 120 | grammar. For example the grammar for arithmetic expressions | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 121 | can be given as follows | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 122 | |
| 459 | 123 | \begin{plstx}[margin=3cm,one per line]
 | 
| 124 | \mbox{\rm (1)}: \meta{E} ::= \meta{N}\\
 | |
| 125 | \mbox{\rm (2)}: \meta{E} ::= \meta{E} \cdot + \cdot \meta{E}\\
 | |
| 126 | \mbox{\rm (3)}: \meta{E} ::= \meta{E} \cdot - \cdot \meta{E}\\
 | |
| 127 | \mbox{\rm (4)}: \meta{E} ::= \meta{E} \cdot * \cdot \meta{E}\\
 | |
| 128 | \mbox{\rm (5)}: \meta{E} ::= ( \cdot \meta{E} \cdot )\\
 | |
| 129 | \mbox{\rm (6\ldots)}: \meta{N} ::= \meta{N} \cdot \meta{N} 
 | |
| 130 | \mid 0 \mid 1 \mid \ldots \mid 9\\ | |
| 131 | \end{plstx}
 | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 132 | |
| 459 | 133 | \noindent where \meta{E} is the starting symbol. A
 | 
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 134 | \emph{derivation} for a grammar starts with the starting
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 135 | symbol of the grammar and in each step replaces one | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 136 | non-terminal by a right-hand side of a rule. A derivation ends | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 137 | with a string in which only terminal symbols are left. For | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 138 | example a derivation for the string $(1 + 2) + 3$ is as | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 139 | follows: | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 140 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 141 | \begin{center}
 | 
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 142 | \begin{tabular}{lll@{\hspace{2cm}}l}
 | 
| 459 | 143 | \meta{E} & $\rightarrow$ & $\meta{E}+\meta{E}$          & by (2)\\
 | 
| 144 |        & $\rightarrow$ & $(\meta{E})+\meta{E}$     & by (5)\\
 | |
| 145 |        & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{E}$   & by (2)\\
 | |
| 146 |        & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{N}$   & by (1)\\
 | |
| 147 |        & $\rightarrow$ & $(\meta{E}+\meta{E})+3$   & by (6\dots)\\
 | |
| 148 |        & $\rightarrow$ & $(\meta{N}+\meta{E})+3$   & by (1)\\
 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 149 | & $\rightarrow^+$ & $(1+2)+3$ & by (1, 6\ldots)\\ | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 150 | \end{tabular} 
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 151 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 152 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 153 | \noindent where on the right it is indicated which | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 154 | grammar rule has been applied. In the last step we | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 155 | merged several steps into one. | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 156 | |
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 157 | The \emph{language} of a context-free grammar $G$
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 158 | with start symbol $S$ is defined as the set of strings | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 159 | derivable by a derivation, that is | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 160 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 161 | \begin{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 162 | $\{c_1\ldots c_n \;|\; S \rightarrow^* c_1\ldots c_n \;\;\text{with all} \; c_i \;\text{being non-terminals}\}$
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 163 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 164 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 165 | \noindent | 
| 680 | 166 | A \emph{parse-tree} encodes how a string is derived with the starting
 | 
| 167 | symbol on top and each non-terminal containing a subtree for how it is | |
| 168 | replaced in a derivation. The parse tree for the string $(1 + 23)+4$ is | |
| 169 | as follows: | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 170 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 171 | \begin{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 172 | \begin{tikzpicture}[level distance=8mm, black]
 | 
| 665 | 173 |   \node {\meta{E}}
 | 
| 174 |     child {node {\meta{E} } 
 | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 175 |        child {node {$($}}
 | 
| 665 | 176 |        child {node {\meta{E} }       
 | 
| 177 |          child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}}
 | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 178 |          child {node {$+$}}
 | 
| 665 | 179 |          child {node {\meta{E} } 
 | 
| 180 |             child {node {\meta{N} } child {node {$2$}}}
 | |
| 181 |             child {node {\meta{N} } child {node {$3$}}}
 | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 182 | } | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 183 | } | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 184 |        child {node {$)$}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 185 | } | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 186 |      child {node {$+$}}
 | 
| 665 | 187 |      child {node {\meta{E} }
 | 
| 188 |         child {node {\meta{N} } child {node {$4$}}}
 | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 189 | }; | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 190 | \end{tikzpicture}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 191 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 192 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 193 | \noindent We are often interested in these parse-trees since | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 194 | they encode the structure of how a string is derived by a | 
| 680 | 195 | grammar. | 
| 196 | ||
| 197 | Before we come to the problem of constructing such parse-trees, we need | |
| 198 | to consider the following two properties of grammars. A grammar is | |
| 199 | \emph{left-recursive} if there is a derivation starting from a
 | |
| 200 | non-terminal, say \meta{NT} which leads to a string which again starts
 | |
| 201 | with \meta{NT}. This means a derivation of the form.
 | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 202 | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 203 | \begin{center}
 | 
| 665 | 204 | $\meta{NT} \rightarrow \ldots \rightarrow \meta{NT} \cdot \ldots$
 | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 205 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 206 | |
| 680 | 207 | \noindent It can be easily seen that the grammar above for arithmetic | 
| 208 | expressions is left-recursive: for example the rules $\meta{E}
 | |
| 209 | \rightarrow \meta{E}\cdot + \cdot \meta{E}$ and $\meta{N} \rightarrow
 | |
| 210 | \meta{N}\cdot \meta{N}$ show that this grammar is left-recursive. But
 | |
| 211 | note that left-recursiveness can involve more than one step in the | |
| 212 | derivation. The problem with left-recursive grammars is that some | |
| 213 | algorithms cannot cope with them: with left-recursive grammars they will | |
| 214 | fall into a loop. Fortunately every left-recursive grammar can be | |
| 215 | transformed into one that is not left-recursive, although this | |
| 216 | transformation might make the grammar less ``human-readable''. For | |
| 217 | example if we want to give a non-left-recursive grammar for numbers we | |
| 218 | might specify | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 219 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 220 | \begin{center}
 | 
| 665 | 221 | $\meta{N} \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;
 | 
| 222 | 1\cdot \meta{N}\;|\;2\cdot \meta{N}\;|\;\ldots\;|\;9\cdot \meta{N}$
 | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 223 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 224 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 225 | \noindent Using this grammar we can still derive every number | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 226 | string, but we will never be able to derive a string of the | 
| 665 | 227 | form $\meta{N} \to \ldots \to \meta{N} \cdot \ldots$.
 | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 228 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 229 | The other property we have to watch out for is when a grammar | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 230 | is \emph{ambiguous}. A grammar is said to be ambiguous if
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 231 | there are two parse-trees for one string. Again the grammar | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 232 | for arithmetic expressions shown above is ambiguous. While the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 233 | shown parse tree for the string $(1 + 23) + 4$ is unique, this | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 234 | is not the case in general. For example there are two parse | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 235 | trees for the string $1 + 2 + 3$, namely | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 236 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 237 | \begin{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 238 | \begin{tabular}{c@{\hspace{10mm}}c}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 239 | \begin{tikzpicture}[level distance=8mm, black]
 | 
| 665 | 240 |   \node {\meta{E} }
 | 
| 241 |     child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}}
 | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 242 |     child {node {$+$}}
 | 
| 665 | 243 |     child {node {\meta{E} }
 | 
| 244 |        child {node {\meta{E} } child {node {\meta{N} } child {node {$2$}}}}
 | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 245 |        child {node {$+$}}
 | 
| 665 | 246 |        child {node {\meta{E} } child {node {\meta{N} } child {node {$3$}}}}
 | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 247 | } | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 248 | ; | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 249 | \end{tikzpicture} 
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 250 | & | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 251 | \begin{tikzpicture}[level distance=8mm, black]
 | 
| 665 | 252 |   \node {\meta{E} }
 | 
| 253 |     child {node {\meta{E} }
 | |
| 254 |        child {node {\meta{E} } child {node {\meta{N} } child {node {$1$}}}}
 | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 255 |        child {node {$+$}}
 | 
| 665 | 256 |        child {node {\meta{E} } child {node {\meta{N} } child {node {$2$}}}} 
 | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 257 | } | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 258 |     child {node {$+$}}
 | 
| 665 | 259 |     child {node {\meta{E} } child {node {\meta{N} } child {node {$3$}}}}
 | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 260 | ; | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 261 | \end{tikzpicture}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 262 | \end{tabular} 
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 263 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 264 | |
| 680 | 265 | \noindent In particular in programming languages we will try to avoid | 
| 266 | ambiguous grammars because two different parse-trees for a string mean a | |
| 267 | program can be interpreted in two different ways. In such cases we have | |
| 268 | to somehow make sure the two different ways do not matter, or | |
| 269 | disambiguate the grammar in some other way (for example making the $+$ | |
| 270 | left-associative). Unfortunately already the problem of deciding whether | |
| 271 | a grammar is ambiguous or not is in general undecidable. But in simple | |
| 272 | instance (the ones we deal with in this module) one can usually see when | |
| 273 | a grammar is ambiguous. | |
| 274 | ||
| 275 | \subsection*{Removing Left-Recursion}
 | |
| 276 | ||
| 277 | Let us come back to the problem of left-recursion and consider the | |
| 278 | following grammar for binary numbers: | |
| 279 | ||
| 280 | \begin{plstx}[margin=1cm]
 | |
| 281 | : \meta{B} ::= \meta{B} \cdot \meta{B} | 0 | 1\\
 | |
| 282 | \end{plstx}
 | |
| 283 | ||
| 284 | \noindent | |
| 285 | It is clear that this grammar can create all binary numbers, but | |
| 286 | it is also clear that this grammar is left-recursive. Giving this | |
| 287 | grammar as is to parser combinators will result in an infinite | |
| 288 | loop. Fortunately, every left-recursive grammar can be translated | |
| 289 | into one that is not left-recursive with the help of some | |
| 290 | transformation rules. Suppose we identified the ``offensive'' | |
| 291 | rule, then we can separate the grammar into this offensive rule | |
| 292 | and the ``rest'': | |
| 293 | ||
| 294 | \begin{plstx}[margin=1cm]
 | |
| 295 |   : \meta{B} ::= \underbrace{\meta{B} \cdot \meta{B}}_{\textit{lft-rec}} 
 | |
| 296 |   | \underbrace{0 \;\;|\;\; 1}_{\textit{rest}}\\
 | |
| 297 | \end{plstx}
 | |
| 298 | ||
| 299 | \noindent | |
| 300 | To make the idea of the transformation clearer, suppose the left-recursive | |
| 301 | rule is of the form $\meta{B}\alpha$ (the left-recursive non-terminal 
 | |
| 302 | followed by something called $\alpha$) and the ``rest'' is called $\beta$. | |
| 303 | That means our grammar looks schematically as follows | |
| 304 | ||
| 305 | \begin{plstx}[margin=1cm]
 | |
| 306 |   : \meta{B} ::= \meta{B} \cdot \alpha | \beta\\
 | |
| 307 | \end{plstx}
 | |
| 308 | ||
| 309 | \noindent | |
| 310 | To get rid of the left-recursion, we are required to introduce | |
| 311 | a new non-terminal, say $\meta{B'}$ and transform the rule
 | |
| 312 | as follows: | |
| 313 | ||
| 314 | \begin{plstx}[margin=1cm]
 | |
| 315 |   : \meta{B} ::= \beta \cdot \meta{B'}\\
 | |
| 316 |   : \meta{B'} ::= \alpha \cdot \meta{B'} | \epsilon\\
 | |
| 317 | \end{plstx}
 | |
| 318 | ||
| 319 | \noindent | |
| 320 | In our example of binary numbers we would after the transformation | |
| 321 | end up with the rules | |
| 322 | ||
| 323 | \begin{plstx}[margin=1cm]
 | |
| 324 |   : \meta{B} ::= 0 \cdot \meta{B'} | 1 \cdot \meta{B'}\\
 | |
| 325 |   : \meta{B'} ::= \meta{B} \cdot \meta{B'} | \epsilon\\
 | |
| 326 | \end{plstx}
 | |
| 327 | ||
| 328 | \noindent | |
| 329 | A little thought should convince you that this grammar still derives | |
| 330 | all the binary numbers (for example 0 and 1 are derivable because $\meta{B'}$
 | |
| 331 | can be $\epsilon$). Less clear might be why this grammar is non-left recursive. | |
| 332 | For $\meta{B'}$ it is relatively clear because we will never be 
 | |
| 333 | able to derive things like | |
| 334 | ||
| 335 | \begin{center}
 | |
| 336 | $\meta{B'} \rightarrow\ldots\rightarrow \meta{B'}\cdot\ldots$
 | |
| 337 | \end{center}  
 | |
| 338 | ||
| 339 | \noindent | |
| 340 | because there will always be a $\meta{B}$ in front of a $\meta{B'}$, and
 | |
| 341 | $\meta{B}$ now has always a $0$ or $1$ in front, so a $\meta{B'}$ can
 | |
| 342 | never be in the first place. The reasoning is similar for $\meta{B}$:
 | |
| 343 | the $0$ and $1$ in the rule for $\meta{B}$ ``protect'' it from becoming
 | |
| 344 | left-recursive. This transformation does not mean the grammar is the | |
| 345 | simplest left-recursive grammar for binary numbers. For example the | |
| 346 | following grammar would do as well | |
| 347 | ||
| 348 | \begin{plstx}[margin=1cm]
 | |
| 349 |   : \meta{B} ::= 0 \cdot \meta{B} | 1 \cdot \meta{B} | 0 | 1\\
 | |
| 350 | \end{plstx}
 | |
| 351 | ||
| 352 | \noindent | |
| 353 | The point is that we can in principle transform every left-recursive | |
| 354 | grammar into one that is non-left-recursive one. This explains why often | |
| 355 | the following grammar is used for arithmetic expressions: | |
| 356 | ||
| 357 | \begin{plstx}[margin=1cm]
 | |
| 358 |   : \meta{E} ::= \meta{T} | \meta{T} \cdot + \cdot \meta{E} |  \meta{T} \cdot - \cdot \meta{E}\\
 | |
| 359 |   : \meta{T} ::= \meta{F} | \meta{F} \cdot * \cdot \meta{T}\\
 | |
| 360 |   : \meta{F} ::= num\_token | ( \cdot \meta{E} \cdot )\\
 | |
| 361 | \end{plstx}
 | |
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 362 | |
| 680 | 363 | \noindent | 
| 364 | In this grammar all $\meta{E}$xpressions, $\meta{T}$erms and $\meta{F}$actors
 | |
| 365 | are in some way protected from being left-recusive. For example if you | |
| 366 | start $\meta{E}$ you can derive another one by going through $\meta{T}$, then 
 | |
| 367 | $\meta{F}$, but then $\meta{E}$ is protected by the open-parenthesis.
 | |
| 368 | ||
| 369 | \subsection*{Removing $\epsilon$-Rules and CYK-Algorithm}
 | |
| 370 | ||
| 371 | I showed above that the non-left-recursive grammar for binary numbers is | |
| 372 | ||
| 373 | \begin{plstx}[margin=1cm]
 | |
| 374 |   : \meta{B} ::= 0 \cdot \meta{B'} | 1 \cdot \meta{B'}\\
 | |
| 375 |   : \meta{B'} ::= \meta{B} \cdot \meta{B'} | \epsilon\\
 | |
| 376 | \end{plstx}
 | |
| 377 | ||
| 378 | \noindent | |
| 379 | The transformation made the original grammar non-left-recursive, but at | |
| 380 | the expense of introducing an $\epsilon$ in the second rule. Having an | |
| 381 | explicit $\epsilon$-rule is annoying to, not in terms of looping, but in | |
| 382 | terms of efficiency. The reason is that the $\epsilon$-rule always | |
| 383 | applies but since it recognises the empty string, it does not make any | |
| 384 | progress with recognising a string. Better are rules like $( \cdot | |
| 385 | \meta{E} \cdot )$ where something of the input is consumed. Getting
 | |
| 386 | rid of $\epsilon$-rules is also important for the CYK parsing algorithm, | |
| 387 | which can give us an insight into the complexity class of parsing. | |
| 388 | ||
| 389 | It turns out we can also by some generic transformations eliminate | |
| 390 | $\epsilon$-rules from grammars. Consider again the grammar above for | |
| 391 | binary numbers where have a rule $\meta{B'} ::= \epsilon$. In this case
 | |
| 392 | we look for rules of the (generic) form \mbox{$\meta{A} :=
 | |
| 393 | \alpha\cdot\meta{B'}\cdot\beta$}. That is there are rules that use
 | |
| 394 | $\meta{B'}$ and something ($\alpha$) is in front of $\meta{B'}$ and
 | |
| 395 | something follows ($\beta$). Such rules need to be replaced by | |
| 396 | additional rules of the form \mbox{$\meta{A} := \alpha\cdot\beta$}.
 | |
| 397 | In our running example there are the two rules for $\meta{B}$ which
 | |
| 398 | fall into this category | |
| 399 | ||
| 400 | \begin{plstx}[margin=1cm]
 | |
| 401 |   : \meta{B} ::= 0 \cdot \meta{B'} | 1 \cdot \meta{B'}\\
 | |
| 402 | \end{plstx} 
 | |
| 403 | ||
| 404 | \noindent To follow the general scheme of the transfromation, | |
| 405 | the $\alpha$ is either is either $0$ or $1$, and the $\beta$ happens | |
| 406 | to be empty. SO we need to generate new rules for the form | |
| 407 | \mbox{$\meta{A} := \alpha\cdot\beta$}, which in our particular 
 | |
| 408 | example means we obtain | |
| 409 | ||
| 410 | \begin{plstx}[margin=1cm]
 | |
| 411 |   : \meta{B} ::= 0 \cdot \meta{B'} | 1 \cdot \meta{B'} | 0 | 1\\
 | |
| 412 | \end{plstx} 
 | |
| 413 | ||
| 414 | \noindent | |
| 415 | Unfortunately $\meta{B'}$ is also used in the rule 
 | |
| 416 | ||
| 417 | \begin{plstx}[margin=1cm]
 | |
| 418 |   : \meta{B'} ::= \meta{B} \cdot \meta{B'}\\
 | |
| 419 | \end{plstx}
 | |
| 420 | ||
| 421 | \noindent | |
| 422 | For this we repeat the transformation, giving | |
| 423 | ||
| 424 | \begin{plstx}[margin=1cm]
 | |
| 425 |   : \meta{B'} ::= \meta{B} \cdot \meta{B'} | \meta{B}\\
 | |
| 426 | \end{plstx}
 | |
| 427 | ||
| 428 | \noindent | |
| 429 | In this case $\alpha$ was substituted with $\meta{B}$ and $\beta$
 | |
| 430 | was again empty. Once no rule is left over, we can simply throw | |
| 431 | away the $\epsilon$ rule. This gives the grammar | |
| 432 | ||
| 433 | \begin{plstx}[margin=1cm]
 | |
| 434 |   : \meta{B} ::= 0 \cdot \meta{B'} | 1 \cdot \meta{B'} | 0 | 1\\
 | |
| 435 |   : \meta{B'} ::= \meta{B} \cdot \meta{B'} | \meta{B}\\
 | |
| 436 | \end{plstx}
 | |
| 437 | ||
| 438 | \noindent | |
| 439 | I let you think about whether this grammar can still recognise all | |
| 440 | binary numbers and whether this grammar is non-left-recursive. The | |
| 441 | precise statement for the transformation of removing $\epsilon$-rules is | |
| 442 | that if the original grammar was able to recognise only non-empty | |
| 443 | strings, then the transformed grammar will be equivalent (matching the | |
| 444 | same set of strings); if the original grammar was able to match the | |
| 445 | empty string, then the transformed grammar will be able to match the | |
| 446 | same strings, \emph{except} the empty string. So the  $\epsilon$-removal
 | |
| 447 | does not preserve equivalence of grammars, but the small defect with the | |
| 448 | empty string is not important for practical purposes. | |
| 449 | ||
| 450 | So why are these transformations all useful? Well apart from making the | |
| 451 | parser combinators work (remember they cannot deal with left-recursion and | |
| 452 | are inefficient with $\epsilon$-rules), a second reason is that they help | |
| 453 | with getting any insight into the complexity of the parsing problem. | |
| 454 | The parser combinators are very easy to implement, but are far from the | |
| 455 | most efficient way of processing input (they can blow up exponentially | |
| 456 | with ambiguous grammars). The question remains what is the best possible | |
| 457 | complexity for parsing? It turns out that this is $O(n^3)$ for context-free | |
| 458 | languages. | |
| 459 | ||
| 460 | To answer the question about complexity, let me describe next the CYK | |
| 461 | algorithm (named after the authors Cocke–Younger–Kasami). This algorithm | |
| 681 | 462 | works with grammars that are in \emph{Chomsky normalform}. In Chomsky
 | 
| 463 | normalform all rules must be of the form $\meta{A} ::= a$, where $a$ is 
 | |
| 464 | a terminal, or $\meta{A} ::= \meta{B}\cdot \meta{C}$, where $\meta{B}$ and
 | |
| 465 | $\meta{B}$ need to be non-terminals. And no rule can contain $\epsilon$.
 | |
| 466 | The following grammar is in Chomsky normalform: | |
| 467 | ||
| 468 | \begin{plstx}[margin=1cm]
 | |
| 682 | 469 |   : \meta{S} ::= \meta{N}\cdot \meta{P}\\
 | 
| 470 |   : \meta{P} ::= \meta{V}\cdot \meta{N}\\
 | |
| 471 |   : \meta{N} ::= \meta{N}\cdot \meta{N}\\
 | |
| 472 |   : \meta{N} ::= \meta{A}\cdot \meta{N}\\
 | |
| 473 |   : \meta{N} ::= \texttt{student} | \texttt{trainer} | \texttt{team} 
 | |
| 474 |                 | \texttt{trains}\\
 | |
| 475 |   : \meta{V} ::= \texttt{trains} | \texttt{team}\\
 | |
| 476 |   : \meta{A} ::= \texttt{The} | \texttt{the}\\
 | |
| 681 | 477 | \end{plstx}
 | 
| 478 | ||
| 479 | \noindent | |
| 480 | where $\meta{S}$ is the start symbol and $\meta{S}$, $\meta{P}$,
 | |
| 481 | $\meta{N}$, $\meta{V}$ and $\meta{A}$ are non-terminals. The ``words''
 | |
| 482 | are terminals. The rough idea behind this grammar is that $\meta{S}$
 | |
| 483 | stands for a sentence, $\meta{P}$ is a predicate, $\meta{N}$ is a noun
 | |
| 484 | and so on. For example the rule \mbox{$\meta{P} ::= \meta{V}\cdot
 | |
| 485 | \meta{N}$} states that a predicate can be a verb followed by a noun.
 | |
| 486 | Now the question is whether the string | |
| 487 | ||
| 488 | \begin{center}
 | |
| 489 |   \texttt{The trainer trains the student team}
 | |
| 490 | \end{center}
 | |
| 491 | ||
| 492 | \noindent | |
| 493 | is recognised by the grammar. The CYK algorithm starts with the | |
| 494 | following triangular data structure. | |
| 680 | 495 | |
| 682 | 496 | \begin{figure}[t]
 | 
| 497 | \begin{center}
 | |
| 498 |   \begin{tikzpicture}[scale=0.8,line width=0.8mm]
 | |
| 499 | \draw (-2,0) -- (4,0); | |
| 500 | \draw (-2,1) -- (4,1); | |
| 501 | \draw (-2,2) -- (3,2); | |
| 502 | \draw (-2,3) -- (2,3); | |
| 503 | \draw (-2,4) -- (1,4); | |
| 504 | \draw (-2,5) -- (0,5); | |
| 505 | \draw (-2,6) -- (-1,6); | |
| 506 | ||
| 507 | \draw (0,0) -- (0, 5); | |
| 508 | \draw (1,0) -- (1, 4); | |
| 509 | \draw (2,0) -- (2, 3); | |
| 510 | \draw (3,0) -- (3, 2); | |
| 511 | \draw (4,0) -- (4, 1); | |
| 512 | \draw (-1,0) -- (-1, 6); | |
| 513 | \draw (-2,0) -- (-2, 6); | |
| 514 | ||
| 515 |   \draw (-1.5,-0.5) node {\footnotesize{}\texttt{The}}; 
 | |
| 516 |   \draw (-0.5,-1.0) node {\footnotesize{}\texttt{trainer}}; 
 | |
| 517 |   \draw ( 0.5,-0.5) node {\footnotesize{}\texttt{trains}}; 
 | |
| 518 |   \draw ( 1.5,-1.0) node {\footnotesize{}\texttt{the}}; 
 | |
| 519 |   \draw ( 2.5,-0.5) node {\footnotesize{}\texttt{student}}; 
 | |
| 520 |   \draw ( 3.5,-1.0) node {\footnotesize{}\texttt{team}};
 | |
| 521 | ||
| 522 |   \draw (-1.5,0.5) node {$A$}; 
 | |
| 523 |   \draw (-0.5,0.5) node {$N$}; 
 | |
| 524 |   \draw ( 0.5,0.5) node {$N,V$}; 
 | |
| 525 |   \draw ( 1.5,0.5) node {$A$}; 
 | |
| 526 |   \draw ( 2.5,0.5) node {$N$}; 
 | |
| 527 |   \draw ( 3.5,0.5) node {$N,V$};
 | |
| 528 | ||
| 529 |   \draw (-2.4, 5.5) node {$1$}; 
 | |
| 530 |   \draw (-2.4, 4.5) node {$2$}; 
 | |
| 531 |   \draw (-2.4, 3.5) node {$3$}; 
 | |
| 532 |   \draw (-2.4, 2.5) node {$4$}; 
 | |
| 533 |   \draw (-2.4, 1.5) node {$5$}; 
 | |
| 534 |   \draw (-2.4, 0.5) node {$6$}; 
 | |
| 535 |   \end{tikzpicture}
 | |
| 536 |   \end{center}
 | |
| 537 | \end{figure}
 | |
| 680 | 538 | |
| 539 | \end{document}
 | |
| 540 | ||
| 541 | ||
| 542 | %%% Parser combinators are now part of handout 6 | |
| 459 | 543 | |
| 544 | \subsection*{Parser Combinators}
 | |
| 545 | ||
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 546 | Let us now turn to the problem of generating a parse-tree for | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 547 | a grammar and string. In what follows we explain \emph{parser
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 548 | combinators}, because they are easy to implement and closely | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 549 | resemble grammar rules. Imagine that a grammar describes the | 
| 665 | 550 | strings of natural numbers, such as the grammar \meta{N}  shown
 | 
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 551 | above. For all such strings we want to generate the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 552 | parse-trees or later on we actually want to extract the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 553 | meaning of these strings, that is the concrete integers | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 554 | ``behind'' these strings. In Scala the parser combinators will | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 555 | be functions of type | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 556 | |
| 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 557 | \begin{center}
 | 
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 558 | \texttt{I $\Rightarrow$ Set[(T, I)]}
 | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 559 | \end{center}
 | 
| 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 560 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 561 | \noindent that is they take as input something of type | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 562 | \texttt{I}, typically a list of tokens or a string, and return
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 563 | a set of pairs. The first component of these pairs corresponds | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 564 | to what the parser combinator was able to process from the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 565 | input and the second is the unprocessed part of the input. As | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 566 | we shall see shortly, a parser combinator might return more | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 567 | than one such pair, with the idea that there are potentially | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 568 | several ways how to interpret the input. As a concrete | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 569 | example, consider the case where the input is of type string, | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 570 | say the string | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 571 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 572 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 573 | \tt\Grid{iffoo\VS testbar}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 574 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 575 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 576 | \noindent We might have a parser combinator which tries to | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 577 | interpret this string as a keyword (\texttt{if}) or an
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 578 | identifier (\texttt{iffoo}). Then the output will be the set
 | 
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 579 | |
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 580 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 581 | $\left\{ \left(\texttt{\Grid{if}}\,,\, \texttt{\Grid{foo\VS testbar}}\right), 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 582 |            \left(\texttt{\Grid{iffoo}}\,,\, \texttt{\Grid{\VS testbar}}\right) \right\}$
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 583 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 584 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 585 | \noindent where the first pair means the parser could | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 586 | recognise \texttt{if} from the input and leaves the rest as
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 587 | `unprocessed' as the second component of the pair; in the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 588 | other case it could recognise \texttt{iffoo} and leaves
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 589 | \texttt{\VS testbar} as unprocessed. If the parser cannot
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 590 | recognise anything from the input then parser combinators just | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 591 | return the empty set $\{\}$. This will indicate
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 592 | something ``went wrong''. | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 593 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 594 | The main attraction is that we can easily build parser combinators out of smaller components | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 595 | following very closely the structure of a grammar. In order to implement this in an object | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 596 | oriented programming language, like Scala, we need to specify an abstract class for parser | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 597 | combinators. This abstract class requires the implementation of the function | 
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 598 | \texttt{parse} taking an argument of type \texttt{I} and returns a set of type  
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 599 | \mbox{\texttt{Set[(T, I)]}}.
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 600 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 601 | \begin{center}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 602 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 603 | abstract class Parser[I, T] {
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 604 | def parse(ts: I): Set[(T, I)] | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 605 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 606 | def parse_all(ts: I): Set[T] = | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 607 | for ((head, tail) <- parse(ts); if (tail.isEmpty)) | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 608 | yield head | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 609 | } | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 610 | \end{lstlisting}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 611 | \end{center}
 | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 612 | |
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 613 | \noindent | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 614 | From the function \texttt{parse} we can then ``centrally'' derive the function \texttt{parse\_all},
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 615 | which just filters out all pairs whose second component is not empty (that is has still some | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 616 | unprocessed part). The reason is that at the end of parsing we are only interested in the | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 617 | results where all the input has been consumed and no unprocessed part is left. | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 618 | |
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 619 | One of the simplest parser combinators recognises just a character, say $c$, | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 620 | from the beginning of strings. Its behaviour is as follows: | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 621 | |
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 622 | \begin{itemize}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 623 | \item if the head of the input string starts with a $c$, it returns | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 624 | 	the set $\{(c, \textit{tail of}\; s)\}$
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 625 | \item otherwise it returns the empty set $\varnothing$ | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 626 | \end{itemize}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 627 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 628 | \noindent | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 629 | The input type of this simple parser combinator for characters is | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 630 | \texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. 
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 631 | The code in Scala is as follows: | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 632 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 633 | \begin{center}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 634 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 635 | case class CharParser(c: Char) extends Parser[String, Char] {
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 636 | def parse(sb: String) = | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 637 | if (sb.head == c) Set((c, sb.tail)) else Set() | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 638 | } | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 639 | \end{lstlisting}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 640 | \end{center}
 | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 641 | |
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 642 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 643 | The \texttt{parse} function tests whether the first character of the 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 644 | input string \texttt{sb} is equal to \texttt{c}. If yes, then it splits the
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 645 | string into the recognised part \texttt{c} and the unprocessed part
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 646 | \texttt{sb.tail}. In case \texttt{sb} does not start with \texttt{c} then
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 647 | the parser returns the empty set (in Scala \texttt{Set()}).
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 648 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 649 | More interesting are the parser combinators that build larger parsers | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 650 | out of smaller component parsers. For example the alternative | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 651 | parser combinator is as follows. | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 652 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 653 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 654 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 655 | class AltParser[I, T] | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 656 | (p: => Parser[I, T], | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 657 |         q: => Parser[I, T]) extends Parser[I, T] {
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 658 | def parse(sb: I) = p.parse(sb) ++ q.parse(sb) | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 659 | } | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 660 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 661 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 662 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 663 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 664 | The types of this parser combinator are polymorphic (we just have \texttt{I}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 665 | for the input type, and \texttt{T} for the output type). The alternative parser
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 666 | builds a new parser out of two existing parser combinator \texttt{p} and \texttt{q}.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 667 | Both need to be able to process input of type \texttt{I} and return the same
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 668 | output type \texttt{Set[(T, I)]}. (There is an interesting detail of Scala, namely the 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 669 | \texttt{=>} in front of the types of \texttt{p} and \texttt{q}. They will prevent the
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 670 | evaluation of the arguments before they are used. This is often called | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 671 | \emph{lazy evaluation} of the arguments.) The alternative parser should run
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 672 | the input with the first parser \texttt{p} (producing a set of outputs) and then
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 673 | run the same input with \texttt{q}. The result should be then just the union
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 674 | of both sets, which is the operation \texttt{++} in Scala.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 675 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 676 | This parser combinator already allows us to construct a parser that either | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 677 | a character \texttt{a} or \texttt{b}, as
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 678 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 679 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 680 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 681 | new AltParser(CharParser('a'), CharParser('b'))
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 682 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 683 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 684 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 685 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 686 | Scala allows us to introduce some more readable shorthand notation for this, like \texttt{'a' || 'b'}. 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 687 | We can call this parser combinator with the strings | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 688 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 689 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 690 | \begin{tabular}{rcl}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 691 | input string & & output\medskip\\ | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 692 | \texttt{\Grid{ac}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 693 | \texttt{\Grid{bc}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 694 | \texttt{\Grid{cc}} & $\rightarrow$ & $\varnothing$
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 695 | \end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 696 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 697 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 698 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 699 | We receive in the first two cases a successful output (that is a non-empty set). | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 700 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 701 | A bit more interesting is the \emph{sequence parser combinator} implemented in
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 702 | Scala as follows: | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 703 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 704 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 705 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 706 | class SeqParser[I, T, S] | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 707 | (p: => Parser[I, T], | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 708 |         q: => Parser[I, S]) extends Parser[I, (T, S)] {
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 709 | def parse(sb: I) = | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 710 | for ((head1, tail1) <- p.parse(sb); | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 711 | (head2, tail2) <- q.parse(tail1)) | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 712 | yield ((head1, head2), tail2) | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 713 | } | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 714 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 715 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 716 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 717 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 718 | This parser takes as input two parsers, \texttt{p} and \texttt{q}. It implements \texttt{parse} 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 719 | as follows: let first run the parser \texttt{p} on the input producing a set of pairs (\texttt{head1}, \texttt{tail1}).
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 720 | The \texttt{tail1} stands for the unprocessed parts left over by \texttt{p}. 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 721 | Let \texttt{q} run on these unprocessed parts
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 722 | producing again a set of pairs. The output of the sequence parser combinator is then a set | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 723 | containing pairs where the first components are again pairs, namely what the first parser could parse | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 724 | together with what the second parser could parse; the second component is the unprocessed | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 725 | part left over after running the second parser \texttt{q}. Therefore the input type of
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 726 | the sequence parser combinator is as usual \texttt{I}, but the output type is
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 727 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 728 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 729 | \texttt{Set[((T, S), I)]}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 730 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 731 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 732 | Scala allows us to provide some | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 733 | shorthand notation for the sequence parser combinator. So we can write for | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 734 | example \texttt{'a'  $\sim$ 'b'}, which is the
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 735 | parser combinator that first consumes the character \texttt{a} from a string and then \texttt{b}.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 736 | Calling this parser combinator with the strings | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 737 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 738 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 739 | \begin{tabular}{rcl}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 740 | input string & & output\medskip\\ | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 741 | \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 742 | \texttt{\Grid{bac}} & $\rightarrow$ & $\varnothing$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 743 | \texttt{\Grid{ccc}} & $\rightarrow$ & $\varnothing$
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 744 | \end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 745 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 746 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 747 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 748 | A slightly more complicated parser is \texttt{('a'  || 'b') $\sim$ 'b'} which parses as first character either
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 749 | an \texttt{a} or \texttt{b} followed by a \texttt{b}. This parser produces the following results.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 750 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 751 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 752 | \begin{tabular}{rcl}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 753 | input string & & output\medskip\\ | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 754 | \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 755 | \texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 756 | \texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 757 | \end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 758 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 759 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 760 | Note carefully that constructing the parser \texttt{'a' || ('a' $\sim$ 'b')} will result in a tying error.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 761 | The first parser has as output type a single character (recall the type of \texttt{CharParser}),
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 762 | but the second parser produces a pair of characters as output. The alternative parser is however | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 763 | required to have both component parsers to have the same type. We will see later how we can | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 764 | build this parser without the typing error. | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 765 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 766 | The next parser combinator does not actually combine smaller parsers, but applies | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 767 | a function to the result of the parser. It is implemented in Scala as follows | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 768 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 769 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 770 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 771 | class FunParser[I, T, S] | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 772 | (p: => Parser[I, T], | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 773 |           f: T => S) extends Parser[I, S] {
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 774 | def parse(sb: I) = | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 775 | for ((head, tail) <- p.parse(sb)) yield (f(head), tail) | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 776 | } | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 777 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 778 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 779 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 780 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 781 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 782 | This parser combinator takes a parser \texttt{p} with output type \texttt{T} as
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 783 | input as well as a function \texttt{f} with type \texttt{T => S}. The parser \texttt{p}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 784 | produces sets of type \texttt{(T, I)}. The \texttt{FunParser} combinator then
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 785 | applies the function \texttt{f} to all the parer outputs. Since this function
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 786 | is of type \texttt{T => S}, we obtain a parser with output type \texttt{S}.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 787 | Again Scala lets us introduce some shorthand notation for this parser combinator. | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 788 | Therefore we will write \texttt{p ==> f} for it.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 789 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 790 | %\bigskip | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 791 | %takes advantage of the full generality---have a look | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 792 | %what it produces if we call it with the string \texttt{abc}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 793 | % | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 794 | %\begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 795 | %\begin{tabular}{rcl}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 796 | %input string & & output\medskip\\ | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 797 | %\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 798 | %\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 799 | %\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 800 | %\end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 801 | %\end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 802 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 803 | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 804 | |
| 680 | 805 | |
| 806 | ||
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 807 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 808 | %%% Local Variables: | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 809 | %%% mode: latex | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 810 | %%% TeX-master: t | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 811 | %%% End: | 
| 680 | 812 |