| author | cu | 
| Tue, 10 Oct 2017 18:53:30 +0100 | |
| changeset 516 | e4ed90b9948d | 
| parent 459 | 858f62bc930a | 
| child 545 | 56e054d84be2 | 
| permissions | -rw-r--r-- | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1 | \documentclass{article}
 | 
| 297 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 2 | \usepackage{../style}
 | 
| 217 
cd6066f1056a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
183diff
changeset | 3 | \usepackage{../langs}
 | 
| 459 | 4 | \usepackage{../grammar}
 | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 5 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 6 | \begin{document}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 7 | |
| 385 
7f8516ff408d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
362diff
changeset | 8 | \section*{Handout 5 (Grammars \& Parser)}
 | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 9 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 10 | While regular expressions are very useful for lexing and for | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 11 | recognising many patterns in strings (like email addresses), | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 12 | they have their limitations. For example there is no regular | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 13 | expression that can recognise the language $a^nb^n$. Another | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 14 | example for which there exists no regular expression is the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 15 | language of well-parenthesised expressions. In languages like | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 16 | Lisp, which use parentheses rather extensively, it might be of | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 17 | interest whether the following two expressions are | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 18 | well-parenthesised (the left one is, the right one is not): | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 19 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 20 | \begin{center}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 21 | $(((()()))())$  \hspace{10mm} $(((()()))()))$
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 22 | \end{center}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 23 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 24 | \noindent Not being able to solve such recognition problems is | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 25 | a serious limitation. In order to solve such recognition | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 26 | problems, we need more powerful techniques than regular | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 27 | expressions. We will in particular look at \emph{context-free
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 28 | languages}. They include the regular languages as the picture | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 29 | below shows: | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 30 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 31 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 32 | \begin{center}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 33 | \begin{tikzpicture}
 | 
| 297 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 34 | [rect/.style={draw=black!50, 
 | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 35 | top color=white,bottom color=black!20, | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 36 | rectangle, very thick, rounded corners}] | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 37 | |
| 297 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 38 | \draw (0,0) node [rect, text depth=30mm, text width=46mm] {\small all languages};
 | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 39 | \draw (0,-0.4) node [rect, text depth=20mm, text width=44mm] {\small decidable languages};
 | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 40 | \draw (0,-0.65) node [rect, text depth=13mm] {\small context sensitive languages};
 | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 41 | \draw (0,-0.84) node [rect, text depth=7mm, text width=35mm] {\small context-free languages};
 | 
| 
5c51839c88fd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 42 | \draw (0,-1.05) node [rect] {\small regular languages};
 | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 43 | \end{tikzpicture}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 44 | \end{center}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 45 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 46 | \noindent Context-free languages play an important role in | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 47 | `day-to-day' text processing and in programming languages. | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 48 | Context-free languages are usually specified by grammars. For | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 49 | example a grammar for well-parenthesised expressions is | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 50 | |
| 459 | 51 | \begin{plstx}[margin=3cm]
 | 
| 52 | : \meta{P} ::=  ( \cdot  \meta{P} \cdot ) \cdot \meta{P}
 | |
| 53 | | \epsilon\\ | |
| 54 | \end{plstx}
 | |
| 55 | ||
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 56 | \noindent | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 57 | or a grammar for recognising strings consisting of ones is | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 58 | |
| 459 | 59 | \begin{plstx}[margin=3cm]
 | 
| 60 | : \meta{O} ::= 1 \cdot  \meta{O} 
 | |
| 61 | | 1\\ | |
| 62 | \end{plstx}
 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 63 | |
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 64 | In general grammars consist of finitely many rules built up | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 65 | from \emph{terminal symbols} (usually lower-case letters) and
 | 
| 459 | 66 | \emph{non-terminal symbols} (upper-case letters inside \meta{\mbox{}}). Rules have
 | 
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 67 | the shape | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 68 | |
| 459 | 69 | \begin{plstx}[margin=3cm]
 | 
| 70 | : \meta{NT} ::= rhs\\
 | |
| 71 | \end{plstx}
 | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 72 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 73 | \noindent where on the left-hand side is a single non-terminal | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 74 | and on the right a string consisting of both terminals and | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 75 | non-terminals including the $\epsilon$-symbol for indicating | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 76 | the empty string. We use the convention to separate components | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 77 | on the right hand-side by using the $\cdot$ symbol, as in the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 78 | grammar for well-parenthesised expressions. We also use the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 79 | convention to use $|$ as a shorthand notation for several | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 80 | rules. For example | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 81 | |
| 459 | 82 | \begin{plstx}[margin=3cm]
 | 
| 83 | : \meta{NT} ::= rhs_1
 | |
| 84 | | rhs_2\\ | |
| 85 | \end{plstx}
 | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 86 | |
| 459 | 87 | \noindent means that the non-terminal \meta{NT} can be replaced by
 | 
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 88 | either $\textit{rhs}_1$ or $\textit{rhs}_2$. If there are more
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 89 | than one non-terminal on the left-hand side of the rules, then | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 90 | we need to indicate what is the \emph{starting} symbol of the
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 91 | grammar. For example the grammar for arithmetic expressions | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 92 | can be given as follows | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 93 | |
| 459 | 94 | \begin{plstx}[margin=3cm,one per line]
 | 
| 95 | \mbox{\rm (1)}: \meta{E} ::= \meta{N}\\
 | |
| 96 | \mbox{\rm (2)}: \meta{E} ::= \meta{E} \cdot + \cdot \meta{E}\\
 | |
| 97 | \mbox{\rm (3)}: \meta{E} ::= \meta{E} \cdot - \cdot \meta{E}\\
 | |
| 98 | \mbox{\rm (4)}: \meta{E} ::= \meta{E} \cdot * \cdot \meta{E}\\
 | |
| 99 | \mbox{\rm (5)}: \meta{E} ::= ( \cdot \meta{E} \cdot )\\
 | |
| 100 | \mbox{\rm (6\ldots)}: \meta{N} ::= \meta{N} \cdot \meta{N} 
 | |
| 101 | \mid 0 \mid 1 \mid \ldots \mid 9\\ | |
| 102 | \end{plstx}
 | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 103 | |
| 459 | 104 | \noindent where \meta{E} is the starting symbol. A
 | 
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 105 | \emph{derivation} for a grammar starts with the starting
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 106 | symbol of the grammar and in each step replaces one | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 107 | non-terminal by a right-hand side of a rule. A derivation ends | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 108 | with a string in which only terminal symbols are left. For | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 109 | example a derivation for the string $(1 + 2) + 3$ is as | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 110 | follows: | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 111 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 112 | \begin{center}
 | 
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 113 | \begin{tabular}{lll@{\hspace{2cm}}l}
 | 
| 459 | 114 | \meta{E} & $\rightarrow$ & $\meta{E}+\meta{E}$          & by (2)\\
 | 
| 115 |        & $\rightarrow$ & $(\meta{E})+\meta{E}$     & by (5)\\
 | |
| 116 |        & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{E}$   & by (2)\\
 | |
| 117 |        & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{N}$   & by (1)\\
 | |
| 118 |        & $\rightarrow$ & $(\meta{E}+\meta{E})+3$   & by (6\dots)\\
 | |
| 119 |        & $\rightarrow$ & $(\meta{N}+\meta{E})+3$   & by (1)\\
 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 120 | & $\rightarrow^+$ & $(1+2)+3$ & by (1, 6\ldots)\\ | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 121 | \end{tabular} 
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 122 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 123 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 124 | \noindent where on the right it is indicated which | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 125 | grammar rule has been applied. In the last step we | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 126 | merged several steps into one. | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 127 | |
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 128 | The \emph{language} of a context-free grammar $G$
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 129 | with start symbol $S$ is defined as the set of strings | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 130 | derivable by a derivation, that is | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 131 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 132 | \begin{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 133 | $\{c_1\ldots c_n \;|\; S \rightarrow^* c_1\ldots c_n \;\;\text{with all} \; c_i \;\text{being non-terminals}\}$
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 134 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 135 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 136 | \noindent | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 137 | A \emph{parse-tree} encodes how a string is derived with the starting symbol on 
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 138 | top and each non-terminal containing a subtree for how it is replaced in a derivation. | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 139 | The parse tree for the string $(1 + 23)+4$ is as follows: | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 140 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 141 | \begin{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 142 | \begin{tikzpicture}[level distance=8mm, black]
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 143 |   \node {$E$}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 144 |     child {node {$E$} 
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 145 |        child {node {$($}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 146 |        child {node {$E$}       
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 147 |          child {node {$E$} child {node {$N$} child {node {$1$}}}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 148 |          child {node {$+$}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 149 |          child {node {$E$} 
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 150 |             child {node {$N$} child {node {$2$}}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 151 |             child {node {$N$} child {node {$3$}}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 152 | } | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 153 | } | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 154 |        child {node {$)$}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 155 | } | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 156 |      child {node {$+$}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 157 |      child {node {$E$}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 158 |         child {node {$N$} child {node {$4$}}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 159 | }; | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 160 | \end{tikzpicture}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 161 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 162 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 163 | \noindent We are often interested in these parse-trees since | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 164 | they encode the structure of how a string is derived by a | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 165 | grammar. Before we come to the problem of constructing such | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 166 | parse-trees, we need to consider the following two properties | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 167 | of grammars. A grammar is \emph{left-recursive} if there is a
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 168 | derivation starting from a non-terminal, say $NT$ which leads | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 169 | to a string which again starts with $NT$. This means a | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 170 | derivation of the form. | 
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 171 | |
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 172 | \begin{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 173 | $NT \rightarrow \ldots \rightarrow NT \cdot \ldots$ | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 174 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 175 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 176 | \noindent It can be easily seen that the grammar above for | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 177 | arithmetic expressions is left-recursive: for example the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 178 | rules $E \rightarrow E\cdot + \cdot E$ and $N \rightarrow | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 179 | N\cdot N$ show that this grammar is left-recursive. But note | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 180 | that left-recursiveness can involve more than one step in the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 181 | derivation. The problem with left-recursive grammars is that | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 182 | some algorithms cannot cope with them: they fall into a loop. | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 183 | Fortunately every left-recursive grammar can be transformed | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 184 | into one that is not left-recursive, although this | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 185 | transformation might make the grammar less ``human-readable''. | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 186 | For example if we want to give a non-left-recursive grammar | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 187 | for numbers we might specify | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 188 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 189 | \begin{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 190 | $N \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;1\cdot N\;|\;2\cdot N\;|\;\ldots\;|\;9\cdot N$ | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 191 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 192 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 193 | \noindent Using this grammar we can still derive every number | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 194 | string, but we will never be able to derive a string of the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 195 | form $N \to \ldots \to N \cdot \ldots$. | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 196 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 197 | The other property we have to watch out for is when a grammar | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 198 | is \emph{ambiguous}. A grammar is said to be ambiguous if
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 199 | there are two parse-trees for one string. Again the grammar | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 200 | for arithmetic expressions shown above is ambiguous. While the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 201 | shown parse tree for the string $(1 + 23) + 4$ is unique, this | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 202 | is not the case in general. For example there are two parse | 
| 175 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 203 | trees for the string $1 + 2 + 3$, namely | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 204 | |
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 205 | \begin{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 206 | \begin{tabular}{c@{\hspace{10mm}}c}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 207 | \begin{tikzpicture}[level distance=8mm, black]
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 208 |   \node {$E$}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 209 |     child {node {$E$} child {node {$N$} child {node {$1$}}}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 210 |     child {node {$+$}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 211 |     child {node {$E$}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 212 |        child {node {$E$} child {node {$N$} child {node {$2$}}}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 213 |        child {node {$+$}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 214 |        child {node {$E$} child {node {$N$} child {node {$3$}}}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 215 | } | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 216 | ; | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 217 | \end{tikzpicture} 
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 218 | & | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 219 | \begin{tikzpicture}[level distance=8mm, black]
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 220 |   \node {$E$}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 221 |     child {node {$E$}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 222 |        child {node {$E$} child {node {$N$} child {node {$1$}}}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 223 |        child {node {$+$}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 224 |        child {node {$E$} child {node {$N$} child {node {$2$}}}} 
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 225 | } | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 226 |     child {node {$+$}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 227 |     child {node {$E$} child {node {$N$} child {node {$3$}}}}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 228 | ; | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 229 | \end{tikzpicture}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 230 | \end{tabular} 
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 231 | \end{center}
 | 
| 
5801e8c0e528
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
173diff
changeset | 232 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 233 | \noindent In particular in programming languages we will try | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 234 | to avoid ambiguous grammars because two different parse-trees | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 235 | for a string mean a program can be interpreted in two | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 236 | different ways. In such cases we have to somehow make sure the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 237 | two different ways do not matter, or disambiguate the grammar | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 238 | in some other way (for example making the $+$ | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 239 | left-associative). Unfortunately already the problem of | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 240 | deciding whether a grammar is ambiguous or not is in general | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 241 | undecidable. But in simple instance (the ones we deal in this | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 242 | module) one can usually see when a grammar is ambiguous. | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 243 | |
| 459 | 244 | |
| 245 | \subsection*{Parser Combinators}
 | |
| 246 | ||
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 247 | Let us now turn to the problem of generating a parse-tree for | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 248 | a grammar and string. In what follows we explain \emph{parser
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 249 | combinators}, because they are easy to implement and closely | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 250 | resemble grammar rules. Imagine that a grammar describes the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 251 | strings of natural numbers, such as the grammar $N$ shown | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 252 | above. For all such strings we want to generate the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 253 | parse-trees or later on we actually want to extract the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 254 | meaning of these strings, that is the concrete integers | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 255 | ``behind'' these strings. In Scala the parser combinators will | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 256 | be functions of type | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 257 | |
| 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 258 | \begin{center}
 | 
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 259 | \texttt{I $\Rightarrow$ Set[(T, I)]}
 | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 260 | \end{center}
 | 
| 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 261 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 262 | \noindent that is they take as input something of type | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 263 | \texttt{I}, typically a list of tokens or a string, and return
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 264 | a set of pairs. The first component of these pairs corresponds | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 265 | to what the parser combinator was able to process from the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 266 | input and the second is the unprocessed part of the input. As | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 267 | we shall see shortly, a parser combinator might return more | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 268 | than one such pair, with the idea that there are potentially | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 269 | several ways how to interpret the input. As a concrete | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 270 | example, consider the case where the input is of type string, | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 271 | say the string | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 272 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 273 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 274 | \tt\Grid{iffoo\VS testbar}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 275 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 276 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 277 | \noindent We might have a parser combinator which tries to | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 278 | interpret this string as a keyword (\texttt{if}) or an
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 279 | identifier (\texttt{iffoo}). Then the output will be the set
 | 
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 280 | |
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 281 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 282 | $\left\{ \left(\texttt{\Grid{if}}\,,\, \texttt{\Grid{foo\VS testbar}}\right), 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 283 |            \left(\texttt{\Grid{iffoo}}\,,\, \texttt{\Grid{\VS testbar}}\right) \right\}$
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 284 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 285 | |
| 362 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 286 | \noindent where the first pair means the parser could | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 287 | recognise \texttt{if} from the input and leaves the rest as
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 288 | `unprocessed' as the second component of the pair; in the | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 289 | other case it could recognise \texttt{iffoo} and leaves
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 290 | \texttt{\VS testbar} as unprocessed. If the parser cannot
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 291 | recognise anything from the input then parser combinators just | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 292 | return the empty set $\{\}$. This will indicate
 | 
| 
57ea439feaff
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
360diff
changeset | 293 | something ``went wrong''. | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 294 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 295 | The main attraction is that we can easily build parser combinators out of smaller components | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 296 | following very closely the structure of a grammar. In order to implement this in an object | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 297 | oriented programming language, like Scala, we need to specify an abstract class for parser | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 298 | combinators. This abstract class requires the implementation of the function | 
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 299 | \texttt{parse} taking an argument of type \texttt{I} and returns a set of type  
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 300 | \mbox{\texttt{Set[(T, I)]}}.
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 301 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 302 | \begin{center}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 303 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 304 | abstract class Parser[I, T] {
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 305 | def parse(ts: I): Set[(T, I)] | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 306 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 307 | def parse_all(ts: I): Set[T] = | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 308 | for ((head, tail) <- parse(ts); if (tail.isEmpty)) | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 309 | yield head | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 310 | } | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 311 | \end{lstlisting}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 312 | \end{center}
 | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 313 | |
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 314 | \noindent | 
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 315 | From the function \texttt{parse} we can then ``centrally'' derive the function \texttt{parse\_all},
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 316 | which just filters out all pairs whose second component is not empty (that is has still some | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 317 | unprocessed part). The reason is that at the end of parsing we are only interested in the | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 318 | results where all the input has been consumed and no unprocessed part is left. | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 319 | |
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 320 | One of the simplest parser combinators recognises just a character, say $c$, | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 321 | from the beginning of strings. Its behaviour is as follows: | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 322 | |
| 177 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 323 | \begin{itemize}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 324 | \item if the head of the input string starts with a $c$, it returns | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 325 | 	the set $\{(c, \textit{tail of}\; s)\}$
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 326 | \item otherwise it returns the empty set $\varnothing$ | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 327 | \end{itemize}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 328 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 329 | \noindent | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 330 | The input type of this simple parser combinator for characters is | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 331 | \texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. 
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 332 | The code in Scala is as follows: | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 333 | |
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 334 | \begin{center}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 335 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 336 | case class CharParser(c: Char) extends Parser[String, Char] {
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 337 | def parse(sb: String) = | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 338 | if (sb.head == c) Set((c, sb.tail)) else Set() | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 339 | } | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 340 | \end{lstlisting}
 | 
| 
53def1fbf472
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
176diff
changeset | 341 | \end{center}
 | 
| 176 
3c2653fc8b5a
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
175diff
changeset | 342 | |
| 183 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 343 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 344 | The \texttt{parse} function tests whether the first character of the 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 345 | input string \texttt{sb} is equal to \texttt{c}. If yes, then it splits the
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 346 | string into the recognised part \texttt{c} and the unprocessed part
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 347 | \texttt{sb.tail}. In case \texttt{sb} does not start with \texttt{c} then
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 348 | the parser returns the empty set (in Scala \texttt{Set()}).
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 349 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 350 | More interesting are the parser combinators that build larger parsers | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 351 | out of smaller component parsers. For example the alternative | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 352 | parser combinator is as follows. | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 353 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 354 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 355 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 356 | class AltParser[I, T] | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 357 | (p: => Parser[I, T], | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 358 |         q: => Parser[I, T]) extends Parser[I, T] {
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 359 | def parse(sb: I) = p.parse(sb) ++ q.parse(sb) | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 360 | } | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 361 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 362 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 363 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 364 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 365 | The types of this parser combinator are polymorphic (we just have \texttt{I}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 366 | for the input type, and \texttt{T} for the output type). The alternative parser
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 367 | builds a new parser out of two existing parser combinator \texttt{p} and \texttt{q}.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 368 | Both need to be able to process input of type \texttt{I} and return the same
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 369 | output type \texttt{Set[(T, I)]}. (There is an interesting detail of Scala, namely the 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 370 | \texttt{=>} in front of the types of \texttt{p} and \texttt{q}. They will prevent the
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 371 | evaluation of the arguments before they are used. This is often called | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 372 | \emph{lazy evaluation} of the arguments.) The alternative parser should run
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 373 | the input with the first parser \texttt{p} (producing a set of outputs) and then
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 374 | run the same input with \texttt{q}. The result should be then just the union
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 375 | of both sets, which is the operation \texttt{++} in Scala.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 376 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 377 | This parser combinator already allows us to construct a parser that either | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 378 | a character \texttt{a} or \texttt{b}, as
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 379 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 380 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 381 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 382 | new AltParser(CharParser('a'), CharParser('b'))
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 383 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 384 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 385 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 386 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 387 | Scala allows us to introduce some more readable shorthand notation for this, like \texttt{'a' || 'b'}. 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 388 | We can call this parser combinator with the strings | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 389 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 390 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 391 | \begin{tabular}{rcl}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 392 | input string & & output\medskip\\ | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 393 | \texttt{\Grid{ac}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 394 | \texttt{\Grid{bc}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 395 | \texttt{\Grid{cc}} & $\rightarrow$ & $\varnothing$
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 396 | \end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 397 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 398 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 399 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 400 | We receive in the first two cases a successful output (that is a non-empty set). | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 401 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 402 | A bit more interesting is the \emph{sequence parser combinator} implemented in
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 403 | Scala as follows: | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 404 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 405 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 406 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 407 | class SeqParser[I, T, S] | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 408 | (p: => Parser[I, T], | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 409 |         q: => Parser[I, S]) extends Parser[I, (T, S)] {
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 410 | def parse(sb: I) = | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 411 | for ((head1, tail1) <- p.parse(sb); | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 412 | (head2, tail2) <- q.parse(tail1)) | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 413 | yield ((head1, head2), tail2) | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 414 | } | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 415 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 416 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 417 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 418 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 419 | This parser takes as input two parsers, \texttt{p} and \texttt{q}. It implements \texttt{parse} 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 420 | as follows: let first run the parser \texttt{p} on the input producing a set of pairs (\texttt{head1}, \texttt{tail1}).
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 421 | The \texttt{tail1} stands for the unprocessed parts left over by \texttt{p}. 
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 422 | Let \texttt{q} run on these unprocessed parts
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 423 | producing again a set of pairs. The output of the sequence parser combinator is then a set | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 424 | containing pairs where the first components are again pairs, namely what the first parser could parse | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 425 | together with what the second parser could parse; the second component is the unprocessed | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 426 | part left over after running the second parser \texttt{q}. Therefore the input type of
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 427 | the sequence parser combinator is as usual \texttt{I}, but the output type is
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 428 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 429 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 430 | \texttt{Set[((T, S), I)]}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 431 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 432 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 433 | Scala allows us to provide some | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 434 | shorthand notation for the sequence parser combinator. So we can write for | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 435 | example \texttt{'a'  $\sim$ 'b'}, which is the
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 436 | parser combinator that first consumes the character \texttt{a} from a string and then \texttt{b}.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 437 | Calling this parser combinator with the strings | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 438 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 439 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 440 | \begin{tabular}{rcl}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 441 | input string & & output\medskip\\ | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 442 | \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 443 | \texttt{\Grid{bac}} & $\rightarrow$ & $\varnothing$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 444 | \texttt{\Grid{ccc}} & $\rightarrow$ & $\varnothing$
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 445 | \end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 446 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 447 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 448 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 449 | A slightly more complicated parser is \texttt{('a'  || 'b') $\sim$ 'b'} which parses as first character either
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 450 | an \texttt{a} or \texttt{b} followed by a \texttt{b}. This parser produces the following results.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 451 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 452 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 453 | \begin{tabular}{rcl}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 454 | input string & & output\medskip\\ | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 455 | \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 456 | \texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 457 | \texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 458 | \end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 459 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 460 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 461 | Note carefully that constructing the parser \texttt{'a' || ('a' $\sim$ 'b')} will result in a tying error.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 462 | The first parser has as output type a single character (recall the type of \texttt{CharParser}),
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 463 | but the second parser produces a pair of characters as output. The alternative parser is however | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 464 | required to have both component parsers to have the same type. We will see later how we can | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 465 | build this parser without the typing error. | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 466 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 467 | The next parser combinator does not actually combine smaller parsers, but applies | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 468 | a function to the result of the parser. It is implemented in Scala as follows | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 469 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 470 | \begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 471 | \begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 472 | class FunParser[I, T, S] | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 473 | (p: => Parser[I, T], | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 474 |           f: T => S) extends Parser[I, S] {
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 475 | def parse(sb: I) = | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 476 | for ((head, tail) <- p.parse(sb)) yield (f(head), tail) | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 477 | } | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 478 | \end{lstlisting}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 479 | \end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 480 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 481 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 482 | \noindent | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 483 | This parser combinator takes a parser \texttt{p} with output type \texttt{T} as
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 484 | input as well as a function \texttt{f} with type \texttt{T => S}. The parser \texttt{p}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 485 | produces sets of type \texttt{(T, I)}. The \texttt{FunParser} combinator then
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 486 | applies the function \texttt{f} to all the parer outputs. Since this function
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 487 | is of type \texttt{T => S}, we obtain a parser with output type \texttt{S}.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 488 | Again Scala lets us introduce some shorthand notation for this parser combinator. | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 489 | Therefore we will write \texttt{p ==> f} for it.
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 490 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 491 | %\bigskip | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 492 | %takes advantage of the full generality---have a look | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 493 | %what it produces if we call it with the string \texttt{abc}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 494 | % | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 495 | %\begin{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 496 | %\begin{tabular}{rcl}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 497 | %input string & & output\medskip\\ | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 498 | %\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 499 | %\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 500 | %\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 501 | %\end{tabular}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 502 | %\end{center}
 | 
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 503 | |
| 
b17eff695c7f
added new stuff
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
177diff
changeset | 504 | |
| 173 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 505 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 506 | \end{document}
 | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 507 | |
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 508 | %%% Local Variables: | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 509 | %%% mode: latex | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 510 | %%% TeX-master: t | 
| 
7cfb7a6f7c99
added slides
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 511 | %%% End: |