| author | Christian Urban <urbanc@in.tum.de> | 
| Thu, 17 Oct 2019 13:41:30 +0100 | |
| changeset 661 | 32fd8125f56e | 
| parent 618 | 1c7cca56fadf | 
| child 665 | 81bb2db5e658 | 
| permissions | -rw-r--r-- | 
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1  | 
\documentclass{article}
 | 
| 
297
 
5c51839c88fd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
292 
diff
changeset
 | 
2  | 
\usepackage{../style}
 | 
| 
217
 
cd6066f1056a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
183 
diff
changeset
 | 
3  | 
\usepackage{../langs}
 | 
| 459 | 4  | 
\usepackage{../grammar}
 | 
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
5  | 
|
| 545 | 6  | 
% epsilon and left-recursion elimination  | 
7  | 
% http://www.mollypages.org/page/grammar/index.mp  | 
|
8  | 
||
| 618 | 9  | 
%% parsing scala files  | 
10  | 
%%https://scalameta.org/  | 
|
11  | 
||
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
12  | 
\begin{document}
 | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
13  | 
|
| 
385
 
7f8516ff408d
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
362 
diff
changeset
 | 
14  | 
\section*{Handout 5 (Grammars \& Parser)}
 | 
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
15  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
16  | 
While regular expressions are very useful for lexing and for  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
17  | 
recognising many patterns in strings (like email addresses),  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
18  | 
they have their limitations. For example there is no regular  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
19  | 
expression that can recognise the language $a^nb^n$. Another  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
20  | 
example for which there exists no regular expression is the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
21  | 
language of well-parenthesised expressions. In languages like  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
22  | 
Lisp, which use parentheses rather extensively, it might be of  | 
| 582 | 23  | 
interest to know whether the following two expressions are  | 
24  | 
well-parenthesised or not (the left one is, the right one is not):  | 
|
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
25  | 
|
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
26  | 
\begin{center}
 | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
27  | 
$(((()()))())$  \hspace{10mm} $(((()()))()))$
 | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
28  | 
\end{center}
 | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
29  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
30  | 
\noindent Not being able to solve such recognition problems is  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
31  | 
a serious limitation. In order to solve such recognition  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
32  | 
problems, we need more powerful techniques than regular  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
33  | 
expressions. We will in particular look at \emph{context-free
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
34  | 
languages}. They include the regular languages as the picture  | 
| 582 | 35  | 
below about language classes shows:  | 
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
36  | 
|
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
37  | 
|
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
38  | 
\begin{center}
 | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
39  | 
\begin{tikzpicture}
 | 
| 
297
 
5c51839c88fd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
292 
diff
changeset
 | 
40  | 
[rect/.style={draw=black!50, 
 | 
| 
 
5c51839c88fd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
292 
diff
changeset
 | 
41  | 
top color=white,bottom color=black!20,  | 
| 
 
5c51839c88fd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
292 
diff
changeset
 | 
42  | 
rectangle, very thick, rounded corners}]  | 
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
43  | 
|
| 
297
 
5c51839c88fd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
292 
diff
changeset
 | 
44  | 
\draw (0,0) node [rect, text depth=30mm, text width=46mm] {\small all languages};
 | 
| 
 
5c51839c88fd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
292 
diff
changeset
 | 
45  | 
\draw (0,-0.4) node [rect, text depth=20mm, text width=44mm] {\small decidable languages};
 | 
| 
 
5c51839c88fd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
292 
diff
changeset
 | 
46  | 
\draw (0,-0.65) node [rect, text depth=13mm] {\small context sensitive languages};
 | 
| 
 
5c51839c88fd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
292 
diff
changeset
 | 
47  | 
\draw (0,-0.84) node [rect, text depth=7mm, text width=35mm] {\small context-free languages};
 | 
| 
 
5c51839c88fd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
292 
diff
changeset
 | 
48  | 
\draw (0,-1.05) node [rect] {\small regular languages};
 | 
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
49  | 
\end{tikzpicture}
 | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
50  | 
\end{center}
 | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
51  | 
|
| 582 | 52  | 
\noindent Each ``bubble'' stands for sets of languages (remember  | 
53  | 
languages are sets of strings). As indicated the set of regular  | 
|
54  | 
languages are fully included inside the context-free languages,  | 
|
55  | 
meaning every regular language is also context-free, but not vice  | 
|
56  | 
versa. Below I will let you think for example what the context-free  | 
|
57  | 
grammar is for the language corresponding to the regular expression  | 
|
58  | 
$(aaa)^*a$.  | 
|
59  | 
||
60  | 
Because of their convenience, context-free languages play an important  | 
|
61  | 
role in `day-to-day' text processing and in programming  | 
|
62  | 
languages. Context-free in this setting means that ``words'' have one  | 
|
63  | 
meaning only ansd this meaning is independend from in which context  | 
|
64  | 
the ``words'' appear. For example ambiguity issues like  | 
|
65  | 
||
66  | 
\begin{center}
 | 
|
67  | 
\tt Time flies like an arrow; fruit flies like a banana.  | 
|
68  | 
\end{center}  
 | 
|
69  | 
||
70  | 
\noindent  | 
|
71  | 
from natural languages were the meaning of \emph{flies} depend on the
 | 
|
72  | 
surounding \emph{context} are avoided as much as possible.
 | 
|
73  | 
||
74  | 
Context-free languages are usually specified by grammars. For example  | 
|
75  | 
a grammar for well-parenthesised expressions can be given as follows:  | 
|
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
76  | 
|
| 459 | 77  | 
\begin{plstx}[margin=3cm]
 | 
78  | 
: \meta{P} ::=  ( \cdot  \meta{P} \cdot ) \cdot \meta{P}
 | 
|
79  | 
| \epsilon\\  | 
|
80  | 
\end{plstx}
 | 
|
81  | 
||
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
82  | 
\noindent  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
83  | 
or a grammar for recognising strings consisting of ones is  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
84  | 
|
| 459 | 85  | 
\begin{plstx}[margin=3cm]
 | 
86  | 
: \meta{O} ::= 1 \cdot  \meta{O} 
 | 
|
87  | 
| 1\\  | 
|
88  | 
\end{plstx}
 | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
89  | 
|
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
90  | 
In general grammars consist of finitely many rules built up  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
91  | 
from \emph{terminal symbols} (usually lower-case letters) and
 | 
| 582 | 92  | 
\emph{non-terminal symbols} (upper-case letters written in
 | 
93  | 
bold like \meta{A}, \meta{N} and so on). Rules have
 | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
94  | 
the shape  | 
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
95  | 
|
| 459 | 96  | 
\begin{plstx}[margin=3cm]
 | 
97  | 
: \meta{NT} ::= rhs\\
 | 
|
98  | 
\end{plstx}
 | 
|
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
99  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
100  | 
\noindent where on the left-hand side is a single non-terminal  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
101  | 
and on the right a string consisting of both terminals and  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
102  | 
non-terminals including the $\epsilon$-symbol for indicating  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
103  | 
the empty string. We use the convention to separate components  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
104  | 
on the right hand-side by using the $\cdot$ symbol, as in the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
105  | 
grammar for well-parenthesised expressions. We also use the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
106  | 
convention to use $|$ as a shorthand notation for several  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
107  | 
rules. For example  | 
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
108  | 
|
| 459 | 109  | 
\begin{plstx}[margin=3cm]
 | 
110  | 
: \meta{NT} ::= rhs_1
 | 
|
111  | 
| rhs_2\\  | 
|
112  | 
\end{plstx}
 | 
|
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
113  | 
|
| 459 | 114  | 
\noindent means that the non-terminal \meta{NT} can be replaced by
 | 
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
115  | 
either $\textit{rhs}_1$ or $\textit{rhs}_2$. If there are more
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
116  | 
than one non-terminal on the left-hand side of the rules, then  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
117  | 
we need to indicate what is the \emph{starting} symbol of the
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
118  | 
grammar. For example the grammar for arithmetic expressions  | 
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
119  | 
can be given as follows  | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
120  | 
|
| 459 | 121  | 
\begin{plstx}[margin=3cm,one per line]
 | 
122  | 
\mbox{\rm (1)}: \meta{E} ::= \meta{N}\\
 | 
|
123  | 
\mbox{\rm (2)}: \meta{E} ::= \meta{E} \cdot + \cdot \meta{E}\\
 | 
|
124  | 
\mbox{\rm (3)}: \meta{E} ::= \meta{E} \cdot - \cdot \meta{E}\\
 | 
|
125  | 
\mbox{\rm (4)}: \meta{E} ::= \meta{E} \cdot * \cdot \meta{E}\\
 | 
|
126  | 
\mbox{\rm (5)}: \meta{E} ::= ( \cdot \meta{E} \cdot )\\
 | 
|
127  | 
\mbox{\rm (6\ldots)}: \meta{N} ::= \meta{N} \cdot \meta{N} 
 | 
|
128  | 
\mid 0 \mid 1 \mid \ldots \mid 9\\  | 
|
129  | 
\end{plstx}
 | 
|
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
130  | 
|
| 459 | 131  | 
\noindent where \meta{E} is the starting symbol. A
 | 
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
132  | 
\emph{derivation} for a grammar starts with the starting
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
133  | 
symbol of the grammar and in each step replaces one  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
134  | 
non-terminal by a right-hand side of a rule. A derivation ends  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
135  | 
with a string in which only terminal symbols are left. For  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
136  | 
example a derivation for the string $(1 + 2) + 3$ is as  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
137  | 
follows:  | 
| 
175
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
138  | 
|
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
139  | 
\begin{center}
 | 
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
140  | 
\begin{tabular}{lll@{\hspace{2cm}}l}
 | 
| 459 | 141  | 
\meta{E} & $\rightarrow$ & $\meta{E}+\meta{E}$          & by (2)\\
 | 
142  | 
       & $\rightarrow$ & $(\meta{E})+\meta{E}$     & by (5)\\
 | 
|
143  | 
       & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{E}$   & by (2)\\
 | 
|
144  | 
       & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{N}$   & by (1)\\
 | 
|
145  | 
       & $\rightarrow$ & $(\meta{E}+\meta{E})+3$   & by (6\dots)\\
 | 
|
146  | 
       & $\rightarrow$ & $(\meta{N}+\meta{E})+3$   & by (1)\\
 | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
147  | 
& $\rightarrow^+$ & $(1+2)+3$ & by (1, 6\ldots)\\  | 
| 
175
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
148  | 
\end{tabular} 
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
149  | 
\end{center}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
150  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
151  | 
\noindent where on the right it is indicated which  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
152  | 
grammar rule has been applied. In the last step we  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
153  | 
merged several steps into one.  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
154  | 
|
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
155  | 
The \emph{language} of a context-free grammar $G$
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
156  | 
with start symbol $S$ is defined as the set of strings  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
157  | 
derivable by a derivation, that is  | 
| 
175
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
158  | 
|
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
159  | 
\begin{center}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
160  | 
$\{c_1\ldots c_n \;|\; S \rightarrow^* c_1\ldots c_n \;\;\text{with all} \; c_i \;\text{being non-terminals}\}$
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
161  | 
\end{center}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
162  | 
|
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
163  | 
\noindent  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
164  | 
A \emph{parse-tree} encodes how a string is derived with the starting symbol on 
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
165  | 
top and each non-terminal containing a subtree for how it is replaced in a derivation.  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
166  | 
The parse tree for the string $(1 + 23)+4$ is as follows:  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
167  | 
|
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
168  | 
\begin{center}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
169  | 
\begin{tikzpicture}[level distance=8mm, black]
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
170  | 
  \node {$E$}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
171  | 
    child {node {$E$} 
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
172  | 
       child {node {$($}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
173  | 
       child {node {$E$}       
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
174  | 
         child {node {$E$} child {node {$N$} child {node {$1$}}}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
175  | 
         child {node {$+$}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
176  | 
         child {node {$E$} 
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
177  | 
            child {node {$N$} child {node {$2$}}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
178  | 
            child {node {$N$} child {node {$3$}}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
179  | 
}  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
180  | 
}  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
181  | 
       child {node {$)$}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
182  | 
}  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
183  | 
     child {node {$+$}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
184  | 
     child {node {$E$}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
185  | 
        child {node {$N$} child {node {$4$}}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
186  | 
};  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
187  | 
\end{tikzpicture}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
188  | 
\end{center}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
189  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
190  | 
\noindent We are often interested in these parse-trees since  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
191  | 
they encode the structure of how a string is derived by a  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
192  | 
grammar. Before we come to the problem of constructing such  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
193  | 
parse-trees, we need to consider the following two properties  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
194  | 
of grammars. A grammar is \emph{left-recursive} if there is a
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
195  | 
derivation starting from a non-terminal, say $NT$ which leads  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
196  | 
to a string which again starts with $NT$. This means a  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
197  | 
derivation of the form.  | 
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
198  | 
|
| 
175
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
199  | 
\begin{center}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
200  | 
$NT \rightarrow \ldots \rightarrow NT \cdot \ldots$  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
201  | 
\end{center}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
202  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
203  | 
\noindent It can be easily seen that the grammar above for  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
204  | 
arithmetic expressions is left-recursive: for example the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
205  | 
rules $E \rightarrow E\cdot + \cdot E$ and $N \rightarrow  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
206  | 
N\cdot N$ show that this grammar is left-recursive. But note  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
207  | 
that left-recursiveness can involve more than one step in the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
208  | 
derivation. The problem with left-recursive grammars is that  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
209  | 
some algorithms cannot cope with them: they fall into a loop.  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
210  | 
Fortunately every left-recursive grammar can be transformed  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
211  | 
into one that is not left-recursive, although this  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
212  | 
transformation might make the grammar less ``human-readable''.  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
213  | 
For example if we want to give a non-left-recursive grammar  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
214  | 
for numbers we might specify  | 
| 
175
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
215  | 
|
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
216  | 
\begin{center}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
217  | 
$N \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;1\cdot N\;|\;2\cdot N\;|\;\ldots\;|\;9\cdot N$  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
218  | 
\end{center}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
219  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
220  | 
\noindent Using this grammar we can still derive every number  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
221  | 
string, but we will never be able to derive a string of the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
222  | 
form $N \to \ldots \to N \cdot \ldots$.  | 
| 
175
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
223  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
224  | 
The other property we have to watch out for is when a grammar  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
225  | 
is \emph{ambiguous}. A grammar is said to be ambiguous if
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
226  | 
there are two parse-trees for one string. Again the grammar  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
227  | 
for arithmetic expressions shown above is ambiguous. While the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
228  | 
shown parse tree for the string $(1 + 23) + 4$ is unique, this  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
229  | 
is not the case in general. For example there are two parse  | 
| 
175
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
230  | 
trees for the string $1 + 2 + 3$, namely  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
231  | 
|
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
232  | 
\begin{center}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
233  | 
\begin{tabular}{c@{\hspace{10mm}}c}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
234  | 
\begin{tikzpicture}[level distance=8mm, black]
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
235  | 
  \node {$E$}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
236  | 
    child {node {$E$} child {node {$N$} child {node {$1$}}}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
237  | 
    child {node {$+$}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
238  | 
    child {node {$E$}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
239  | 
       child {node {$E$} child {node {$N$} child {node {$2$}}}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
240  | 
       child {node {$+$}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
241  | 
       child {node {$E$} child {node {$N$} child {node {$3$}}}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
242  | 
}  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
243  | 
;  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
244  | 
\end{tikzpicture} 
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
245  | 
&  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
246  | 
\begin{tikzpicture}[level distance=8mm, black]
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
247  | 
  \node {$E$}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
248  | 
    child {node {$E$}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
249  | 
       child {node {$E$} child {node {$N$} child {node {$1$}}}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
250  | 
       child {node {$+$}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
251  | 
       child {node {$E$} child {node {$N$} child {node {$2$}}}} 
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
252  | 
}  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
253  | 
    child {node {$+$}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
254  | 
    child {node {$E$} child {node {$N$} child {node {$3$}}}}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
255  | 
;  | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
256  | 
\end{tikzpicture}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
257  | 
\end{tabular} 
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
258  | 
\end{center}
 | 
| 
 
5801e8c0e528
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
173 
diff
changeset
 | 
259  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
260  | 
\noindent In particular in programming languages we will try  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
261  | 
to avoid ambiguous grammars because two different parse-trees  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
262  | 
for a string mean a program can be interpreted in two  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
263  | 
different ways. In such cases we have to somehow make sure the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
264  | 
two different ways do not matter, or disambiguate the grammar  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
265  | 
in some other way (for example making the $+$  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
266  | 
left-associative). Unfortunately already the problem of  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
267  | 
deciding whether a grammar is ambiguous or not is in general  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
268  | 
undecidable. But in simple instance (the ones we deal in this  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
269  | 
module) one can usually see when a grammar is ambiguous.  | 
| 
176
 
3c2653fc8b5a
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
175 
diff
changeset
 | 
270  | 
|
| 459 | 271  | 
|
272  | 
\subsection*{Parser Combinators}
 | 
|
273  | 
||
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
274  | 
Let us now turn to the problem of generating a parse-tree for  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
275  | 
a grammar and string. In what follows we explain \emph{parser
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
276  | 
combinators}, because they are easy to implement and closely  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
277  | 
resemble grammar rules. Imagine that a grammar describes the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
278  | 
strings of natural numbers, such as the grammar $N$ shown  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
279  | 
above. For all such strings we want to generate the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
280  | 
parse-trees or later on we actually want to extract the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
281  | 
meaning of these strings, that is the concrete integers  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
282  | 
``behind'' these strings. In Scala the parser combinators will  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
283  | 
be functions of type  | 
| 
176
 
3c2653fc8b5a
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
175 
diff
changeset
 | 
284  | 
|
| 
 
3c2653fc8b5a
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
175 
diff
changeset
 | 
285  | 
\begin{center}
 | 
| 
177
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
286  | 
\texttt{I $\Rightarrow$ Set[(T, I)]}
 | 
| 
176
 
3c2653fc8b5a
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
175 
diff
changeset
 | 
287  | 
\end{center}
 | 
| 
 
3c2653fc8b5a
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
175 
diff
changeset
 | 
288  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
289  | 
\noindent that is they take as input something of type  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
290  | 
\texttt{I}, typically a list of tokens or a string, and return
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
291  | 
a set of pairs. The first component of these pairs corresponds  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
292  | 
to what the parser combinator was able to process from the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
293  | 
input and the second is the unprocessed part of the input. As  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
294  | 
we shall see shortly, a parser combinator might return more  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
295  | 
than one such pair, with the idea that there are potentially  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
296  | 
several ways how to interpret the input. As a concrete  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
297  | 
example, consider the case where the input is of type string,  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
298  | 
say the string  | 
| 
183
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
299  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
300  | 
\begin{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
301  | 
\tt\Grid{iffoo\VS testbar}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
302  | 
\end{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
303  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
304  | 
\noindent We might have a parser combinator which tries to  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
305  | 
interpret this string as a keyword (\texttt{if}) or an
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
306  | 
identifier (\texttt{iffoo}). Then the output will be the set
 | 
| 
177
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
307  | 
|
| 
183
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
308  | 
\begin{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
309  | 
$\left\{ \left(\texttt{\Grid{if}}\,,\, \texttt{\Grid{foo\VS testbar}}\right), 
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
310  | 
           \left(\texttt{\Grid{iffoo}}\,,\, \texttt{\Grid{\VS testbar}}\right) \right\}$
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
311  | 
\end{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
312  | 
|
| 
362
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
313  | 
\noindent where the first pair means the parser could  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
314  | 
recognise \texttt{if} from the input and leaves the rest as
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
315  | 
`unprocessed' as the second component of the pair; in the  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
316  | 
other case it could recognise \texttt{iffoo} and leaves
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
317  | 
\texttt{\VS testbar} as unprocessed. If the parser cannot
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
318  | 
recognise anything from the input then parser combinators just  | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
319  | 
return the empty set $\{\}$. This will indicate
 | 
| 
 
57ea439feaff
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
360 
diff
changeset
 | 
320  | 
something ``went wrong''.  | 
| 
183
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
321  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
322  | 
The main attraction is that we can easily build parser combinators out of smaller components  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
323  | 
following very closely the structure of a grammar. In order to implement this in an object  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
324  | 
oriented programming language, like Scala, we need to specify an abstract class for parser  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
325  | 
combinators. This abstract class requires the implementation of the function  | 
| 
177
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
326  | 
\texttt{parse} taking an argument of type \texttt{I} and returns a set of type  
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
327  | 
\mbox{\texttt{Set[(T, I)]}}.
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
328  | 
|
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
329  | 
\begin{center}
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
330  | 
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
331  | 
abstract class Parser[I, T] {
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
332  | 
def parse(ts: I): Set[(T, I)]  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
333  | 
|
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
334  | 
def parse_all(ts: I): Set[T] =  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
335  | 
for ((head, tail) <- parse(ts); if (tail.isEmpty))  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
336  | 
yield head  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
337  | 
}  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
338  | 
\end{lstlisting}
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
339  | 
\end{center}
 | 
| 
176
 
3c2653fc8b5a
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
175 
diff
changeset
 | 
340  | 
|
| 
177
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
341  | 
\noindent  | 
| 
183
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
342  | 
From the function \texttt{parse} we can then ``centrally'' derive the function \texttt{parse\_all},
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
343  | 
which just filters out all pairs whose second component is not empty (that is has still some  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
344  | 
unprocessed part). The reason is that at the end of parsing we are only interested in the  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
345  | 
results where all the input has been consumed and no unprocessed part is left.  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
346  | 
|
| 
177
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
347  | 
One of the simplest parser combinators recognises just a character, say $c$,  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
348  | 
from the beginning of strings. Its behaviour is as follows:  | 
| 
176
 
3c2653fc8b5a
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
175 
diff
changeset
 | 
349  | 
|
| 
177
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
350  | 
\begin{itemize}
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
351  | 
\item if the head of the input string starts with a $c$, it returns  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
352  | 
	the set $\{(c, \textit{tail of}\; s)\}$
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
353  | 
\item otherwise it returns the empty set $\varnothing$  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
354  | 
\end{itemize}
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
355  | 
|
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
356  | 
\noindent  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
357  | 
The input type of this simple parser combinator for characters is  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
358  | 
\texttt{String} and the output type \mbox{\texttt{Set[(Char, String)]}}. 
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
359  | 
The code in Scala is as follows:  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
360  | 
|
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
361  | 
\begin{center}
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
362  | 
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
363  | 
case class CharParser(c: Char) extends Parser[String, Char] {
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
364  | 
def parse(sb: String) =  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
365  | 
if (sb.head == c) Set((c, sb.tail)) else Set()  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
366  | 
}  | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
367  | 
\end{lstlisting}
 | 
| 
 
53def1fbf472
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
176 
diff
changeset
 | 
368  | 
\end{center}
 | 
| 
176
 
3c2653fc8b5a
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
175 
diff
changeset
 | 
369  | 
|
| 
183
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
370  | 
\noindent  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
371  | 
The \texttt{parse} function tests whether the first character of the 
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
372  | 
input string \texttt{sb} is equal to \texttt{c}. If yes, then it splits the
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
373  | 
string into the recognised part \texttt{c} and the unprocessed part
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
374  | 
\texttt{sb.tail}. In case \texttt{sb} does not start with \texttt{c} then
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
375  | 
the parser returns the empty set (in Scala \texttt{Set()}).
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
376  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
377  | 
More interesting are the parser combinators that build larger parsers  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
378  | 
out of smaller component parsers. For example the alternative  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
379  | 
parser combinator is as follows.  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
380  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
381  | 
\begin{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
382  | 
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
383  | 
class AltParser[I, T]  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
384  | 
(p: => Parser[I, T],  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
385  | 
        q: => Parser[I, T]) extends Parser[I, T] {
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
386  | 
def parse(sb: I) = p.parse(sb) ++ q.parse(sb)  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
387  | 
}  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
388  | 
\end{lstlisting}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
389  | 
\end{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
390  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
391  | 
\noindent  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
392  | 
The types of this parser combinator are polymorphic (we just have \texttt{I}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
393  | 
for the input type, and \texttt{T} for the output type). The alternative parser
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
394  | 
builds a new parser out of two existing parser combinator \texttt{p} and \texttt{q}.
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
395  | 
Both need to be able to process input of type \texttt{I} and return the same
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
396  | 
output type \texttt{Set[(T, I)]}. (There is an interesting detail of Scala, namely the 
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
397  | 
\texttt{=>} in front of the types of \texttt{p} and \texttt{q}. They will prevent the
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
398  | 
evaluation of the arguments before they are used. This is often called  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
399  | 
\emph{lazy evaluation} of the arguments.) The alternative parser should run
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
400  | 
the input with the first parser \texttt{p} (producing a set of outputs) and then
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
401  | 
run the same input with \texttt{q}. The result should be then just the union
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
402  | 
of both sets, which is the operation \texttt{++} in Scala.
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
403  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
404  | 
This parser combinator already allows us to construct a parser that either  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
405  | 
a character \texttt{a} or \texttt{b}, as
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
406  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
407  | 
\begin{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
408  | 
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
409  | 
new AltParser(CharParser('a'), CharParser('b'))
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
410  | 
\end{lstlisting}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
411  | 
\end{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
412  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
413  | 
\noindent  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
414  | 
Scala allows us to introduce some more readable shorthand notation for this, like \texttt{'a' || 'b'}. 
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
415  | 
We can call this parser combinator with the strings  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
416  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
417  | 
\begin{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
418  | 
\begin{tabular}{rcl}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
419  | 
input string & & output\medskip\\  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
420  | 
\texttt{\Grid{ac}} & $\rightarrow$ & $\left\{(\texttt{\Grid{a}}, \texttt{\Grid{c}})\right\}$\\
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
421  | 
\texttt{\Grid{bc}} & $\rightarrow$ & $\left\{(\texttt{\Grid{b}}, \texttt{\Grid{c}})\right\}$\\
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
422  | 
\texttt{\Grid{cc}} & $\rightarrow$ & $\varnothing$
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
423  | 
\end{tabular}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
424  | 
\end{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
425  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
426  | 
\noindent  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
427  | 
We receive in the first two cases a successful output (that is a non-empty set).  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
428  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
429  | 
A bit more interesting is the \emph{sequence parser combinator} implemented in
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
430  | 
Scala as follows:  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
431  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
432  | 
\begin{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
433  | 
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
434  | 
class SeqParser[I, T, S]  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
435  | 
(p: => Parser[I, T],  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
436  | 
        q: => Parser[I, S]) extends Parser[I, (T, S)] {
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
437  | 
def parse(sb: I) =  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
438  | 
for ((head1, tail1) <- p.parse(sb);  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
439  | 
(head2, tail2) <- q.parse(tail1))  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
440  | 
yield ((head1, head2), tail2)  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
441  | 
}  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
442  | 
\end{lstlisting}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
443  | 
\end{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
444  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
445  | 
\noindent  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
446  | 
This parser takes as input two parsers, \texttt{p} and \texttt{q}. It implements \texttt{parse} 
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
447  | 
as follows: let first run the parser \texttt{p} on the input producing a set of pairs (\texttt{head1}, \texttt{tail1}).
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
448  | 
The \texttt{tail1} stands for the unprocessed parts left over by \texttt{p}. 
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
449  | 
Let \texttt{q} run on these unprocessed parts
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
450  | 
producing again a set of pairs. The output of the sequence parser combinator is then a set  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
451  | 
containing pairs where the first components are again pairs, namely what the first parser could parse  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
452  | 
together with what the second parser could parse; the second component is the unprocessed  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
453  | 
part left over after running the second parser \texttt{q}. Therefore the input type of
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
454  | 
the sequence parser combinator is as usual \texttt{I}, but the output type is
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
455  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
456  | 
\begin{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
457  | 
\texttt{Set[((T, S), I)]}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
458  | 
\end{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
459  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
460  | 
Scala allows us to provide some  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
461  | 
shorthand notation for the sequence parser combinator. So we can write for  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
462  | 
example \texttt{'a'  $\sim$ 'b'}, which is the
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
463  | 
parser combinator that first consumes the character \texttt{a} from a string and then \texttt{b}.
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
464  | 
Calling this parser combinator with the strings  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
465  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
466  | 
\begin{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
467  | 
\begin{tabular}{rcl}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
468  | 
input string & & output\medskip\\  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
469  | 
\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
470  | 
\texttt{\Grid{bac}} & $\rightarrow$ & $\varnothing$\\
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
471  | 
\texttt{\Grid{ccc}} & $\rightarrow$ & $\varnothing$
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
472  | 
\end{tabular}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
473  | 
\end{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
474  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
475  | 
\noindent  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
476  | 
A slightly more complicated parser is \texttt{('a'  || 'b') $\sim$ 'b'} which parses as first character either
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
477  | 
an \texttt{a} or \texttt{b} followed by a \texttt{b}. This parser produces the following results.
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
478  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
479  | 
\begin{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
480  | 
\begin{tabular}{rcl}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
481  | 
input string & & output\medskip\\  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
482  | 
\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
483  | 
\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
484  | 
\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
485  | 
\end{tabular}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
486  | 
\end{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
487  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
488  | 
Note carefully that constructing the parser \texttt{'a' || ('a' $\sim$ 'b')} will result in a tying error.
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
489  | 
The first parser has as output type a single character (recall the type of \texttt{CharParser}),
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
490  | 
but the second parser produces a pair of characters as output. The alternative parser is however  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
491  | 
required to have both component parsers to have the same type. We will see later how we can  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
492  | 
build this parser without the typing error.  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
493  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
494  | 
The next parser combinator does not actually combine smaller parsers, but applies  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
495  | 
a function to the result of the parser. It is implemented in Scala as follows  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
496  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
497  | 
\begin{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
498  | 
\begin{lstlisting}[language=Scala,basicstyle=\small\ttfamily, numbers=none]
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
499  | 
class FunParser[I, T, S]  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
500  | 
(p: => Parser[I, T],  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
501  | 
          f: T => S) extends Parser[I, S] {
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
502  | 
def parse(sb: I) =  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
503  | 
for ((head, tail) <- p.parse(sb)) yield (f(head), tail)  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
504  | 
}  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
505  | 
\end{lstlisting}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
506  | 
\end{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
507  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
508  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
509  | 
\noindent  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
510  | 
This parser combinator takes a parser \texttt{p} with output type \texttt{T} as
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
511  | 
input as well as a function \texttt{f} with type \texttt{T => S}. The parser \texttt{p}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
512  | 
produces sets of type \texttt{(T, I)}. The \texttt{FunParser} combinator then
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
513  | 
applies the function \texttt{f} to all the parer outputs. Since this function
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
514  | 
is of type \texttt{T => S}, we obtain a parser with output type \texttt{S}.
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
515  | 
Again Scala lets us introduce some shorthand notation for this parser combinator.  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
516  | 
Therefore we will write \texttt{p ==> f} for it.
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
517  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
518  | 
%\bigskip  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
519  | 
%takes advantage of the full generality---have a look  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
520  | 
%what it produces if we call it with the string \texttt{abc}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
521  | 
%  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
522  | 
%\begin{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
523  | 
%\begin{tabular}{rcl}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
524  | 
%input string & & output\medskip\\  | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
525  | 
%\texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
526  | 
%\texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
527  | 
%\texttt{\Grid{aac}} & $\rightarrow$ & $\varnothing$
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
528  | 
%\end{tabular}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
529  | 
%\end{center}
 | 
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
530  | 
|
| 
 
b17eff695c7f
added new stuff
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
177 
diff
changeset
 | 
531  | 
|
| 
173
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
532  | 
|
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
533  | 
\end{document}
 | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
534  | 
|
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
535  | 
%%% Local Variables:  | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
536  | 
%%% mode: latex  | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
537  | 
%%% TeX-master: t  | 
| 
 
7cfb7a6f7c99
added slides
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
538  | 
%%% End:  |