46 \noindent Context-free languages play an important role in |
46 \noindent Context-free languages play an important role in |
47 `day-to-day' text processing and in programming languages. |
47 `day-to-day' text processing and in programming languages. |
48 Context-free languages are usually specified by grammars. For |
48 Context-free languages are usually specified by grammars. For |
49 example a grammar for well-parenthesised expressions is |
49 example a grammar for well-parenthesised expressions is |
50 |
50 |
51 \begin{center} |
51 \begin{plstx}[margin=3cm] |
52 $P \;\;\rightarrow\;\; ( \cdot P \cdot ) \cdot P \;|\; \epsilon$ |
52 : \meta{P} ::= ( \cdot \meta{P} \cdot ) \cdot \meta{P} |
53 \end{center} |
53 | \epsilon\\ |
54 |
54 \end{plstx} |
|
55 |
55 \noindent |
56 \noindent |
56 or a grammar for recognising strings consisting of ones is |
57 or a grammar for recognising strings consisting of ones is |
57 |
58 |
58 \begin{center} |
59 \begin{plstx}[margin=3cm] |
59 $O \;\;\rightarrow\;\; 1 \cdot O \;|\; 1$ |
60 : \meta{O} ::= 1 \cdot \meta{O} |
60 \end{center} |
61 | 1\\ |
|
62 \end{plstx} |
61 |
63 |
62 In general grammars consist of finitely many rules built up |
64 In general grammars consist of finitely many rules built up |
63 from \emph{terminal symbols} (usually lower-case letters) and |
65 from \emph{terminal symbols} (usually lower-case letters) and |
64 \emph{non-terminal symbols} (upper-case letters). Rules have |
66 \emph{non-terminal symbols} (upper-case letters inside \meta{\mbox{}}). Rules have |
65 the shape |
67 the shape |
66 |
68 |
67 \begin{center} |
69 \begin{plstx}[margin=3cm] |
68 $NT \;\;\rightarrow\;\; \textit{rhs}$ |
70 : \meta{NT} ::= rhs\\ |
69 \end{center} |
71 \end{plstx} |
70 |
72 |
71 \noindent where on the left-hand side is a single non-terminal |
73 \noindent where on the left-hand side is a single non-terminal |
72 and on the right a string consisting of both terminals and |
74 and on the right a string consisting of both terminals and |
73 non-terminals including the $\epsilon$-symbol for indicating |
75 non-terminals including the $\epsilon$-symbol for indicating |
74 the empty string. We use the convention to separate components |
76 the empty string. We use the convention to separate components |
75 on the right hand-side by using the $\cdot$ symbol, as in the |
77 on the right hand-side by using the $\cdot$ symbol, as in the |
76 grammar for well-parenthesised expressions. We also use the |
78 grammar for well-parenthesised expressions. We also use the |
77 convention to use $|$ as a shorthand notation for several |
79 convention to use $|$ as a shorthand notation for several |
78 rules. For example |
80 rules. For example |
79 |
81 |
80 \begin{center} |
82 \begin{plstx}[margin=3cm] |
81 $NT \;\;\rightarrow\;\; \textit{rhs}_1 \;|\; \textit{rhs}_2$ |
83 : \meta{NT} ::= rhs_1 |
82 \end{center} |
84 | rhs_2\\ |
83 |
85 \end{plstx} |
84 \noindent means that the non-terminal $NT$ can be replaced by |
86 |
|
87 \noindent means that the non-terminal \meta{NT} can be replaced by |
85 either $\textit{rhs}_1$ or $\textit{rhs}_2$. If there are more |
88 either $\textit{rhs}_1$ or $\textit{rhs}_2$. If there are more |
86 than one non-terminal on the left-hand side of the rules, then |
89 than one non-terminal on the left-hand side of the rules, then |
87 we need to indicate what is the \emph{starting} symbol of the |
90 we need to indicate what is the \emph{starting} symbol of the |
88 grammar. For example the grammar for arithmetic expressions |
91 grammar. For example the grammar for arithmetic expressions |
89 can be given as follows |
92 can be given as follows |
90 |
93 |
91 \begin{center} |
94 \begin{plstx}[margin=3cm,one per line] |
92 \begin{tabular}{lcl@{\hspace{2cm}}l} |
95 \mbox{\rm (1)}: \meta{E} ::= \meta{N}\\ |
93 $E$ & $\rightarrow$ & $N$ & (1)\\ |
96 \mbox{\rm (2)}: \meta{E} ::= \meta{E} \cdot + \cdot \meta{E}\\ |
94 $E$ & $\rightarrow$ & $E \cdot + \cdot E$ & (2)\\ |
97 \mbox{\rm (3)}: \meta{E} ::= \meta{E} \cdot - \cdot \meta{E}\\ |
95 $E$ & $\rightarrow$ & $E \cdot - \cdot E$ & (3)\\ |
98 \mbox{\rm (4)}: \meta{E} ::= \meta{E} \cdot * \cdot \meta{E}\\ |
96 $E$ & $\rightarrow$ & $E \cdot * \cdot E$ & (4)\\ |
99 \mbox{\rm (5)}: \meta{E} ::= ( \cdot \meta{E} \cdot )\\ |
97 $E$ & $\rightarrow$ & $( \cdot E \cdot )$ & (5)\\ |
100 \mbox{\rm (6\ldots)}: \meta{N} ::= \meta{N} \cdot \meta{N} |
98 $N$ & $\rightarrow$ & $N \cdot N \;|\; 0 \;|\; 1 \;|\: \ldots \;|\; 9$ & (6\ldots) |
101 \mid 0 \mid 1 \mid \ldots \mid 9\\ |
99 \end{tabular} |
102 \end{plstx} |
100 \end{center} |
103 |
101 |
104 \noindent where \meta{E} is the starting symbol. A |
102 \noindent where $E$ is the starting symbol. A |
|
103 \emph{derivation} for a grammar starts with the starting |
105 \emph{derivation} for a grammar starts with the starting |
104 symbol of the grammar and in each step replaces one |
106 symbol of the grammar and in each step replaces one |
105 non-terminal by a right-hand side of a rule. A derivation ends |
107 non-terminal by a right-hand side of a rule. A derivation ends |
106 with a string in which only terminal symbols are left. For |
108 with a string in which only terminal symbols are left. For |
107 example a derivation for the string $(1 + 2) + 3$ is as |
109 example a derivation for the string $(1 + 2) + 3$ is as |
108 follows: |
110 follows: |
109 |
111 |
110 \begin{center} |
112 \begin{center} |
111 \begin{tabular}{lll@{\hspace{2cm}}l} |
113 \begin{tabular}{lll@{\hspace{2cm}}l} |
112 $E$ & $\rightarrow$ & $E+E$ & by (2)\\ |
114 \meta{E} & $\rightarrow$ & $\meta{E}+\meta{E}$ & by (2)\\ |
113 & $\rightarrow$ & $(E)+E$ & by (5)\\ |
115 & $\rightarrow$ & $(\meta{E})+\meta{E}$ & by (5)\\ |
114 & $\rightarrow$ & $(E+E)+E$ & by (2)\\ |
116 & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{E}$ & by (2)\\ |
115 & $\rightarrow$ & $(E+E)+N$ & by (1)\\ |
117 & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{N}$ & by (1)\\ |
116 & $\rightarrow$ & $(E+E)+3$ & by (6\dots)\\ |
118 & $\rightarrow$ & $(\meta{E}+\meta{E})+3$ & by (6\dots)\\ |
117 & $\rightarrow$ & $(N+E)+3$ & by (1)\\ |
119 & $\rightarrow$ & $(\meta{N}+\meta{E})+3$ & by (1)\\ |
118 & $\rightarrow^+$ & $(1+2)+3$ & by (1, 6\ldots)\\ |
120 & $\rightarrow^+$ & $(1+2)+3$ & by (1, 6\ldots)\\ |
119 \end{tabular} |
121 \end{tabular} |
120 \end{center} |
122 \end{center} |
121 |
123 |
122 \noindent where on the right it is indicated which |
124 \noindent where on the right it is indicated which |
237 left-associative). Unfortunately already the problem of |
239 left-associative). Unfortunately already the problem of |
238 deciding whether a grammar is ambiguous or not is in general |
240 deciding whether a grammar is ambiguous or not is in general |
239 undecidable. But in simple instance (the ones we deal in this |
241 undecidable. But in simple instance (the ones we deal in this |
240 module) one can usually see when a grammar is ambiguous. |
242 module) one can usually see when a grammar is ambiguous. |
241 |
243 |
|
244 |
|
245 \subsection*{Parser Combinators} |
|
246 |
242 Let us now turn to the problem of generating a parse-tree for |
247 Let us now turn to the problem of generating a parse-tree for |
243 a grammar and string. In what follows we explain \emph{parser |
248 a grammar and string. In what follows we explain \emph{parser |
244 combinators}, because they are easy to implement and closely |
249 combinators}, because they are easy to implement and closely |
245 resemble grammar rules. Imagine that a grammar describes the |
250 resemble grammar rules. Imagine that a grammar describes the |
246 strings of natural numbers, such as the grammar $N$ shown |
251 strings of natural numbers, such as the grammar $N$ shown |