15 they have their limitations. For example there is no regular |
15 they have their limitations. For example there is no regular |
16 expression that can recognise the language $a^nb^n$. Another |
16 expression that can recognise the language $a^nb^n$. Another |
17 example for which there exists no regular expression is the |
17 example for which there exists no regular expression is the |
18 language of well-parenthesised expressions. In languages like |
18 language of well-parenthesised expressions. In languages like |
19 Lisp, which use parentheses rather extensively, it might be of |
19 Lisp, which use parentheses rather extensively, it might be of |
20 interest whether the following two expressions are |
20 interest to know whether the following two expressions are |
21 well-parenthesised (the left one is, the right one is not): |
21 well-parenthesised or not (the left one is, the right one is not): |
22 |
22 |
23 \begin{center} |
23 \begin{center} |
24 $(((()()))())$ \hspace{10mm} $(((()()))()))$ |
24 $(((()()))())$ \hspace{10mm} $(((()()))()))$ |
25 \end{center} |
25 \end{center} |
26 |
26 |
27 \noindent Not being able to solve such recognition problems is |
27 \noindent Not being able to solve such recognition problems is |
28 a serious limitation. In order to solve such recognition |
28 a serious limitation. In order to solve such recognition |
29 problems, we need more powerful techniques than regular |
29 problems, we need more powerful techniques than regular |
30 expressions. We will in particular look at \emph{context-free |
30 expressions. We will in particular look at \emph{context-free |
31 languages}. They include the regular languages as the picture |
31 languages}. They include the regular languages as the picture |
32 below shows: |
32 below about language classes shows: |
33 |
33 |
34 |
34 |
35 \begin{center} |
35 \begin{center} |
36 \begin{tikzpicture} |
36 \begin{tikzpicture} |
37 [rect/.style={draw=black!50, |
37 [rect/.style={draw=black!50, |
44 \draw (0,-0.84) node [rect, text depth=7mm, text width=35mm] {\small context-free languages}; |
44 \draw (0,-0.84) node [rect, text depth=7mm, text width=35mm] {\small context-free languages}; |
45 \draw (0,-1.05) node [rect] {\small regular languages}; |
45 \draw (0,-1.05) node [rect] {\small regular languages}; |
46 \end{tikzpicture} |
46 \end{tikzpicture} |
47 \end{center} |
47 \end{center} |
48 |
48 |
49 \noindent Context-free languages play an important role in |
49 \noindent Each ``bubble'' stands for sets of languages (remember |
50 `day-to-day' text processing and in programming languages. |
50 languages are sets of strings). As indicated the set of regular |
51 Context-free languages are usually specified by grammars. For |
51 languages are fully included inside the context-free languages, |
52 example a grammar for well-parenthesised expressions is |
52 meaning every regular language is also context-free, but not vice |
|
53 versa. Below I will let you think for example what the context-free |
|
54 grammar is for the language corresponding to the regular expression |
|
55 $(aaa)^*a$. |
|
56 |
|
57 Because of their convenience, context-free languages play an important |
|
58 role in `day-to-day' text processing and in programming |
|
59 languages. Context-free in this setting means that ``words'' have one |
|
60 meaning only ansd this meaning is independend from in which context |
|
61 the ``words'' appear. For example ambiguity issues like |
|
62 |
|
63 \begin{center} |
|
64 \tt Time flies like an arrow; fruit flies like a banana. |
|
65 \end{center} |
|
66 |
|
67 \noindent |
|
68 from natural languages were the meaning of \emph{flies} depend on the |
|
69 surounding \emph{context} are avoided as much as possible. |
|
70 |
|
71 Context-free languages are usually specified by grammars. For example |
|
72 a grammar for well-parenthesised expressions can be given as follows: |
53 |
73 |
54 \begin{plstx}[margin=3cm] |
74 \begin{plstx}[margin=3cm] |
55 : \meta{P} ::= ( \cdot \meta{P} \cdot ) \cdot \meta{P} |
75 : \meta{P} ::= ( \cdot \meta{P} \cdot ) \cdot \meta{P} |
56 | \epsilon\\ |
76 | \epsilon\\ |
57 \end{plstx} |
77 \end{plstx} |
64 | 1\\ |
84 | 1\\ |
65 \end{plstx} |
85 \end{plstx} |
66 |
86 |
67 In general grammars consist of finitely many rules built up |
87 In general grammars consist of finitely many rules built up |
68 from \emph{terminal symbols} (usually lower-case letters) and |
88 from \emph{terminal symbols} (usually lower-case letters) and |
69 \emph{non-terminal symbols} (upper-case letters inside \meta{\mbox{}}). Rules have |
89 \emph{non-terminal symbols} (upper-case letters written in |
|
90 bold like \meta{A}, \meta{N} and so on). Rules have |
70 the shape |
91 the shape |
71 |
92 |
72 \begin{plstx}[margin=3cm] |
93 \begin{plstx}[margin=3cm] |
73 : \meta{NT} ::= rhs\\ |
94 : \meta{NT} ::= rhs\\ |
74 \end{plstx} |
95 \end{plstx} |