10 |
18 |
11 \begin{enumerate} |
19 \begin{enumerate} |
12 \item The regular expression matchers in Java, Python and Ruby can be |
20 \item The regular expression matchers in Java, Python and Ruby can be |
13 very slow with some (basic) regular expressions. What is the main |
21 very slow with some (basic) regular expressions. What is the main |
14 reason for this inefficient computation? |
22 reason for this inefficient computation? |
|
23 |
|
24 \solution{Many matchers employ DFS type of algorithms to check |
|
25 if a string is matched by the regex or not. Such algorithms |
|
26 require backtracking if have gone down the wrong path which |
|
27 can be very slow. There are also problems with bounded regular |
|
28 expressions and backreferences.} |
15 |
29 |
16 \item What is a regular language? Are there alternative ways |
30 \item What is a regular language? Are there alternative ways |
17 to define this notion? If yes, give an explanation why |
31 to define this notion? If yes, give an explanation why |
18 they define the same notion. |
32 they define the same notion. |
19 |
33 |
|
34 \solution{A regular language is a language for which every string |
|
35 can be recognized by some regular expression. Another definition is |
|
36 that it is a language for which a finite automaton can be |
|
37 constructed. Both define the same set of languages.} |
|
38 |
20 \item Why is every finite set of strings a regular language? |
39 \item Why is every finite set of strings a regular language? |
21 |
40 |
|
41 \solution{Take a regex composed of all strings (works for finite languages)} |
|
42 |
22 \item Assume you have an alphabet consisting of the letters |
43 \item Assume you have an alphabet consisting of the letters |
23 $a$, $b$ and $c$ only. (1) Find a regular expression |
44 $a$, $b$ and $c$ only. (1) Find a regular expression |
24 that recognises the two strings $ab$ and $ac$. (2) Find |
45 that recognises the two strings $ab$ and $ac$. (2) Find |
25 a regular expression that matches all strings |
46 a regular expression that matches all strings |
26 \emph{except} these two strings. Note, you can only use |
47 \emph{except} these two strings. Note, you can only use |
59 |
82 |
60 \item Given a deterministic finite automaton $A(\varSigma, Q, Q_0, F, |
83 \item Given a deterministic finite automaton $A(\varSigma, Q, Q_0, F, |
61 \delta)$, define which language is recognised by this |
84 \delta)$, define which language is recognised by this |
62 automaton. Can you define also the language defined by a |
85 automaton. Can you define also the language defined by a |
63 non-deterministic automaton? |
86 non-deterministic automaton? |
|
87 |
|
88 |
|
89 \solution{ |
|
90 A formula for DFAs is |
|
91 |
|
92 \[L(A) \dn \{s \;|\; \hat{\delta}(start_q, s) \in F\}\] |
|
93 |
|
94 For NFAs you need to first define what $\hat{\rho}$ means. If |
|
95 $\rho$ is given as a relation, you can define: |
|
96 |
|
97 \[ |
|
98 \hat{\rho}(qs, []) \dn qs \qquad |
|
99 \hat{\rho}(qs, c::s) \dn \bigcup_{q\in qs} \{ q' \; | \; \rho(q, c, q')\} |
|
100 \] |
|
101 |
|
102 This ``collects'' all the states reachable in a breadth-first |
|
103 manner. Once you have all the states reachable by an NFA, you can define |
|
104 the language as |
|
105 |
|
106 \[ |
|
107 L(N) \dn \{s \;|\; \hat{\rho}(qs_{start}, s) \cap F \not= \emptyset\} |
|
108 \] |
|
109 |
|
110 Here you test whether the all states reachable (for $s$) contain at least |
|
111 a single accepting state. |
|
112 |
|
113 } |
64 |
114 |
65 \item Given the following deterministic finite automaton over |
115 \item Given the following deterministic finite automaton over |
66 the alphabet $\{a, b\}$, find an automaton that |
116 the alphabet $\{a, b\}$, find an automaton that |
67 recognises the complement language. (Hint: Recall that |
117 recognises the complement language. (Hint: Recall that |
68 for the algorithm from the lectures, the automaton needs |
118 for the algorithm from the lectures, the automaton needs |
69 to be in completed form, that is have a transition for |
119 to be in completed form, that is have a transition for |
70 every letter from the alphabet.) |
120 every letter from the alphabet.) |
|
121 |
|
122 \solution{ |
|
123 Before exchanging accepting and non-accepting states, it is important that |
|
124 the automaton is completed (meamning has a transition for every letter |
|
125 of the alphabet). If not completed, you have to introduce a sink state. |
|
126 |
|
127 For fun you can try out the example with |
|
128 out completion: Then the original automaton can recognise |
|
129 strings of the form $a$, $ab...b$; but the ``uncompleted'' automaton would |
|
130 recognise only the empty string. |
|
131 } |
71 |
132 |
72 \begin{center} |
133 \begin{center} |
73 \begin{tikzpicture}[>=stealth',very thick,auto, |
134 \begin{tikzpicture}[>=stealth',very thick,auto, |
74 every state/.style={minimum size=0pt, |
135 every state/.style={minimum size=0pt, |
75 inner sep=2pt,draw=blue!50,very thick, |
136 inner sep=2pt,draw=blue!50,very thick, |
145 (q3) edge [bend left=95, looseness = 2.2] node [left=2mm] {$1$} (q0) |
206 (q3) edge [bend left=95, looseness = 2.2] node [left=2mm] {$1$} (q0) |
146 (q4) edge [loop right] node {$0, 1$} (); |
207 (q4) edge [loop right] node {$0, 1$} (); |
147 \end{tikzpicture} |
208 \end{tikzpicture} |
148 \end{center} |
209 \end{center} |
149 |
210 |
|
211 \solution{Q0 and Q2 can be merged; and Q1 and Q3 as well} |
|
212 |
150 \item Given the following finite deterministic automaton over the alphabet $\{a, b\}$: |
213 \item Given the following finite deterministic automaton over the alphabet $\{a, b\}$: |
151 |
214 |
152 \begin{center} |
215 \begin{center} |
153 \begin{tikzpicture}[scale=2,>=stealth',very thick,auto, |
216 \begin{tikzpicture}[scale=2,>=stealth',very thick,auto, |
154 every state/.style={minimum size=0pt, |
217 every state/.style={minimum size=0pt, |
175 \item If a non-deterministic finite automaton (NFA) has |
238 \item If a non-deterministic finite automaton (NFA) has |
176 $n$ states. How many states does a deterministic |
239 $n$ states. How many states does a deterministic |
177 automaton (DFA) that can recognise the same language |
240 automaton (DFA) that can recognise the same language |
178 as the NFA maximal need? |
241 as the NFA maximal need? |
179 |
242 |
|
243 \solution{ $2^n$ in the worst-case and for some regexes the worst case |
|
244 cannot be avoided. |
|
245 |
|
246 Other comments: $r^{\{n\}}$ can only be represented as $n$ |
|
247 copies of the automaton for $r$, which can explode the automaton for bounded |
|
248 regular expressions. Similarly, we have no idea how backreferences can be |
|
249 represented as automaton. |
|
250 } |
|
251 |
180 \item Prove that for all regular expressions $r$ we have |
252 \item Prove that for all regular expressions $r$ we have |
181 |
253 |
182 \begin{center} |
254 \begin{center} |
183 $\textit{nullable}(r) \quad \text{if and only if} |
255 $\textit{nullable}(r) \quad \text{if and only if} |
184 \quad [] \in L(r)$ |
256 \quad [] \in L(r)$ |