67 \end{tabular} |
67 \end{tabular} |
68 \end{center} |
68 \end{center} |
69 |
69 |
70 What is the language recognised by this automaton? |
70 What is the language recognised by this automaton? |
71 |
71 |
|
72 |
|
73 |
72 \item Give a non-deterministic finite automaton that can |
74 \item Give a non-deterministic finite automaton that can |
73 recognise the language $L(a\cdot (a + b)^* \cdot c)$. |
75 recognise the language $L(a\cdot (a + b)^* \cdot c)$. |
|
76 |
|
77 \solution{It is already possible to just read off the automaton without |
|
78 going through Thompson.} |
74 |
79 |
75 \item Given a deterministic finite automaton $A(\varSigma, Q, Q_0, F, |
80 \item Given a deterministic finite automaton $A(\varSigma, Q, Q_0, F, |
76 \delta)$, define which language is recognised by this |
81 \delta)$, define which language is recognised by this |
77 automaton. Can you define also the language defined by a |
82 automaton. Can you define also the language defined by a |
78 non-deterministic automaton? |
83 non-deterministic automaton? |
230 \item If a non-deterministic finite automaton (NFA) has |
235 \item If a non-deterministic finite automaton (NFA) has |
231 $n$ states. How many states does a deterministic |
236 $n$ states. How many states does a deterministic |
232 automaton (DFA) that can recognise the same language |
237 automaton (DFA) that can recognise the same language |
233 as the NFA maximal need? |
238 as the NFA maximal need? |
234 |
239 |
235 \solution{ $2^n$ in the worst-case and for some regexes the worst case |
240 \solution{$2^n$ in the worst-case and for some regexes the worst case |
236 cannot be avoided. |
241 cannot be avoided. |
237 |
242 |
238 Other comments: $r^{\{n\}}$ can only be represented as $n$ |
243 Other comments: $r^{\{n\}}$ can only be represented as $n$ |
239 copies of the automaton for $r$, which can explode the automaton for bounded |
244 copies of the automaton for $r$, which can explode the automaton for bounded |
240 regular expressions. Similarly, we have no idea how backreferences can be |
245 regular expressions. Similarly, we have no idea how backreferences can be |
241 represented as automaton. |
246 represented as automaton. |
242 } |
247 } |
|
248 |
|
249 \item Rust implements a non-backtracking regular expression matcher |
|
250 based on the classic idea of DFAs. Still, some regular expressions |
|
251 take a surprising amount of time for matching problems. Explain the |
|
252 problem? |
|
253 |
|
254 \solution{The problem has to do with bounded regular expressions, |
|
255 such as $r^{\{n\}}$. They are represented as $n$-copies of some |
|
256 automaton for $r$. If $n$ is large, then this can result in a |
|
257 large memory-footprint and slow runtime.} |
243 |
258 |
244 \item Prove that for all regular expressions $r$ we have |
259 \item Prove that for all regular expressions $r$ we have |
245 |
260 |
246 \begin{center} |
261 \begin{center} |
247 $\textit{nullable}(r) \quad \text{if and only if} |
262 $\textit{nullable}(r) \quad \text{if and only if} |