author | Christian Urban <christian.urban@kcl.ac.uk> |
Sat, 09 Oct 2021 12:29:15 +0100 | |
changeset 843 | 97b622202547 |
parent 768 | 34f77b976b88 |
child 892 | f4df090a84d0 |
permissions | -rw-r--r-- |
31 | 1 |
\documentclass{article} |
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
2 |
\usepackage{../style} |
292
7ed2a25dd115
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
267
diff
changeset
|
3 |
\usepackage{../graphics} |
146
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
4 |
|
31 | 5 |
\begin{document} |
6 |
||
7 |
\section*{Homework 4} |
|
8 |
||
347
22b5294daa2a
updated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
294
diff
changeset
|
9 |
\HEADER |
22b5294daa2a
updated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
294
diff
changeset
|
10 |
|
31 | 11 |
\begin{enumerate} |
34 | 12 |
|
726 | 13 |
\item Given the regular expressions |
14 |
||
15 |
\begin{center} |
|
16 |
\begin{tabular}{ll} |
|
17 |
1) & $(ab + a)\cdot (\ONE + b)$\\ |
|
18 |
2) & $(aa + a)^*$\\ |
|
19 |
\end{tabular} |
|
20 |
\end{center} |
|
21 |
||
22 |
there are several values for how these regular expressions can |
|
23 |
recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case |
|
24 |
\emph{all} the values and indicate which one is the POSIX value. |
|
25 |
||
26 |
||
444
3056a4c071b0
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
401
diff
changeset
|
27 |
\item If a regular expression $r$ does not contain any occurrence of $\ZERO$, |
525 | 28 |
is it possible for $L(r)$ to be empty? Explain why, or give a proof. |
32 | 29 |
|
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
30 |
\item Define the tokens and regular expressions for a language |
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
31 |
consisting of numbers, left-parenthesis $($, right-parenthesis $)$, |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
32 |
identifiers and the operations $+$, $-$ and $*$. Can the following |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
33 |
strings in this language be lexed? |
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
34 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
35 |
\begin{itemize} |
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
36 |
\item $(a + 3) * b$ |
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
37 |
\item $)()++ -33$ |
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
38 |
\item $(a / 3) * 3$ |
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
39 |
\end{itemize} |
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
40 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
41 |
In case they can, can you give the corresponding token |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
42 |
sequences. |
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
43 |
|
768 | 44 |
\item Assume $r$ is nullable. Show that |
45 |
\[ 1 + r + r\cdot r \;\equiv\; r\cdot r |
|
46 |
\] |
|
47 |
||
48 |
holds. |
|
49 |
||
50 |
\item \textbf{(Deleted)} Assume that $s^{-1}$ stands for the operation of reversing a |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
51 |
string $s$. Given the following \emph{reversing} function on regular |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
52 |
expressions |
32 | 53 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
54 |
\begin{center} |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
55 |
\begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l} |
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
56 |
$rev(\ZERO)$ & $\dn$ & $\ZERO$\\ |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
57 |
$rev(\ONE)$ & $\dn$ & $\ONE$\\ |
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
58 |
$rev(c)$ & $\dn$ & $c$\\ |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
59 |
$rev(r_1 + r_2)$ & $\dn$ & $rev(r_1) + rev(r_2)$\\ |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
60 |
$rev(r_1 \cdot r_2)$ & $\dn$ & $rev(r_2) \cdot rev(r_1)$\\ |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
61 |
$rev(r^*)$ & $\dn$ & $rev(r)^*$\\ |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
62 |
\end{tabular} |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
63 |
\end{center} |
34 | 64 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
65 |
and the set |
32 | 66 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
67 |
\begin{center} |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
68 |
$Rev\,A \dn \{s^{-1} \;|\; s \in A\}$ |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
69 |
\end{center} |
31 | 70 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
71 |
prove whether |
32 | 72 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
73 |
\begin{center} |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
74 |
$L(rev(r)) = Rev (L(r))$ |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
75 |
\end{center} |
31 | 76 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
77 |
holds. |
42 | 78 |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
79 |
\item Assume the delimiters for comments are |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
80 |
\texttt{$\slash$*} and \texttt{*$\slash$}. Give a |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
81 |
regular expression that can recognise comments of the |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
82 |
form |
42 | 83 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
84 |
\begin{center} |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
85 |
\texttt{$\slash$*~\ldots{}~*$\slash$} |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
86 |
\end{center} |
42 | 87 |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
88 |
where the three dots stand for arbitrary characters, but |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
89 |
not comment delimiters. (Hint: You can assume you are |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
90 |
already given a regular expression written \texttt{ALL}, |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
91 |
that can recognise any character, and a regular |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
92 |
expression \texttt{NOT} that recognises the complement |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
93 |
of a regular expression.) |
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
94 |
|
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
95 |
\item Simplify the regular expression |
42 | 96 |
|
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
97 |
\[ |
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
98 |
(\ZERO \cdot (b \cdot c)) + |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
99 |
((\ZERO \cdot c) + \ONE) |
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
100 |
\] |
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
101 |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
102 |
Does simplification always preserve the meaning of a |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
103 |
regular expression? |
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
104 |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
105 |
\item The Sulzmann \& Lu algorithm contains the function |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
106 |
$mkeps$ which answers how a regular expression can match |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
107 |
the empty string. What is the answer of $mkeps$ for the |
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
108 |
regular expressions: |
146
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
109 |
|
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
110 |
\[ |
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
111 |
\begin{array}{l} |
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
112 |
(\ZERO \cdot (b \cdot c)) + |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
113 |
((\ZERO \cdot c) + \ONE)\\ |
577 | 114 |
(a + \ONE) \cdot (\ONE + \ONE)\\ |
115 |
a^* |
|
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
116 |
\end{array} |
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
117 |
\] |
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
118 |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
119 |
\item What is the purpose of the record regular expression in |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
120 |
the Sulzmann \& Lu algorithm? |
498 | 121 |
|
843 | 122 |
\item Recall the functions \textit{nullable} and |
123 |
\textit{zeroable}. Define recursive functions |
|
124 |
\textit{atmostempty} (for regular expressions that match no |
|
125 |
string or only the empty string), \textit{somechars} (for |
|
126 |
regular expressions that match some non-empty string), |
|
127 |
\textit{infinitestrings} (for regular expressions that can match |
|
128 |
infinitely many strings). |
|
498 | 129 |
|
843 | 130 |
\item There are two kinds of automata that are generate for |
131 |
regular expression matching---DFAs and NFAs. (1) Regular expression engines like |
|
132 |
the one in Python generate NFAs. Explain what is the problem with such |
|
133 |
NFAs and what is the reason why they use NFAs. (2) Regular expression |
|
134 |
engines like the one in Rust generate DFAs. Explain what is the |
|
135 |
problem with these regex engines and also what is the problem with $a^{\{1000\}}$ |
|
136 |
in these engines. |
|
137 |
||
146
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
138 |
%\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as |
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
139 |
%argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so |
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
140 |
%that it filters out all comments and whitespace from the result. |
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
141 |
|
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
142 |
%\item (Optional) Modify the tokenizer in \texttt{regexp2.scala} so that it |
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
143 |
%implements the \texttt{findAll} function. This function takes a regular |
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
144 |
%expressions and a string, and returns all substrings in this string that |
444
3056a4c071b0
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
401
diff
changeset
|
145 |
%match the regular expression. |
3056a4c071b0
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
401
diff
changeset
|
146 |
|
3056a4c071b0
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
401
diff
changeset
|
147 |
\item \POSTSCRIPT |
31 | 148 |
\end{enumerate} |
149 |
||
150 |
||
151 |
\end{document} |
|
152 |
||
153 |
%%% Local Variables: |
|
154 |
%%% mode: latex |
|
155 |
%%% TeX-master: t |
|
156 |
%%% End: |