1
00:00:06,410 --> 00:00:09,390
The final video for this week.
2
00:00:09,390 --> 00:00:12,465
And let me say something
about the coursework.
3
00:00:12,465 --> 00:00:15,300
First. You can solve
4
00:00:15,300 --> 00:00:17,745
the coursework in any programming
language you're like,
5
00:00:17,745 --> 00:00:22,440
I already said that. You
have to submit a PDF file.
6
00:00:22,440 --> 00:00:23,850
There will be some questions,
7
00:00:23,850 --> 00:00:26,250
so you have to write down the
solution to this questions.
8
00:00:26,250 --> 00:00:30,315
Please use a PDF and you have
to submit your source code.
9
00:00:30,315 --> 00:00:35,580
And yes, if you do use a
language which isn't Scala,
10
00:00:35,580 --> 00:00:39,450
it might help to tell me
how I can run your code.
11
00:00:39,450 --> 00:00:42,550
If I can't run your code,
I will ask you anyway.
12
00:00:42,550 --> 00:00:50,044
Also, please submit both the
PDF and the code in a file,
13
00:00:50,044 --> 00:00:52,730
in a zip file, which generates
14
00:00:52,730 --> 00:00:55,835
a subdirectory with your
name and your family name.
15
00:00:55,835 --> 00:00:58,970
That we'll just save a
lot of time for me to try
16
00:00:58,970 --> 00:01:02,030
to figure out who
has submitted what.
17
00:01:02,030 --> 00:01:04,445
Please do that.
18
00:01:04,445 --> 00:01:07,789
So what is the task
into coursework?
19
00:01:07,789 --> 00:01:09,885
I essentially showed you how
20
00:01:09,885 --> 00:01:11,995
the Brzozowski matcher
21
00:01:11,995 --> 00:01:14,645
works for the basic
regular expressions.
22
00:01:14,645 --> 00:01:16,295
I also showed you actually how it
23
00:01:16,295 --> 00:01:18,110
works for the n-times.
24
00:01:18,110 --> 00:01:20,300
And the task in coursework
25
00:01:20,300 --> 00:01:22,970
is that you extend the
Brzozowski matcher to
26
00:01:22,970 --> 00:01:25,820
the other regular expressions
27
00:01:25,820 --> 00:01:27,800
from the extended
regular expressions.
28
00:01:27,800 --> 00:01:30,245
So you're supposed
to extended with
29
00:01:30,245 --> 00:01:34,490
r-plus, optional r, for n-times.
30
00:01:34,490 --> 00:01:38,420
You've already seen that.
For 0 or more times,
31
00:01:38,420 --> 00:01:41,120
but not more than m
times regular expression.
32
00:01:41,120 --> 00:01:45,890
For at least n-times regular
expression and for between
33
00:01:45,890 --> 00:01:47,480
n times and m times
34
00:01:47,480 --> 00:01:50,810
regular expression and also this
NOT regular expression.
35
00:01:50,810 --> 00:01:52,790
So your task is to
36
00:01:52,790 --> 00:01:55,310
essentially find the definition
37
00:01:55,310 --> 00:01:57,155
for nullable in these cases.
38
00:01:57,155 --> 00:02:00,215
Find a definition for derivative,
39
00:02:00,215 --> 00:02:02,480
implement them,
write them down in
40
00:02:02,480 --> 00:02:06,065
a PDF and then do some
experiments with them.
41
00:02:06,065 --> 00:02:08,875
Well, how can you do that?
42
00:02:08,875 --> 00:02:10,370
Well I've given you
43
00:02:10,370 --> 00:02:13,565
the meaning of these additional
regular expressions.
44
00:02:13,565 --> 00:02:16,730
So here's, for example, the
meaning of this r-plus.
45
00:02:16,730 --> 00:02:18,290
So that would be, I have
46
00:02:18,290 --> 00:02:22,115
at least one copy up to infinity.
47
00:02:22,115 --> 00:02:25,070
And the optional-r would be it's
48
00:02:25,070 --> 00:02:28,370
the language of r plus
the empty string.
49
00:02:28,370 --> 00:02:30,440
If I have it exactly n times,
50
00:02:30,440 --> 00:02:31,985
then it would be
just the language
51
00:02:31,985 --> 00:02:34,010
of r exactly n-times.
52
00:02:34,010 --> 00:02:38,105
And here you have union
from 0 to m and so on.
53
00:02:38,105 --> 00:02:42,560
This might be a slightly
interesting regular expression.
54
00:02:42,560 --> 00:02:46,580
So that's supposed to be the
character set that is very
55
00:02:46,580 --> 00:02:48,410
similar to character ranges like
56
00:02:48,410 --> 00:02:51,005
in the existing regular
expression matchers.
57
00:02:51,005 --> 00:02:52,820
So this just says
58
00:02:52,820 --> 00:02:55,565
...this regular
expression can match
59
00:02:55,565 --> 00:03:00,860
either the character c1 or
the character c2 up to cn.
60
00:03:00,860 --> 00:03:03,620
Why do I include
that regular expression?
61
00:03:03,620 --> 00:03:09,050
I could also just say
c1 plus c2 plus c3...
62
00:03:09,050 --> 00:03:12,889
a big alternative.
Well that's possible.
63
00:03:12,889 --> 00:03:16,595
But remember the Achilles'
heel of this algorithm is,
64
00:03:16,595 --> 00:03:18,425
if the regular expression is large,
65
00:03:18,425 --> 00:03:20,825
then the computation
take always a long time.
66
00:03:20,825 --> 00:03:23,840
So I'm trying to compress
that as much as I can.
67
00:03:23,840 --> 00:03:25,370
So instead of giving a big
68
00:03:25,370 --> 00:03:29,134
alternative, I am trying to
give one regular expression,
69
00:03:29,134 --> 00:03:31,340
which can not just match
a single character,
70
00:03:31,340 --> 00:03:34,230
but a set of characters.
71
00:03:34,630 --> 00:03:36,980
How can you be sure that
72
00:03:36,980 --> 00:03:39,215
definition you come
up with are correct?
73
00:03:39,215 --> 00:03:41,450
Well, I showed you which are
74
00:03:41,450 --> 00:03:45,170
the properties these two
functions need to satisfy.
75
00:03:45,170 --> 00:03:47,060
I've given you here what
76
00:03:47,060 --> 00:03:49,325
the meaning of these
expressions are.
77
00:03:49,325 --> 00:03:52,700
So you will always know what's
on the right-hand side.
78
00:03:52,700 --> 00:03:54,515
This is completely determined.
79
00:03:54,515 --> 00:03:56,720
You essentially
have to fill something
80
00:03:56,720 --> 00:03:58,910
equivalent on the left-hand side.
81
00:03:58,910 --> 00:04:02,105
That's the main task
of the coursework.
82
00:04:02,105 --> 00:04:08,090
And you can start from the
files I provided on KEATS.
83
00:04:08,090 --> 00:04:12,125
So you would just have to
add these additional cases.
84
00:04:12,125 --> 00:04:15,020
When you add these
additional cases
85
00:04:15,020 --> 00:04:17,330
and you're using the Scala language,
86
00:04:17,330 --> 00:04:18,980
you can do me a favour.
87
00:04:18,980 --> 00:04:22,550
You can call these
constructors for
88
00:04:22,550 --> 00:04:25,400
these different regular
expressions as RANGE,
89
00:04:25,400 --> 00:04:28,985
PLUS, OPTIONAL and NTIMES,
UPTO, FROM and BETWEEN.
90
00:04:28,985 --> 00:04:31,370
Remember I have this
convention to always
91
00:04:31,370 --> 00:04:34,025
use capital letters
for regular expressions.
92
00:04:34,025 --> 00:04:36,680
It would be nice if you could use
93
00:04:36,680 --> 00:04:39,260
these names because
then it will be
94
00:04:39,260 --> 00:04:42,335
very easy for me
to test your code.
95
00:04:42,335 --> 00:04:44,690
If you're using a different
programming language
96
00:04:44,690 --> 00:04:46,370
like let's say Rust,
97
00:04:46,370 --> 00:04:48,860
I expect some people will use that, where I
98
00:04:48,860 --> 00:04:51,380
have no idea what are the
conventions in your language,
99
00:04:51,380 --> 00:04:53,420
how you have to call
these constructors,
100
00:04:53,420 --> 00:04:56,420
we will see whatever you
submit. I'll have a look.
101
00:04:56,420 --> 00:04:59,360
There's one more
constraint I have to
102
00:04:59,360 --> 00:05:02,194
impose to make this
coursework interesting.
103
00:05:02,194 --> 00:05:04,730
I do not want you
that you just take
104
00:05:04,730 --> 00:05:07,370
these extended regular
expressions and that you
105
00:05:07,370 --> 00:05:10,550
expand them using
their definition.
106
00:05:10,550 --> 00:05:12,320
Because that would make the regular
107
00:05:12,320 --> 00:05:13,955
expression matcher very slow.
108
00:05:13,955 --> 00:05:16,160
So for example, if
you want to define
109
00:05:16,160 --> 00:05:18,785
what's the derivative for r-plus,
110
00:05:18,785 --> 00:05:21,560
then what does not
count as a solution:
111
00:05:21,560 --> 00:05:24,770
if you just expand that
to the definition that r
112
00:05:24,770 --> 00:05:28,935
plus is equivalent to
r followed by r-star.
113
00:05:28,935 --> 00:05:32,845
So in code, what I
would like to not see,
114
00:05:32,845 --> 00:05:35,680
I would not give any
marks for that is,
115
00:05:35,680 --> 00:05:37,780
if you add the plus as follows,
116
00:05:37,780 --> 00:05:39,910
so that is still perfectly fine.
117
00:05:39,910 --> 00:05:42,160
There's nothing you
can do differently.
118
00:05:42,160 --> 00:05:44,065
That would be the constructor.
119
00:05:44,065 --> 00:05:46,480
But when you define nullable,
120
00:05:46,480 --> 00:05:49,180
I do not want to see
that you defined
121
00:05:49,180 --> 00:05:51,790
this plus r as nullable
122
00:05:51,790 --> 00:05:54,385
of the sequence of r and star-r,
123
00:05:54,385 --> 00:05:58,075
just unfolding
the definition.
124
00:05:58,075 --> 00:06:00,415
As you can see, when you come
125
00:06:00,415 --> 00:06:02,815
up with a much better
solution for that.
126
00:06:02,815 --> 00:06:05,110
This is a very inefficient way
127
00:06:05,110 --> 00:06:07,090
for how to calculate nullable.
128
00:06:07,090 --> 00:06:10,825
And so the same for derivative
would not be allowed.
129
00:06:10,825 --> 00:06:13,895
If you, I have the plus r here,
130
00:06:13,895 --> 00:06:16,685
you can't just unfold
the definition,
131
00:06:16,685 --> 00:06:19,790
with r-plus
being defined as r
132
00:06:19,790 --> 00:06:21,350
followed by r star and
133
00:06:21,350 --> 00:06:23,375
then just build the
derivative of that.
134
00:06:23,375 --> 00:06:25,280
You have to find something more
135
00:06:25,280 --> 00:06:28,730
primitive that involves
only the derivative of r,
136
00:06:28,730 --> 00:06:31,790
essentially, nothing
more involved. The same
137
00:06:31,790 --> 00:06:33,815
if you have n-times, for example,
138
00:06:33,815 --> 00:06:39,935
you can't just unfold this
n-times in this sequence
139
00:06:39,935 --> 00:06:43,310
of .... n-copies
140
00:06:43,310 --> 00:06:47,285
of this r. You have to get
something more primitive.
141
00:06:47,285 --> 00:06:49,760
Otherwise, as you've
seen earlier,
142
00:06:49,760 --> 00:06:53,135
your regular expression matcher
would be totally slow.
143
00:06:53,135 --> 00:06:55,475
When you submit your solution,
144
00:06:55,475 --> 00:06:58,445
please submit this
solution in the PDF.
145
00:06:58,445 --> 00:07:01,580
So give the cases of your definition
146
00:07:01,580 --> 00:07:06,004
in a form like that for
nullable and derivative.
147
00:07:06,004 --> 00:07:08,405
And also implement it in code.
148
00:07:08,405 --> 00:07:10,910
That is just helping me to
149
00:07:10,910 --> 00:07:13,400
find out whether you have
the correct solution or not.
150
00:07:13,400 --> 00:07:15,710
So you needed a kind of
mathematical notation of
151
00:07:15,710 --> 00:07:18,695
your solution, and
an actual code.
152
00:07:18,695 --> 00:07:22,414
And then once you
have your definition,
153
00:07:22,414 --> 00:07:25,699
also make sure you try
it out on some examples.
154
00:07:25,699 --> 00:07:28,970
These will be the examples
I will definitely try out,
155
00:07:28,970 --> 00:07:30,560
but probably also more
156
00:07:30,560 --> 00:07:33,215
depending on what
definitions you give me.
157
00:07:33,215 --> 00:07:38,300
There's one more question I
ask you to do. You've seen
158
00:07:38,300 --> 00:07:40,130
there's some regular
expressions that
159
00:07:40,130 --> 00:07:42,215
are involved for characters,
160
00:07:42,215 --> 00:07:46,925
for character ranges or
character sets essentially.
161
00:07:46,925 --> 00:07:48,665
And sometimes I also want to have
162
00:07:48,665 --> 00:07:51,665
just a reg expression which
can match any character!!
163
00:07:51,665 --> 00:07:56,195
And I could have for all of
them separate definitions.
164
00:07:56,195 --> 00:07:57,710
And that would mean I also need
165
00:07:57,710 --> 00:07:59,645
separate definitions for nullable,
166
00:07:59,645 --> 00:08:02,405
separate definitions
for derivative.
167
00:08:02,405 --> 00:08:05,825
But actually they can be
generalized to just one constructor.
168
00:08:05,825 --> 00:08:08,210
And that is if we introduce
169
00:08:08,210 --> 00:08:11,630
a constructor for regular expressions,
170
00:08:11,630 --> 00:08:13,760
which not just takes
a single character
171
00:08:13,760 --> 00:08:15,469
or set of characters.
172
00:08:15,469 --> 00:08:18,245
But, which I call here CFUN.
173
00:08:18,245 --> 00:08:23,165
And it takes a function from
characters to booleans.
174
00:08:23,165 --> 00:08:24,470
So if I want to match
175
00:08:24,470 --> 00:08:26,900
just a single character
then this function f
176
00:08:26,900 --> 00:08:29,060
would only say true
177
00:08:29,060 --> 00:08:32,225
if it gets as argument
the single character.
178
00:08:32,225 --> 00:08:34,850
If I have a character set,
179
00:08:34,850 --> 00:08:36,515
then I would say, well,
180
00:08:36,515 --> 00:08:38,300
I need a function
which says true
181
00:08:38,300 --> 00:08:40,970
for all the characters
in this set.
182
00:08:40,970 --> 00:08:43,460
And otherwise it's false.
183
00:08:43,460 --> 00:08:46,205
And if I want to
match any character!!,
184
00:08:46,205 --> 00:08:48,500
then they should get a function
185
00:08:48,500 --> 00:08:53,450
which says true for
all characters.
186
00:08:53,450 --> 00:08:56,630
Okay? If I have
such a constructor
187
00:08:56,630 --> 00:09:00,140
that actually I can save
myself a bit of work.
188
00:09:00,140 --> 00:09:03,200
And I can just have one case
189
00:09:03,200 --> 00:09:06,920
for nullable and one
case for CFUNS.
190
00:09:06,920 --> 00:09:09,800
So that would then subsume
the character ranges and
191
00:09:09,800 --> 00:09:13,385
the character and also
this ALL regular expression.
192
00:09:13,385 --> 00:09:15,380
Ok, this was not explicitly
included at the beginning,
193
00:09:15,380 --> 00:09:17,510
but that's what I can now define.
194
00:09:17,510 --> 00:09:21,740
And then I can define
this for characters,
195
00:09:21,740 --> 00:09:23,885
just as special cases
for these functions.
196
00:09:23,885 --> 00:09:25,610
And I told you already
what this function
197
00:09:25,610 --> 00:09:28,025
should look like in
these three cases.
198
00:09:28,025 --> 00:09:30,350
So I let ...
199
00:09:30,350 --> 00:09:33,515
you decide how you're
going to implement that.
200
00:09:33,515 --> 00:09:35,450
If you first define
201
00:09:35,450 --> 00:09:38,495
your implementation is
all these explicit cases.
202
00:09:38,495 --> 00:09:41,900
Because in these cases it's
actually more intuitive,
203
00:09:41,900 --> 00:09:45,980
what nullable and
derivative should be.
204
00:09:45,980 --> 00:09:48,035
And then in a second step,
205
00:09:48,035 --> 00:09:51,140
you implement these
more general cases.
206
00:09:51,140 --> 00:09:53,360
And just keep the original ones
207
00:09:53,360 --> 00:09:54,500
in your implementation if you
208
00:09:54,500 --> 00:09:57,710
want, or if you feel
adventurous enough,
209
00:09:57,710 --> 00:09:59,225
and want to be lazy,
210
00:09:59,225 --> 00:10:01,115
that you just implement that.
211
00:10:01,115 --> 00:10:02,660
And then you have already done
212
00:10:02,660 --> 00:10:06,530
at least two constructors
by just implementing one.
213
00:10:06,530 --> 00:10:08,915
But as said that
requires a bit skill
214
00:10:08,915 --> 00:10:11,450
of generalizing how
that should be.
215
00:10:11,450 --> 00:10:14,180
And the other questions
are just then
216
00:10:14,180 --> 00:10:16,745
trying out your
expression matcher on
217
00:10:16,745 --> 00:10:19,580
some examples, more
power examples,
218
00:10:19,580 --> 00:10:22,400
and they should be
very easy to solve.
219
00:10:22,400 --> 00:10:24,605
So I hope it's not
too much asked for
220
00:10:24,605 --> 00:10:26,615
and I hope you have fun.
221
00:10:26,615 --> 00:10:29,675
It is not too much ask
because you can,
222
00:10:29,675 --> 00:10:32,420
I hope it's not too much
because you can start from
223
00:10:32,420 --> 00:10:35,615
my definitions or
my implementation.
224
00:10:35,615 --> 00:10:39,425
It's really essentially
mostly is brain work,
225
00:10:39,425 --> 00:10:42,170
how these nullable and
derivative should work.
226
00:10:42,170 --> 00:10:44,510
If you're in a
language like Scala,
227
00:10:44,510 --> 00:10:48,875
the implementation should
then be easy-peasy.
228
00:10:48,875 --> 00:10:51,365
If you are in a different language
229
00:10:51,365 --> 00:10:52,610
I assume you also
230
00:10:52,610 --> 00:10:54,890
dedicated to that
language to start with,
231
00:10:54,890 --> 00:10:58,475
so it should be also very
easy for you to translate
232
00:10:58,475 --> 00:11:01,100
my Scala code into whatever
language you want to
233
00:11:01,100 --> 00:11:04,280
do, first and then
start from there.
234
00:11:04,280 --> 00:11:06,230
If there are any questions,
235
00:11:06,230 --> 00:11:08,360
please ask me or the TAs.
236
00:11:08,360 --> 00:11:10,040
We are trying to be as helpful
237
00:11:10,040 --> 00:11:12,900
as possible with the coursework.
238
00:11:13,240 --> 00:11:17,885
Bear in mind, this is the
first step in our compiler.
239
00:11:17,885 --> 00:11:21,005
The coursework 2 will then
build on top of that.
240
00:11:21,005 --> 00:11:25,770
So it's better to get this
correct first. Thanks.