1+ −
00:00:09,990 --> 00:00:13,465+ −
Welcome back to this+ −
week's lecture.+ −
+ −
2+ −
00:00:13,465 --> 00:00:16,450+ −
The task this week is to actually+ −
+ −
3+ −
00:00:16,450 --> 00:00:20,320+ −
implement a regular+ −
expression matcher.+ −
+ −
4+ −
00:00:20,320 --> 00:00:22,510+ −
And we want to be a bit+ −
+ −
5+ −
00:00:22,510 --> 00:00:25,315+ −
faster than the regular+ −
expression matcher+ −
+ −
6+ −
00:00:25,315 --> 00:00:29,380+ −
in Python, Ruby,+ −
Javascript, and Java.+ −
+ −
7+ −
00:00:29,380 --> 00:00:31,960+ −
Remember this regular expression+ −
+ −
8+ −
00:00:31,960 --> 00:00:35,785+ −
and strings which are+ −
just a number of a's,+ −
+ −
9+ −
00:00:35,785 --> 00:00:39,670+ −
these regular expression matchers+ −
where abysmally slow.+ −
+ −
10+ −
00:00:39,670 --> 00:00:43,170+ −
They could only match+ −
approximately 30 a's in+ −
+ −
11+ −
00:00:43,170 --> 00:00:45,665+ −
something like half a minute.+ −
+ −
12+ −
00:00:45,665 --> 00:00:49,460+ −
What we like to do with+ −
our regular expression matcher.+ −
+ −
13+ −
00:00:49,460 --> 00:00:51,890+ −
We will try a few times,+ −
+ −
14+ −
00:00:51,890 --> 00:00:55,505+ −
but our second trial will already+ −
be much, much better.+ −
+ −
15+ −
00:00:55,505 --> 00:00:58,400+ −
It will probably+ −
match around maybe+ −
+ −
16+ −
00:00:58,400 --> 00:01:02,030+ −
thousand a's in something+ −
in half a minute.+ −
+ −
17+ −
00:01:02,030 --> 00:01:03,920+ −
But our third version in+ −
+ −
18+ −
00:01:03,920 --> 00:01:06,230+ −
comparison will be+ −
blindingly fast.+ −
+ −
19+ −
00:01:06,230 --> 00:01:08,180+ −
And we'll be able to match+ −
+ −
20+ −
00:01:08,180 --> 00:01:10,145+ −
strings of length 10,000 a's+ −
+ −
21+ −
00:01:10,145 --> 00:01:12,230+ −
and will not require+ −
+ −
22+ −
00:01:12,230 --> 00:01:14,975+ −
more than five seconds.+ −
So let's go ahead with that.+ −
+ −
23+ −
00:01:14,975 --> 00:01:18,095+ −
We will first implement+ −
+ −
24+ −
00:01:18,095 --> 00:01:19,880+ −
our regular expression+ −
matcher only+ −
+ −
25+ −
00:01:19,880 --> 00:01:22,069+ −
for the basic+ −
regular expressions.+ −
+ −
26+ −
00:01:22,069 --> 00:01:25,430+ −
So we will have only six+ −
cases to think about.+ −
+ −
27+ −
00:01:25,430 --> 00:01:27,620+ −
This will simplify matters at first.+ −
+ −
28+ −
00:01:27,620 --> 00:01:30,350+ −
Later we can+ −
think about how to+ −
+ −
29+ −
00:01:30,350 --> 00:01:34,100+ −
extend that to the extended+ −
regular expressions.+ −
+ −
30+ −
00:01:34,100 --> 00:01:37,625+ −
Unfortunately, before we can+ −
start our implementation,+ −
+ −
31+ −
00:01:37,625 --> 00:01:39,290+ −
we have to discuss+ −
+ −
32+ −
00:01:39,290 --> 00:01:42,470+ −
when two regular+ −
expressions are equivalent.+ −
+ −
33+ −
00:01:42,470 --> 00:01:46,595+ −
And we use here this notation+ −
of the triple equals.+ −
+ −
34+ −
00:01:46,595 --> 00:01:48,215+ −
We say a regular expression+ −
+ −
35+ −
00:01:48,215 --> 00:01:50,180+ −
r1 and r2 are+ −
+ −
36+ −
00:01:50,180 --> 00:01:56,059+ −
equivalent if and only+ −
if the language of r1 is+ −
+ −
37+ −
00:01:56,059 --> 00:01:59,360+ −
equal to the language of r2.+ −
+ −
38+ −
00:01:59,360 --> 00:02:02,915+ −
Sounds complicated,+ −
but essentially means+ −
+ −
39+ −
00:02:02,915 --> 00:02:04,700+ −
if the two regular expressions can+ −
+ −
40+ −
00:02:04,700 --> 00:02:06,950+ −
match exactly the same strings,+ −
+ −
41+ −
00:02:06,950 --> 00:02:09,800+ −
then we want to regard+ −
them as equivalent.+ −
+ −
42+ −
00:02:09,800 --> 00:02:14,300+ −
This equivalence justifies+ −
why we can be sloppy+ −
+ −
43+ −
00:02:14,300 --> 00:02:16,040+ −
with parentheses.+ −
+ −
44+ −
00:02:16,040 --> 00:02:23,870+ −
For example, if we have+ −
(r1 + r2) + r3,+ −
+ −
45+ −
00:02:23,870 --> 00:02:32,270+ −
then this will be equivalent+ −
to r1 + (r2 + r3).+ −
+ −
46+ −
00:02:32,270 --> 00:02:34,910+ −
Remember, regular+ −
expressions are trees,+ −
+ −
47+ −
00:02:34,910 --> 00:02:37,940+ −
so these are two different+ −
regular expressions,+ −
+ −
48+ −
00:02:37,940 --> 00:02:41,540+ −
but they can match+ −
the same strings.+ −
+ −
49+ −
00:02:41,540 --> 00:02:45,695+ −
Because, well, if we take+ −
here the meaning of that,+ −
+ −
50+ −
00:02:45,695 --> 00:02:54,350+ −
that would be just L(r1 + r2 + r3),+ −
+ −
+ −
51+ −
00:02:54,350 --> 00:03:04,100+ −
which is equal to L(r1 + r2) u L(r3).+ −
+ −
+ −
52+ −
00:03:04,100 --> 00:03:10,595+ −
And that is equal to of + −
+ −
53+ −
00:03:10,595 --> 00:03:15,710+ −
L(r1) u L(r2) u L(r3).+ −
+ −
+ −
54+ −
00:03:15,710 --> 00:03:17,930+ −
And if you do the+ −
same calculation+ −
+ −
55+ −
00:03:17,930 --> 00:03:19,445+ −
for that regular expression,+ −
+ −
56+ −
00:03:19,445 --> 00:03:22,940+ −
you will find out the+ −
two sets are the same.+ −
+ −
57+ −
00:03:22,940 --> 00:03:26,195+ −
So both regular expressions+ −
can match the same strings.+ −
+ −
58+ −
00:03:26,195 --> 00:03:28,805+ −
So even if they're different+ −
regular expressions,+ −
+ −
59+ −
00:03:28,805 --> 00:03:31,220+ −
we can regard them as eqivalent.+ −
+ −
60+ −
00:03:31,220 --> 00:03:33,290+ −
And so that's why we can forget+ −
+ −
61+ −
00:03:33,290 --> 00:03:35,270+ −
about writing these parentheses.+ −
+ −
62+ −
00:03:35,270 --> 00:03:40,205+ −
Let's have a look+ −
at some more examples.+ −
+ −
63+ −
00:03:40,205 --> 00:03:41,870+ −
So the first one here,+ −
+ −
64+ −
00:03:41,870 --> 00:03:43,205+ −
that is now clear.+ −
+ −
65+ −
00:03:43,205 --> 00:03:45,200+ −
We did this calculation already+ −
+ −
66+ −
00:03:45,200 --> 00:03:47,480+ −
for arbitrary regular expressions.+ −
+ −
67+ −
00:03:47,480 --> 00:03:49,520+ −
Here it is for+ −
concrete ones a, b and c.+ −
+ −
68+ −
00:03:49,520 --> 00:03:52,690+ −
Here on the left-hand side,+ −
+ −
69+ −
00:03:52,690 --> 00:03:54,895+ −
the regular expression+ −
has the same meaning+ −
+ −
70+ −
00:03:54,895 --> 00:03:56,620+ −
as on the right-hand side.+ −
+ −
71+ −
00:03:56,620 --> 00:03:58,420+ −
So they are equivalent.+ −
+ −
72+ −
00:03:58,420 --> 00:04:01,375+ −
We can ignore the parentheses.+ −
+ −
73+ −
00:04:01,375 --> 00:04:03,220+ −
If I have a choice,+ −
+ −
74+ −
00:04:03,220 --> 00:04:06,610+ −
but the choice is+ −
the same a or a,+ −
+ −
75+ −
00:04:06,610 --> 00:04:09,265+ −
then this is+ −
equivalent to just a.+ −
+ −
76+ −
00:04:09,265 --> 00:04:13,075+ −
So the same choice doesn't+ −
introduce anything more.+ −
+ −
77+ −
00:04:13,075 --> 00:04:15,370+ −
In the next case, if I have+ −
+ −
78+ −
00:04:15,370 --> 00:04:19,450+ −
a regular expression+ −
which can match a or b,+ −
+ −
79+ −
00:04:19,450 --> 00:04:23,860+ −
that can match the same+ −
strings as b or a.+ −
+ −
80+ −
00:04:23,860 --> 00:04:27,325+ −
So you have a kind of+ −
commutativity for the plus,+ −
+ −
81+ −
00:04:27,325 --> 00:04:29,485+ −
which is like on natural numbers.+ −
+ −
82+ −
00:04:29,485 --> 00:04:32,880+ −
But you would not have a+ −
communitivity for the sequence:+ −
+ −
83+ −
00:04:32,880 --> 00:04:37,070+ −
if you have+ −
first a and then b,+ −
+ −
84+ −
00:04:37,070 --> 00:04:40,850+ −
that's not the same as+ −
matching first b and then a.+ −
+ −
85+ −
00:04:40,850 --> 00:04:42,800+ −
So for the sequence ...+ −
+ −
86+ −
00:04:42,800 --> 00:04:44,615+ −
they are not equivalent.+ −
+ −
87+ −
00:04:44,615 --> 00:04:49,475+ −
However, for the sequence I+ −
can, like for the plus, ignore+ −
+ −
88+ −
00:04:49,475 --> 00:04:51,245+ −
prarentheses.+ −
+ −
89+ −
00:04:51,245 --> 00:04:55,070+ −
If I have the parentheses+ −
and this way,+ −
+ −
90+ −
00:04:55,070 --> 00:04:57,785+ −
or I have it in this way.+ −
+ −
91+ −
00:04:57,785 --> 00:05:00,605+ −
These are two different+ −
regular expressions,+ −
+ −
92+ −
00:05:00,605 --> 00:05:02,120+ −
but they have the same meaning.+ −
+ −
93+ −
00:05:02,120 --> 00:05:03,815+ −
So they are equivalent.+ −
+ −
94+ −
00:05:03,815 --> 00:05:05,780+ −
And so I can leave out parentheses+ −
+ −
95+ −
00:05:05,780 --> 00:05:09,170+ −
for times as well.+ −
+ −
96+ −
00:05:09,170 --> 00:05:12,185+ −
The next one is a slightly+ −
more interesting one.+ −
+ −
97+ −
00:05:12,185 --> 00:05:15,020+ −
If I have a regular+ −
expression which can match+ −
+ −
98+ −
00:05:15,020 --> 00:05:19,655+ −
c first followed by (a + b),+ −
+ −
99+ −
00:05:19,655 --> 00:05:25,355+ −
then this is the same as+ −
first c followed by a,+ −
+ −
100+ −
00:05:25,355 --> 00:05:28,640+ −
or c followed by b.+ −
+ −
101+ −
00:05:28,640 --> 00:05:31,265+ −
So that's the kind of+ −
distributivity law+ −
+ −
102+ −
00:05:31,265 --> 00:05:33,545+ −
on regular expressions.+ −
+ −
103+ −
00:05:33,545 --> 00:05:36,350+ −
But they're also regular expressions+ −
which are not equivalent.+ −
+ −
104+ −
00:05:36,350 --> 00:05:38,990+ −
If I have the regular+ −
expression which can+ −
+ −
105+ −
00:05:38,990 --> 00:05:42,800+ −
match the string containing+ −
exactly two a's.+ −
+ −
106+ −
00:05:42,800 --> 00:05:44,240+ −
That is not equivalent+ −
+ −
107+ −
00:05:44,240 --> 00:05:46,730+ −
to the regular+ −
expression which can just match+ −
+ −
108+ −
00:05:46,730 --> 00:05:49,250+ −
a single a. And similarly+ −
+ −
109+ −
00:05:49,250 --> 00:05:51,680+ −
in this case, if I can match+ −
+ −
110+ −
00:05:51,680 --> 00:05:56,075+ −
a or the string b followed by c,+ −
+ −
111+ −
00:05:56,075 --> 00:05:59,825+ −
that is not the same as a or b,+ −
+ −
112+ −
00:05:59,825 --> 00:06:02,000+ −
followed by a or c.+ −
+ −
113+ −
00:06:02,000 --> 00:06:05,990+ −
I'll let you think about+ −
why is that not equivalent.+ −
+ −
114+ −
00:06:05,990 --> 00:06:08,060+ −
Essentially you+ −
have to find out what's+ −
+ −
115+ −
00:06:08,060 --> 00:06:10,475+ −
the meaning of both+ −
regular expressions.+ −
+ −
116+ −
00:06:10,475 --> 00:06:14,090+ −
And are they the+ −
same sets or not?+ −
+ −
117+ −
00:06:14,090 --> 00:06:17,435+ −
There are again some+ −
interesting corner cases.+ −
+ −
118+ −
00:06:17,435 --> 00:06:20,540+ −
If I have a regular expression+ −
that can match a,+ −
+ −
119+ −
00:06:20,540 --> 00:06:23,450+ −
but followed by the regular+ −
expression which cannot match+ −
+ −
120+ −
00:06:23,450 --> 00:06:25,670+ −
anything, that is not equivalent+ −
+ −
121+ −
00:06:25,670 --> 00:06:29,255+ −
to the regular+ −
expression which can match a.+ −
+ −
122+ −
00:06:29,255 --> 00:06:31,340+ −
Again here you have+ −
to do the calculation+ −
+ −
123+ −
00:06:31,340 --> 00:06:32,915+ −
to convince you of that.+ −
+ −
124+ −
00:06:32,915 --> 00:06:35,840+ −
Similarly, if I have a regular+ −
expression which can+ −
+ −
125+ −
00:06:35,840 --> 00:06:38,750+ −
match a or the empty string,+ −
+ −
126+ −
00:06:38,750 --> 00:06:40,640+ −
this is not equivalent to+ −
+ −
127+ −
00:06:40,640 --> 00:06:43,355+ −
the regular expression+ −
which can just match a.+ −
+ −
128+ −
00:06:43,355 --> 00:06:46,925+ −
Here are some interesting+ −
ones with zeros and ones.+ −
+ −
129+ −
00:06:46,925 --> 00:06:48,860+ −
So if I have the regular expression+ −
+ −
130+ −
00:06:48,860 --> 00:06:50,975+ −
that can match the empty string,+ −
+ −
131+ −
00:06:50,975 --> 00:06:53,540+ −
this will be actually equivalent to+ −
+ −
132+ −
00:06:53,540 --> 00:06:56,435+ −
the regular expression+ −
which can match nothing,+ −
+ −
133+ −
00:06:56,435 --> 00:06:59,405+ −
but star of that.+ −
+ −
134+ −
00:06:59,405 --> 00:07:01,970+ −
Remember the star+ −
always introduces,+ −
+ −
135+ −
00:07:01,970 --> 00:07:04,370+ −
no matter what, the empty string.+ −
+ −
136+ −
00:07:04,370 --> 00:07:06,170+ −
So this regular expression can match+ −
+ −
137+ −
00:07:06,170 --> 00:07:08,930+ −
the empty string,+ −
just like the 1.+ −
+ −
138+ −
00:07:08,930 --> 00:07:12,125+ −
So these two expressions+ −
are not the same,+ −
+ −
139+ −
00:07:12,125 --> 00:07:14,210+ −
but they are equivalent.+ −
+ −
140+ −
00:07:14,210 --> 00:07:16,700+ −
And it doesn't matter+ −
whether you take+ −
+ −
141+ −
00:07:16,700 --> 00:07:20,090+ −
the empty string to the star,+ −
+ −
142+ −
00:07:20,090 --> 00:07:23,855+ −
because if I can match 0 or+ −
more times the empty string,+ −
+ −
143+ −
00:07:23,855 --> 00:07:27,450+ −
that will be equivalent to + −
just the empty string.+ −
+ −
144+ −
00:07:27,520 --> 00:07:32,510+ −
Slightly similar to the+ −
third case is this one.+ −
+ −
145+ −
00:07:32,510 --> 00:07:35,720+ −
Zero star is not equivalent to 0+ −
+ −
146+ −
00:07:35,720 --> 00:07:40,025+ −
because that can match+ −
exactly the empty string.+ −
+ −
147+ −
00:07:40,025 --> 00:07:42,740+ −
This cannot match anything.+ −
+ −
148+ −
00:07:42,740 --> 00:07:44,839+ −
While the previous+ −
+ −
149+ −
00:07:44,839 --> 00:07:47,540+ −
equivalences are all very+ −
instructive to look at,+ −
+ −
150+ −
00:07:47,540 --> 00:07:49,910+ −
these are the+ −
equivalences we need+ −
+ −
151+ −
00:07:49,910 --> 00:07:52,685+ −
in our regular expression matchers.+ −
+ −
152+ −
00:07:52,685 --> 00:07:54,350+ −
And they are defined for+ −
+ −
153+ −
00:07:54,350 --> 00:07:58,160+ −
all regular expressions r.+ −
So r plus 0,+ −
+ −
154+ −
00:07:58,160 --> 00:08:00,470+ −
no matter what r looks+ −
like is equivalent+ −
+ −
155+ −
00:08:00,470 --> 00:08:02,945+ −
to just r. Similarly+ −
+ −
156+ −
00:08:02,945 --> 00:08:05,930+ −
0 plus r is also+ −
equivalent to r.+ −
+ −
157+ −
00:08:05,930 --> 00:08:08,525+ −
The order of this+ −
choice doesn't matter;+ −
+ −
158+ −
00:08:08,525 --> 00:08:11,495+ −
r followed by 1,+ −
+ −
159+ −
00:08:11,495 --> 00:08:14,030+ −
that's the sequence+ −
regular expression, is+ −
+ −
160+ −
00:08:14,030 --> 00:08:16,760+ −
equivalent to r. The+ −
sam, r followed by+ −
+ −
161+ −
00:08:16,760 --> 00:08:19,010+ −
r is equivalent to r.+ −
+ −
162+ −
00:08:19,010 --> 00:08:20,990+ −
And r followed by+ −
+ −
163+ −
00:08:20,990 --> 00:08:23,390+ −
the regular expression which+ −
cannot match anything,+ −
+ −
164+ −
00:08:23,390 --> 00:08:27,455+ −
is equivalent to just 0.+ −
+ −
165+ −
00:08:27,455 --> 00:08:30,740+ −
And 0 followed by r is also equivalent to 0.+ −
+ −
166+ −
00:08:30,740 --> 00:08:33,615+ −
And if you have the+ −
choice of r plus r,+ −
+ −
167+ −
00:08:33,615 --> 00:08:37,210+ −
that is also+ −
equivalent to just r.+ −
+ −
168+ −
00:08:37,210 --> 00:08:40,270+ −
What we're going to do+ −
with these equivalences is+ −
+ −
169+ −
00:08:40,270 --> 00:08:42,820+ −
that we regard them as+ −
simplification rules.+ −
+ −
170+ −
00:08:42,820 --> 00:08:43,930+ −
So whenever we see+ −
+ −
171+ −
00:08:43,930 --> 00:08:46,465+ −
this regular expression+ −
in our algorithm,+ −
+ −
172+ −
00:08:46,465 --> 00:08:48,580+ −
we will replace it by+ −
the right-hand side.+ −
+ −
173+ −
00:08:48,580 --> 00:08:51,715+ −
So we will orient+ −
these equivalences.+ −
+ −
174+ −
00:08:51,715 --> 00:08:53,650+ −
If we see, this regular expression,+ −
+ −
175+ −
00:08:53,650 --> 00:08:55,225+ −
we will replace it by that one.+ −
+ −
176+ −
00:08:55,225 --> 00:08:58,945+ −
And it will not matter+ −
because the left-hand sides+ −
+ −
177+ −
00:08:58,945 --> 00:09:01,255+ −
can match exactly+ −
the same strings+ −
+ −
178+ −
00:09:01,255 --> 00:09:03,475+ −
as the right-hand sides.+ −
+ −
179+ −
00:09:03,475 --> 00:09:06,370+ −
Here two quick examples.+ −
+ −
180+ −
00:09:06,370 --> 00:09:08,680+ −
The first one, let's+ −
assume you have+ −
+ −
181+ −
00:09:08,680 --> 00:09:10,270+ −
a regular expression r and+ −
+ −
182+ −
00:09:10,270 --> 00:09:11,905+ −
there is something+ −
in front of it.+ −
+ −
183+ −
00:09:11,905 --> 00:09:13,720+ −
This "something in front of it"+ −
+ −
184+ −
00:09:13,720 --> 00:09:15,870+ −
can be simplified as follows.+ −
+ −
185+ −
00:09:15,870 --> 00:09:20,000+ −
This 1 times b+ −
can be simplified to b.+ −
+ −
186+ −
00:09:20,000 --> 00:09:23,555+ −
Then this b plus 0 can+ −
be simplified to b.+ −
+ −
187+ −
00:09:23,555 --> 00:09:25,490+ −
And assuming that nothing can+ −
+ −
188+ −
00:09:25,490 --> 00:09:27,470+ −
be simplified inside this r,+ −
+ −
189+ −
00:09:27,470 --> 00:09:28,700+ −
then this would be+ −
+ −
190+ −
00:09:28,700 --> 00:09:33,050+ −
the simplified version+ −
of this regular expression.+ −
+ −
191+ −
00:09:33,050 --> 00:09:34,820+ −
And because the simplification+ −
+ −
192+ −
00:09:34,820 --> 00:09:36,965+ −
rules preserve what can be matched,+ −
+ −
193+ −
00:09:36,965 --> 00:09:39,080+ −
we can be sure that+ −
this regular expression+ −
+ −
194+ −
00:09:39,080 --> 00:09:41,255+ −
can match exactly the strings+ −
+ −
195+ −
00:09:41,255 --> 00:09:43,940+ −
this regular expression can match.+ −
+ −
196+ −
00:09:43,940 --> 00:09:45,740+ −
Here is the other example.+ −
+ −
197+ −
00:09:45,740 --> 00:09:49,550+ −
This has just a tiny change+ −
that this becomes here as 0.+ −
+ −
198+ −
00:09:49,550 --> 00:09:54,485+ −
Then 0 times b can be+ −
replaced by just 0.+ −
+ −
199+ −
00:09:54,485 --> 00:09:56,705+ −
Then they are actually two+ −
rules which could apply+ −
+ −
200+ −
00:09:56,705 --> 00:09:59,014+ −
either that we have+ −
the same choice+ −
+ −
201+ −
00:09:59,014 --> 00:10:01,160+ −
or we can just say something plus+ −
+ −
202+ −
00:10:01,160 --> 00:10:04,100+ −
0 will always go to something.+ −
+ −
203+ −
00:10:04,100 --> 00:10:06,245+ −
So we can simplify it to that.+ −
+ −
204+ −
00:10:06,245 --> 00:10:08,510+ −
And then we have+ −
0 times r again,+ −
+ −
205+ −
00:10:08,510 --> 00:10:10,460+ −
and that can be simplified to 0.+ −
+ −
206+ −
00:10:10,460 --> 00:10:12,080+ −
So actually what we find out with+ −
+ −
207+ −
00:10:12,080 --> 00:10:14,645+ −
this calculation is that+ −
this regular expression,+ −
+ −
208+ −
00:10:14,645 --> 00:10:16,835+ −
even if it looks+ −
quite complicated,+ −
+ −
209+ −
00:10:16,835 --> 00:10:19,295+ −
actually doesn't+ −
match any string.+ −
+ −
210+ −
00:10:19,295 --> 00:10:21,290+ −
Because it matches exactly+ −
+ −
211+ −
00:10:21,290 --> 00:10:23,420+ −
those string this regular+ −
expression can match.+ −
+ −
212+ −
00:10:23,420 --> 00:10:26,285+ −
And this reg expression+ −
cannot match any.+ −
+ −
213+ −
00:10:26,285 --> 00:10:29,930+ −
We need one more+ −
operation on languages.+ −
+ −
214+ −
00:10:29,930 --> 00:10:31,700+ −
I call this operation+ −
+ −
215+ −
00:10:31,700 --> 00:10:35,165+ −
the semantic derivative+ −
of a language.+ −
+ −
216+ −
00:10:35,165 --> 00:10:37,775+ −
This operation takes+ −
two arguments.+ −
+ −
217+ −
00:10:37,775 --> 00:10:40,670+ −
It takes a character+ −
which we take+ −
+ −
218+ −
00:10:40,670 --> 00:10:43,925+ −
the semantic derivative+ −
according to,+ −
+ −
219+ −
00:10:43,925 --> 00:10:45,980+ −
and it takes a language.+ −
+ −
220+ −
00:10:45,980 --> 00:10:49,579+ −
And what it does is it+ −
looks into this language+ −
+ −
221+ −
00:10:49,579 --> 00:10:51,365+ −
and looks for all the strings+ −
+ −
222+ −
00:10:51,365 --> 00:10:53,735+ −
that start with this character,+ −
+ −
223+ −
00:10:53,735 --> 00:10:56,405+ −
let's say c, and then+ −
+ −
224+ −
00:10:56,405 --> 00:11:00,920+ −
just strips off that c.+ −
So here's an example.+ −
+ −
225+ −
00:11:00,920 --> 00:11:02,975+ −
Suppose you have a language A,+ −
+ −
226+ −
00:11:02,975 --> 00:11:04,460+ −
with the strings+ −
+ −
227+ −
00:11:04,460 --> 00:11:06,965+ −
foo, bar and frak.+ −
+ −
228+ −
00:11:06,965 --> 00:11:10,835+ −
And you take the semantic+ −
derivative according to f,+ −
+ −
229+ −
00:11:10,835 --> 00:11:14,450+ −
then we discard all the+ −
strings which do not+ −
+ −
230+ −
00:11:14,450 --> 00:11:18,320+ −
start with an f. So+ −
bar will be discarded.+ −
+ −
231+ −
00:11:18,320 --> 00:11:22,850+ −
Foo and frac start with+ −
an f. So we keep them+ −
+ −
232+ −
00:11:22,850 --> 00:11:25,265+ −
but we strip off the first f.+ −
+ −
233+ −
00:11:25,265 --> 00:11:29,045+ −
So here we have only oo and rak.+ −
+ −
234+ −
00:11:29,045 --> 00:11:31,609+ −
If you take the+ −
semantic derivative+ −
+ −
235+ −
00:11:31,609 --> 00:11:33,830+ −
of that language according to b,+ −
+ −
236+ −
00:11:33,830 --> 00:11:37,190+ −
then we will discard foo and+ −
frak because they don't+ −
+ −
237+ −
00:11:37,190 --> 00:11:40,940+ −
start with b, and just keep bar.+ −
+ −
238+ −
00:11:40,940 --> 00:11:44,585+ −
But again, we have to+ −
strip off this first b.+ −
+ −
239+ −
00:11:44,585 --> 00:11:49,305+ −
And if you take the semantic+ −
derivative according to a,+ −
+ −
240+ −
00:11:49,305 --> 00:11:52,585+ −
then none of these+ −
strings start with a.+ −
+ −
241+ −
00:11:52,585 --> 00:11:56,800+ −
So this will be defined+ −
as just the empty set.+ −
+ −
242+ −
00:11:56,800 --> 00:11:59,695+ −
You can slightly + −
generalized this+ −
+ −
243+ −
00:11:59,695 --> 00:12:02,560+ −
semantic derivative+ −
to also strings.+ −
+ −
244+ −
00:12:02,560 --> 00:12:05,170+ −
So imagine you have+ −
a language A and you+ −
+ −
245+ −
00:12:05,170 --> 00:12:08,274+ −
build the semantic derivative+ −
according to a string s.+ −
+ −
246+ −
00:12:08,274 --> 00:12:10,990+ −
Then you look in+ −
this language and+ −
+ −
247+ −
00:12:10,990 --> 00:12:13,420+ −
you look for all the+ −
strings that start with+ −
+ −
248+ −
00:12:13,420 --> 00:12:19,555+ −
this s. But you strip+ −
off that s. Ok?+ −
+ −
249+ −
00:12:19,555 --> 00:12:23,830+ −
So if you have a string+ −
starting with the s,+ −
+ −
250+ −
00:12:23,830 --> 00:12:26,065+ −
then you keep that string,+ −
+ −
251+ −
00:12:26,065 --> 00:12:27,490+ −
but you keep only the rest...+ −
+ −
252+ −
00:12:27,490 --> 00:12:28,810+ −
what's coming after this s.+ −
+ −
253+ −
00:12:28,810 --> 00:12:32,010+ −
That is here indicated+ −
with this s'.+ −
+ −
254+ −
00:12:32,010 --> 00:12:35,330+ −
I also have this convention,+ −
this personal convention+ −
+ −
255+ −
00:12:35,330 --> 00:12:40,055+ −
I have to say, that everything+ −
which works on lists,+ −
+ −
256+ −
00:12:40,055 --> 00:12:42,185+ −
remember strings are+ −
lists of characters.+ −
+ −
257+ −
00:12:42,185 --> 00:12:46,655+ −
I just put there an 's'. So+ −
here's the one for characters.+ −
+ −
258+ −
00:12:46,655 --> 00:12:48,680+ −
I just call it Der with a capital+ −
+ −
259+ −
00:12:48,680 --> 00:12:51,740+ −
D. And I call that Ders,+ −
+ −
260+ −
00:12:51,740 --> 00:12:54,350+ −
because that works over strings.+ −
+ −
261+ −
00:12:54,350 --> 00:12:55,865+ −
And you can see how it would+ −
+ −
262+ −
00:12:55,865 --> 00:12:59,750+ −
be defined if you only take this+ −
as a primitive operation.+ −
+ −
263+ −
00:12:59,750 --> 00:13:01,340+ −
We would just need to iterate+ −
+ −
264+ −
00:13:01,340 --> 00:13:04,050+ −
that essentially for this Ders.+ −
+ −
265+ −
00:13:04,060 --> 00:13:07,970+ −
Ok, we now have all+ −
the theory in place.+ −
+ −
266+ −
00:13:07,970 --> 00:13:11,345+ −
And we can finally start+ −
implementing our algorithm.+ −
+ −
267+ −
00:13:11,345 --> 00:13:14,580+ −
That's when we'll do+ −
in the next video.+ −