Binary file cws/cw01.pdf has changed
Binary file cws/cw02.pdf has changed
Binary file cws/cw03.pdf has changed
Binary file cws/cw04.pdf has changed
Binary file cws/cw05.pdf has changed
--- a/style.sty Tue Sep 29 00:51:43 2020 +0100
+++ b/style.sty Tue Sep 29 12:52:07 2020 +0100
@@ -99,9 +99,9 @@
% CW deadlines
-\def\cwONE{11 October}
-\def\cwTWO{4 November}
-\def\cwTHREE{22 November}
+\def\cwONE{16 October}
+\def\cwTWO{23 November}
+\def\cwTHREE{20 November}
\def\cwFOUR{11 December}
\def\cwFIVE{15 January}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/videos/01-basics2.srt Tue Sep 29 12:52:07 2020 +0100
@@ -0,0 +1,1627 @@
+1
+00:00:05,810 --> 00:00:11,039
+Okay, so we have strings.
+The empty string and string "abc".
+
+2
+00:00:11,039 --> 00:00:13,200
+And we can take the head
+
+3
+00:00:13,200 --> 00:00:15,615
+of a string and the
+tail of the string,
+
+4
+00:00:15,615 --> 00:00:18,480
+since we regard them as
+lists of characters.
+
+5
+00:00:18,480 --> 00:00:22,230
+We also introduced the
+notion of a language.
+
+6
+00:00:22,230 --> 00:00:25,305
+A language is just
+a set of strings.
+
+7
+00:00:25,305 --> 00:00:26,700
+So here's a language.
+
+8
+00:00:26,700 --> 00:00:28,275
+It contains the empty string,
+
+9
+00:00:28,275 --> 00:00:30,360
+the string hello, string foobar,
+
+10
+00:00:30,360 --> 00:00:31,965
+the string, which is
+
+11
+00:00:31,965 --> 00:00:34,874
+just a character a and
+the string abc.
+
+12
+00:00:34,874 --> 00:00:37,800
+There will be also other
+languages, which for example,
+
+13
+00:00:37,800 --> 00:00:39,780
+contain infinitely many strings.
+
+14
+00:00:39,780 --> 00:00:42,190
+Actually that will
+happen quite often.
+
+15
+00:00:42,190 --> 00:00:45,560
+Now, you've seen this operation
+
+16
+00:00:45,560 --> 00:00:47,660
+of concatenation of two strings.
+
+17
+00:00:47,660 --> 00:00:50,840
+So if you have the string
+foo and a string bar,
+
+18
+00:00:50,840 --> 00:00:53,255
+we can just put them
+next to each other.
+
+19
+00:00:53,255 --> 00:00:57,725
+We will now extend that
+notion also told languages.
+
+20
+00:00:57,725 --> 00:01:02,270
+So assume you have a
+language A and a language B.
+
+21
+00:01:02,270 --> 00:01:05,825
+That means we take every
+string from the language
+
+22
+00:01:05,825 --> 00:01:11,195
+A and concatenate it with every
+string of the language B.
+
+23
+00:01:11,195 --> 00:01:13,160
+Here's an example.
+
+24
+00:01:13,160 --> 00:01:14,750
+If you have the language A
+
+25
+00:01:14,750 --> 00:01:17,030
+containing foo and bar as strings,
+
+26
+00:01:17,030 --> 00:01:20,585
+and the language B containing
+a and b as strings.
+
+27
+00:01:20,585 --> 00:01:23,000
+Then concatenating
+these two languages
+
+28
+00:01:23,000 --> 00:01:27,110
+means I take foo with this a and b,
+
+29
+00:01:27,110 --> 00:01:28,625
+giving fooa and foob,
+
+30
+00:01:28,625 --> 00:01:30,905
+and I take bar and
+concatenated with
+
+31
+00:01:30,905 --> 00:01:34,160
+a and b, giving bara and barb.
+
+32
+00:01:34,160 --> 00:01:36,185
+So the pointer is we're
+
+33
+00:01:36,185 --> 00:01:39,110
+overloading this notion
+of concatenating
+
+34
+00:01:39,110 --> 00:01:45,020
+two strings also to
+concatenating two languages.
+
+35
+00:01:45,020 --> 00:01:48,890
+Okay, here are two corner
+cases of this definition.
+
+36
+00:01:48,890 --> 00:01:51,395
+What happens if I
+have any language,
+
+37
+00:01:51,395 --> 00:01:54,470
+say A, and I concatenate it with
+
+38
+00:01:54,470 --> 00:01:58,640
+the language which contains
+only the empty string.
+
+39
+00:01:58,640 --> 00:02:02,270
+And secondly, if I
+have any language,
+
+40
+00:02:02,270 --> 00:02:04,220
+say again A, and I
+
+41
+00:02:04,220 --> 00:02:05,870
+concatenate it with the language
+
+42
+00:02:05,870 --> 00:02:08,345
+which doesn't contain any string.
+
+43
+00:02:08,345 --> 00:02:11,720
+How would these two
+cases be defined?
+
+44
+00:02:11,720 --> 00:02:16,055
+Remember, this language here
+contains a single element,
+
+45
+00:02:16,055 --> 00:02:17,645
+namely the empty string,
+
+46
+00:02:17,645 --> 00:02:20,210
+while this language
+
+47
+00:02:20,210 --> 00:02:23,585
+does not contain any
+string. It is empty.
+
+48
+00:02:23,585 --> 00:02:26,930
+So these are two
+different languages.
+
+49
+00:02:26,930 --> 00:02:29,960
+Think about how this
+would be defined.
+
+50
+00:02:29,960 --> 00:02:33,455
+So what's the point of
+all these definitions?
+
+51
+00:02:33,455 --> 00:02:36,110
+Well, we want to make precise
+
+52
+00:02:36,110 --> 00:02:38,735
+what is the meaning of
+a regular expression.
+
+53
+00:02:38,735 --> 00:02:41,510
+For this, we use
+this function L. It
+
+54
+00:02:41,510 --> 00:02:44,180
+will take a regular expression as
+
+55
+00:02:44,180 --> 00:02:47,150
+argument and generates a
+
+56
+00:02:47,150 --> 00:02:49,970
+set of strings or a language.
+
+57
+00:02:49,970 --> 00:02:52,670
+And it's supposed to be the
+language which contains
+
+58
+00:02:52,670 --> 00:02:56,330
+all the strings this regular
+expression can match.
+
+59
+00:02:56,330 --> 00:02:58,670
+So remember the 0
+regular expression,
+
+60
+00:02:58,670 --> 00:03:01,115
+it's not supposed to
+match any string.
+
+61
+00:03:01,115 --> 00:03:05,105
+So we give it the empty
+language or empty set.
+
+62
+00:03:05,105 --> 00:03:06,740
+This regular expression,
+
+63
+00:03:06,740 --> 00:03:09,380
+the one regular expression
+is supposed to match
+
+64
+00:03:09,380 --> 00:03:13,445
+exactly one string,
+namely the empty string.
+
+65
+00:03:13,445 --> 00:03:15,710
+the regular expression
+
+66
+00:03:15,710 --> 00:03:18,875
+recognizing just the
+character c, in this case.
+
+67
+00:03:18,875 --> 00:03:22,820
+Well, that will be the
+language which contains
+
+68
+00:03:22,820 --> 00:03:28,175
+the string only containing
+the single character c.
+
+69
+00:03:28,175 --> 00:03:31,295
+Now, what is this regular
+expression supposed to match?
+
+70
+00:03:31,295 --> 00:03:34,835
+Well, a string is matched
+by this regular expression
+
+71
+00:03:34,835 --> 00:03:37,355
+if it is matched by
+the first component, r1,
+
+72
+00:03:37,355 --> 00:03:40,970
+or by the second component, r2.
+
+73
+00:03:40,970 --> 00:03:43,190
+That's why this plus
+is the alternative.
+
+74
+00:03:43,190 --> 00:03:45,605
+So which are the strings
+
+75
+00:03:45,605 --> 00:03:48,320
+this regular expression
+can match? Well,
+
+76
+00:03:48,320 --> 00:03:51,275
+all the strings
+r1 one can match,
+
+77
+00:03:51,275 --> 00:03:54,170
+given by L(r1) union...
+
+78
+00:03:54,170 --> 00:03:57,380
+all the strings
+R2 can match.
+
+79
+00:03:57,380 --> 00:04:01,325
+And we use here the union
+because this alternative,
+
+80
+00:04:01,325 --> 00:04:05,945
+even if it says this string
+is matched by r1 or by r2.
+
+81
+00:04:05,945 --> 00:04:08,465
+The meaning of this
+reg expression
+
+82
+00:04:08,465 --> 00:04:11,209
+is the union of both languages.
+
+83
+00:04:11,209 --> 00:04:14,315
+Now the sequence case.
+
+84
+00:04:14,315 --> 00:04:17,960
+This means a string is
+matched by this regular expression
+
+85
+00:04:17,960 --> 00:04:20,510
+if the first part of this
+string is matched by r1
+
+86
+00:04:20,510 --> 00:04:24,935
+and the second part of
+the string is matched by r2.
+
+87
+00:04:24,935 --> 00:04:28,490
+Now we have to say, which
+are all the strings
+
+88
+00:04:28,490 --> 00:04:31,645
+this regular expression
+can match. Well, it's
+
+89
+00:04:31,645 --> 00:04:34,495
+all the strings
+r1 can match,
+
+90
+00:04:34,495 --> 00:04:39,205
+concatenated with all the
+strings r2 can match.
+
+91
+00:04:39,205 --> 00:04:42,895
+So for this, we have to use
+this concatenation operation.
+
+92
+00:04:42,895 --> 00:04:47,440
+So if r1 can match a string
+and r2 can match a string,
+
+93
+00:04:47,440 --> 00:04:51,220
+then in the meaning will be
+the concatenation of that.
+
+94
+00:04:51,220 --> 00:04:53,170
+So with this function L
+
+95
+00:04:53,170 --> 00:04:56,995
+we can make precise
+what are the strings,
+
+96
+00:04:56,995 --> 00:05:00,205
+*all* the strings a regular
+expression can match.
+
+97
+00:05:00,205 --> 00:05:04,689
+The only case we haven't
+specified yet is the r*.
+
+98
+00:05:04,689 --> 00:05:07,750
+For this, we need
+another operation
+
+99
+00:05:07,750 --> 00:05:12,470
+on languages or sets of
+strings, which we do next.
+
+100
+00:05:12,490 --> 00:05:17,285
+We introduce a power
+operation for languages.
+
+101
+00:05:17,285 --> 00:05:19,100
+The power operation or
+
+102
+00:05:19,100 --> 00:05:22,835
+the nth power is
+defined recursively.
+
+103
+00:05:22,835 --> 00:05:26,615
+So the A to the power of 0
+
+104
+00:05:26,615 --> 00:05:31,205
+would be defined as the set
+containing the empty string.
+
+105
+00:05:31,205 --> 00:05:36,200
+And A to the power of
+n + 1 would be A
+
+106
+00:05:36,200 --> 00:05:38,705
+concatenated with
+A to the power of
+
+107
+00:05:38,705 --> 00:05:42,160
+n. Here are a few examples.
+
+108
+00:05:42,160 --> 00:05:45,380
+A to the power of four
+would be essentially
+
+109
+00:05:45,380 --> 00:05:47,660
+A concatenated with
+A concatenated with
+
+110
+00:05:47,660 --> 00:05:49,640
+A concatenated with A,
+
+111
+00:05:49,640 --> 00:05:51,650
+and also concatenated with
+
+112
+00:05:51,650 --> 00:05:55,580
+the language which
+contains the empty string.
+
+113
+00:05:55,580 --> 00:05:57,275
+Because that's how we define
+
+114
+00:05:57,275 --> 00:05:59,975
+the base case, A
+to the power of 0.
+
+115
+00:05:59,975 --> 00:06:03,410
+And A to the power
+one would be just A,
+
+116
+00:06:03,410 --> 00:06:05,330
+again followed by that one,
+
+117
+00:06:05,330 --> 00:06:07,790
+but this would be just A.
+
+118
+00:06:07,790 --> 00:06:10,385
+And A to the power 0
+
+119
+00:06:10,385 --> 00:06:14,270
+is this language which
+contains the empty string.
+
+120
+00:06:14,270 --> 00:06:16,670
+With this power operation,
+
+121
+00:06:16,670 --> 00:06:19,505
+I can also fill in this case.
+
+122
+00:06:19,505 --> 00:06:23,210
+Remember this r* operation
+is supposed to match
+
+123
+00:06:23,210 --> 00:06:27,680
+a string with either
+0 or more copies of r.
+
+124
+00:06:27,680 --> 00:06:31,940
+So the meaning of this
+regular expression would be
+
+125
+00:06:31,940 --> 00:06:37,475
+now the Union of all the
+powers greater or equal than 0,
+
+126
+00:06:37,475 --> 00:06:40,835
+of the language that r can match.
+
+127
+00:06:40,835 --> 00:06:44,945
+And we take now at the
+union of all these powers,
+
+128
+00:06:44,945 --> 00:06:47,150
+that means essentially what
+
+129
+00:06:47,150 --> 00:06:48,530
+can the regular expression match
+
+130
+00:06:48,530 --> 00:06:51,440
+if I take L of r
+to the 0th power,
+
+131
+00:06:51,440 --> 00:06:53,030
+what can I match with
+
+132
+00:06:53,030 --> 00:06:57,305
+the first power, the
+second power, and so on.
+
+133
+00:06:57,305 --> 00:07:00,544
+And I take the union
+of all these powers.
+
+134
+00:07:00,544 --> 00:07:03,830
+That's the definition
+of which strings
+
+135
+00:07:03,830 --> 00:07:05,510
+this regular expression
+is supposed to
+
+136
+00:07:05,510 --> 00:07:08,510
+match. The meaning of
+this regular expression.
+
+137
+00:07:08,510 --> 00:07:11,300
+This is such an
+important definition,
+
+138
+00:07:11,300 --> 00:07:13,250
+that there's actually
+a name for it.
+
+139
+00:07:13,250 --> 00:07:15,020
+It's called the Kleene star.
+
+140
+00:07:15,020 --> 00:07:16,400
+And it's written like this.
+
+141
+00:07:16,400 --> 00:07:18,335
+If I have a language A,
+
+142
+00:07:18,335 --> 00:07:21,785
+I can build the star
+of this language
+
+143
+00:07:21,785 --> 00:07:26,255
+by union up all the powers
+greater or equal than 0.
+
+144
+00:07:26,255 --> 00:07:28,580
+The A-star of
+
+145
+00:07:28,580 --> 00:07:31,580
+a language would expand
+to a to the power 0,
+
+146
+00:07:31,580 --> 00:07:33,770
+union A to the power of one,
+
+147
+00:07:33,770 --> 00:07:36,665
+A to the power of two, and so on.
+
+148
+00:07:36,665 --> 00:07:39,230
+And it would
+essentially mean, well,
+
+149
+00:07:39,230 --> 00:07:43,460
+it's the language which contains
+the empty string because
+
+150
+00:07:43,460 --> 00:07:48,290
+of the A to the 0, one copy of A,
+
+151
+00:07:48,290 --> 00:07:51,020
+two concatenated copies of A,
+
+152
+00:07:51,020 --> 00:07:55,070
+three concatenated
+copies of A, and so on.
+
+153
+00:07:55,070 --> 00:07:59,060
+So that's what this A-star
+is defined as.
+
+154
+00:07:59,060 --> 00:08:03,725
+Essentially as the Union
+of all the powers.
+
+155
+00:08:03,725 --> 00:08:05,990
+And it's essentially
+each power is how many
+
+156
+00:08:05,990 --> 00:08:08,750
+times I have to
+concatenate this language A.
+
+157
+00:08:08,750 --> 00:08:13,549
+And note that this language
+A-star will always contain
+
+158
+00:08:13,549 --> 00:08:21,240
+the empty string because that
+is how the A^0 is defined.
+
+159
+00:08:21,310 --> 00:08:23,540
+So in this definition,
+
+160
+00:08:23,540 --> 00:08:25,969
+I could have also written star
+
+161
+00:08:25,969 --> 00:08:29,180
+here because the
+meaning of this r*,
+
+162
+00:08:29,180 --> 00:08:34,055
+regular expression
+is the Kleene star of
+
+163
+00:08:34,055 --> 00:08:37,130
+the language of r.
+Remember that's
+
+164
+00:08:37,130 --> 00:08:41,525
+the union of all powers
+greater or equal than 0.
+
+165
+00:08:41,525 --> 00:08:43,640
+It's behind.
+I hope you don't get
+
+166
+00:08:43,640 --> 00:08:45,800
+confused by the use of the stars.
+
+167
+00:08:45,800 --> 00:08:48,845
+This star is for
+the regular expressions.
+
+168
+00:08:48,845 --> 00:08:52,205
+That star is for languages.
+
+169
+00:08:52,205 --> 00:08:54,274
+They are two
+different operations.
+
+170
+00:08:54,274 --> 00:08:56,555
+They don't even
+have the same type.
+
+171
+00:08:56,555 --> 00:08:58,954
+Here might be
+something interesting.
+
+172
+00:08:58,954 --> 00:09:00,560
+Remember I asked you to think
+
+173
+00:09:00,560 --> 00:09:03,035
+about these two corner cases.
+
+174
+00:09:03,035 --> 00:09:04,730
+I hope you have done so,
+
+175
+00:09:04,730 --> 00:09:07,070
+but let's also have look
+at this together.
+
+176
+00:09:07,070 --> 00:09:09,785
+According to the definition
+
+177
+00:09:09,785 --> 00:09:11,930
+for this append of languages,
+
+178
+00:09:11,930 --> 00:09:13,670
+I have to take every string in
+
+179
+00:09:13,670 --> 00:09:16,130
+this set and have
+
+180
+00:09:16,130 --> 00:09:18,695
+two concatenated with
+every string in that set.
+
+181
+00:09:18,695 --> 00:09:22,115
+So this set contains only
+the empty string as element.
+
+182
+00:09:22,115 --> 00:09:24,820
+So if I concatenate anything in
+
+183
+00:09:24,820 --> 00:09:27,745
+there with the empty string,
+that will be left untouched.
+
+184
+00:09:27,745 --> 00:09:31,855
+So this one will be
+actually A.
+
+185
+00:09:31,855 --> 00:09:34,600
+This one I have to
+take every string in
+
+186
+00:09:34,600 --> 00:09:36,190
+this language and I have to
+
+187
+00:09:36,190 --> 00:09:39,115
+concatenate with every
+string in that language.
+
+188
+00:09:39,115 --> 00:09:41,110
+But here isn't any string.
+
+189
+00:09:41,110 --> 00:09:43,300
+So I can't concatenate that.
+
+190
+00:09:43,300 --> 00:09:46,900
+That will be actually
+the empty set (not empty string).
+
+191
+00:09:46,900 --> 00:09:49,570
+So now let's have
+
+192
+00:09:49,570 --> 00:09:51,670
+a look at regular expressions.
+
+193
+00:09:51,670 --> 00:09:53,230
+That was with languages.
+
+194
+00:09:53,230 --> 00:09:56,035
+But let's have a look at
+regular expressions now.
+
+195
+00:09:56,035 --> 00:09:58,660
+What would be the
+meaning, for example,
+
+196
+00:09:58,660 --> 00:10:01,945
+of r followed by 1?
+
+197
+00:10:01,945 --> 00:10:04,149
+That is a regular expression.
+
+198
+00:10:04,149 --> 00:10:06,255
+And it essentially says:
+
+199
+00:10:06,255 --> 00:10:07,850
+this regular expression matches
+
+200
+00:10:07,850 --> 00:10:10,685
+some strings and this matches
+the empty string.
+
+201
+00:10:10,685 --> 00:10:13,730
+So if you find out
+what the meaning is,
+
+202
+00:10:13,730 --> 00:10:16,955
+we would apply this
+L-function to it.
+
+203
+00:10:16,955 --> 00:10:19,430
+And if we now look
+up the definition,
+
+204
+00:10:19,430 --> 00:10:27,360
+that would be L of r
+appended L of 1.
+
+205
+00:10:27,850 --> 00:10:32,255
+If you look up what
+this is defined,
+
+206
+00:10:32,255 --> 00:10:41,640
+then this would be L of r
+appended with this language.
+
+207
+00:10:41,950 --> 00:10:46,325
+And if you now look up
+our definition earlier,
+
+208
+00:10:46,325 --> 00:10:50,455
+that will be just L of r.
+
+209
+00:10:50,455 --> 00:10:52,690
+What that essentially
+
+210
+00:10:52,690 --> 00:10:55,765
+means is if you have
+this regular expression,
+
+211
+00:10:55,765 --> 00:10:58,885
+this will match
+exactly those strings
+
+212
+00:10:58,885 --> 00:11:01,000
+which this regular
+expression can match.
+
+213
+00:11:01,000 --> 00:11:04,794
+And that pretty much
+coincides with our intuition.
+
+214
+00:11:04,794 --> 00:11:05,950
+This is supposed to match
+
+215
+00:11:05,950 --> 00:11:08,410
+the empty string at
+the end of the string,
+
+216
+00:11:08,410 --> 00:11:11,005
+so it doesn't really
+make any difference.
+
+217
+00:11:11,005 --> 00:11:13,480
+And the meaning
+really tells us that.
+
+218
+00:11:13,480 --> 00:11:15,880
+But here's the
+interesting thing.
+
+219
+00:11:15,880 --> 00:11:17,979
+When you were in school,
+
+220
+00:11:17,979 --> 00:11:23,124
+how would you think
+about this expression?
+
+221
+00:11:23,124 --> 00:11:29,515
+Well, r times 1
+equals just our, okay?
+
+222
+00:11:29,515 --> 00:11:33,634
+Furthermore, if you had r times 0.
+
+223
+00:11:33,634 --> 00:11:35,045
+What would that be equal?
+
+224
+00:11:35,045 --> 00:11:37,205
+Well, it would be 0.
+
+225
+00:11:37,205 --> 00:11:39,650
+Ok, let's have
+
+226
+00:11:39,650 --> 00:11:42,605
+a look how that works out
+with regular expressions.
+
+227
+00:11:42,605 --> 00:11:46,355
+So if you take r followed by 0,
+
+228
+00:11:46,355 --> 00:11:48,260
+remember this is
+the regular expression
+
+229
+00:11:48,260 --> 00:11:49,895
+that cannot match anything.
+
+230
+00:11:49,895 --> 00:11:52,415
+By unfolding the definition,
+
+231
+00:11:52,415 --> 00:11:59,104
+I would get L of r
+appended with L of 0.
+
+232
+00:11:59,104 --> 00:12:01,775
+This is of course defined as L of r
+
+233
+00:12:01,775 --> 00:12:05,915
+appended with the empty set.
+
+234
+00:12:05,915 --> 00:12:10,760
+And this would, according
+to the definition earlier,
+
+235
+00:12:10,760 --> 00:12:13,830
+well just the empty set.
+
+236
+00:12:14,160 --> 00:12:20,260
+And this would be
+of course L of 0.
+
+237
+00:12:20,260 --> 00:12:24,580
+So it seems like these laws,
+
+238
+00:12:24,580 --> 00:12:27,175
+at least for the times,
+
+239
+00:12:27,175 --> 00:12:29,785
+seem to be valid from math,
+
+240
+00:12:29,785 --> 00:12:31,420
+are also valid with regular expressions,
+
+241
+00:12:31,420 --> 00:12:33,370
+if we look at their meaning.
+
+242
+00:12:33,370 --> 00:12:36,670
+These laws on natural numbers
+
+243
+00:12:36,670 --> 00:12:40,105
+and regular expressions and
+their close correspondance
+
+244
+00:12:40,105 --> 00:12:42,460
+probably explain why I use
+
+245
+00:12:42,460 --> 00:12:46,975
+a times for the sequence
+regular expression.
+
+246
+00:12:46,975 --> 00:12:51,040
+So r followed by the
+regular expression 1,
+
+247
+00:12:51,040 --> 00:12:54,505
+will have the same meaning
+as that regular expression.
+
+248
+00:12:54,505 --> 00:12:59,120
+And r followed by the
+0 regular expression
+
+249
+00:12:59,120 --> 00:13:01,295
+will have this meaning.
+
+250
+00:13:01,295 --> 00:13:03,590
+You might now think, but I also
+
+251
+00:13:03,590 --> 00:13:06,605
+wrote the alternative with a plus.
+
+252
+00:13:06,605 --> 00:13:10,370
+Does a similar law
+holds for plus?
+
+253
+00:13:10,370 --> 00:13:15,965
+Of course, if I take r
+plus 0 on natural numbers,
+
+254
+00:13:15,965 --> 00:13:20,060
+that would be just r. Does
+hold for regular expressions?
+
+255
+00:13:20,060 --> 00:13:22,085
+Yes, indeed it holds.
+
+256
+00:13:22,085 --> 00:13:26,735
+If I have this
+regular expression and
+
+257
+00:13:26,735 --> 00:13:33,245
+unfold the definition that
+would be L(r) union L(0).
+
+258
+00:13:33,245 --> 00:13:38,060
+This would be equal
+to L(r) union...
+
+259
+00:13:38,060 --> 00:13:41,150
+What is this? Well, that's
+just the empty set.
+
+260
+00:13:41,150 --> 00:13:43,760
+And if I union something
+with the empty set,
+
+261
+00:13:43,760 --> 00:13:47,180
+then this will be
+just of L(r). So yes,
+
+262
+00:13:47,180 --> 00:13:50,120
+indeed, plus is also very much
+
+263
+00:13:50,120 --> 00:13:54,184
+inspired by the laws
+on natural numbers.
+
+264
+00:13:54,184 --> 00:13:57,065
+There exists other notations too,
+
+265
+00:13:57,065 --> 00:14:01,310
+but I prefer this
+using the plus for
+
+266
+00:14:01,310 --> 00:14:03,844
+the alternatives and the times
+
+267
+00:14:03,844 --> 00:14:05,765
+for the sequence expression.
+
+268
+00:14:05,765 --> 00:14:07,460
+It's also the reason why I call
+
+269
+00:14:07,460 --> 00:14:10,325
+this regular expression for
+the empty string 1.
+
+270
+00:14:10,325 --> 00:14:14,915
+And for matching
+nothing at all 0.
+
+271
+00:14:14,915 --> 00:14:18,665
+This correspondence to
+natural numbers doesn't quite
+
+272
+00:14:18,665 --> 00:14:22,055
+extend to the star
+regular expression.
+
+273
+00:14:22,055 --> 00:14:25,055
+But there's still a
+connection. There too.
+
+274
+00:14:25,055 --> 00:14:26,510
+Can you remember how
+
+275
+00:14:26,510 --> 00:14:29,345
+the power operation on
+languages was defined?
+
+276
+00:14:29,345 --> 00:14:31,370
+The 0 case was defined
+
+277
+00:14:31,370 --> 00:14:34,520
+as the set containing
+the empty string.
+
+278
+00:14:34,520 --> 00:14:37,039
+Why is that? It looks
+a bit arbitrary.
+
+279
+00:14:37,039 --> 00:14:41,150
+Why is it not the empty set
+or why is it not defined as A?
+
+280
+00:14:41,150 --> 00:14:43,745
+Why is defined in this
+particular way?
+
+281
+00:14:43,745 --> 00:14:46,880
+Well, can you remember how
+
+282
+00:14:46,880 --> 00:14:48,950
+the power operation r to the
+
+283
+00:14:48,950 --> 00:14:51,440
+0 is defined on natural numbers?
+
+284
+00:14:51,440 --> 00:14:53,930
+Yes, that's defined as 1.
+
+285
+00:14:53,930 --> 00:14:57,125
+Does that also apply to
+regular expressions?
+
+286
+00:14:57,125 --> 00:15:00,725
+Well, if we take the meaning
+of a regular expression,
+
+287
+00:15:00,725 --> 00:15:04,730
+let's say r, and raise
+it to the 0th power,
+
+288
+00:15:04,730 --> 00:15:07,685
+then this will be, of
+course, by definition,
+
+289
+00:15:07,685 --> 00:15:12,245
+be defined as the set
+containing the empty string.
+
+290
+00:15:12,245 --> 00:15:17,160
+And that is of course
+equal to L(1).
+
+291
+00:15:17,830 --> 00:15:20,570
+Again, you can see that
+
+292
+00:15:20,570 --> 00:15:23,300
+this law on natural numbers also
+
+293
+00:15:23,300 --> 00:15:29,000
+holds on the laws of regular
+expressions - on heir meaning.
+
+294
+00:15:29,000 --> 00:15:32,810
+What I find really beautiful
+here is that of course,
+
+295
+00:15:32,810 --> 00:15:36,244
+the meaning is defined on
+sets, sets of strings.
+
+296
+00:15:36,244 --> 00:15:38,975
+This of course, on natural numbers.
+
+297
+00:15:38,975 --> 00:15:41,060
+You can probably
+see now where I'm
+
+298
+00:15:41,060 --> 00:15:43,085
+coming from with my notation.
+
+299
+00:15:43,085 --> 00:15:46,205
+That is actually a slight
+problem in this field,
+
+300
+00:15:46,205 --> 00:15:48,590
+since it's so old.
+
+301
+00:15:48,590 --> 00:15:52,205
+Pretty much everybody
+introduced their own notation.
+
+302
+00:15:52,205 --> 00:15:53,870
+And you now have heaps of
+
+303
+00:15:53,870 --> 00:15:55,550
+different notations
+for the same thing.
+
+304
+00:15:55,550 --> 00:15:57,379
+Okay, that is slightly
+exaggerating.
+
+305
+00:15:57,379 --> 00:16:00,830
+And also the notation
+I used for times and
+
+306
+00:16:00,830 --> 00:16:04,295
+plus and 0 and 1s...definitely
+I'm not the only one.
+
+307
+00:16:04,295 --> 00:16:05,435
+There are many people
+
+308
+00:16:05,435 --> 00:16:07,625
+who have used this
+notation as well.
+
+309
+00:16:07,625 --> 00:16:10,190
+It's just, it's not universal.
+
+310
+00:16:10,190 --> 00:16:12,290
+Well, here's a question now.
+
+311
+00:16:12,290 --> 00:16:15,635
+Why did we go through this
+kerfuffle in the first place?
+
+312
+00:16:15,635 --> 00:16:19,370
+Why did we introduce these
+operations on languages?
+
+313
+00:16:19,370 --> 00:16:21,260
+And why did we introduce
+
+314
+00:16:21,260 --> 00:16:23,255
+the meaning of a
+regular expression?
+
+315
+00:16:23,255 --> 00:16:26,300
+Well, remember at the beginning,
+
+316
+00:16:26,300 --> 00:16:28,265
+we wanted to specify
+
+317
+00:16:28,265 --> 00:16:31,880
+what problem our algorithms
+actually supposed to solve.
+
+318
+00:16:31,880 --> 00:16:35,960
+And we want to make that
+very precise so that we can
+
+319
+00:16:35,960 --> 00:16:38,555
+be sure that our implementation
+
+320
+00:16:38,555 --> 00:16:40,295
+really solves the problem.
+
+321
+00:16:40,295 --> 00:16:41,540
+And that's what you can do now.
+
+322
+00:16:41,540 --> 00:16:45,605
+We can say a regular
+expression, r say,
+
+323
+00:16:45,605 --> 00:16:50,015
+is matching a string
+s if and only if
+
+324
+00:16:50,015 --> 00:16:55,474
+the string s is in the language
+of the regular expression.
+
+325
+00:16:55,474 --> 00:17:00,770
+So that is the problem our
+matcher is supposed to solve.
+
+326
+00:17:00,770 --> 00:17:03,320
+And it's supposed to
+solve that a bit faster
+
+327
+00:17:03,320 --> 00:17:06,860
+than in Python, Ruby and Java.
+
+328
+00:17:06,860 --> 00:17:09,585
+And unfortunately we cannot use
+
+329
+00:17:09,585 --> 00:17:12,260
+the definition of L
+directly for that.
+
+330
+00:17:12,260 --> 00:17:15,815
+Because remember, in
+the case of the star,
+
+331
+00:17:15,815 --> 00:17:17,690
+sometimes the meaning of
+
+332
+00:17:17,690 --> 00:17:18,830
+a regular expression is actually
+
+333
+00:17:18,830 --> 00:17:20,925
+an infinite set of strings.
+
+334
+00:17:20,925 --> 00:17:23,575
+And it's a tiny bit difficult
+
+335
+00:17:23,575 --> 00:17:27,040
+to decide whether a string
+is an infinite set.
+
+336
+00:17:27,040 --> 00:17:30,790
+So we have to do something
+more clever in our algorithm.
+
+337
+00:17:30,790 --> 00:17:33,535
+But that's for next week.
+So see you next week. Bye.
+
+338
+00:17:38,535 --> 00:17:41,680
+Okay, just to troll you.
+Here's one more slide.
+
+339
+00:17:41,680 --> 00:17:45,850
+So when you go over the
+handouts and the videos,
+
+340
+00:17:45,850 --> 00:17:47,875
+think about this example.
+
+341
+00:17:47,875 --> 00:17:49,840
+Imagine you have a language A,
+
+342
+00:17:49,840 --> 00:17:51,970
+which contains four strings.
+
+343
+00:17:51,970 --> 00:17:53,725
+First string is just the character a,
+
+344
+00:17:53,725 --> 00:17:57,445
+second string is just
+the character b, and so on.
+
+345
+00:17:57,445 --> 00:18:02,335
+How many strings are there
+in A to the power four.
+
+346
+00:18:02,335 --> 00:18:05,210
+Okay, that should be
+relatively simple.
+
+347
+00:18:05,210 --> 00:18:07,310
+If not, just try it out by
+
+348
+00:18:07,310 --> 00:18:11,165
+implementing it in Scala and see
+how many strings there are.
+
+349
+00:18:11,165 --> 00:18:13,850
+But now the more
+interesting question,
+
+350
+00:18:13,850 --> 00:18:16,670
+imagine the same language,
+
+351
+00:18:16,670 --> 00:18:19,400
+except that instead
+of the character d,
+
+352
+00:18:19,400 --> 00:18:22,250
+we have here, the empty string.
+
+353
+00:18:22,250 --> 00:18:26,630
+How many strings are in
+A to the power four?
+
+354
+00:18:26,630 --> 00:18:31,320
+Okay, I'll let you think
+about this. Bye now.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/videos/02-algo1.srt Tue Sep 29 12:52:07 2020 +0100
@@ -0,0 +1,2617 @@
+1
+00:00:05,880 --> 00:00:09,700
+Welcome back.
+Remember this slide.
+
+2
+00:00:09,700 --> 00:00:11,500
+This slide said, What is
+
+3
+00:00:11,500 --> 00:00:14,500
+all wreck expression Metro
+actually supposed to do?
+
+4
+00:00:14,500 --> 00:00:16,570
+It will take two arguments and
+
+5
+00:00:16,570 --> 00:00:18,670
+reg expression R and a string
+
+6
+00:00:18,670 --> 00:00:21,580
+S. And it's supposed to say yes,
+
+7
+00:00:21,580 --> 00:00:23,440
+the wreck expression matches
+
+8
+00:00:23,440 --> 00:00:26,920
+the string if and only
+if the string is in
+
+9
+00:00:26,920 --> 00:00:28,855
+the language of R.
+
+10
+00:00:28,855 --> 00:00:32,410
+And if the string is not
+in the language of our,
+
+11
+00:00:32,410 --> 00:00:35,515
+then our algorithm has to say no.
+
+12
+00:00:35,515 --> 00:00:37,210
+And we can't use
+
+13
+00:00:37,210 --> 00:00:39,565
+this specification
+directly because remember,
+
+14
+00:00:39,565 --> 00:00:43,305
+this l Sometimes
+produces infinite sets.
+
+15
+00:00:43,305 --> 00:00:47,585
+And so we can test whether a
+string is an infinite set,
+
+16
+00:00:47,585 --> 00:00:50,090
+at least not easily.
+
+17
+00:00:50,090 --> 00:00:52,340
+And so what we have to do
+
+18
+00:00:52,340 --> 00:00:54,260
+instead is we have
+to be a bit more
+
+19
+00:00:54,260 --> 00:00:57,050
+clever and implement
+some operations
+
+20
+00:00:57,050 --> 00:00:59,284
+on Rekha expressions instead.
+
+21
+00:00:59,284 --> 00:01:03,275
+Because Weka expressions
+are always finite trees.
+
+22
+00:01:03,275 --> 00:01:05,870
+I should say the
+person who has been
+
+23
+00:01:05,870 --> 00:01:08,150
+clever is called brush-off ski.
+
+24
+00:01:08,150 --> 00:01:11,435
+It's his, I've written,
+I'm introducing here.
+
+25
+00:01:11,435 --> 00:01:15,515
+And his algorithm consists
+of two functions.
+
+26
+00:01:15,515 --> 00:01:17,840
+One is called
+nullable and it takes
+
+27
+00:01:17,840 --> 00:01:20,104
+a regular expression as argument.
+
+28
+00:01:20,104 --> 00:01:24,155
+And the idea is that
+this function nullable.
+
+29
+00:01:24,155 --> 00:01:26,450
+Testing whether
+the reg expression
+
+30
+00:01:26,450 --> 00:01:27,950
+can match the empty string.
+
+31
+00:01:27,950 --> 00:01:30,305
+So 0 cannot match any string.
+
+32
+00:01:30,305 --> 00:01:33,275
+So it cannot match the
+empty string either.
+
+33
+00:01:33,275 --> 00:01:35,465
+So that's defined as false.
+
+34
+00:01:35,465 --> 00:01:37,775
+This reg expression,
+the whole purpose
+
+35
+00:01:37,775 --> 00:01:39,680
+is that it can match
+the empty string.
+
+36
+00:01:39,680 --> 00:01:41,645
+So it's defined as true.
+
+37
+00:01:41,645 --> 00:01:44,599
+If a reg expression
+can match a character,
+
+38
+00:01:44,599 --> 00:01:47,045
+then it cannot match
+the empty string.
+
+39
+00:01:47,045 --> 00:01:49,445
+So that is defined as false.
+
+40
+00:01:49,445 --> 00:01:53,180
+If an alternative can
+match the empty string,
+
+41
+00:01:53,180 --> 00:01:56,960
+then either or one can
+match the empty string.
+
+42
+00:01:56,960 --> 00:01:59,720
+Or R2 can match the empty string.
+
+43
+00:01:59,720 --> 00:02:03,110
+So either nullable
+of R1 has to hold,
+
+44
+00:02:03,110 --> 00:02:06,945
+or nullable of R2 has to hold.
+
+45
+00:02:06,945 --> 00:02:09,925
+In this sequence, it's
+the other way around.
+
+46
+00:02:09,925 --> 00:02:12,790
+If this reg expression can
+match the empty string,
+
+47
+00:02:12,790 --> 00:02:16,615
+then R1 has to be able to
+match the empty string,
+
+48
+00:02:16,615 --> 00:02:20,290
+and R2 has to be able to
+match the empty string.
+
+49
+00:02:20,290 --> 00:02:22,555
+So here it's just the opposite.
+
+50
+00:02:22,555 --> 00:02:25,660
+In one case it is o and
+the other case it's end.
+
+51
+00:02:25,660 --> 00:02:27,970
+In the store record
+expression can match
+
+52
+00:02:27,970 --> 00:02:30,445
+always the empty string.
+So that is true.
+
+53
+00:02:30,445 --> 00:02:33,340
+So this is a simple
+recursive function
+
+54
+00:02:33,340 --> 00:02:37,179
+and should not be too
+difficult to implement.
+
+55
+00:02:37,179 --> 00:02:42,025
+Okay, this nullable function
+that is easy-peasy.
+
+56
+00:02:42,025 --> 00:02:44,604
+The second function, however,
+
+57
+00:02:44,604 --> 00:02:49,155
+is a bit more involved and
+that's just to be expected.
+
+58
+00:02:49,155 --> 00:02:53,075
+Remember people working in
+this area already for decades.
+
+59
+00:02:53,075 --> 00:02:57,305
+If they have some problems
+with runtime, for example,
+
+60
+00:02:57,305 --> 00:02:58,940
+we can't expect that as
+
+61
+00:02:58,940 --> 00:03:03,260
+simple fix will solve all
+the problems in the world.
+
+62
+00:03:03,260 --> 00:03:06,530
+So I admit the second
+function is a bit more
+
+63
+00:03:06,530 --> 00:03:10,085
+complicated and make sure
+that you understand it.
+
+64
+00:03:10,085 --> 00:03:12,140
+And it also just
+chose this preserves
+
+65
+00:03:12,140 --> 00:03:14,345
+the is a very clever guy.
+
+66
+00:03:14,345 --> 00:03:15,800
+Actually, I have to say he was
+
+67
+00:03:15,800 --> 00:03:17,720
+a clever guy because
+I think he either
+
+68
+00:03:17,720 --> 00:03:21,650
+died last year or
+the year before.
+
+69
+00:03:21,650 --> 00:03:25,505
+And he came up with this
+algorithm already in 1964.
+
+70
+00:03:25,505 --> 00:03:27,260
+It somehow got lost,
+
+71
+00:03:27,260 --> 00:03:30,305
+but has been rediscovered
+in the last ten years.
+
+72
+00:03:30,305 --> 00:03:34,685
+So the idea of the second
+function is the following.
+
+73
+00:03:34,685 --> 00:03:38,120
+Imagine you have a
+reexpression and it can
+
+74
+00:03:38,120 --> 00:03:41,930
+match a string of the
+form C followed by as.
+
+75
+00:03:41,930 --> 00:03:44,810
+So the C is the first
+character of that string.
+
+76
+00:03:44,810 --> 00:03:48,605
+So I mentioned that can
+match this kind of string.
+
+77
+00:03:48,605 --> 00:03:50,330
+The question is now,
+
+78
+00:03:50,330 --> 00:03:52,400
+what would a wrecker
+expression look
+
+79
+00:03:52,400 --> 00:03:54,695
+like that can match chest
+
+80
+00:03:54,695 --> 00:03:57,140
+s. You probably remember
+
+81
+00:03:57,140 --> 00:03:59,300
+that from the semantic
+that every relative,
+
+82
+00:03:59,300 --> 00:04:00,860
+there was also
+something of chopping
+
+83
+00:04:00,860 --> 00:04:02,210
+off the first character.
+
+84
+00:04:02,210 --> 00:04:04,940
+Here it's the same.
+If a reg expression
+
+85
+00:04:04,940 --> 00:04:07,835
+can match a string
+starting with a C,
+
+86
+00:04:07,835 --> 00:04:11,090
+we're looking for a wreck
+expression which can match
+
+87
+00:04:11,090 --> 00:04:15,215
+the rest of the string where
+the c has been chopped off.
+
+88
+00:04:15,215 --> 00:04:17,690
+And this operation will be called
+
+89
+00:04:17,690 --> 00:04:21,049
+the derivative of a
+wreck expression.
+
+90
+00:04:21,049 --> 00:04:22,205
+And it will also take
+
+91
+00:04:22,205 --> 00:04:25,460
+a character as argument
+and the rec expression.
+
+92
+00:04:25,460 --> 00:04:28,730
+And in contrast to
+the semantic records,
+
+93
+00:04:28,730 --> 00:04:31,310
+semantic derivative, which works
+
+94
+00:04:31,310 --> 00:04:34,430
+over languages or
+sets of strings.
+
+95
+00:04:34,430 --> 00:04:39,620
+This derivative works
+over regular expressions.
+
+96
+00:04:39,620 --> 00:04:41,330
+Okay, here's this function.
+
+97
+00:04:41,330 --> 00:04:43,970
+It's defined recursively over
+
+98
+00:04:43,970 --> 00:04:46,370
+the structure of rec expressions.
+
+99
+00:04:46,370 --> 00:04:48,814
+The first argument
+is the character,
+
+100
+00:04:48,814 --> 00:04:52,700
+and the second one is
+a wreck expression.
+
+101
+00:04:52,700 --> 00:04:56,510
+And remember the idea
+is we're looking for
+
+102
+00:04:56,510 --> 00:05:00,680
+a wreck expression that
+can match everything.
+
+103
+00:05:00,680 --> 00:05:03,125
+This reg expression
+as argument can match
+
+104
+00:05:03,125 --> 00:05:07,040
+except for the C. So now let's
+look at this first case.
+
+105
+00:05:07,040 --> 00:05:10,550
+So this reg expression
+cannot match any string.
+
+106
+00:05:10,550 --> 00:05:14,270
+So it certainly cannot
+match any string starting
+
+107
+00:05:14,270 --> 00:05:16,910
+a C. So we have to look
+
+108
+00:05:16,910 --> 00:05:20,090
+for and reg expression which
+can not much anything.
+
+109
+00:05:20,090 --> 00:05:22,700
+Well, that's the purpose
+of this record expression,
+
+110
+00:05:22,700 --> 00:05:24,815
+so we define it as 0.
+
+111
+00:05:24,815 --> 00:05:28,130
+In the next case,
+this reg expression
+
+112
+00:05:28,130 --> 00:05:30,440
+can match the empty string,
+
+113
+00:05:30,440 --> 00:05:33,440
+but it cannot match
+any string that starts
+
+114
+00:05:33,440 --> 00:05:36,350
+with C. So also in this case,
+
+115
+00:05:36,350 --> 00:05:39,560
+we just define it as
+the bracket question,
+
+116
+00:05:39,560 --> 00:05:41,615
+which cannot match anything.
+
+117
+00:05:41,615 --> 00:05:45,170
+In the next case, this
+C as the argument to
+
+118
+00:05:45,170 --> 00:05:48,335
+the derivative and this d
+is the Rekha expression.
+
+119
+00:05:48,335 --> 00:05:50,225
+So now there are two cases.
+
+120
+00:05:50,225 --> 00:05:53,525
+If this C and this D
+is actually equal.
+
+121
+00:05:53,525 --> 00:05:55,970
+That means this record
+expression can match
+
+122
+00:05:55,970 --> 00:05:59,240
+a string which just contains C0.
+
+123
+00:05:59,240 --> 00:06:01,505
+So if we strip that off,
+
+124
+00:06:01,505 --> 00:06:04,790
+motor remains is
+the empty string.
+
+125
+00:06:04,790 --> 00:06:06,890
+What is a regular expression
+
+126
+00:06:06,890 --> 00:06:09,170
+look like that can
+match the empty string.
+
+127
+00:06:09,170 --> 00:06:11,915
+Well, that's the
+purpose of the warm.
+
+128
+00:06:11,915 --> 00:06:15,440
+And if c is not equal to d,
+
+129
+00:06:15,440 --> 00:06:17,630
+then this reg expression
+
+130
+00:06:17,630 --> 00:06:19,220
+cannot match anything
+which starts with
+
+131
+00:06:19,220 --> 00:06:23,780
+a C. So again it will
+be defined as just 0.
+
+132
+00:06:23,780 --> 00:06:29,390
+Okay? Now, the alternative case,
+
+133
+00:06:29,390 --> 00:06:31,970
+if this reg expression can
+
+134
+00:06:31,970 --> 00:06:36,050
+match the strings
+starting with C,
+
+135
+00:06:36,050 --> 00:06:40,820
+then it can either be
+matched by the Ongwen.
+
+136
+00:06:40,820 --> 00:06:44,495
+Or it can be much by the R2.
+
+137
+00:06:44,495 --> 00:06:50,090
+If they are one can match C
+and then following a string,
+
+138
+00:06:50,090 --> 00:06:53,975
+then we just have to recursively
+call the derivative for
+
+139
+00:06:53,975 --> 00:06:56,570
+R to get a reg expression
+
+140
+00:06:56,570 --> 00:06:59,135
+that can match the
+rest of the string.
+
+141
+00:06:59,135 --> 00:07:02,690
+And the same we can do with R2.
+
+142
+00:07:02,690 --> 00:07:06,110
+You can find a reg expression
+which can match everything.
+
+143
+00:07:06,110 --> 00:07:07,850
+This R2 can match,
+
+144
+00:07:07,850 --> 00:07:09,710
+starting with C, bad,
+
+145
+00:07:09,710 --> 00:07:12,590
+which chopping off this C. Okay?
+
+146
+00:07:12,590 --> 00:07:16,370
+So now if you have to find
+the break expression,
+
+147
+00:07:16,370 --> 00:07:20,030
+which can match all the strings
+where C is tripped off.
+
+148
+00:07:20,030 --> 00:07:22,295
+Then we just have to
+take the alternatives
+
+149
+00:07:22,295 --> 00:07:24,965
+of these two derivatives.
+
+150
+00:07:24,965 --> 00:07:29,390
+Okay? Now to sequence case,
+
+151
+00:07:29,390 --> 00:07:33,005
+this sequence case is the
+most complicated one.
+
+152
+00:07:33,005 --> 00:07:35,180
+And let's look first at
+
+153
+00:07:35,180 --> 00:07:39,335
+the second case where
+the Earth's brush.
+
+154
+00:07:39,335 --> 00:07:42,830
+Okay? So if this
+regular expression
+
+155
+00:07:42,830 --> 00:07:46,145
+can match a string
+starting with C,
+
+156
+00:07:46,145 --> 00:07:48,155
+then the following
+must have happened.
+
+157
+00:07:48,155 --> 00:07:51,905
+First, the R1 must have matched
+a string starting with C
+
+158
+00:07:51,905 --> 00:07:56,420
+and then anything coming
+afterwards with r2.
+
+159
+00:07:56,420 --> 00:07:59,660
+Okay? So in this case I only
+
+160
+00:07:59,660 --> 00:08:03,260
+have to call this
+derivative for this r one.
+
+161
+00:08:03,260 --> 00:08:06,830
+And find the reg expression
+which can match everything
+
+162
+00:08:06,830 --> 00:08:11,555
+this R one can match except
+for this C chopped off.
+
+163
+00:08:11,555 --> 00:08:15,830
+And I have to build that
+sequence derivative of that.
+
+164
+00:08:15,830 --> 00:08:18,260
+So that's what the As per se.
+
+165
+00:08:18,260 --> 00:08:21,860
+So I take the
+derivative of R1 and I
+
+166
+00:08:21,860 --> 00:08:23,480
+put the R2 on
+
+167
+00:08:23,480 --> 00:08:25,865
+the back because that's
+the rest of the string.
+
+168
+00:08:25,865 --> 00:08:29,240
+Ok? So that's the only
+case we have to consider
+
+169
+00:08:29,240 --> 00:08:32,750
+this sequence case
+except if not the,
+
+170
+00:08:32,750 --> 00:08:37,895
+how one can match the
+sea and something else.
+
+171
+00:08:37,895 --> 00:08:42,965
+But if on mismatching
+actually the empty string,
+
+172
+00:08:42,965 --> 00:08:48,890
+this case actually the R two
+is in charge of matching
+
+173
+00:08:48,890 --> 00:08:51,590
+the string starting with C. So in
+
+174
+00:08:51,590 --> 00:08:55,490
+this case we have to call
+the derivative for R2.
+
+175
+00:08:55,490 --> 00:08:57,875
+So that's why we have
+these two cases.
+
+176
+00:08:57,875 --> 00:09:00,455
+So if R1 is nullable,
+
+177
+00:09:00,455 --> 00:09:03,245
+then it can match
+the empty string.
+
+178
+00:09:03,245 --> 00:09:05,330
+And we have to
+consider the case that
+
+179
+00:09:05,330 --> 00:09:08,045
+this R2 is matching a string
+
+180
+00:09:08,045 --> 00:09:10,700
+starting with C. And so we have
+
+181
+00:09:10,700 --> 00:09:14,210
+to call the derivative
+for this R2 in this case.
+
+182
+00:09:14,210 --> 00:09:18,680
+Otherwise, the R1 will
+be in charge of matching
+
+183
+00:09:18,680 --> 00:09:20,840
+a part of that
+string starting with
+
+184
+00:09:20,840 --> 00:09:24,695
+C. And I have to call the
+derivative on this R1.
+
+185
+00:09:24,695 --> 00:09:30,670
+Okay? I hope this makes sense.
+
+186
+00:09:30,670 --> 00:09:34,150
+Go over that and
+also the handouts.
+
+187
+00:09:34,150 --> 00:09:37,465
+Again. With that, that's
+it's really subtle.
+
+188
+00:09:37,465 --> 00:09:40,945
+And how do I remember this case?
+
+189
+00:09:40,945 --> 00:09:42,430
+Well, I know it's
+
+190
+00:09:42,430 --> 00:09:45,310
+an if condition and the
+condition is nullable,
+
+191
+00:09:45,310 --> 00:09:48,160
+then I will always remember
+
+192
+00:09:48,160 --> 00:09:53,275
+the else branch where pushes
+NG derivative over the R1.
+
+193
+00:09:53,275 --> 00:09:55,780
+So are usually write
+down the S punch
+
+194
+00:09:55,780 --> 00:09:59,650
+first and then construct
+the thin branch by saying,
+
+195
+00:09:59,650 --> 00:10:01,525
+well, I just repeat this.
+
+196
+00:10:01,525 --> 00:10:03,760
+And I have to remember
+in that case I
+
+197
+00:10:03,760 --> 00:10:06,580
+have to build also
+derivative of R2.
+
+198
+00:10:06,580 --> 00:10:12,695
+Okay? Finally, the star case.
+
+199
+00:10:12,695 --> 00:10:15,665
+Ok. So here again
+we're looking for
+
+200
+00:10:15,665 --> 00:10:17,300
+a reg expression which
+
+201
+00:10:17,300 --> 00:10:19,745
+can match the string
+starting with C,
+
+202
+00:10:19,745 --> 00:10:22,355
+except that the c has
+been chopped off.
+
+203
+00:10:22,355 --> 00:10:28,640
+So if r star has to match
+a string starting with C,
+
+204
+00:10:28,640 --> 00:10:32,735
+then at least we need one
+copy of this r. Okay?
+
+205
+00:10:32,735 --> 00:10:34,310
+So at least one copy of
+
+206
+00:10:34,310 --> 00:10:37,010
+this r has matched
+something which starts with
+
+207
+00:10:37,010 --> 00:10:38,870
+a C and then afterwards
+
+208
+00:10:38,870 --> 00:10:41,570
+come 0 more copies
+of this OnStar.
+
+209
+00:10:41,570 --> 00:10:45,530
+Okay? What we do there
+is we trying to find
+
+210
+00:10:45,530 --> 00:10:47,960
+the Rekha expression
+which can match
+
+211
+00:10:47,960 --> 00:10:50,915
+this first part of the string.
+
+212
+00:10:50,915 --> 00:10:53,255
+However, where the
+sea is chopped off.
+
+213
+00:10:53,255 --> 00:10:55,130
+And then we just say, well,
+
+214
+00:10:55,130 --> 00:10:57,050
+the rest has to be
+matched again with
+
+215
+00:10:57,050 --> 00:10:59,030
+0 or more copies of
+
+216
+00:10:59,030 --> 00:11:02,600
+this R. So that's why
+it's defined like this.
+
+217
+00:11:02,600 --> 00:11:09,050
+Okay? S8. Please take care
+with this definition.
+
+218
+00:11:09,050 --> 00:11:11,435
+That's not so simple.
+
+219
+00:11:11,435 --> 00:11:13,250
+Once you get a hang of it,
+
+220
+00:11:13,250 --> 00:11:15,170
+however, it makes perfect sense.
+
+221
+00:11:15,170 --> 00:11:17,210
+So let me explain it in
+
+222
+00:11:17,210 --> 00:11:20,825
+different ways in
+the next slides.
+
+223
+00:11:20,825 --> 00:11:24,695
+Okay, let's look
+first some examples.
+
+224
+00:11:24,695 --> 00:11:27,140
+So here is a regular expression,
+
+225
+00:11:27,140 --> 00:11:29,390
+R. And let's have
+
+226
+00:11:29,390 --> 00:11:32,450
+a look at these three
+derivatives according to a,
+
+227
+00:11:32,450 --> 00:11:38,405
+B, and C. And Vishal do
+with d1 for the a. Ok.
+
+228
+00:11:38,405 --> 00:11:42,379
+So here is our reg expression
+
+229
+00:11:42,379 --> 00:11:45,334
+and was very generous
+with dependent a thesis.
+
+230
+00:11:45,334 --> 00:11:48,140
+And the outermost is a star.
+
+231
+00:11:48,140 --> 00:11:52,550
+So if people now the
+derivative according to a,
+
+232
+00:11:52,550 --> 00:11:55,474
+the character a of
+that wreck expression.
+
+233
+00:11:55,474 --> 00:11:57,380
+Okay? So the first thing we
+
+234
+00:11:57,380 --> 00:11:59,555
+have to analyze is the K star.
+
+235
+00:11:59,555 --> 00:12:04,370
+Ok? So here's direct expression,
+which we are looking at.
+
+236
+00:12:04,370 --> 00:12:09,170
+This are the outermost
+constructor is this star.
+
+237
+00:12:09,170 --> 00:12:11,510
+If you go back to the definition,
+
+238
+00:12:11,510 --> 00:12:13,625
+I hope you have it next to you,
+
+239
+00:12:13,625 --> 00:12:16,340
+then this star case is defined
+
+240
+00:12:16,340 --> 00:12:20,000
+as u taking just the
+inside of the star
+
+241
+00:12:20,000 --> 00:12:23,030
+and apply this derivative and
+
+242
+00:12:23,030 --> 00:12:26,765
+leave the are on the
+outside at the end.
+
+243
+00:12:26,765 --> 00:12:29,990
+Ok. So that's the first step.
+
+244
+00:12:29,990 --> 00:12:32,030
+Now we have to analyze
+
+245
+00:12:32,030 --> 00:12:36,035
+the derivative according to
+a of this record expression,
+
+246
+00:12:36,035 --> 00:12:38,000
+which is an alternative.
+
+247
+00:12:38,000 --> 00:12:39,665
+So the outermost is a plus.
+
+248
+00:12:39,665 --> 00:12:41,375
+Ok, that's very easy again,
+
+249
+00:12:41,375 --> 00:12:45,185
+we just have to push the
+derivative into each component,
+
+250
+00:12:45,185 --> 00:12:47,705
+into the a, followed by b.
+
+251
+00:12:47,705 --> 00:12:49,145
+And in this segment,
+
+252
+00:12:49,145 --> 00:12:51,185
+alternative into b. Ok,
+
+253
+00:12:51,185 --> 00:12:56,030
+so we take the derivative
+of each according to a way.
+
+254
+00:12:56,030 --> 00:13:00,635
+Now this one is a sequence
+break expression.
+
+255
+00:13:00,635 --> 00:13:02,210
+This most complicated case.
+
+256
+00:13:02,210 --> 00:13:04,160
+So the first of all
+you have to test is,
+
+257
+00:13:04,160 --> 00:13:07,910
+is the first component
+nullable of this sequence?
+
+258
+00:13:07,910 --> 00:13:09,200
+Well, that is a,
+
+259
+00:13:09,200 --> 00:13:12,740
+in this case, a on its
+own is not nullable.
+
+260
+00:13:12,740 --> 00:13:14,210
+So vn, the easy case,
+
+261
+00:13:14,210 --> 00:13:17,000
+we only have a single derivative
+
+262
+00:13:17,000 --> 00:13:19,370
+pushed in the first component.
+
+263
+00:13:19,370 --> 00:13:25,160
+So we have the derivative
+of a with the character a.
+
+264
+00:13:25,160 --> 00:13:27,920
+Okay, that's now
+the character case.
+
+265
+00:13:27,920 --> 00:13:29,720
+And in this case the character
+
+266
+00:13:29,720 --> 00:13:31,715
+in the regular expression agree.
+
+267
+00:13:31,715 --> 00:13:33,890
+So it's defined as one.
+
+268
+00:13:33,890 --> 00:13:37,550
+Ok? In the other case,
+
+269
+00:13:37,550 --> 00:13:39,710
+the break expression is P,
+
+270
+00:13:39,710 --> 00:13:41,675
+But the characters a.
+
+271
+00:13:41,675 --> 00:13:46,385
+So in this case
+it's defined as 0.
+
+272
+00:13:46,385 --> 00:13:50,630
+Okay? So that's what the
+derivative would be.
+
+273
+00:13:50,630 --> 00:13:52,160
+This r is there
+
+274
+00:13:52,160 --> 00:13:55,280
+because originally we
+started with a star.
+
+275
+00:13:55,280 --> 00:13:58,295
+This expression is that
+star at expression.
+
+276
+00:13:58,295 --> 00:14:02,780
+Ok? So the derivative
+according to a
+
+277
+00:14:02,780 --> 00:14:07,610
+up that reg expression
+is this expression.
+
+278
+00:14:07,610 --> 00:14:10,970
+We just have to
+substitute this back in.
+
+279
+00:14:10,970 --> 00:14:13,745
+Just coming back to this slide.
+
+280
+00:14:13,745 --> 00:14:16,160
+So far, they're only analyze
+
+281
+00:14:16,160 --> 00:14:19,505
+the derivative according
+to a single character.
+
+282
+00:14:19,505 --> 00:14:23,960
+But we can also very easily
+extend that to whole strings.
+
+283
+00:14:23,960 --> 00:14:26,360
+So if you build the
+derivative according
+
+284
+00:14:26,360 --> 00:14:27,905
+to the empty string,
+
+285
+00:14:27,905 --> 00:14:30,875
+we just return the
+Rekha expression.
+
+286
+00:14:30,875 --> 00:14:35,585
+If we have a string
+starting with character c,
+
+287
+00:14:35,585 --> 00:14:37,850
+remember that can
+be any character.
+
+288
+00:14:37,850 --> 00:14:42,170
+Then we build first the simple
+derivative according to
+
+289
+00:14:42,170 --> 00:14:44,360
+that first character and
+
+290
+00:14:44,360 --> 00:14:46,925
+continue with the
+rest of the string.
+
+291
+00:14:46,925 --> 00:14:50,615
+So here you see again,
+my personal convention.
+
+292
+00:14:50,615 --> 00:14:54,365
+Everything which works on
+lists has this S at the end.
+
+293
+00:14:54,365 --> 00:14:57,125
+So this function is
+for single characters.
+
+294
+00:14:57,125 --> 00:14:59,179
+This one is for strings,
+
+295
+00:14:59,179 --> 00:15:02,450
+but it uses the one
+for the character.
+
+296
+00:15:02,450 --> 00:15:04,025
+Essentially what it does is
+
+297
+00:15:04,025 --> 00:15:06,185
+it chops off the first character,
+
+298
+00:15:06,185 --> 00:15:09,800
+builds the derivative, then
+chops off the next character,
+
+299
+00:15:09,800 --> 00:15:13,760
+builds the derivative of
+the result, and so on.
+
+300
+00:15:13,760 --> 00:15:17,000
+Having this function,
+we can actually now
+
+301
+00:15:17,000 --> 00:15:20,600
+state what the algorithm
+is, the complete algorithm.
+
+302
+00:15:20,600 --> 00:15:23,465
+So the pro shops ki mantra
+
+303
+00:15:23,465 --> 00:15:24,860
+takes a regular expression as
+
+304
+00:15:24,860 --> 00:15:26,915
+argument and a
+string as argument.
+
+305
+00:15:26,915 --> 00:15:30,920
+And is supposed to say yes if
+the reg expression matches
+
+306
+00:15:30,920 --> 00:15:33,560
+the string or No
+
+307
+00:15:33,560 --> 00:15:36,065
+if the reg expression does
+not match the string.
+
+308
+00:15:36,065 --> 00:15:37,715
+And how does it do that?
+
+309
+00:15:37,715 --> 00:15:42,560
+Well, it takes this string
+s And this reg expression,
+
+310
+00:15:42,560 --> 00:15:43,925
+and it first built
+
+311
+00:15:43,925 --> 00:15:48,845
+successive derivatives until
+that string is exhaust that.
+
+312
+00:15:48,845 --> 00:15:52,115
+Okay? Then you have
+a final derivative,
+
+313
+00:15:52,115 --> 00:15:53,839
+a final reg expression.
+
+314
+00:15:53,839 --> 00:15:55,370
+And you test whether
+
+315
+00:15:55,370 --> 00:15:57,920
+this reg expression can
+match the empty string.
+
+316
+00:15:57,920 --> 00:16:01,370
+If yes, then the
+original reg expression
+
+317
+00:16:01,370 --> 00:16:03,245
+is r can match the string.
+
+318
+00:16:03,245 --> 00:16:05,210
+If no, if it cannot match
+
+319
+00:16:05,210 --> 00:16:08,030
+the final derivative
+with the empty string,
+
+320
+00:16:08,030 --> 00:16:10,280
+then know this
+regular expression,
+
+321
+00:16:10,280 --> 00:16:12,905
+R cannot match that string.
+
+322
+00:16:12,905 --> 00:16:16,010
+I know it looks
+very anticlimactic,
+
+323
+00:16:16,010 --> 00:16:19,625
+but that's actually the
+beauty of this algorithm,
+
+324
+00:16:19,625 --> 00:16:22,760
+that it's not that complicated.
+
+325
+00:16:22,760 --> 00:16:25,340
+So how does the algorithm work?
+
+326
+00:16:25,340 --> 00:16:27,634
+In a concrete example?
+
+327
+00:16:27,634 --> 00:16:31,580
+Imagine you have a string, abc
+
+328
+00:16:31,580 --> 00:16:34,370
+and you have a break
+expression, say R1.
+
+329
+00:16:34,370 --> 00:16:37,520
+And you want to find out
+whether this or one can match
+
+330
+00:16:37,520 --> 00:16:41,300
+that string abc or not.
+How do you do that?
+
+331
+00:16:41,300 --> 00:16:45,140
+Well, you would first
+take this R1 and you
+
+332
+00:16:45,140 --> 00:16:47,150
+build the derivative according
+
+333
+00:16:47,150 --> 00:16:49,880
+to the first character, D-A.
+
+334
+00:16:49,880 --> 00:16:53,015
+Okay? You get the derivative,
+
+335
+00:16:53,015 --> 00:16:55,294
+which I call here R2.
+
+336
+00:16:55,294 --> 00:16:58,640
+Then you take the next
+character, the B.
+
+337
+00:16:58,640 --> 00:17:04,535
+You now build the derivative
+according to b of this R2.
+
+338
+00:17:04,535 --> 00:17:07,460
+Okay? So you take the
+result of the first step,
+
+339
+00:17:07,460 --> 00:17:09,530
+you feed it into the second step,
+
+340
+00:17:09,530 --> 00:17:11,810
+and you take the
+second character.
+
+341
+00:17:11,810 --> 00:17:17,075
+Then you do this also with c.
+So you get a derivative R3,
+
+342
+00:17:17,075 --> 00:17:22,460
+and you build the derivative
+of R three according to c,
+
+343
+00:17:22,460 --> 00:17:24,185
+you get an R four.
+
+344
+00:17:24,185 --> 00:17:26,300
+Okay, so that's the
+final derivative.
+
+345
+00:17:26,300 --> 00:17:27,530
+The string is exhausted.
+
+346
+00:17:27,530 --> 00:17:29,570
+We build derivatives
+according to a, B,
+
+347
+00:17:29,570 --> 00:17:34,610
+and C. Now we just test whether
+this r four is nullable.
+
+348
+00:17:34,610 --> 00:17:37,175
+If it says yes,
+
+349
+00:17:37,175 --> 00:17:41,510
+then df break expression metro
+will just say true, yes,
+
+350
+00:17:41,510 --> 00:17:43,340
+this original reg expression,
+
+351
+00:17:43,340 --> 00:17:47,270
+the R1, will be able to
+match that string abc.
+
+352
+00:17:47,270 --> 00:17:50,585
+And if this test returns false,
+
+353
+00:17:50,585 --> 00:17:53,015
+then the algorithm says false.
+
+354
+00:17:53,015 --> 00:17:56,975
+This reg expression will
+not match the string.
+
+355
+00:17:56,975 --> 00:18:00,260
+Ok, you might ask
+why on earth does
+
+356
+00:18:00,260 --> 00:18:02,960
+that algorithm
+actually work away?
+
+357
+00:18:02,960 --> 00:18:06,515
+Here's an explanation
+for why it works.
+
+358
+00:18:06,515 --> 00:18:10,190
+Imagine you have a wreck
+expression R1, okay?
+
+359
+00:18:10,190 --> 00:18:13,220
+And you have a string abc,
+
+360
+00:18:13,220 --> 00:18:14,270
+and you want to find out
+
+361
+00:18:14,270 --> 00:18:17,180
+whether one can
+match that string.
+
+362
+00:18:17,180 --> 00:18:18,799
+And for the moment,
+
+363
+00:18:18,799 --> 00:18:22,610
+let's assume that it
+can match that string.
+
+364
+00:18:22,610 --> 00:18:26,315
+Ok? So the language L of
+
+365
+00:18:26,315 --> 00:18:30,185
+R will actually
+contain that string,
+
+366
+00:18:30,185 --> 00:18:31,805
+otherwise it wouldn't match that.
+
+367
+00:18:31,805 --> 00:18:36,710
+Okay? So ABC is in
+this language, okay?
+
+368
+00:18:36,710 --> 00:18:39,785
+If I now take the
+semantic derivative,
+
+369
+00:18:39,785 --> 00:18:43,145
+that means I look at all
+the strings in this f,
+
+370
+00:18:43,145 --> 00:18:46,820
+R1, and further out
+
+371
+00:18:46,820 --> 00:18:48,740
+all the ones which
+do not start with
+
+372
+00:18:48,740 --> 00:18:51,260
+an a, I discharge them.
+
+373
+00:18:51,260 --> 00:18:54,545
+And I only look the one
+which start with an a.
+
+374
+00:18:54,545 --> 00:18:56,465
+And of those strings,
+
+375
+00:18:56,465 --> 00:18:58,475
+I chop off this a.
+
+376
+00:18:58,475 --> 00:19:01,025
+So after this
+romantic derivative,
+
+377
+00:19:01,025 --> 00:19:05,735
+this set of strings will
+contain just B and C.
+
+378
+00:19:05,735 --> 00:19:12,830
+Ok. Now if I build the next
+semantic derivative of that,
+
+379
+00:19:12,830 --> 00:19:14,345
+then I would look at
+
+380
+00:19:14,345 --> 00:19:16,850
+all the strings which
+start with a P,
+
+381
+00:19:16,850 --> 00:19:21,350
+and forget about everything
+else of the ones.
+
+382
+00:19:21,350 --> 00:19:27,905
+I know they start with B.
+I just chop of the B. Ok.
+
+383
+00:19:27,905 --> 00:19:31,655
+So in this whole set here,
+
+384
+00:19:31,655 --> 00:19:33,785
+in this whole set here,
+
+385
+00:19:33,785 --> 00:19:39,030
+there will be now a string
+which is just c. Okay?
+
+386
+00:19:39,190 --> 00:19:44,420
+Then I built the third
+semantic derivative
+
+387
+00:19:44,420 --> 00:19:47,300
+because I want to find out
+whether ABC is involved.
+
+388
+00:19:47,300 --> 00:19:50,540
+Okay? So now I look
+at all the strings in
+
+389
+00:19:50,540 --> 00:19:52,820
+here and look at
+
+390
+00:19:52,820 --> 00:19:55,340
+them whether they start
+with a C. If yes,
+
+391
+00:19:55,340 --> 00:19:56,885
+I chop off the sea.
+
+392
+00:19:56,885 --> 00:19:59,120
+And put in markets remaining.
+
+393
+00:19:59,120 --> 00:20:00,425
+So in this case,
+
+394
+00:20:00,425 --> 00:20:02,510
+if I have the string c
+
+395
+00:20:02,510 --> 00:20:04,550
+in this language and
+I chop off this,
+
+396
+00:20:04,550 --> 00:20:07,700
+see what is remaining
+is the empty string.
+
+397
+00:20:07,700 --> 00:20:09,695
+So we have to check of
+
+398
+00:20:09,695 --> 00:20:14,510
+that language whether it
+contains the empty string.
+
+399
+00:20:14,510 --> 00:20:18,800
+If yes, then the
+original R1 can match
+
+400
+00:20:18,800 --> 00:20:21,050
+this ABC because this ABC
+
+401
+00:20:21,050 --> 00:20:24,119
+must have been in this language.
+
+402
+00:20:24,130 --> 00:20:28,565
+And if in the end there wasn't
+the empty string, then,
+
+403
+00:20:28,565 --> 00:20:33,575
+then this ABC Watson in
+this language of one.
+
+404
+00:20:33,575 --> 00:20:36,260
+And so the electron must have,
+
+405
+00:20:36,260 --> 00:20:38,880
+or the metro must have failed.
+
+406
+00:20:39,040 --> 00:20:42,530
+The clever bit is that here
+
+407
+00:20:42,530 --> 00:20:45,530
+the explanation is for languages.
+
+408
+00:20:45,530 --> 00:20:49,835
+Remember, this
+semantic derivative
+
+409
+00:20:49,835 --> 00:20:53,450
+works over languages and they
+sometimes can be in finite.
+
+410
+00:20:53,450 --> 00:20:55,730
+So that's not really
+an algorithm.
+
+411
+00:20:55,730 --> 00:20:58,880
+Yeah, that's just
+explaining the idea with
+
+412
+00:20:58,880 --> 00:21:02,525
+preserves key
+achieved was that he
+
+413
+00:21:02,525 --> 00:21:06,440
+now works with this derivative
+America expressions and
+
+414
+00:21:06,440 --> 00:21:10,715
+somehow imitates what
+happens on these languages.
+
+415
+00:21:10,715 --> 00:21:14,135
+Because remember if you
+have an wreck expression,
+
+416
+00:21:14,135 --> 00:21:17,405
+are you want to test
+whether can match APC,
+
+417
+00:21:17,405 --> 00:21:22,550
+then you take first
+derivative according to a.
+
+418
+00:21:22,550 --> 00:21:25,760
+So you will get a wreck
+expression which can match b
+
+419
+00:21:25,760 --> 00:21:29,464
+and c If R could match abc.
+
+420
+00:21:29,464 --> 00:21:31,430
+So after the first derivative,
+
+421
+00:21:31,430 --> 00:21:33,620
+you will get a wreck expression
+which can match B and
+
+422
+00:21:33,620 --> 00:21:37,070
+C. If you take the
+second derivative,
+
+423
+00:21:37,070 --> 00:21:41,225
+you will get a reexpression
+which can match c alone.
+
+424
+00:21:41,225 --> 00:21:44,180
+And if you take the
+final derivative,
+
+425
+00:21:44,180 --> 00:21:46,070
+then you will get
+
+426
+00:21:46,070 --> 00:21:48,260
+rec expression which hopefully
+
+427
+00:21:48,260 --> 00:21:49,715
+can match the empty string.
+
+428
+00:21:49,715 --> 00:21:53,780
+If it does, then this
+R can match the ABC.
+
+429
+00:21:53,780 --> 00:21:55,655
+And if it doesn't, then
+
+430
+00:21:55,655 --> 00:21:58,680
+ABC couldn't be
+matched by this on.
+
+431
+00:21:58,900 --> 00:22:02,990
+Okay, let's have a look
+how this pans out in code.
+
+432
+00:22:02,990 --> 00:22:06,050
+Here's defile RE1.
+
+433
+00:22:06,050 --> 00:22:07,940
+It's also uploaded on Keith,
+
+434
+00:22:07,940 --> 00:22:10,625
+so you can see exactly
+what I'm doing.
+
+435
+00:22:10,625 --> 00:22:13,970
+And actually I already saw
+that file because I showed you
+
+436
+00:22:13,970 --> 00:22:15,710
+how my wreck expressions are
+
+437
+00:22:15,710 --> 00:22:17,960
+defined with the
+abstract classes here.
+
+438
+00:22:17,960 --> 00:22:21,155
+And here, the six cases
+for 0-1 character,
+
+439
+00:22:21,155 --> 00:22:23,540
+I turn a TIF in
+sequence and star.
+
+440
+00:22:23,540 --> 00:22:26,705
+Ok. So the first
+function nullable,
+
+441
+00:22:26,705 --> 00:22:28,760
+the simple one, takes
+
+442
+00:22:28,760 --> 00:22:32,120
+a regular expression as
+argument and returns a boolean.
+
+443
+00:22:32,120 --> 00:22:34,280
+And then with this
+pattern matching,
+
+444
+00:22:34,280 --> 00:22:37,040
+we just go through
+all these six cases
+
+445
+00:22:37,040 --> 00:22:38,900
+are serious defined as false.
+
+446
+00:22:38,900 --> 00:22:43,234
+One is defined as true
+character for any character,
+
+447
+00:22:43,234 --> 00:22:45,455
+this null return false.
+
+448
+00:22:45,455 --> 00:22:47,540
+The alternative is to find here,
+
+449
+00:22:47,540 --> 00:22:50,000
+so that's the or in Scala.
+
+450
+00:22:50,000 --> 00:22:52,700
+And for the sequence,
+that's the end.
+
+451
+00:22:52,700 --> 00:22:56,180
+And this star, no matter
+what the reg expression is,
+
+452
+00:22:56,180 --> 00:22:59,540
+it will always match the
+empty string, so true.
+
+453
+00:22:59,540 --> 00:23:02,225
+So nanobots, very easy.
+
+454
+00:23:02,225 --> 00:23:07,430
+The derivative is also not
+so much more complicated.
+
+455
+00:23:07,430 --> 00:23:08,974
+It takes two arguments,
+
+456
+00:23:08,974 --> 00:23:11,810
+a character and the
+rec expression,
+
+457
+00:23:11,810 --> 00:23:14,405
+and returns a wreck expression.
+
+458
+00:23:14,405 --> 00:23:16,340
+So now we just have to analyze
+
+459
+00:23:16,340 --> 00:23:18,890
+what's that reg
+expression looks like.
+
+460
+00:23:18,890 --> 00:23:23,870
+If it's 0, we return
+01, we return 0.
+
+461
+00:23:23,870 --> 00:23:26,990
+If the character is the
+wreck expressions character
+
+462
+00:23:26,990 --> 00:23:30,260
+d. We test whether it's
+equal to this character.
+
+463
+00:23:30,260 --> 00:23:32,225
+We want to take the
+derivative off.
+
+464
+00:23:32,225 --> 00:23:36,185
+If yes, return one, otherwise 0.
+
+465
+00:23:36,185 --> 00:23:38,600
+The alternative which has pushed
+
+466
+00:23:38,600 --> 00:23:39,860
+the derivative under
+
+467
+00:23:39,860 --> 00:23:42,710
+this alternative,
+that's the easy one.
+
+468
+00:23:42,710 --> 00:23:44,690
+Here's the sequence case where we
+
+469
+00:23:44,690 --> 00:23:47,015
+first have to test
+if it's nullable,
+
+470
+00:23:47,015 --> 00:23:49,040
+If it's not the have to push
+
+471
+00:23:49,040 --> 00:23:52,160
+the derivative over
+the first DR1.
+
+472
+00:23:52,160 --> 00:23:56,135
+And otherwise if it is null
+above we have two cases.
+
+473
+00:23:56,135 --> 00:24:00,605
+Either you have to push
+it over the R1 or R2.
+
+474
+00:24:00,605 --> 00:24:03,860
+And a star case, this one.
+
+475
+00:24:03,860 --> 00:24:07,160
+We just build the sequence
+of the derivative of
+
+476
+00:24:07,160 --> 00:24:11,600
+R1 and append the
+or as a sequence,
+
+477
+00:24:11,600 --> 00:24:14,195
+put the star at the end.
+
+478
+00:24:14,195 --> 00:24:19,430
+Okay, so here's this
+function for strings.
+
+479
+00:24:19,430 --> 00:24:21,740
+So a list of characters.
+
+480
+00:24:21,740 --> 00:24:23,870
+This function takes N,
+
+481
+00:24:23,870 --> 00:24:26,480
+S list of characters argument
+and a reg expression
+
+482
+00:24:26,480 --> 00:24:29,360
+as argument and returns
+a wreck expression.
+
+483
+00:24:29,360 --> 00:24:31,460
+And we just have to
+pattern match what
+
+484
+00:24:31,460 --> 00:24:34,055
+that string looks like
+or this list looks like.
+
+485
+00:24:34,055 --> 00:24:35,360
+If it's the empty list,
+
+486
+00:24:35,360 --> 00:24:38,030
+it just immediately
+return the R. If
+
+487
+00:24:38,030 --> 00:24:42,020
+it's something of the
+form C followed by S,
+
+488
+00:24:42,020 --> 00:24:45,050
+then we build first the
+derivative according
+
+489
+00:24:45,050 --> 00:24:48,345
+to a C. And then
+the result of that,
+
+490
+00:24:48,345 --> 00:24:50,090
+people recursively call
+
+491
+00:24:50,090 --> 00:24:53,675
+the derivative
+according to s. Okay?
+
+492
+00:24:53,675 --> 00:24:56,060
+And now the main mantra,
+
+493
+00:24:56,060 --> 00:24:59,360
+it takes a reg
+expression and takes
+
+494
+00:24:59,360 --> 00:25:02,870
+a string and returns a
+boolean, true or false.
+
+495
+00:25:02,870 --> 00:25:05,705
+And it first builds
+the derivative.
+
+496
+00:25:05,705 --> 00:25:07,430
+The only thing I have to do here
+
+497
+00:25:07,430 --> 00:25:09,410
+because I'm implementing
+it and scholar,
+
+498
+00:25:09,410 --> 00:25:15,064
+I have to coerce between strings
+and lists of characters.
+
+499
+00:25:15,064 --> 00:25:16,580
+So he, I take first
+
+500
+00:25:16,580 --> 00:25:20,810
+the two list of the S that
+gives me a list of characters.
+
+501
+00:25:20,810 --> 00:25:23,135
+Then I build this derivative
+
+502
+00:25:23,135 --> 00:25:26,045
+is ds because I'm
+looking at strings.
+
+503
+00:25:26,045 --> 00:25:28,160
+And then at the end,
+
+504
+00:25:28,160 --> 00:25:33,050
+built-in nullable of
+the final derivative.
+
+505
+00:25:33,050 --> 00:25:37,310
+The beauty of all this
+is that in Scala,
+
+506
+00:25:37,310 --> 00:25:40,085
+these definition which
+I had on the slides
+
+507
+00:25:40,085 --> 00:25:43,835
+go almost one-to-one into code.
+
+508
+00:25:43,835 --> 00:25:45,605
+And that's of course,
+
+509
+00:25:45,605 --> 00:25:47,480
+makes it very easy
+to implement in
+
+510
+00:25:47,480 --> 00:25:49,730
+a functional language like Scala.
+
+511
+00:25:49,730 --> 00:25:52,865
+We can also now try
+out some examples.
+
+512
+00:25:52,865 --> 00:25:56,960
+This was the example
+I had on the slide.
+
+513
+00:25:56,960 --> 00:25:58,370
+And we could now calculate
+
+514
+00:25:58,370 --> 00:26:00,305
+what's the derivative
+according to a,
+
+515
+00:26:00,305 --> 00:26:02,720
+B, and C of this
+record expression.
+
+516
+00:26:02,720 --> 00:26:07,040
+And I hope you didn't
+make any mistake.
+
+517
+00:26:07,040 --> 00:26:09,260
+Now of course we want
+to see whether B
+
+518
+00:26:09,260 --> 00:26:11,390
+do any better with
+this algorithm.
+
+519
+00:26:11,390 --> 00:26:13,715
+Then Python and Ruby.
+
+520
+00:26:13,715 --> 00:26:16,070
+And let's first have a
+look at this example.
+
+521
+00:26:16,070 --> 00:26:18,079
+So this reg expression.
+
+522
+00:26:18,079 --> 00:26:19,880
+The problem is that in
+
+523
+00:26:19,880 --> 00:26:22,070
+our reg expression
+metro so far we have
+
+524
+00:26:22,070 --> 00:26:24,530
+the sequence right
+expression where we
+
+525
+00:26:24,530 --> 00:26:27,200
+don't have this optional
+wreck expression.
+
+526
+00:26:27,200 --> 00:26:30,800
+And we don't have this n
+times Rekha expression,
+
+527
+00:26:30,800 --> 00:26:36,605
+but we can quickly implement
+that in our implementation.
+
+528
+00:26:36,605 --> 00:26:40,549
+So if you want to build the
+optional reg expression,
+
+529
+00:26:40,549 --> 00:26:41,870
+which is defined here,
+
+530
+00:26:41,870 --> 00:26:44,570
+a little function which
+takes a reg expression and
+
+531
+00:26:44,570 --> 00:26:47,360
+returns the alternative of R one.
+
+532
+00:26:47,360 --> 00:26:49,624
+So it essentially
+takes the definition
+
+533
+00:26:49,624 --> 00:26:53,240
+of optional R and
+replaces it by that.
+
+534
+00:26:53,240 --> 00:26:56,150
+The end times what we
+essentially do it,
+
+535
+00:26:56,150 --> 00:27:01,535
+There's beaches built n
+copies of this r. Ok?
+
+536
+00:27:01,535 --> 00:27:04,745
+So if this n times was 0,
+
+537
+00:27:04,745 --> 00:27:06,245
+we just return one.
+
+538
+00:27:06,245 --> 00:27:11,570
+If it's one, then we return
+just the reg expression.
+
+539
+00:27:11,570 --> 00:27:15,575
+And if it's a, something
+bigger than one,
+
+540
+00:27:15,575 --> 00:27:18,560
+then we just built the
+sequence of one of
+
+541
+00:27:18,560 --> 00:27:20,270
+these copies and call
+
+542
+00:27:20,270 --> 00:27:22,925
+this function again
+with n minus one.
+
+543
+00:27:22,925 --> 00:27:26,330
+So we just now n copies of that.
+
+544
+00:27:26,330 --> 00:27:30,710
+Okay? Okay, so we can look
+at our first example.
+
+545
+00:27:30,710 --> 00:27:32,629
+This is the work expression,
+
+546
+00:27:32,629 --> 00:27:36,035
+and I call that here
+even rec expression1.
+
+547
+00:27:36,035 --> 00:27:37,670
+It doesn't look as beautiful
+
+548
+00:27:37,670 --> 00:27:39,140
+as what we write down on paper.
+
+549
+00:27:39,140 --> 00:27:41,240
+We will actually look
+at this later on
+
+550
+00:27:41,240 --> 00:27:43,640
+if this can be improved.
+But here it is.
+
+551
+00:27:43,640 --> 00:27:45,724
+Here's the reg expression,
+
+552
+00:27:45,724 --> 00:27:49,520
+and here's a little function
+which measures the time.
+
+553
+00:27:49,520 --> 00:27:53,180
+And we can now start testing
+
+554
+00:27:53,180 --> 00:27:57,845
+this reg expression with
+strings of just containing A's.
+
+555
+00:27:57,845 --> 00:28:00,020
+And we first a bit cautious,
+
+556
+00:28:00,020 --> 00:28:04,985
+be tested between 020
+and be count by two.
+
+557
+00:28:04,985 --> 00:28:08,330
+And I actually will not
+start that with Scala,
+
+558
+00:28:08,330 --> 00:28:12,965
+but actually the orbital online.
+
+559
+00:28:12,965 --> 00:28:15,305
+And that's out.
+
+560
+00:28:15,305 --> 00:28:17,269
+And that correlates.
+
+561
+00:28:17,269 --> 00:28:20,675
+So for 18 days it's pretty fast.
+
+562
+00:28:20,675 --> 00:28:24,815
+But for 20 it already
+needs to seconds.
+
+563
+00:28:24,815 --> 00:28:28,265
+And you can see
+actually this jump from
+
+564
+00:28:28,265 --> 00:28:32,825
+18 to 20 in the runtime
+is prepared to.
+
+565
+00:28:32,825 --> 00:28:37,460
+And if we actually measure
+it more accurately,
+
+566
+00:28:37,460 --> 00:28:39,770
+then we get this picture.
+
+567
+00:28:39,770 --> 00:28:41,255
+And what turns out,
+
+568
+00:28:41,255 --> 00:28:45,830
+we actually inverse as Python
+and Ruby in this example.
+
+569
+00:28:45,830 --> 00:28:49,230
+So I guess that's a fail.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/videos/02-prelims.srt Tue Sep 29 12:52:07 2020 +0100
@@ -0,0 +1,1217 @@
+1
+00:00:09,990 --> 00:00:13,465
+Welcome back to this
+week's lecture.
+
+2
+00:00:13,465 --> 00:00:16,450
+The task this week is to actually
+
+3
+00:00:16,450 --> 00:00:20,320
+implement a regular
+expression matcher.
+
+4
+00:00:20,320 --> 00:00:22,510
+And we want to be a bit
+
+5
+00:00:22,510 --> 00:00:25,315
+faster than the regular
+expression matcher
+
+6
+00:00:25,315 --> 00:00:29,380
+in Python, Ruby,
+Javascript, and Java.
+
+7
+00:00:29,380 --> 00:00:31,960
+Remember this regular expression
+
+8
+00:00:31,960 --> 00:00:35,785
+and strings which are
+just a number of a's,
+
+9
+00:00:35,785 --> 00:00:39,670
+these regular expression matchers
+where abysmally slow.
+
+10
+00:00:39,670 --> 00:00:43,170
+They could only match
+approximately 30 a's in
+
+11
+00:00:43,170 --> 00:00:45,665
+something like half a minute.
+
+12
+00:00:45,665 --> 00:00:49,460
+What we like to do with
+our regular expression matcher.
+
+13
+00:00:49,460 --> 00:00:51,890
+We will try a few times,
+
+14
+00:00:51,890 --> 00:00:55,505
+but our second trial will already
+be much, much better.
+
+15
+00:00:55,505 --> 00:00:58,400
+It will probably
+match around maybe
+
+16
+00:00:58,400 --> 00:01:02,030
+thousand a's in something
+in half a minute.
+
+17
+00:01:02,030 --> 00:01:03,920
+But our third version in
+
+18
+00:01:03,920 --> 00:01:06,230
+comparison will be
+blindingly fast.
+
+19
+00:01:06,230 --> 00:01:08,180
+And we'll be able to match
+
+20
+00:01:08,180 --> 00:01:10,145
+strings of length 10,000 a's
+
+21
+00:01:10,145 --> 00:01:12,230
+and will not require
+
+22
+00:01:12,230 --> 00:01:14,975
+more than five seconds.
+So let's go ahead with that.
+
+23
+00:01:14,975 --> 00:01:18,095
+We will first implement
+
+24
+00:01:18,095 --> 00:01:19,880
+our regular expression
+matcher only
+
+25
+00:01:19,880 --> 00:01:22,069
+for the basic
+regular expressions.
+
+26
+00:01:22,069 --> 00:01:25,430
+So we will have only six
+cases to think about.
+
+27
+00:01:25,430 --> 00:01:27,620
+This will simplify matters at first.
+
+28
+00:01:27,620 --> 00:01:30,350
+Later we can
+think about how to
+
+29
+00:01:30,350 --> 00:01:34,100
+extend that to the extended
+regular expressions.
+
+30
+00:01:34,100 --> 00:01:37,625
+Unfortunately, before we can
+start our implementation,
+
+31
+00:01:37,625 --> 00:01:39,290
+we have to discuss
+
+32
+00:01:39,290 --> 00:01:42,470
+when two regular
+expressions are equivalent.
+
+33
+00:01:42,470 --> 00:01:46,595
+And we use here this notation
+of the triple equals.
+
+34
+00:01:46,595 --> 00:01:48,215
+We say a regular expression
+
+35
+00:01:48,215 --> 00:01:50,180
+r1 and r2 are
+
+36
+00:01:50,180 --> 00:01:56,059
+equivalent if and only
+if the language of r1 is
+
+37
+00:01:56,059 --> 00:01:59,360
+equal to the language of r2.
+
+38
+00:01:59,360 --> 00:02:02,915
+Sounds complicated,
+but essentially means
+
+39
+00:02:02,915 --> 00:02:04,700
+if the two regular expressions can
+
+40
+00:02:04,700 --> 00:02:06,950
+match exactly the same strings,
+
+41
+00:02:06,950 --> 00:02:09,800
+then we want to regard
+them as equivalent.
+
+42
+00:02:09,800 --> 00:02:14,300
+This equivalence justifies
+why we can be sloppy
+
+43
+00:02:14,300 --> 00:02:16,040
+with parentheses.
+
+44
+00:02:16,040 --> 00:02:23,870
+For example, if we have
+(r1 + r2) + r3,
+
+45
+00:02:23,870 --> 00:02:32,270
+then this will be equivalent
+to r1 + (r2 + r3).
+
+46
+00:02:32,270 --> 00:02:34,910
+Remember, regular
+expressions are trees,
+
+47
+00:02:34,910 --> 00:02:37,940
+so these are two different
+regular expressions,
+
+48
+00:02:37,940 --> 00:02:41,540
+but they can match
+the same strings.
+
+49
+00:02:41,540 --> 00:02:45,695
+Because, well, if we take
+here the meaning of that,
+
+50
+00:02:45,695 --> 00:02:54,350
+that would be just L(r1 + r2 + r3),
+
+
+51
+00:02:54,350 --> 00:03:04,100
+which is equal to L(r1 + r2) u L(r3).
+
+
+52
+00:03:04,100 --> 00:03:10,595
+And that is equal to of
+
+53
+00:03:10,595 --> 00:03:15,710
+L(r1) u L(r2) u L(r3).
+
+
+54
+00:03:15,710 --> 00:03:17,930
+And if you do the
+same calculation
+
+55
+00:03:17,930 --> 00:03:19,445
+for that regular expression,
+
+56
+00:03:19,445 --> 00:03:22,940
+you will find out the
+two sets are the same.
+
+57
+00:03:22,940 --> 00:03:26,195
+So both regular expressions
+can match the same strings.
+
+58
+00:03:26,195 --> 00:03:28,805
+So even if they're different
+regular expressions,
+
+59
+00:03:28,805 --> 00:03:31,220
+we can regard them as eqivalent.
+
+60
+00:03:31,220 --> 00:03:33,290
+And so that's why we can forget
+
+61
+00:03:33,290 --> 00:03:35,270
+about writing these parentheses.
+
+62
+00:03:35,270 --> 00:03:40,205
+Let's have a look
+at some more examples.
+
+63
+00:03:40,205 --> 00:03:41,870
+So the first one here,
+
+64
+00:03:41,870 --> 00:03:43,205
+that is now clear.
+
+65
+00:03:43,205 --> 00:03:45,200
+We did this calculation already
+
+66
+00:03:45,200 --> 00:03:47,480
+for arbitrary regular expressions.
+
+67
+00:03:47,480 --> 00:03:49,520
+Here it is for
+concrete ones a, b and c.
+
+68
+00:03:49,520 --> 00:03:52,690
+Here on the left-hand side,
+
+69
+00:03:52,690 --> 00:03:54,895
+the regular expression
+has the same meaning
+
+70
+00:03:54,895 --> 00:03:56,620
+as on the right-hand side.
+
+71
+00:03:56,620 --> 00:03:58,420
+So they are equivalent.
+
+72
+00:03:58,420 --> 00:04:01,375
+We can ignore the parentheses.
+
+73
+00:04:01,375 --> 00:04:03,220
+If I have a choice,
+
+74
+00:04:03,220 --> 00:04:06,610
+but the choice is
+the same a or a,
+
+75
+00:04:06,610 --> 00:04:09,265
+then this is
+equivalent to just a.
+
+76
+00:04:09,265 --> 00:04:13,075
+So the same choice doesn't
+introduce anything more.
+
+77
+00:04:13,075 --> 00:04:15,370
+In the next case, if I have
+
+78
+00:04:15,370 --> 00:04:19,450
+a regular expression
+which can match a or b,
+
+79
+00:04:19,450 --> 00:04:23,860
+that can match the same
+strings as b or a.
+
+80
+00:04:23,860 --> 00:04:27,325
+So you have a kind of
+commutativity for the plus,
+
+81
+00:04:27,325 --> 00:04:29,485
+which is like on natural numbers.
+
+82
+00:04:29,485 --> 00:04:32,880
+But you would not have a
+communitivity for the sequence:
+
+83
+00:04:32,880 --> 00:04:37,070
+if you have
+first a and then b,
+
+84
+00:04:37,070 --> 00:04:40,850
+that's not the same as
+matching first b and then a.
+
+85
+00:04:40,850 --> 00:04:42,800
+So for the sequence ...
+
+86
+00:04:42,800 --> 00:04:44,615
+they are not equivalent.
+
+87
+00:04:44,615 --> 00:04:49,475
+However, for the sequence I
+can, like for the plus, ignore
+
+88
+00:04:49,475 --> 00:04:51,245
+prarentheses.
+
+89
+00:04:51,245 --> 00:04:55,070
+If I have the parentheses
+and this way,
+
+90
+00:04:55,070 --> 00:04:57,785
+or I have it in this way.
+
+91
+00:04:57,785 --> 00:05:00,605
+These are two different
+regular expressions,
+
+92
+00:05:00,605 --> 00:05:02,120
+but they have the same meaning.
+
+93
+00:05:02,120 --> 00:05:03,815
+So they are equivalent.
+
+94
+00:05:03,815 --> 00:05:05,780
+And so I can leave out parentheses
+
+95
+00:05:05,780 --> 00:05:09,170
+for times as well.
+
+96
+00:05:09,170 --> 00:05:12,185
+The next one is a slightly
+more interesting one.
+
+97
+00:05:12,185 --> 00:05:15,020
+If I have a regular
+expression which can match
+
+98
+00:05:15,020 --> 00:05:19,655
+c first followed by (a + b),
+
+99
+00:05:19,655 --> 00:05:25,355
+then this is the same as
+first c followed by a,
+
+100
+00:05:25,355 --> 00:05:28,640
+or c followed by b.
+
+101
+00:05:28,640 --> 00:05:31,265
+So that's the kind of
+distributivity law
+
+102
+00:05:31,265 --> 00:05:33,545
+on regular expressions.
+
+103
+00:05:33,545 --> 00:05:36,350
+But they're also regular expressions
+which are not equivalent.
+
+104
+00:05:36,350 --> 00:05:38,990
+If I have the regular
+expression which can
+
+105
+00:05:38,990 --> 00:05:42,800
+match the string containing
+exactly two a's.
+
+106
+00:05:42,800 --> 00:05:44,240
+That is not equivalent
+
+107
+00:05:44,240 --> 00:05:46,730
+to the regular
+expression which can just match
+
+108
+00:05:46,730 --> 00:05:49,250
+a single a. And similarly
+
+109
+00:05:49,250 --> 00:05:51,680
+in this case, if I can match
+
+110
+00:05:51,680 --> 00:05:56,075
+a or the string b followed by c,
+
+111
+00:05:56,075 --> 00:05:59,825
+that is not the same as a or b,
+
+112
+00:05:59,825 --> 00:06:02,000
+followed by a or c.
+
+113
+00:06:02,000 --> 00:06:05,990
+I'll let you think about
+why is that not equivalent.
+
+114
+00:06:05,990 --> 00:06:08,060
+Essentially you
+have to find out what's
+
+115
+00:06:08,060 --> 00:06:10,475
+the meaning of both
+regular expressions.
+
+116
+00:06:10,475 --> 00:06:14,090
+And are they the
+same sets or not?
+
+117
+00:06:14,090 --> 00:06:17,435
+There are again some
+interesting corner cases.
+
+118
+00:06:17,435 --> 00:06:20,540
+If I have a regular expression
+that can match a,
+
+119
+00:06:20,540 --> 00:06:23,450
+but followed by the regular
+expression which cannot match
+
+120
+00:06:23,450 --> 00:06:25,670
+anything, that is not equivalent
+
+121
+00:06:25,670 --> 00:06:29,255
+to the regular
+expression which can match a.
+
+122
+00:06:29,255 --> 00:06:31,340
+Again here you have
+to do the calculation
+
+123
+00:06:31,340 --> 00:06:32,915
+to convince you of that.
+
+124
+00:06:32,915 --> 00:06:35,840
+Similarly, if I have a regular
+expression which can
+
+125
+00:06:35,840 --> 00:06:38,750
+match a or the empty string,
+
+126
+00:06:38,750 --> 00:06:40,640
+this is not equivalent to
+
+127
+00:06:40,640 --> 00:06:43,355
+the regular expression
+which can just match a.
+
+128
+00:06:43,355 --> 00:06:46,925
+Here are some interesting
+ones with zeros and ones.
+
+129
+00:06:46,925 --> 00:06:48,860
+So if I have the regular expression
+
+130
+00:06:48,860 --> 00:06:50,975
+that can match the empty string,
+
+131
+00:06:50,975 --> 00:06:53,540
+this will be actually equivalent to
+
+132
+00:06:53,540 --> 00:06:56,435
+the regular expression
+which can match nothing,
+
+133
+00:06:56,435 --> 00:06:59,405
+but star of that.
+
+134
+00:06:59,405 --> 00:07:01,970
+Remember the star
+always introduces,
+
+135
+00:07:01,970 --> 00:07:04,370
+no matter what, the empty string.
+
+136
+00:07:04,370 --> 00:07:06,170
+So this regular expression can match
+
+137
+00:07:06,170 --> 00:07:08,930
+the empty string,
+just like the 1.
+
+138
+00:07:08,930 --> 00:07:12,125
+So these two expressions
+are not the same,
+
+139
+00:07:12,125 --> 00:07:14,210
+but they are equivalent.
+
+140
+00:07:14,210 --> 00:07:16,700
+And it doesn't matter
+whether you take
+
+141
+00:07:16,700 --> 00:07:20,090
+the empty string to the star,
+
+142
+00:07:20,090 --> 00:07:23,855
+because if I can match 0 or
+more times the empty string,
+
+143
+00:07:23,855 --> 00:07:27,450
+that will be equivalent to
+just the empty string.
+
+144
+00:07:27,520 --> 00:07:32,510
+Slightly similar to the
+third case is this one.
+
+145
+00:07:32,510 --> 00:07:35,720
+Zero star is not equivalent to 0
+
+146
+00:07:35,720 --> 00:07:40,025
+because that can match
+exactly the empty string.
+
+147
+00:07:40,025 --> 00:07:42,740
+This cannot match anything.
+
+148
+00:07:42,740 --> 00:07:44,839
+While the previous
+
+149
+00:07:44,839 --> 00:07:47,540
+equivalences are all very
+instructive to look at,
+
+150
+00:07:47,540 --> 00:07:49,910
+these are the
+equivalences we need
+
+151
+00:07:49,910 --> 00:07:52,685
+in our regular expression matchers.
+
+152
+00:07:52,685 --> 00:07:54,350
+And they are defined for
+
+153
+00:07:54,350 --> 00:07:58,160
+all regular expressions r.
+So r plus 0,
+
+154
+00:07:58,160 --> 00:08:00,470
+no matter what r looks
+like is equivalent
+
+155
+00:08:00,470 --> 00:08:02,945
+to just r. Similarly
+
+156
+00:08:02,945 --> 00:08:05,930
+0 plus r is also
+equivalent to r.
+
+157
+00:08:05,930 --> 00:08:08,525
+The order of this
+choice doesn't matter;
+
+158
+00:08:08,525 --> 00:08:11,495
+r followed by 1,
+
+159
+00:08:11,495 --> 00:08:14,030
+that's the sequence
+regular expression, is
+
+160
+00:08:14,030 --> 00:08:16,760
+equivalent to r. The
+sam, r followed by
+
+161
+00:08:16,760 --> 00:08:19,010
+r is equivalent to r.
+
+162
+00:08:19,010 --> 00:08:20,990
+And r followed by
+
+163
+00:08:20,990 --> 00:08:23,390
+the regular expression which
+cannot match anything,
+
+164
+00:08:23,390 --> 00:08:27,455
+is equivalent to just 0.
+
+165
+00:08:27,455 --> 00:08:30,740
+And 0 followed by r is also equivalent to 0.
+
+166
+00:08:30,740 --> 00:08:33,615
+And if you have the
+choice of r plus r,
+
+167
+00:08:33,615 --> 00:08:37,210
+that is also
+equivalent to just r.
+
+168
+00:08:37,210 --> 00:08:40,270
+What we're going to do
+with these equivalences is
+
+169
+00:08:40,270 --> 00:08:42,820
+that we regard them as
+simplification rules.
+
+170
+00:08:42,820 --> 00:08:43,930
+So whenever we see
+
+171
+00:08:43,930 --> 00:08:46,465
+this regular expression
+in our algorithm,
+
+172
+00:08:46,465 --> 00:08:48,580
+we will replace it by
+the right-hand side.
+
+173
+00:08:48,580 --> 00:08:51,715
+So we will orient
+these equivalences.
+
+174
+00:08:51,715 --> 00:08:53,650
+If we see, this regular expression,
+
+175
+00:08:53,650 --> 00:08:55,225
+we will replace it by that one.
+
+176
+00:08:55,225 --> 00:08:58,945
+And it will not matter
+because the left-hand sides
+
+177
+00:08:58,945 --> 00:09:01,255
+can match exactly
+the same strings
+
+178
+00:09:01,255 --> 00:09:03,475
+as the right-hand sides.
+
+179
+00:09:03,475 --> 00:09:06,370
+Here two quick examples.
+
+180
+00:09:06,370 --> 00:09:08,680
+The first one, let's
+assume you have
+
+181
+00:09:08,680 --> 00:09:10,270
+a regular expression r and
+
+182
+00:09:10,270 --> 00:09:11,905
+there is something
+in front of it.
+
+183
+00:09:11,905 --> 00:09:13,720
+This "something in front of it"
+
+184
+00:09:13,720 --> 00:09:15,870
+can be simplified as follows.
+
+185
+00:09:15,870 --> 00:09:20,000
+This 1 times b
+can be simplified to b.
+
+186
+00:09:20,000 --> 00:09:23,555
+Then this b plus 0 can
+be simplified to b.
+
+187
+00:09:23,555 --> 00:09:25,490
+And assuming that nothing can
+
+188
+00:09:25,490 --> 00:09:27,470
+be simplified inside this r,
+
+189
+00:09:27,470 --> 00:09:28,700
+then this would be
+
+190
+00:09:28,700 --> 00:09:33,050
+the simplified version
+of this regular expression.
+
+191
+00:09:33,050 --> 00:09:34,820
+And because the simplification
+
+192
+00:09:34,820 --> 00:09:36,965
+rules preserve what can be matched,
+
+193
+00:09:36,965 --> 00:09:39,080
+we can be sure that
+this regular expression
+
+194
+00:09:39,080 --> 00:09:41,255
+can match exactly the strings
+
+195
+00:09:41,255 --> 00:09:43,940
+this regular expression can match.
+
+196
+00:09:43,940 --> 00:09:45,740
+Here is the other example.
+
+197
+00:09:45,740 --> 00:09:49,550
+This has just a tiny change
+that this becomes here as 0.
+
+198
+00:09:49,550 --> 00:09:54,485
+Then 0 times b can be
+replaced by just 0.
+
+199
+00:09:54,485 --> 00:09:56,705
+Then they are actually two
+rules which could apply
+
+200
+00:09:56,705 --> 00:09:59,014
+either that we have
+the same choice
+
+201
+00:09:59,014 --> 00:10:01,160
+or we can just say something plus
+
+202
+00:10:01,160 --> 00:10:04,100
+0 will always go to something.
+
+203
+00:10:04,100 --> 00:10:06,245
+So we can simplify it to that.
+
+204
+00:10:06,245 --> 00:10:08,510
+And then we have
+0 times r again,
+
+205
+00:10:08,510 --> 00:10:10,460
+and that can be simplified to 0.
+
+206
+00:10:10,460 --> 00:10:12,080
+So actually what we find out with
+
+207
+00:10:12,080 --> 00:10:14,645
+this calculation is that
+this regular expression,
+
+208
+00:10:14,645 --> 00:10:16,835
+even if it looks
+quite complicated,
+
+209
+00:10:16,835 --> 00:10:19,295
+actually doesn't
+match any string.
+
+210
+00:10:19,295 --> 00:10:21,290
+Because it matches exactly
+
+211
+00:10:21,290 --> 00:10:23,420
+those string this regular
+expression can match.
+
+212
+00:10:23,420 --> 00:10:26,285
+And this reg expression
+cannot match any.
+
+213
+00:10:26,285 --> 00:10:29,930
+We need one more
+operation on languages.
+
+214
+00:10:29,930 --> 00:10:31,700
+I call this operation
+
+215
+00:10:31,700 --> 00:10:35,165
+the semantic derivative
+of a language.
+
+216
+00:10:35,165 --> 00:10:37,775
+This operation takes
+two arguments.
+
+217
+00:10:37,775 --> 00:10:40,670
+It takes a character
+which we take
+
+218
+00:10:40,670 --> 00:10:43,925
+the semantic derivative
+according to,
+
+219
+00:10:43,925 --> 00:10:45,980
+and it takes a language.
+
+220
+00:10:45,980 --> 00:10:49,579
+And what it does is it
+looks into this language
+
+221
+00:10:49,579 --> 00:10:51,365
+and looks for all the strings
+
+222
+00:10:51,365 --> 00:10:53,735
+that start with this character,
+
+223
+00:10:53,735 --> 00:10:56,405
+let's say c, and then
+
+224
+00:10:56,405 --> 00:11:00,920
+just strips off that c.
+So here's an example.
+
+225
+00:11:00,920 --> 00:11:02,975
+Suppose you have a language A,
+
+226
+00:11:02,975 --> 00:11:04,460
+with the strings
+
+227
+00:11:04,460 --> 00:11:06,965
+foo, bar and frak.
+
+228
+00:11:06,965 --> 00:11:10,835
+And you take the semantic
+derivative according to f,
+
+229
+00:11:10,835 --> 00:11:14,450
+then we discard all the
+strings which do not
+
+230
+00:11:14,450 --> 00:11:18,320
+start with an f. So
+bar will be discarded.
+
+231
+00:11:18,320 --> 00:11:22,850
+Foo and frac start with
+an f. So we keep them
+
+232
+00:11:22,850 --> 00:11:25,265
+but we strip off the first f.
+
+233
+00:11:25,265 --> 00:11:29,045
+So here we have only oo and rak.
+
+234
+00:11:29,045 --> 00:11:31,609
+If you take the
+semantic derivative
+
+235
+00:11:31,609 --> 00:11:33,830
+of that language according to b,
+
+236
+00:11:33,830 --> 00:11:37,190
+then we will discard foo and
+frak because they don't
+
+237
+00:11:37,190 --> 00:11:40,940
+start with b, and just keep bar.
+
+238
+00:11:40,940 --> 00:11:44,585
+But again, we have to
+strip off this first b.
+
+239
+00:11:44,585 --> 00:11:49,305
+And if you take the semantic
+derivative according to a,
+
+240
+00:11:49,305 --> 00:11:52,585
+then none of these
+strings start with a.
+
+241
+00:11:52,585 --> 00:11:56,800
+So this will be defined
+as just the empty set.
+
+242
+00:11:56,800 --> 00:11:59,695
+You can slightly
+generalized this
+
+243
+00:11:59,695 --> 00:12:02,560
+semantic derivative
+to also strings.
+
+244
+00:12:02,560 --> 00:12:05,170
+So imagine you have
+a language A and you
+
+245
+00:12:05,170 --> 00:12:08,274
+build the semantic derivative
+according to a string s.
+
+246
+00:12:08,274 --> 00:12:10,990
+Then you look in
+this language and
+
+247
+00:12:10,990 --> 00:12:13,420
+you look for all the
+strings that start with
+
+248
+00:12:13,420 --> 00:12:19,555
+this s. But you strip
+off that s. Ok?
+
+249
+00:12:19,555 --> 00:12:23,830
+So if you have a string
+starting with the s,
+
+250
+00:12:23,830 --> 00:12:26,065
+then you keep that string,
+
+251
+00:12:26,065 --> 00:12:27,490
+but you keep only the rest...
+
+252
+00:12:27,490 --> 00:12:28,810
+what's coming after this s.
+
+253
+00:12:28,810 --> 00:12:32,010
+That is here indicated
+with this s'.
+
+254
+00:12:32,010 --> 00:12:35,330
+I also have this convention,
+this personal convention
+
+255
+00:12:35,330 --> 00:12:40,055
+I have to say, that everything
+which works on lists,
+
+256
+00:12:40,055 --> 00:12:42,185
+remember strings are
+lists of characters.
+
+257
+00:12:42,185 --> 00:12:46,655
+I just put there an 's'. So
+here's the one for characters.
+
+258
+00:12:46,655 --> 00:12:48,680
+I just call it Der with a capital
+
+259
+00:12:48,680 --> 00:12:51,740
+D. And I call that Ders,
+
+260
+00:12:51,740 --> 00:12:54,350
+because that works over strings.
+
+261
+00:12:54,350 --> 00:12:55,865
+And you can see how it would
+
+262
+00:12:55,865 --> 00:12:59,750
+be defined if you only take this
+as a primitive operation.
+
+263
+00:12:59,750 --> 00:13:01,340
+We would just need to iterate
+
+264
+00:13:01,340 --> 00:13:04,050
+that essentially for this Ders.
+
+265
+00:13:04,060 --> 00:13:07,970
+Ok, we now have all
+the theory in place.
+
+266
+00:13:07,970 --> 00:13:11,345
+And we can finally start
+implementing our algorithm.
+
+267
+00:13:11,345 --> 00:13:14,580
+That's when we'll do
+in the next video.