# HG changeset patch # User Christian Urban # Date 1601380327 -3600 # Node ID e8402d8ec8e6392e9a7a6af7a05926d08f3e200f # Parent b294cfbb5c01536c670099533a8d56c4acf19357 updated diff -r b294cfbb5c01 -r e8402d8ec8e6 cws/cw01.pdf Binary file cws/cw01.pdf has changed diff -r b294cfbb5c01 -r e8402d8ec8e6 cws/cw02.pdf Binary file cws/cw02.pdf has changed diff -r b294cfbb5c01 -r e8402d8ec8e6 cws/cw03.pdf Binary file cws/cw03.pdf has changed diff -r b294cfbb5c01 -r e8402d8ec8e6 cws/cw04.pdf Binary file cws/cw04.pdf has changed diff -r b294cfbb5c01 -r e8402d8ec8e6 cws/cw05.pdf Binary file cws/cw05.pdf has changed diff -r b294cfbb5c01 -r e8402d8ec8e6 style.sty --- a/style.sty Tue Sep 29 00:51:43 2020 +0100 +++ b/style.sty Tue Sep 29 12:52:07 2020 +0100 @@ -99,9 +99,9 @@ % CW deadlines -\def\cwONE{11 October} -\def\cwTWO{4 November} -\def\cwTHREE{22 November} +\def\cwONE{16 October} +\def\cwTWO{23 November} +\def\cwTHREE{20 November} \def\cwFOUR{11 December} \def\cwFIVE{15 January} diff -r b294cfbb5c01 -r e8402d8ec8e6 videos/01-basics2.srt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/videos/01-basics2.srt Tue Sep 29 12:52:07 2020 +0100 @@ -0,0 +1,1627 @@ +1 +00:00:05,810 --> 00:00:11,039 +Okay, so we have strings. +The empty string and string "abc". + +2 +00:00:11,039 --> 00:00:13,200 +And we can take the head + +3 +00:00:13,200 --> 00:00:15,615 +of a string and the +tail of the string, + +4 +00:00:15,615 --> 00:00:18,480 +since we regard them as +lists of characters. + +5 +00:00:18,480 --> 00:00:22,230 +We also introduced the +notion of a language. + +6 +00:00:22,230 --> 00:00:25,305 +A language is just +a set of strings. + +7 +00:00:25,305 --> 00:00:26,700 +So here's a language. + +8 +00:00:26,700 --> 00:00:28,275 +It contains the empty string, + +9 +00:00:28,275 --> 00:00:30,360 +the string hello, string foobar, + +10 +00:00:30,360 --> 00:00:31,965 +the string, which is + +11 +00:00:31,965 --> 00:00:34,874 +just a character a and +the string abc. + +12 +00:00:34,874 --> 00:00:37,800 +There will be also other +languages, which for example, + +13 +00:00:37,800 --> 00:00:39,780 +contain infinitely many strings. + +14 +00:00:39,780 --> 00:00:42,190 +Actually that will +happen quite often. + +15 +00:00:42,190 --> 00:00:45,560 +Now, you've seen this operation + +16 +00:00:45,560 --> 00:00:47,660 +of concatenation of two strings. + +17 +00:00:47,660 --> 00:00:50,840 +So if you have the string +foo and a string bar, + +18 +00:00:50,840 --> 00:00:53,255 +we can just put them +next to each other. + +19 +00:00:53,255 --> 00:00:57,725 +We will now extend that +notion also told languages. + +20 +00:00:57,725 --> 00:01:02,270 +So assume you have a +language A and a language B. + +21 +00:01:02,270 --> 00:01:05,825 +That means we take every +string from the language + +22 +00:01:05,825 --> 00:01:11,195 +A and concatenate it with every +string of the language B. + +23 +00:01:11,195 --> 00:01:13,160 +Here's an example. + +24 +00:01:13,160 --> 00:01:14,750 +If you have the language A + +25 +00:01:14,750 --> 00:01:17,030 +containing foo and bar as strings, + +26 +00:01:17,030 --> 00:01:20,585 +and the language B containing +a and b as strings. + +27 +00:01:20,585 --> 00:01:23,000 +Then concatenating +these two languages + +28 +00:01:23,000 --> 00:01:27,110 +means I take foo with this a and b, + +29 +00:01:27,110 --> 00:01:28,625 +giving fooa and foob, + +30 +00:01:28,625 --> 00:01:30,905 +and I take bar and +concatenated with + +31 +00:01:30,905 --> 00:01:34,160 +a and b, giving bara and barb. + +32 +00:01:34,160 --> 00:01:36,185 +So the pointer is we're + +33 +00:01:36,185 --> 00:01:39,110 +overloading this notion +of concatenating + +34 +00:01:39,110 --> 00:01:45,020 +two strings also to +concatenating two languages. + +35 +00:01:45,020 --> 00:01:48,890 +Okay, here are two corner +cases of this definition. + +36 +00:01:48,890 --> 00:01:51,395 +What happens if I +have any language, + +37 +00:01:51,395 --> 00:01:54,470 +say A, and I concatenate it with + +38 +00:01:54,470 --> 00:01:58,640 +the language which contains +only the empty string. + +39 +00:01:58,640 --> 00:02:02,270 +And secondly, if I +have any language, + +40 +00:02:02,270 --> 00:02:04,220 +say again A, and I + +41 +00:02:04,220 --> 00:02:05,870 +concatenate it with the language + +42 +00:02:05,870 --> 00:02:08,345 +which doesn't contain any string. + +43 +00:02:08,345 --> 00:02:11,720 +How would these two +cases be defined? + +44 +00:02:11,720 --> 00:02:16,055 +Remember, this language here +contains a single element, + +45 +00:02:16,055 --> 00:02:17,645 +namely the empty string, + +46 +00:02:17,645 --> 00:02:20,210 +while this language + +47 +00:02:20,210 --> 00:02:23,585 +does not contain any +string. It is empty. + +48 +00:02:23,585 --> 00:02:26,930 +So these are two +different languages. + +49 +00:02:26,930 --> 00:02:29,960 +Think about how this +would be defined. + +50 +00:02:29,960 --> 00:02:33,455 +So what's the point of +all these definitions? + +51 +00:02:33,455 --> 00:02:36,110 +Well, we want to make precise + +52 +00:02:36,110 --> 00:02:38,735 +what is the meaning of +a regular expression. + +53 +00:02:38,735 --> 00:02:41,510 +For this, we use +this function L. It + +54 +00:02:41,510 --> 00:02:44,180 +will take a regular expression as + +55 +00:02:44,180 --> 00:02:47,150 +argument and generates a + +56 +00:02:47,150 --> 00:02:49,970 +set of strings or a language. + +57 +00:02:49,970 --> 00:02:52,670 +And it's supposed to be the +language which contains + +58 +00:02:52,670 --> 00:02:56,330 +all the strings this regular +expression can match. + +59 +00:02:56,330 --> 00:02:58,670 +So remember the 0 +regular expression, + +60 +00:02:58,670 --> 00:03:01,115 +it's not supposed to +match any string. + +61 +00:03:01,115 --> 00:03:05,105 +So we give it the empty +language or empty set. + +62 +00:03:05,105 --> 00:03:06,740 +This regular expression, + +63 +00:03:06,740 --> 00:03:09,380 +the one regular expression +is supposed to match + +64 +00:03:09,380 --> 00:03:13,445 +exactly one string, +namely the empty string. + +65 +00:03:13,445 --> 00:03:15,710 +the regular expression + +66 +00:03:15,710 --> 00:03:18,875 +recognizing just the +character c, in this case. + +67 +00:03:18,875 --> 00:03:22,820 +Well, that will be the +language which contains + +68 +00:03:22,820 --> 00:03:28,175 +the string only containing +the single character c. + +69 +00:03:28,175 --> 00:03:31,295 +Now, what is this regular +expression supposed to match? + +70 +00:03:31,295 --> 00:03:34,835 +Well, a string is matched +by this regular expression + +71 +00:03:34,835 --> 00:03:37,355 +if it is matched by +the first component, r1, + +72 +00:03:37,355 --> 00:03:40,970 +or by the second component, r2. + +73 +00:03:40,970 --> 00:03:43,190 +That's why this plus +is the alternative. + +74 +00:03:43,190 --> 00:03:45,605 +So which are the strings + +75 +00:03:45,605 --> 00:03:48,320 +this regular expression +can match? Well, + +76 +00:03:48,320 --> 00:03:51,275 +all the strings +r1 one can match, + +77 +00:03:51,275 --> 00:03:54,170 +given by L(r1) union... + +78 +00:03:54,170 --> 00:03:57,380 +all the strings +R2 can match. + +79 +00:03:57,380 --> 00:04:01,325 +And we use here the union +because this alternative, + +80 +00:04:01,325 --> 00:04:05,945 +even if it says this string +is matched by r1 or by r2. + +81 +00:04:05,945 --> 00:04:08,465 +The meaning of this +reg expression + +82 +00:04:08,465 --> 00:04:11,209 +is the union of both languages. + +83 +00:04:11,209 --> 00:04:14,315 +Now the sequence case. + +84 +00:04:14,315 --> 00:04:17,960 +This means a string is +matched by this regular expression + +85 +00:04:17,960 --> 00:04:20,510 +if the first part of this +string is matched by r1 + +86 +00:04:20,510 --> 00:04:24,935 +and the second part of +the string is matched by r2. + +87 +00:04:24,935 --> 00:04:28,490 +Now we have to say, which +are all the strings + +88 +00:04:28,490 --> 00:04:31,645 +this regular expression +can match. Well, it's + +89 +00:04:31,645 --> 00:04:34,495 +all the strings +r1 can match, + +90 +00:04:34,495 --> 00:04:39,205 +concatenated with all the +strings r2 can match. + +91 +00:04:39,205 --> 00:04:42,895 +So for this, we have to use +this concatenation operation. + +92 +00:04:42,895 --> 00:04:47,440 +So if r1 can match a string +and r2 can match a string, + +93 +00:04:47,440 --> 00:04:51,220 +then in the meaning will be +the concatenation of that. + +94 +00:04:51,220 --> 00:04:53,170 +So with this function L + +95 +00:04:53,170 --> 00:04:56,995 +we can make precise +what are the strings, + +96 +00:04:56,995 --> 00:05:00,205 +*all* the strings a regular +expression can match. + +97 +00:05:00,205 --> 00:05:04,689 +The only case we haven't +specified yet is the r*. + +98 +00:05:04,689 --> 00:05:07,750 +For this, we need +another operation + +99 +00:05:07,750 --> 00:05:12,470 +on languages or sets of +strings, which we do next. + +100 +00:05:12,490 --> 00:05:17,285 +We introduce a power +operation for languages. + +101 +00:05:17,285 --> 00:05:19,100 +The power operation or + +102 +00:05:19,100 --> 00:05:22,835 +the nth power is +defined recursively. + +103 +00:05:22,835 --> 00:05:26,615 +So the A to the power of 0 + +104 +00:05:26,615 --> 00:05:31,205 +would be defined as the set +containing the empty string. + +105 +00:05:31,205 --> 00:05:36,200 +And A to the power of +n + 1 would be A + +106 +00:05:36,200 --> 00:05:38,705 +concatenated with +A to the power of + +107 +00:05:38,705 --> 00:05:42,160 +n. Here are a few examples. + +108 +00:05:42,160 --> 00:05:45,380 +A to the power of four +would be essentially + +109 +00:05:45,380 --> 00:05:47,660 +A concatenated with +A concatenated with + +110 +00:05:47,660 --> 00:05:49,640 +A concatenated with A, + +111 +00:05:49,640 --> 00:05:51,650 +and also concatenated with + +112 +00:05:51,650 --> 00:05:55,580 +the language which +contains the empty string. + +113 +00:05:55,580 --> 00:05:57,275 +Because that's how we define + +114 +00:05:57,275 --> 00:05:59,975 +the base case, A +to the power of 0. + +115 +00:05:59,975 --> 00:06:03,410 +And A to the power +one would be just A, + +116 +00:06:03,410 --> 00:06:05,330 +again followed by that one, + +117 +00:06:05,330 --> 00:06:07,790 +but this would be just A. + +118 +00:06:07,790 --> 00:06:10,385 +And A to the power 0 + +119 +00:06:10,385 --> 00:06:14,270 +is this language which +contains the empty string. + +120 +00:06:14,270 --> 00:06:16,670 +With this power operation, + +121 +00:06:16,670 --> 00:06:19,505 +I can also fill in this case. + +122 +00:06:19,505 --> 00:06:23,210 +Remember this r* operation +is supposed to match + +123 +00:06:23,210 --> 00:06:27,680 +a string with either +0 or more copies of r. + +124 +00:06:27,680 --> 00:06:31,940 +So the meaning of this +regular expression would be + +125 +00:06:31,940 --> 00:06:37,475 +now the Union of all the +powers greater or equal than 0, + +126 +00:06:37,475 --> 00:06:40,835 +of the language that r can match. + +127 +00:06:40,835 --> 00:06:44,945 +And we take now at the +union of all these powers, + +128 +00:06:44,945 --> 00:06:47,150 +that means essentially what + +129 +00:06:47,150 --> 00:06:48,530 +can the regular expression match + +130 +00:06:48,530 --> 00:06:51,440 +if I take L of r +to the 0th power, + +131 +00:06:51,440 --> 00:06:53,030 +what can I match with + +132 +00:06:53,030 --> 00:06:57,305 +the first power, the +second power, and so on. + +133 +00:06:57,305 --> 00:07:00,544 +And I take the union +of all these powers. + +134 +00:07:00,544 --> 00:07:03,830 +That's the definition +of which strings + +135 +00:07:03,830 --> 00:07:05,510 +this regular expression +is supposed to + +136 +00:07:05,510 --> 00:07:08,510 +match. The meaning of +this regular expression. + +137 +00:07:08,510 --> 00:07:11,300 +This is such an +important definition, + +138 +00:07:11,300 --> 00:07:13,250 +that there's actually +a name for it. + +139 +00:07:13,250 --> 00:07:15,020 +It's called the Kleene star. + +140 +00:07:15,020 --> 00:07:16,400 +And it's written like this. + +141 +00:07:16,400 --> 00:07:18,335 +If I have a language A, + +142 +00:07:18,335 --> 00:07:21,785 +I can build the star +of this language + +143 +00:07:21,785 --> 00:07:26,255 +by union up all the powers +greater or equal than 0. + +144 +00:07:26,255 --> 00:07:28,580 +The A-star of + +145 +00:07:28,580 --> 00:07:31,580 +a language would expand +to a to the power 0, + +146 +00:07:31,580 --> 00:07:33,770 +union A to the power of one, + +147 +00:07:33,770 --> 00:07:36,665 +A to the power of two, and so on. + +148 +00:07:36,665 --> 00:07:39,230 +And it would +essentially mean, well, + +149 +00:07:39,230 --> 00:07:43,460 +it's the language which contains +the empty string because + +150 +00:07:43,460 --> 00:07:48,290 +of the A to the 0, one copy of A, + +151 +00:07:48,290 --> 00:07:51,020 +two concatenated copies of A, + +152 +00:07:51,020 --> 00:07:55,070 +three concatenated +copies of A, and so on. + +153 +00:07:55,070 --> 00:07:59,060 +So that's what this A-star +is defined as. + +154 +00:07:59,060 --> 00:08:03,725 +Essentially as the Union +of all the powers. + +155 +00:08:03,725 --> 00:08:05,990 +And it's essentially +each power is how many + +156 +00:08:05,990 --> 00:08:08,750 +times I have to +concatenate this language A. + +157 +00:08:08,750 --> 00:08:13,549 +And note that this language +A-star will always contain + +158 +00:08:13,549 --> 00:08:21,240 +the empty string because that +is how the A^0 is defined. + +159 +00:08:21,310 --> 00:08:23,540 +So in this definition, + +160 +00:08:23,540 --> 00:08:25,969 +I could have also written star + +161 +00:08:25,969 --> 00:08:29,180 +here because the +meaning of this r*, + +162 +00:08:29,180 --> 00:08:34,055 +regular expression +is the Kleene star of + +163 +00:08:34,055 --> 00:08:37,130 +the language of r. +Remember that's + +164 +00:08:37,130 --> 00:08:41,525 +the union of all powers +greater or equal than 0. + +165 +00:08:41,525 --> 00:08:43,640 +It's behind. +I hope you don't get + +166 +00:08:43,640 --> 00:08:45,800 +confused by the use of the stars. + +167 +00:08:45,800 --> 00:08:48,845 +This star is for +the regular expressions. + +168 +00:08:48,845 --> 00:08:52,205 +That star is for languages. + +169 +00:08:52,205 --> 00:08:54,274 +They are two +different operations. + +170 +00:08:54,274 --> 00:08:56,555 +They don't even +have the same type. + +171 +00:08:56,555 --> 00:08:58,954 +Here might be +something interesting. + +172 +00:08:58,954 --> 00:09:00,560 +Remember I asked you to think + +173 +00:09:00,560 --> 00:09:03,035 +about these two corner cases. + +174 +00:09:03,035 --> 00:09:04,730 +I hope you have done so, + +175 +00:09:04,730 --> 00:09:07,070 +but let's also have look +at this together. + +176 +00:09:07,070 --> 00:09:09,785 +According to the definition + +177 +00:09:09,785 --> 00:09:11,930 +for this append of languages, + +178 +00:09:11,930 --> 00:09:13,670 +I have to take every string in + +179 +00:09:13,670 --> 00:09:16,130 +this set and have + +180 +00:09:16,130 --> 00:09:18,695 +two concatenated with +every string in that set. + +181 +00:09:18,695 --> 00:09:22,115 +So this set contains only +the empty string as element. + +182 +00:09:22,115 --> 00:09:24,820 +So if I concatenate anything in + +183 +00:09:24,820 --> 00:09:27,745 +there with the empty string, +that will be left untouched. + +184 +00:09:27,745 --> 00:09:31,855 +So this one will be +actually A. + +185 +00:09:31,855 --> 00:09:34,600 +This one I have to +take every string in + +186 +00:09:34,600 --> 00:09:36,190 +this language and I have to + +187 +00:09:36,190 --> 00:09:39,115 +concatenate with every +string in that language. + +188 +00:09:39,115 --> 00:09:41,110 +But here isn't any string. + +189 +00:09:41,110 --> 00:09:43,300 +So I can't concatenate that. + +190 +00:09:43,300 --> 00:09:46,900 +That will be actually +the empty set (not empty string). + +191 +00:09:46,900 --> 00:09:49,570 +So now let's have + +192 +00:09:49,570 --> 00:09:51,670 +a look at regular expressions. + +193 +00:09:51,670 --> 00:09:53,230 +That was with languages. + +194 +00:09:53,230 --> 00:09:56,035 +But let's have a look at +regular expressions now. + +195 +00:09:56,035 --> 00:09:58,660 +What would be the +meaning, for example, + +196 +00:09:58,660 --> 00:10:01,945 +of r followed by 1? + +197 +00:10:01,945 --> 00:10:04,149 +That is a regular expression. + +198 +00:10:04,149 --> 00:10:06,255 +And it essentially says: + +199 +00:10:06,255 --> 00:10:07,850 +this regular expression matches + +200 +00:10:07,850 --> 00:10:10,685 +some strings and this matches +the empty string. + +201 +00:10:10,685 --> 00:10:13,730 +So if you find out +what the meaning is, + +202 +00:10:13,730 --> 00:10:16,955 +we would apply this +L-function to it. + +203 +00:10:16,955 --> 00:10:19,430 +And if we now look +up the definition, + +204 +00:10:19,430 --> 00:10:27,360 +that would be L of r +appended L of 1. + +205 +00:10:27,850 --> 00:10:32,255 +If you look up what +this is defined, + +206 +00:10:32,255 --> 00:10:41,640 +then this would be L of r +appended with this language. + +207 +00:10:41,950 --> 00:10:46,325 +And if you now look up +our definition earlier, + +208 +00:10:46,325 --> 00:10:50,455 +that will be just L of r. + +209 +00:10:50,455 --> 00:10:52,690 +What that essentially + +210 +00:10:52,690 --> 00:10:55,765 +means is if you have +this regular expression, + +211 +00:10:55,765 --> 00:10:58,885 +this will match +exactly those strings + +212 +00:10:58,885 --> 00:11:01,000 +which this regular +expression can match. + +213 +00:11:01,000 --> 00:11:04,794 +And that pretty much +coincides with our intuition. + +214 +00:11:04,794 --> 00:11:05,950 +This is supposed to match + +215 +00:11:05,950 --> 00:11:08,410 +the empty string at +the end of the string, + +216 +00:11:08,410 --> 00:11:11,005 +so it doesn't really +make any difference. + +217 +00:11:11,005 --> 00:11:13,480 +And the meaning +really tells us that. + +218 +00:11:13,480 --> 00:11:15,880 +But here's the +interesting thing. + +219 +00:11:15,880 --> 00:11:17,979 +When you were in school, + +220 +00:11:17,979 --> 00:11:23,124 +how would you think +about this expression? + +221 +00:11:23,124 --> 00:11:29,515 +Well, r times 1 +equals just our, okay? + +222 +00:11:29,515 --> 00:11:33,634 +Furthermore, if you had r times 0. + +223 +00:11:33,634 --> 00:11:35,045 +What would that be equal? + +224 +00:11:35,045 --> 00:11:37,205 +Well, it would be 0. + +225 +00:11:37,205 --> 00:11:39,650 +Ok, let's have + +226 +00:11:39,650 --> 00:11:42,605 +a look how that works out +with regular expressions. + +227 +00:11:42,605 --> 00:11:46,355 +So if you take r followed by 0, + +228 +00:11:46,355 --> 00:11:48,260 +remember this is +the regular expression + +229 +00:11:48,260 --> 00:11:49,895 +that cannot match anything. + +230 +00:11:49,895 --> 00:11:52,415 +By unfolding the definition, + +231 +00:11:52,415 --> 00:11:59,104 +I would get L of r +appended with L of 0. + +232 +00:11:59,104 --> 00:12:01,775 +This is of course defined as L of r + +233 +00:12:01,775 --> 00:12:05,915 +appended with the empty set. + +234 +00:12:05,915 --> 00:12:10,760 +And this would, according +to the definition earlier, + +235 +00:12:10,760 --> 00:12:13,830 +well just the empty set. + +236 +00:12:14,160 --> 00:12:20,260 +And this would be +of course L of 0. + +237 +00:12:20,260 --> 00:12:24,580 +So it seems like these laws, + +238 +00:12:24,580 --> 00:12:27,175 +at least for the times, + +239 +00:12:27,175 --> 00:12:29,785 +seem to be valid from math, + +240 +00:12:29,785 --> 00:12:31,420 +are also valid with regular expressions, + +241 +00:12:31,420 --> 00:12:33,370 +if we look at their meaning. + +242 +00:12:33,370 --> 00:12:36,670 +These laws on natural numbers + +243 +00:12:36,670 --> 00:12:40,105 +and regular expressions and +their close correspondance + +244 +00:12:40,105 --> 00:12:42,460 +probably explain why I use + +245 +00:12:42,460 --> 00:12:46,975 +a times for the sequence +regular expression. + +246 +00:12:46,975 --> 00:12:51,040 +So r followed by the +regular expression 1, + +247 +00:12:51,040 --> 00:12:54,505 +will have the same meaning +as that regular expression. + +248 +00:12:54,505 --> 00:12:59,120 +And r followed by the +0 regular expression + +249 +00:12:59,120 --> 00:13:01,295 +will have this meaning. + +250 +00:13:01,295 --> 00:13:03,590 +You might now think, but I also + +251 +00:13:03,590 --> 00:13:06,605 +wrote the alternative with a plus. + +252 +00:13:06,605 --> 00:13:10,370 +Does a similar law +holds for plus? + +253 +00:13:10,370 --> 00:13:15,965 +Of course, if I take r +plus 0 on natural numbers, + +254 +00:13:15,965 --> 00:13:20,060 +that would be just r. Does +hold for regular expressions? + +255 +00:13:20,060 --> 00:13:22,085 +Yes, indeed it holds. + +256 +00:13:22,085 --> 00:13:26,735 +If I have this +regular expression and + +257 +00:13:26,735 --> 00:13:33,245 +unfold the definition that +would be L(r) union L(0). + +258 +00:13:33,245 --> 00:13:38,060 +This would be equal +to L(r) union... + +259 +00:13:38,060 --> 00:13:41,150 +What is this? Well, that's +just the empty set. + +260 +00:13:41,150 --> 00:13:43,760 +And if I union something +with the empty set, + +261 +00:13:43,760 --> 00:13:47,180 +then this will be +just of L(r). So yes, + +262 +00:13:47,180 --> 00:13:50,120 +indeed, plus is also very much + +263 +00:13:50,120 --> 00:13:54,184 +inspired by the laws +on natural numbers. + +264 +00:13:54,184 --> 00:13:57,065 +There exists other notations too, + +265 +00:13:57,065 --> 00:14:01,310 +but I prefer this +using the plus for + +266 +00:14:01,310 --> 00:14:03,844 +the alternatives and the times + +267 +00:14:03,844 --> 00:14:05,765 +for the sequence expression. + +268 +00:14:05,765 --> 00:14:07,460 +It's also the reason why I call + +269 +00:14:07,460 --> 00:14:10,325 +this regular expression for +the empty string 1. + +270 +00:14:10,325 --> 00:14:14,915 +And for matching +nothing at all 0. + +271 +00:14:14,915 --> 00:14:18,665 +This correspondence to +natural numbers doesn't quite + +272 +00:14:18,665 --> 00:14:22,055 +extend to the star +regular expression. + +273 +00:14:22,055 --> 00:14:25,055 +But there's still a +connection. There too. + +274 +00:14:25,055 --> 00:14:26,510 +Can you remember how + +275 +00:14:26,510 --> 00:14:29,345 +the power operation on +languages was defined? + +276 +00:14:29,345 --> 00:14:31,370 +The 0 case was defined + +277 +00:14:31,370 --> 00:14:34,520 +as the set containing +the empty string. + +278 +00:14:34,520 --> 00:14:37,039 +Why is that? It looks +a bit arbitrary. + +279 +00:14:37,039 --> 00:14:41,150 +Why is it not the empty set +or why is it not defined as A? + +280 +00:14:41,150 --> 00:14:43,745 +Why is defined in this +particular way? + +281 +00:14:43,745 --> 00:14:46,880 +Well, can you remember how + +282 +00:14:46,880 --> 00:14:48,950 +the power operation r to the + +283 +00:14:48,950 --> 00:14:51,440 +0 is defined on natural numbers? + +284 +00:14:51,440 --> 00:14:53,930 +Yes, that's defined as 1. + +285 +00:14:53,930 --> 00:14:57,125 +Does that also apply to +regular expressions? + +286 +00:14:57,125 --> 00:15:00,725 +Well, if we take the meaning +of a regular expression, + +287 +00:15:00,725 --> 00:15:04,730 +let's say r, and raise +it to the 0th power, + +288 +00:15:04,730 --> 00:15:07,685 +then this will be, of +course, by definition, + +289 +00:15:07,685 --> 00:15:12,245 +be defined as the set +containing the empty string. + +290 +00:15:12,245 --> 00:15:17,160 +And that is of course +equal to L(1). + +291 +00:15:17,830 --> 00:15:20,570 +Again, you can see that + +292 +00:15:20,570 --> 00:15:23,300 +this law on natural numbers also + +293 +00:15:23,300 --> 00:15:29,000 +holds on the laws of regular +expressions - on heir meaning. + +294 +00:15:29,000 --> 00:15:32,810 +What I find really beautiful +here is that of course, + +295 +00:15:32,810 --> 00:15:36,244 +the meaning is defined on +sets, sets of strings. + +296 +00:15:36,244 --> 00:15:38,975 +This of course, on natural numbers. + +297 +00:15:38,975 --> 00:15:41,060 +You can probably +see now where I'm + +298 +00:15:41,060 --> 00:15:43,085 +coming from with my notation. + +299 +00:15:43,085 --> 00:15:46,205 +That is actually a slight +problem in this field, + +300 +00:15:46,205 --> 00:15:48,590 +since it's so old. + +301 +00:15:48,590 --> 00:15:52,205 +Pretty much everybody +introduced their own notation. + +302 +00:15:52,205 --> 00:15:53,870 +And you now have heaps of + +303 +00:15:53,870 --> 00:15:55,550 +different notations +for the same thing. + +304 +00:15:55,550 --> 00:15:57,379 +Okay, that is slightly +exaggerating. + +305 +00:15:57,379 --> 00:16:00,830 +And also the notation +I used for times and + +306 +00:16:00,830 --> 00:16:04,295 +plus and 0 and 1s...definitely +I'm not the only one. + +307 +00:16:04,295 --> 00:16:05,435 +There are many people + +308 +00:16:05,435 --> 00:16:07,625 +who have used this +notation as well. + +309 +00:16:07,625 --> 00:16:10,190 +It's just, it's not universal. + +310 +00:16:10,190 --> 00:16:12,290 +Well, here's a question now. + +311 +00:16:12,290 --> 00:16:15,635 +Why did we go through this +kerfuffle in the first place? + +312 +00:16:15,635 --> 00:16:19,370 +Why did we introduce these +operations on languages? + +313 +00:16:19,370 --> 00:16:21,260 +And why did we introduce + +314 +00:16:21,260 --> 00:16:23,255 +the meaning of a +regular expression? + +315 +00:16:23,255 --> 00:16:26,300 +Well, remember at the beginning, + +316 +00:16:26,300 --> 00:16:28,265 +we wanted to specify + +317 +00:16:28,265 --> 00:16:31,880 +what problem our algorithms +actually supposed to solve. + +318 +00:16:31,880 --> 00:16:35,960 +And we want to make that +very precise so that we can + +319 +00:16:35,960 --> 00:16:38,555 +be sure that our implementation + +320 +00:16:38,555 --> 00:16:40,295 +really solves the problem. + +321 +00:16:40,295 --> 00:16:41,540 +And that's what you can do now. + +322 +00:16:41,540 --> 00:16:45,605 +We can say a regular +expression, r say, + +323 +00:16:45,605 --> 00:16:50,015 +is matching a string +s if and only if + +324 +00:16:50,015 --> 00:16:55,474 +the string s is in the language +of the regular expression. + +325 +00:16:55,474 --> 00:17:00,770 +So that is the problem our +matcher is supposed to solve. + +326 +00:17:00,770 --> 00:17:03,320 +And it's supposed to +solve that a bit faster + +327 +00:17:03,320 --> 00:17:06,860 +than in Python, Ruby and Java. + +328 +00:17:06,860 --> 00:17:09,585 +And unfortunately we cannot use + +329 +00:17:09,585 --> 00:17:12,260 +the definition of L +directly for that. + +330 +00:17:12,260 --> 00:17:15,815 +Because remember, in +the case of the star, + +331 +00:17:15,815 --> 00:17:17,690 +sometimes the meaning of + +332 +00:17:17,690 --> 00:17:18,830 +a regular expression is actually + +333 +00:17:18,830 --> 00:17:20,925 +an infinite set of strings. + +334 +00:17:20,925 --> 00:17:23,575 +And it's a tiny bit difficult + +335 +00:17:23,575 --> 00:17:27,040 +to decide whether a string +is an infinite set. + +336 +00:17:27,040 --> 00:17:30,790 +So we have to do something +more clever in our algorithm. + +337 +00:17:30,790 --> 00:17:33,535 +But that's for next week. +So see you next week. Bye. + +338 +00:17:38,535 --> 00:17:41,680 +Okay, just to troll you. +Here's one more slide. + +339 +00:17:41,680 --> 00:17:45,850 +So when you go over the +handouts and the videos, + +340 +00:17:45,850 --> 00:17:47,875 +think about this example. + +341 +00:17:47,875 --> 00:17:49,840 +Imagine you have a language A, + +342 +00:17:49,840 --> 00:17:51,970 +which contains four strings. + +343 +00:17:51,970 --> 00:17:53,725 +First string is just the character a, + +344 +00:17:53,725 --> 00:17:57,445 +second string is just +the character b, and so on. + +345 +00:17:57,445 --> 00:18:02,335 +How many strings are there +in A to the power four. + +346 +00:18:02,335 --> 00:18:05,210 +Okay, that should be +relatively simple. + +347 +00:18:05,210 --> 00:18:07,310 +If not, just try it out by + +348 +00:18:07,310 --> 00:18:11,165 +implementing it in Scala and see +how many strings there are. + +349 +00:18:11,165 --> 00:18:13,850 +But now the more +interesting question, + +350 +00:18:13,850 --> 00:18:16,670 +imagine the same language, + +351 +00:18:16,670 --> 00:18:19,400 +except that instead +of the character d, + +352 +00:18:19,400 --> 00:18:22,250 +we have here, the empty string. + +353 +00:18:22,250 --> 00:18:26,630 +How many strings are in +A to the power four? + +354 +00:18:26,630 --> 00:18:31,320 +Okay, I'll let you think +about this. Bye now. diff -r b294cfbb5c01 -r e8402d8ec8e6 videos/02-algo1.srt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/videos/02-algo1.srt Tue Sep 29 12:52:07 2020 +0100 @@ -0,0 +1,2617 @@ +1 +00:00:05,880 --> 00:00:09,700 +Welcome back. +Remember this slide. + +2 +00:00:09,700 --> 00:00:11,500 +This slide said, What is + +3 +00:00:11,500 --> 00:00:14,500 +all wreck expression Metro +actually supposed to do? + +4 +00:00:14,500 --> 00:00:16,570 +It will take two arguments and + +5 +00:00:16,570 --> 00:00:18,670 +reg expression R and a string + +6 +00:00:18,670 --> 00:00:21,580 +S. And it's supposed to say yes, + +7 +00:00:21,580 --> 00:00:23,440 +the wreck expression matches + +8 +00:00:23,440 --> 00:00:26,920 +the string if and only +if the string is in + +9 +00:00:26,920 --> 00:00:28,855 +the language of R. + +10 +00:00:28,855 --> 00:00:32,410 +And if the string is not +in the language of our, + +11 +00:00:32,410 --> 00:00:35,515 +then our algorithm has to say no. + +12 +00:00:35,515 --> 00:00:37,210 +And we can't use + +13 +00:00:37,210 --> 00:00:39,565 +this specification +directly because remember, + +14 +00:00:39,565 --> 00:00:43,305 +this l Sometimes +produces infinite sets. + +15 +00:00:43,305 --> 00:00:47,585 +And so we can test whether a +string is an infinite set, + +16 +00:00:47,585 --> 00:00:50,090 +at least not easily. + +17 +00:00:50,090 --> 00:00:52,340 +And so what we have to do + +18 +00:00:52,340 --> 00:00:54,260 +instead is we have +to be a bit more + +19 +00:00:54,260 --> 00:00:57,050 +clever and implement +some operations + +20 +00:00:57,050 --> 00:00:59,284 +on Rekha expressions instead. + +21 +00:00:59,284 --> 00:01:03,275 +Because Weka expressions +are always finite trees. + +22 +00:01:03,275 --> 00:01:05,870 +I should say the +person who has been + +23 +00:01:05,870 --> 00:01:08,150 +clever is called brush-off ski. + +24 +00:01:08,150 --> 00:01:11,435 +It's his, I've written, +I'm introducing here. + +25 +00:01:11,435 --> 00:01:15,515 +And his algorithm consists +of two functions. + +26 +00:01:15,515 --> 00:01:17,840 +One is called +nullable and it takes + +27 +00:01:17,840 --> 00:01:20,104 +a regular expression as argument. + +28 +00:01:20,104 --> 00:01:24,155 +And the idea is that +this function nullable. + +29 +00:01:24,155 --> 00:01:26,450 +Testing whether +the reg expression + +30 +00:01:26,450 --> 00:01:27,950 +can match the empty string. + +31 +00:01:27,950 --> 00:01:30,305 +So 0 cannot match any string. + +32 +00:01:30,305 --> 00:01:33,275 +So it cannot match the +empty string either. + +33 +00:01:33,275 --> 00:01:35,465 +So that's defined as false. + +34 +00:01:35,465 --> 00:01:37,775 +This reg expression, +the whole purpose + +35 +00:01:37,775 --> 00:01:39,680 +is that it can match +the empty string. + +36 +00:01:39,680 --> 00:01:41,645 +So it's defined as true. + +37 +00:01:41,645 --> 00:01:44,599 +If a reg expression +can match a character, + +38 +00:01:44,599 --> 00:01:47,045 +then it cannot match +the empty string. + +39 +00:01:47,045 --> 00:01:49,445 +So that is defined as false. + +40 +00:01:49,445 --> 00:01:53,180 +If an alternative can +match the empty string, + +41 +00:01:53,180 --> 00:01:56,960 +then either or one can +match the empty string. + +42 +00:01:56,960 --> 00:01:59,720 +Or R2 can match the empty string. + +43 +00:01:59,720 --> 00:02:03,110 +So either nullable +of R1 has to hold, + +44 +00:02:03,110 --> 00:02:06,945 +or nullable of R2 has to hold. + +45 +00:02:06,945 --> 00:02:09,925 +In this sequence, it's +the other way around. + +46 +00:02:09,925 --> 00:02:12,790 +If this reg expression can +match the empty string, + +47 +00:02:12,790 --> 00:02:16,615 +then R1 has to be able to +match the empty string, + +48 +00:02:16,615 --> 00:02:20,290 +and R2 has to be able to +match the empty string. + +49 +00:02:20,290 --> 00:02:22,555 +So here it's just the opposite. + +50 +00:02:22,555 --> 00:02:25,660 +In one case it is o and +the other case it's end. + +51 +00:02:25,660 --> 00:02:27,970 +In the store record +expression can match + +52 +00:02:27,970 --> 00:02:30,445 +always the empty string. +So that is true. + +53 +00:02:30,445 --> 00:02:33,340 +So this is a simple +recursive function + +54 +00:02:33,340 --> 00:02:37,179 +and should not be too +difficult to implement. + +55 +00:02:37,179 --> 00:02:42,025 +Okay, this nullable function +that is easy-peasy. + +56 +00:02:42,025 --> 00:02:44,604 +The second function, however, + +57 +00:02:44,604 --> 00:02:49,155 +is a bit more involved and +that's just to be expected. + +58 +00:02:49,155 --> 00:02:53,075 +Remember people working in +this area already for decades. + +59 +00:02:53,075 --> 00:02:57,305 +If they have some problems +with runtime, for example, + +60 +00:02:57,305 --> 00:02:58,940 +we can't expect that as + +61 +00:02:58,940 --> 00:03:03,260 +simple fix will solve all +the problems in the world. + +62 +00:03:03,260 --> 00:03:06,530 +So I admit the second +function is a bit more + +63 +00:03:06,530 --> 00:03:10,085 +complicated and make sure +that you understand it. + +64 +00:03:10,085 --> 00:03:12,140 +And it also just +chose this preserves + +65 +00:03:12,140 --> 00:03:14,345 +the is a very clever guy. + +66 +00:03:14,345 --> 00:03:15,800 +Actually, I have to say he was + +67 +00:03:15,800 --> 00:03:17,720 +a clever guy because +I think he either + +68 +00:03:17,720 --> 00:03:21,650 +died last year or +the year before. + +69 +00:03:21,650 --> 00:03:25,505 +And he came up with this +algorithm already in 1964. + +70 +00:03:25,505 --> 00:03:27,260 +It somehow got lost, + +71 +00:03:27,260 --> 00:03:30,305 +but has been rediscovered +in the last ten years. + +72 +00:03:30,305 --> 00:03:34,685 +So the idea of the second +function is the following. + +73 +00:03:34,685 --> 00:03:38,120 +Imagine you have a +reexpression and it can + +74 +00:03:38,120 --> 00:03:41,930 +match a string of the +form C followed by as. + +75 +00:03:41,930 --> 00:03:44,810 +So the C is the first +character of that string. + +76 +00:03:44,810 --> 00:03:48,605 +So I mentioned that can +match this kind of string. + +77 +00:03:48,605 --> 00:03:50,330 +The question is now, + +78 +00:03:50,330 --> 00:03:52,400 +what would a wrecker +expression look + +79 +00:03:52,400 --> 00:03:54,695 +like that can match chest + +80 +00:03:54,695 --> 00:03:57,140 +s. You probably remember + +81 +00:03:57,140 --> 00:03:59,300 +that from the semantic +that every relative, + +82 +00:03:59,300 --> 00:04:00,860 +there was also +something of chopping + +83 +00:04:00,860 --> 00:04:02,210 +off the first character. + +84 +00:04:02,210 --> 00:04:04,940 +Here it's the same. +If a reg expression + +85 +00:04:04,940 --> 00:04:07,835 +can match a string +starting with a C, + +86 +00:04:07,835 --> 00:04:11,090 +we're looking for a wreck +expression which can match + +87 +00:04:11,090 --> 00:04:15,215 +the rest of the string where +the c has been chopped off. + +88 +00:04:15,215 --> 00:04:17,690 +And this operation will be called + +89 +00:04:17,690 --> 00:04:21,049 +the derivative of a +wreck expression. + +90 +00:04:21,049 --> 00:04:22,205 +And it will also take + +91 +00:04:22,205 --> 00:04:25,460 +a character as argument +and the rec expression. + +92 +00:04:25,460 --> 00:04:28,730 +And in contrast to +the semantic records, + +93 +00:04:28,730 --> 00:04:31,310 +semantic derivative, which works + +94 +00:04:31,310 --> 00:04:34,430 +over languages or +sets of strings. + +95 +00:04:34,430 --> 00:04:39,620 +This derivative works +over regular expressions. + +96 +00:04:39,620 --> 00:04:41,330 +Okay, here's this function. + +97 +00:04:41,330 --> 00:04:43,970 +It's defined recursively over + +98 +00:04:43,970 --> 00:04:46,370 +the structure of rec expressions. + +99 +00:04:46,370 --> 00:04:48,814 +The first argument +is the character, + +100 +00:04:48,814 --> 00:04:52,700 +and the second one is +a wreck expression. + +101 +00:04:52,700 --> 00:04:56,510 +And remember the idea +is we're looking for + +102 +00:04:56,510 --> 00:05:00,680 +a wreck expression that +can match everything. + +103 +00:05:00,680 --> 00:05:03,125 +This reg expression +as argument can match + +104 +00:05:03,125 --> 00:05:07,040 +except for the C. So now let's +look at this first case. + +105 +00:05:07,040 --> 00:05:10,550 +So this reg expression +cannot match any string. + +106 +00:05:10,550 --> 00:05:14,270 +So it certainly cannot +match any string starting + +107 +00:05:14,270 --> 00:05:16,910 +a C. So we have to look + +108 +00:05:16,910 --> 00:05:20,090 +for and reg expression which +can not much anything. + +109 +00:05:20,090 --> 00:05:22,700 +Well, that's the purpose +of this record expression, + +110 +00:05:22,700 --> 00:05:24,815 +so we define it as 0. + +111 +00:05:24,815 --> 00:05:28,130 +In the next case, +this reg expression + +112 +00:05:28,130 --> 00:05:30,440 +can match the empty string, + +113 +00:05:30,440 --> 00:05:33,440 +but it cannot match +any string that starts + +114 +00:05:33,440 --> 00:05:36,350 +with C. So also in this case, + +115 +00:05:36,350 --> 00:05:39,560 +we just define it as +the bracket question, + +116 +00:05:39,560 --> 00:05:41,615 +which cannot match anything. + +117 +00:05:41,615 --> 00:05:45,170 +In the next case, this +C as the argument to + +118 +00:05:45,170 --> 00:05:48,335 +the derivative and this d +is the Rekha expression. + +119 +00:05:48,335 --> 00:05:50,225 +So now there are two cases. + +120 +00:05:50,225 --> 00:05:53,525 +If this C and this D +is actually equal. + +121 +00:05:53,525 --> 00:05:55,970 +That means this record +expression can match + +122 +00:05:55,970 --> 00:05:59,240 +a string which just contains C0. + +123 +00:05:59,240 --> 00:06:01,505 +So if we strip that off, + +124 +00:06:01,505 --> 00:06:04,790 +motor remains is +the empty string. + +125 +00:06:04,790 --> 00:06:06,890 +What is a regular expression + +126 +00:06:06,890 --> 00:06:09,170 +look like that can +match the empty string. + +127 +00:06:09,170 --> 00:06:11,915 +Well, that's the +purpose of the warm. + +128 +00:06:11,915 --> 00:06:15,440 +And if c is not equal to d, + +129 +00:06:15,440 --> 00:06:17,630 +then this reg expression + +130 +00:06:17,630 --> 00:06:19,220 +cannot match anything +which starts with + +131 +00:06:19,220 --> 00:06:23,780 +a C. So again it will +be defined as just 0. + +132 +00:06:23,780 --> 00:06:29,390 +Okay? Now, the alternative case, + +133 +00:06:29,390 --> 00:06:31,970 +if this reg expression can + +134 +00:06:31,970 --> 00:06:36,050 +match the strings +starting with C, + +135 +00:06:36,050 --> 00:06:40,820 +then it can either be +matched by the Ongwen. + +136 +00:06:40,820 --> 00:06:44,495 +Or it can be much by the R2. + +137 +00:06:44,495 --> 00:06:50,090 +If they are one can match C +and then following a string, + +138 +00:06:50,090 --> 00:06:53,975 +then we just have to recursively +call the derivative for + +139 +00:06:53,975 --> 00:06:56,570 +R to get a reg expression + +140 +00:06:56,570 --> 00:06:59,135 +that can match the +rest of the string. + +141 +00:06:59,135 --> 00:07:02,690 +And the same we can do with R2. + +142 +00:07:02,690 --> 00:07:06,110 +You can find a reg expression +which can match everything. + +143 +00:07:06,110 --> 00:07:07,850 +This R2 can match, + +144 +00:07:07,850 --> 00:07:09,710 +starting with C, bad, + +145 +00:07:09,710 --> 00:07:12,590 +which chopping off this C. Okay? + +146 +00:07:12,590 --> 00:07:16,370 +So now if you have to find +the break expression, + +147 +00:07:16,370 --> 00:07:20,030 +which can match all the strings +where C is tripped off. + +148 +00:07:20,030 --> 00:07:22,295 +Then we just have to +take the alternatives + +149 +00:07:22,295 --> 00:07:24,965 +of these two derivatives. + +150 +00:07:24,965 --> 00:07:29,390 +Okay? Now to sequence case, + +151 +00:07:29,390 --> 00:07:33,005 +this sequence case is the +most complicated one. + +152 +00:07:33,005 --> 00:07:35,180 +And let's look first at + +153 +00:07:35,180 --> 00:07:39,335 +the second case where +the Earth's brush. + +154 +00:07:39,335 --> 00:07:42,830 +Okay? So if this +regular expression + +155 +00:07:42,830 --> 00:07:46,145 +can match a string +starting with C, + +156 +00:07:46,145 --> 00:07:48,155 +then the following +must have happened. + +157 +00:07:48,155 --> 00:07:51,905 +First, the R1 must have matched +a string starting with C + +158 +00:07:51,905 --> 00:07:56,420 +and then anything coming +afterwards with r2. + +159 +00:07:56,420 --> 00:07:59,660 +Okay? So in this case I only + +160 +00:07:59,660 --> 00:08:03,260 +have to call this +derivative for this r one. + +161 +00:08:03,260 --> 00:08:06,830 +And find the reg expression +which can match everything + +162 +00:08:06,830 --> 00:08:11,555 +this R one can match except +for this C chopped off. + +163 +00:08:11,555 --> 00:08:15,830 +And I have to build that +sequence derivative of that. + +164 +00:08:15,830 --> 00:08:18,260 +So that's what the As per se. + +165 +00:08:18,260 --> 00:08:21,860 +So I take the +derivative of R1 and I + +166 +00:08:21,860 --> 00:08:23,480 +put the R2 on + +167 +00:08:23,480 --> 00:08:25,865 +the back because that's +the rest of the string. + +168 +00:08:25,865 --> 00:08:29,240 +Ok? So that's the only +case we have to consider + +169 +00:08:29,240 --> 00:08:32,750 +this sequence case +except if not the, + +170 +00:08:32,750 --> 00:08:37,895 +how one can match the +sea and something else. + +171 +00:08:37,895 --> 00:08:42,965 +But if on mismatching +actually the empty string, + +172 +00:08:42,965 --> 00:08:48,890 +this case actually the R two +is in charge of matching + +173 +00:08:48,890 --> 00:08:51,590 +the string starting with C. So in + +174 +00:08:51,590 --> 00:08:55,490 +this case we have to call +the derivative for R2. + +175 +00:08:55,490 --> 00:08:57,875 +So that's why we have +these two cases. + +176 +00:08:57,875 --> 00:09:00,455 +So if R1 is nullable, + +177 +00:09:00,455 --> 00:09:03,245 +then it can match +the empty string. + +178 +00:09:03,245 --> 00:09:05,330 +And we have to +consider the case that + +179 +00:09:05,330 --> 00:09:08,045 +this R2 is matching a string + +180 +00:09:08,045 --> 00:09:10,700 +starting with C. And so we have + +181 +00:09:10,700 --> 00:09:14,210 +to call the derivative +for this R2 in this case. + +182 +00:09:14,210 --> 00:09:18,680 +Otherwise, the R1 will +be in charge of matching + +183 +00:09:18,680 --> 00:09:20,840 +a part of that +string starting with + +184 +00:09:20,840 --> 00:09:24,695 +C. And I have to call the +derivative on this R1. + +185 +00:09:24,695 --> 00:09:30,670 +Okay? I hope this makes sense. + +186 +00:09:30,670 --> 00:09:34,150 +Go over that and +also the handouts. + +187 +00:09:34,150 --> 00:09:37,465 +Again. With that, that's +it's really subtle. + +188 +00:09:37,465 --> 00:09:40,945 +And how do I remember this case? + +189 +00:09:40,945 --> 00:09:42,430 +Well, I know it's + +190 +00:09:42,430 --> 00:09:45,310 +an if condition and the +condition is nullable, + +191 +00:09:45,310 --> 00:09:48,160 +then I will always remember + +192 +00:09:48,160 --> 00:09:53,275 +the else branch where pushes +NG derivative over the R1. + +193 +00:09:53,275 --> 00:09:55,780 +So are usually write +down the S punch + +194 +00:09:55,780 --> 00:09:59,650 +first and then construct +the thin branch by saying, + +195 +00:09:59,650 --> 00:10:01,525 +well, I just repeat this. + +196 +00:10:01,525 --> 00:10:03,760 +And I have to remember +in that case I + +197 +00:10:03,760 --> 00:10:06,580 +have to build also +derivative of R2. + +198 +00:10:06,580 --> 00:10:12,695 +Okay? Finally, the star case. + +199 +00:10:12,695 --> 00:10:15,665 +Ok. So here again +we're looking for + +200 +00:10:15,665 --> 00:10:17,300 +a reg expression which + +201 +00:10:17,300 --> 00:10:19,745 +can match the string +starting with C, + +202 +00:10:19,745 --> 00:10:22,355 +except that the c has +been chopped off. + +203 +00:10:22,355 --> 00:10:28,640 +So if r star has to match +a string starting with C, + +204 +00:10:28,640 --> 00:10:32,735 +then at least we need one +copy of this r. Okay? + +205 +00:10:32,735 --> 00:10:34,310 +So at least one copy of + +206 +00:10:34,310 --> 00:10:37,010 +this r has matched +something which starts with + +207 +00:10:37,010 --> 00:10:38,870 +a C and then afterwards + +208 +00:10:38,870 --> 00:10:41,570 +come 0 more copies +of this OnStar. + +209 +00:10:41,570 --> 00:10:45,530 +Okay? What we do there +is we trying to find + +210 +00:10:45,530 --> 00:10:47,960 +the Rekha expression +which can match + +211 +00:10:47,960 --> 00:10:50,915 +this first part of the string. + +212 +00:10:50,915 --> 00:10:53,255 +However, where the +sea is chopped off. + +213 +00:10:53,255 --> 00:10:55,130 +And then we just say, well, + +214 +00:10:55,130 --> 00:10:57,050 +the rest has to be +matched again with + +215 +00:10:57,050 --> 00:10:59,030 +0 or more copies of + +216 +00:10:59,030 --> 00:11:02,600 +this R. So that's why +it's defined like this. + +217 +00:11:02,600 --> 00:11:09,050 +Okay? S8. Please take care +with this definition. + +218 +00:11:09,050 --> 00:11:11,435 +That's not so simple. + +219 +00:11:11,435 --> 00:11:13,250 +Once you get a hang of it, + +220 +00:11:13,250 --> 00:11:15,170 +however, it makes perfect sense. + +221 +00:11:15,170 --> 00:11:17,210 +So let me explain it in + +222 +00:11:17,210 --> 00:11:20,825 +different ways in +the next slides. + +223 +00:11:20,825 --> 00:11:24,695 +Okay, let's look +first some examples. + +224 +00:11:24,695 --> 00:11:27,140 +So here is a regular expression, + +225 +00:11:27,140 --> 00:11:29,390 +R. And let's have + +226 +00:11:29,390 --> 00:11:32,450 +a look at these three +derivatives according to a, + +227 +00:11:32,450 --> 00:11:38,405 +B, and C. And Vishal do +with d1 for the a. Ok. + +228 +00:11:38,405 --> 00:11:42,379 +So here is our reg expression + +229 +00:11:42,379 --> 00:11:45,334 +and was very generous +with dependent a thesis. + +230 +00:11:45,334 --> 00:11:48,140 +And the outermost is a star. + +231 +00:11:48,140 --> 00:11:52,550 +So if people now the +derivative according to a, + +232 +00:11:52,550 --> 00:11:55,474 +the character a of +that wreck expression. + +233 +00:11:55,474 --> 00:11:57,380 +Okay? So the first thing we + +234 +00:11:57,380 --> 00:11:59,555 +have to analyze is the K star. + +235 +00:11:59,555 --> 00:12:04,370 +Ok? So here's direct expression, +which we are looking at. + +236 +00:12:04,370 --> 00:12:09,170 +This are the outermost +constructor is this star. + +237 +00:12:09,170 --> 00:12:11,510 +If you go back to the definition, + +238 +00:12:11,510 --> 00:12:13,625 +I hope you have it next to you, + +239 +00:12:13,625 --> 00:12:16,340 +then this star case is defined + +240 +00:12:16,340 --> 00:12:20,000 +as u taking just the +inside of the star + +241 +00:12:20,000 --> 00:12:23,030 +and apply this derivative and + +242 +00:12:23,030 --> 00:12:26,765 +leave the are on the +outside at the end. + +243 +00:12:26,765 --> 00:12:29,990 +Ok. So that's the first step. + +244 +00:12:29,990 --> 00:12:32,030 +Now we have to analyze + +245 +00:12:32,030 --> 00:12:36,035 +the derivative according to +a of this record expression, + +246 +00:12:36,035 --> 00:12:38,000 +which is an alternative. + +247 +00:12:38,000 --> 00:12:39,665 +So the outermost is a plus. + +248 +00:12:39,665 --> 00:12:41,375 +Ok, that's very easy again, + +249 +00:12:41,375 --> 00:12:45,185 +we just have to push the +derivative into each component, + +250 +00:12:45,185 --> 00:12:47,705 +into the a, followed by b. + +251 +00:12:47,705 --> 00:12:49,145 +And in this segment, + +252 +00:12:49,145 --> 00:12:51,185 +alternative into b. Ok, + +253 +00:12:51,185 --> 00:12:56,030 +so we take the derivative +of each according to a way. + +254 +00:12:56,030 --> 00:13:00,635 +Now this one is a sequence +break expression. + +255 +00:13:00,635 --> 00:13:02,210 +This most complicated case. + +256 +00:13:02,210 --> 00:13:04,160 +So the first of all +you have to test is, + +257 +00:13:04,160 --> 00:13:07,910 +is the first component +nullable of this sequence? + +258 +00:13:07,910 --> 00:13:09,200 +Well, that is a, + +259 +00:13:09,200 --> 00:13:12,740 +in this case, a on its +own is not nullable. + +260 +00:13:12,740 --> 00:13:14,210 +So vn, the easy case, + +261 +00:13:14,210 --> 00:13:17,000 +we only have a single derivative + +262 +00:13:17,000 --> 00:13:19,370 +pushed in the first component. + +263 +00:13:19,370 --> 00:13:25,160 +So we have the derivative +of a with the character a. + +264 +00:13:25,160 --> 00:13:27,920 +Okay, that's now +the character case. + +265 +00:13:27,920 --> 00:13:29,720 +And in this case the character + +266 +00:13:29,720 --> 00:13:31,715 +in the regular expression agree. + +267 +00:13:31,715 --> 00:13:33,890 +So it's defined as one. + +268 +00:13:33,890 --> 00:13:37,550 +Ok? In the other case, + +269 +00:13:37,550 --> 00:13:39,710 +the break expression is P, + +270 +00:13:39,710 --> 00:13:41,675 +But the characters a. + +271 +00:13:41,675 --> 00:13:46,385 +So in this case +it's defined as 0. + +272 +00:13:46,385 --> 00:13:50,630 +Okay? So that's what the +derivative would be. + +273 +00:13:50,630 --> 00:13:52,160 +This r is there + +274 +00:13:52,160 --> 00:13:55,280 +because originally we +started with a star. + +275 +00:13:55,280 --> 00:13:58,295 +This expression is that +star at expression. + +276 +00:13:58,295 --> 00:14:02,780 +Ok? So the derivative +according to a + +277 +00:14:02,780 --> 00:14:07,610 +up that reg expression +is this expression. + +278 +00:14:07,610 --> 00:14:10,970 +We just have to +substitute this back in. + +279 +00:14:10,970 --> 00:14:13,745 +Just coming back to this slide. + +280 +00:14:13,745 --> 00:14:16,160 +So far, they're only analyze + +281 +00:14:16,160 --> 00:14:19,505 +the derivative according +to a single character. + +282 +00:14:19,505 --> 00:14:23,960 +But we can also very easily +extend that to whole strings. + +283 +00:14:23,960 --> 00:14:26,360 +So if you build the +derivative according + +284 +00:14:26,360 --> 00:14:27,905 +to the empty string, + +285 +00:14:27,905 --> 00:14:30,875 +we just return the +Rekha expression. + +286 +00:14:30,875 --> 00:14:35,585 +If we have a string +starting with character c, + +287 +00:14:35,585 --> 00:14:37,850 +remember that can +be any character. + +288 +00:14:37,850 --> 00:14:42,170 +Then we build first the simple +derivative according to + +289 +00:14:42,170 --> 00:14:44,360 +that first character and + +290 +00:14:44,360 --> 00:14:46,925 +continue with the +rest of the string. + +291 +00:14:46,925 --> 00:14:50,615 +So here you see again, +my personal convention. + +292 +00:14:50,615 --> 00:14:54,365 +Everything which works on +lists has this S at the end. + +293 +00:14:54,365 --> 00:14:57,125 +So this function is +for single characters. + +294 +00:14:57,125 --> 00:14:59,179 +This one is for strings, + +295 +00:14:59,179 --> 00:15:02,450 +but it uses the one +for the character. + +296 +00:15:02,450 --> 00:15:04,025 +Essentially what it does is + +297 +00:15:04,025 --> 00:15:06,185 +it chops off the first character, + +298 +00:15:06,185 --> 00:15:09,800 +builds the derivative, then +chops off the next character, + +299 +00:15:09,800 --> 00:15:13,760 +builds the derivative of +the result, and so on. + +300 +00:15:13,760 --> 00:15:17,000 +Having this function, +we can actually now + +301 +00:15:17,000 --> 00:15:20,600 +state what the algorithm +is, the complete algorithm. + +302 +00:15:20,600 --> 00:15:23,465 +So the pro shops ki mantra + +303 +00:15:23,465 --> 00:15:24,860 +takes a regular expression as + +304 +00:15:24,860 --> 00:15:26,915 +argument and a +string as argument. + +305 +00:15:26,915 --> 00:15:30,920 +And is supposed to say yes if +the reg expression matches + +306 +00:15:30,920 --> 00:15:33,560 +the string or No + +307 +00:15:33,560 --> 00:15:36,065 +if the reg expression does +not match the string. + +308 +00:15:36,065 --> 00:15:37,715 +And how does it do that? + +309 +00:15:37,715 --> 00:15:42,560 +Well, it takes this string +s And this reg expression, + +310 +00:15:42,560 --> 00:15:43,925 +and it first built + +311 +00:15:43,925 --> 00:15:48,845 +successive derivatives until +that string is exhaust that. + +312 +00:15:48,845 --> 00:15:52,115 +Okay? Then you have +a final derivative, + +313 +00:15:52,115 --> 00:15:53,839 +a final reg expression. + +314 +00:15:53,839 --> 00:15:55,370 +And you test whether + +315 +00:15:55,370 --> 00:15:57,920 +this reg expression can +match the empty string. + +316 +00:15:57,920 --> 00:16:01,370 +If yes, then the +original reg expression + +317 +00:16:01,370 --> 00:16:03,245 +is r can match the string. + +318 +00:16:03,245 --> 00:16:05,210 +If no, if it cannot match + +319 +00:16:05,210 --> 00:16:08,030 +the final derivative +with the empty string, + +320 +00:16:08,030 --> 00:16:10,280 +then know this +regular expression, + +321 +00:16:10,280 --> 00:16:12,905 +R cannot match that string. + +322 +00:16:12,905 --> 00:16:16,010 +I know it looks +very anticlimactic, + +323 +00:16:16,010 --> 00:16:19,625 +but that's actually the +beauty of this algorithm, + +324 +00:16:19,625 --> 00:16:22,760 +that it's not that complicated. + +325 +00:16:22,760 --> 00:16:25,340 +So how does the algorithm work? + +326 +00:16:25,340 --> 00:16:27,634 +In a concrete example? + +327 +00:16:27,634 --> 00:16:31,580 +Imagine you have a string, abc + +328 +00:16:31,580 --> 00:16:34,370 +and you have a break +expression, say R1. + +329 +00:16:34,370 --> 00:16:37,520 +And you want to find out +whether this or one can match + +330 +00:16:37,520 --> 00:16:41,300 +that string abc or not. +How do you do that? + +331 +00:16:41,300 --> 00:16:45,140 +Well, you would first +take this R1 and you + +332 +00:16:45,140 --> 00:16:47,150 +build the derivative according + +333 +00:16:47,150 --> 00:16:49,880 +to the first character, D-A. + +334 +00:16:49,880 --> 00:16:53,015 +Okay? You get the derivative, + +335 +00:16:53,015 --> 00:16:55,294 +which I call here R2. + +336 +00:16:55,294 --> 00:16:58,640 +Then you take the next +character, the B. + +337 +00:16:58,640 --> 00:17:04,535 +You now build the derivative +according to b of this R2. + +338 +00:17:04,535 --> 00:17:07,460 +Okay? So you take the +result of the first step, + +339 +00:17:07,460 --> 00:17:09,530 +you feed it into the second step, + +340 +00:17:09,530 --> 00:17:11,810 +and you take the +second character. + +341 +00:17:11,810 --> 00:17:17,075 +Then you do this also with c. +So you get a derivative R3, + +342 +00:17:17,075 --> 00:17:22,460 +and you build the derivative +of R three according to c, + +343 +00:17:22,460 --> 00:17:24,185 +you get an R four. + +344 +00:17:24,185 --> 00:17:26,300 +Okay, so that's the +final derivative. + +345 +00:17:26,300 --> 00:17:27,530 +The string is exhausted. + +346 +00:17:27,530 --> 00:17:29,570 +We build derivatives +according to a, B, + +347 +00:17:29,570 --> 00:17:34,610 +and C. Now we just test whether +this r four is nullable. + +348 +00:17:34,610 --> 00:17:37,175 +If it says yes, + +349 +00:17:37,175 --> 00:17:41,510 +then df break expression metro +will just say true, yes, + +350 +00:17:41,510 --> 00:17:43,340 +this original reg expression, + +351 +00:17:43,340 --> 00:17:47,270 +the R1, will be able to +match that string abc. + +352 +00:17:47,270 --> 00:17:50,585 +And if this test returns false, + +353 +00:17:50,585 --> 00:17:53,015 +then the algorithm says false. + +354 +00:17:53,015 --> 00:17:56,975 +This reg expression will +not match the string. + +355 +00:17:56,975 --> 00:18:00,260 +Ok, you might ask +why on earth does + +356 +00:18:00,260 --> 00:18:02,960 +that algorithm +actually work away? + +357 +00:18:02,960 --> 00:18:06,515 +Here's an explanation +for why it works. + +358 +00:18:06,515 --> 00:18:10,190 +Imagine you have a wreck +expression R1, okay? + +359 +00:18:10,190 --> 00:18:13,220 +And you have a string abc, + +360 +00:18:13,220 --> 00:18:14,270 +and you want to find out + +361 +00:18:14,270 --> 00:18:17,180 +whether one can +match that string. + +362 +00:18:17,180 --> 00:18:18,799 +And for the moment, + +363 +00:18:18,799 --> 00:18:22,610 +let's assume that it +can match that string. + +364 +00:18:22,610 --> 00:18:26,315 +Ok? So the language L of + +365 +00:18:26,315 --> 00:18:30,185 +R will actually +contain that string, + +366 +00:18:30,185 --> 00:18:31,805 +otherwise it wouldn't match that. + +367 +00:18:31,805 --> 00:18:36,710 +Okay? So ABC is in +this language, okay? + +368 +00:18:36,710 --> 00:18:39,785 +If I now take the +semantic derivative, + +369 +00:18:39,785 --> 00:18:43,145 +that means I look at all +the strings in this f, + +370 +00:18:43,145 --> 00:18:46,820 +R1, and further out + +371 +00:18:46,820 --> 00:18:48,740 +all the ones which +do not start with + +372 +00:18:48,740 --> 00:18:51,260 +an a, I discharge them. + +373 +00:18:51,260 --> 00:18:54,545 +And I only look the one +which start with an a. + +374 +00:18:54,545 --> 00:18:56,465 +And of those strings, + +375 +00:18:56,465 --> 00:18:58,475 +I chop off this a. + +376 +00:18:58,475 --> 00:19:01,025 +So after this +romantic derivative, + +377 +00:19:01,025 --> 00:19:05,735 +this set of strings will +contain just B and C. + +378 +00:19:05,735 --> 00:19:12,830 +Ok. Now if I build the next +semantic derivative of that, + +379 +00:19:12,830 --> 00:19:14,345 +then I would look at + +380 +00:19:14,345 --> 00:19:16,850 +all the strings which +start with a P, + +381 +00:19:16,850 --> 00:19:21,350 +and forget about everything +else of the ones. + +382 +00:19:21,350 --> 00:19:27,905 +I know they start with B. +I just chop of the B. Ok. + +383 +00:19:27,905 --> 00:19:31,655 +So in this whole set here, + +384 +00:19:31,655 --> 00:19:33,785 +in this whole set here, + +385 +00:19:33,785 --> 00:19:39,030 +there will be now a string +which is just c. Okay? + +386 +00:19:39,190 --> 00:19:44,420 +Then I built the third +semantic derivative + +387 +00:19:44,420 --> 00:19:47,300 +because I want to find out +whether ABC is involved. + +388 +00:19:47,300 --> 00:19:50,540 +Okay? So now I look +at all the strings in + +389 +00:19:50,540 --> 00:19:52,820 +here and look at + +390 +00:19:52,820 --> 00:19:55,340 +them whether they start +with a C. If yes, + +391 +00:19:55,340 --> 00:19:56,885 +I chop off the sea. + +392 +00:19:56,885 --> 00:19:59,120 +And put in markets remaining. + +393 +00:19:59,120 --> 00:20:00,425 +So in this case, + +394 +00:20:00,425 --> 00:20:02,510 +if I have the string c + +395 +00:20:02,510 --> 00:20:04,550 +in this language and +I chop off this, + +396 +00:20:04,550 --> 00:20:07,700 +see what is remaining +is the empty string. + +397 +00:20:07,700 --> 00:20:09,695 +So we have to check of + +398 +00:20:09,695 --> 00:20:14,510 +that language whether it +contains the empty string. + +399 +00:20:14,510 --> 00:20:18,800 +If yes, then the +original R1 can match + +400 +00:20:18,800 --> 00:20:21,050 +this ABC because this ABC + +401 +00:20:21,050 --> 00:20:24,119 +must have been in this language. + +402 +00:20:24,130 --> 00:20:28,565 +And if in the end there wasn't +the empty string, then, + +403 +00:20:28,565 --> 00:20:33,575 +then this ABC Watson in +this language of one. + +404 +00:20:33,575 --> 00:20:36,260 +And so the electron must have, + +405 +00:20:36,260 --> 00:20:38,880 +or the metro must have failed. + +406 +00:20:39,040 --> 00:20:42,530 +The clever bit is that here + +407 +00:20:42,530 --> 00:20:45,530 +the explanation is for languages. + +408 +00:20:45,530 --> 00:20:49,835 +Remember, this +semantic derivative + +409 +00:20:49,835 --> 00:20:53,450 +works over languages and they +sometimes can be in finite. + +410 +00:20:53,450 --> 00:20:55,730 +So that's not really +an algorithm. + +411 +00:20:55,730 --> 00:20:58,880 +Yeah, that's just +explaining the idea with + +412 +00:20:58,880 --> 00:21:02,525 +preserves key +achieved was that he + +413 +00:21:02,525 --> 00:21:06,440 +now works with this derivative +America expressions and + +414 +00:21:06,440 --> 00:21:10,715 +somehow imitates what +happens on these languages. + +415 +00:21:10,715 --> 00:21:14,135 +Because remember if you +have an wreck expression, + +416 +00:21:14,135 --> 00:21:17,405 +are you want to test +whether can match APC, + +417 +00:21:17,405 --> 00:21:22,550 +then you take first +derivative according to a. + +418 +00:21:22,550 --> 00:21:25,760 +So you will get a wreck +expression which can match b + +419 +00:21:25,760 --> 00:21:29,464 +and c If R could match abc. + +420 +00:21:29,464 --> 00:21:31,430 +So after the first derivative, + +421 +00:21:31,430 --> 00:21:33,620 +you will get a wreck expression +which can match B and + +422 +00:21:33,620 --> 00:21:37,070 +C. If you take the +second derivative, + +423 +00:21:37,070 --> 00:21:41,225 +you will get a reexpression +which can match c alone. + +424 +00:21:41,225 --> 00:21:44,180 +And if you take the +final derivative, + +425 +00:21:44,180 --> 00:21:46,070 +then you will get + +426 +00:21:46,070 --> 00:21:48,260 +rec expression which hopefully + +427 +00:21:48,260 --> 00:21:49,715 +can match the empty string. + +428 +00:21:49,715 --> 00:21:53,780 +If it does, then this +R can match the ABC. + +429 +00:21:53,780 --> 00:21:55,655 +And if it doesn't, then + +430 +00:21:55,655 --> 00:21:58,680 +ABC couldn't be +matched by this on. + +431 +00:21:58,900 --> 00:22:02,990 +Okay, let's have a look +how this pans out in code. + +432 +00:22:02,990 --> 00:22:06,050 +Here's defile RE1. + +433 +00:22:06,050 --> 00:22:07,940 +It's also uploaded on Keith, + +434 +00:22:07,940 --> 00:22:10,625 +so you can see exactly +what I'm doing. + +435 +00:22:10,625 --> 00:22:13,970 +And actually I already saw +that file because I showed you + +436 +00:22:13,970 --> 00:22:15,710 +how my wreck expressions are + +437 +00:22:15,710 --> 00:22:17,960 +defined with the +abstract classes here. + +438 +00:22:17,960 --> 00:22:21,155 +And here, the six cases +for 0-1 character, + +439 +00:22:21,155 --> 00:22:23,540 +I turn a TIF in +sequence and star. + +440 +00:22:23,540 --> 00:22:26,705 +Ok. So the first +function nullable, + +441 +00:22:26,705 --> 00:22:28,760 +the simple one, takes + +442 +00:22:28,760 --> 00:22:32,120 +a regular expression as +argument and returns a boolean. + +443 +00:22:32,120 --> 00:22:34,280 +And then with this +pattern matching, + +444 +00:22:34,280 --> 00:22:37,040 +we just go through +all these six cases + +445 +00:22:37,040 --> 00:22:38,900 +are serious defined as false. + +446 +00:22:38,900 --> 00:22:43,234 +One is defined as true +character for any character, + +447 +00:22:43,234 --> 00:22:45,455 +this null return false. + +448 +00:22:45,455 --> 00:22:47,540 +The alternative is to find here, + +449 +00:22:47,540 --> 00:22:50,000 +so that's the or in Scala. + +450 +00:22:50,000 --> 00:22:52,700 +And for the sequence, +that's the end. + +451 +00:22:52,700 --> 00:22:56,180 +And this star, no matter +what the reg expression is, + +452 +00:22:56,180 --> 00:22:59,540 +it will always match the +empty string, so true. + +453 +00:22:59,540 --> 00:23:02,225 +So nanobots, very easy. + +454 +00:23:02,225 --> 00:23:07,430 +The derivative is also not +so much more complicated. + +455 +00:23:07,430 --> 00:23:08,974 +It takes two arguments, + +456 +00:23:08,974 --> 00:23:11,810 +a character and the +rec expression, + +457 +00:23:11,810 --> 00:23:14,405 +and returns a wreck expression. + +458 +00:23:14,405 --> 00:23:16,340 +So now we just have to analyze + +459 +00:23:16,340 --> 00:23:18,890 +what's that reg +expression looks like. + +460 +00:23:18,890 --> 00:23:23,870 +If it's 0, we return +01, we return 0. + +461 +00:23:23,870 --> 00:23:26,990 +If the character is the +wreck expressions character + +462 +00:23:26,990 --> 00:23:30,260 +d. We test whether it's +equal to this character. + +463 +00:23:30,260 --> 00:23:32,225 +We want to take the +derivative off. + +464 +00:23:32,225 --> 00:23:36,185 +If yes, return one, otherwise 0. + +465 +00:23:36,185 --> 00:23:38,600 +The alternative which has pushed + +466 +00:23:38,600 --> 00:23:39,860 +the derivative under + +467 +00:23:39,860 --> 00:23:42,710 +this alternative, +that's the easy one. + +468 +00:23:42,710 --> 00:23:44,690 +Here's the sequence case where we + +469 +00:23:44,690 --> 00:23:47,015 +first have to test +if it's nullable, + +470 +00:23:47,015 --> 00:23:49,040 +If it's not the have to push + +471 +00:23:49,040 --> 00:23:52,160 +the derivative over +the first DR1. + +472 +00:23:52,160 --> 00:23:56,135 +And otherwise if it is null +above we have two cases. + +473 +00:23:56,135 --> 00:24:00,605 +Either you have to push +it over the R1 or R2. + +474 +00:24:00,605 --> 00:24:03,860 +And a star case, this one. + +475 +00:24:03,860 --> 00:24:07,160 +We just build the sequence +of the derivative of + +476 +00:24:07,160 --> 00:24:11,600 +R1 and append the +or as a sequence, + +477 +00:24:11,600 --> 00:24:14,195 +put the star at the end. + +478 +00:24:14,195 --> 00:24:19,430 +Okay, so here's this +function for strings. + +479 +00:24:19,430 --> 00:24:21,740 +So a list of characters. + +480 +00:24:21,740 --> 00:24:23,870 +This function takes N, + +481 +00:24:23,870 --> 00:24:26,480 +S list of characters argument +and a reg expression + +482 +00:24:26,480 --> 00:24:29,360 +as argument and returns +a wreck expression. + +483 +00:24:29,360 --> 00:24:31,460 +And we just have to +pattern match what + +484 +00:24:31,460 --> 00:24:34,055 +that string looks like +or this list looks like. + +485 +00:24:34,055 --> 00:24:35,360 +If it's the empty list, + +486 +00:24:35,360 --> 00:24:38,030 +it just immediately +return the R. If + +487 +00:24:38,030 --> 00:24:42,020 +it's something of the +form C followed by S, + +488 +00:24:42,020 --> 00:24:45,050 +then we build first the +derivative according + +489 +00:24:45,050 --> 00:24:48,345 +to a C. And then +the result of that, + +490 +00:24:48,345 --> 00:24:50,090 +people recursively call + +491 +00:24:50,090 --> 00:24:53,675 +the derivative +according to s. Okay? + +492 +00:24:53,675 --> 00:24:56,060 +And now the main mantra, + +493 +00:24:56,060 --> 00:24:59,360 +it takes a reg +expression and takes + +494 +00:24:59,360 --> 00:25:02,870 +a string and returns a +boolean, true or false. + +495 +00:25:02,870 --> 00:25:05,705 +And it first builds +the derivative. + +496 +00:25:05,705 --> 00:25:07,430 +The only thing I have to do here + +497 +00:25:07,430 --> 00:25:09,410 +because I'm implementing +it and scholar, + +498 +00:25:09,410 --> 00:25:15,064 +I have to coerce between strings +and lists of characters. + +499 +00:25:15,064 --> 00:25:16,580 +So he, I take first + +500 +00:25:16,580 --> 00:25:20,810 +the two list of the S that +gives me a list of characters. + +501 +00:25:20,810 --> 00:25:23,135 +Then I build this derivative + +502 +00:25:23,135 --> 00:25:26,045 +is ds because I'm +looking at strings. + +503 +00:25:26,045 --> 00:25:28,160 +And then at the end, + +504 +00:25:28,160 --> 00:25:33,050 +built-in nullable of +the final derivative. + +505 +00:25:33,050 --> 00:25:37,310 +The beauty of all this +is that in Scala, + +506 +00:25:37,310 --> 00:25:40,085 +these definition which +I had on the slides + +507 +00:25:40,085 --> 00:25:43,835 +go almost one-to-one into code. + +508 +00:25:43,835 --> 00:25:45,605 +And that's of course, + +509 +00:25:45,605 --> 00:25:47,480 +makes it very easy +to implement in + +510 +00:25:47,480 --> 00:25:49,730 +a functional language like Scala. + +511 +00:25:49,730 --> 00:25:52,865 +We can also now try +out some examples. + +512 +00:25:52,865 --> 00:25:56,960 +This was the example +I had on the slide. + +513 +00:25:56,960 --> 00:25:58,370 +And we could now calculate + +514 +00:25:58,370 --> 00:26:00,305 +what's the derivative +according to a, + +515 +00:26:00,305 --> 00:26:02,720 +B, and C of this +record expression. + +516 +00:26:02,720 --> 00:26:07,040 +And I hope you didn't +make any mistake. + +517 +00:26:07,040 --> 00:26:09,260 +Now of course we want +to see whether B + +518 +00:26:09,260 --> 00:26:11,390 +do any better with +this algorithm. + +519 +00:26:11,390 --> 00:26:13,715 +Then Python and Ruby. + +520 +00:26:13,715 --> 00:26:16,070 +And let's first have a +look at this example. + +521 +00:26:16,070 --> 00:26:18,079 +So this reg expression. + +522 +00:26:18,079 --> 00:26:19,880 +The problem is that in + +523 +00:26:19,880 --> 00:26:22,070 +our reg expression +metro so far we have + +524 +00:26:22,070 --> 00:26:24,530 +the sequence right +expression where we + +525 +00:26:24,530 --> 00:26:27,200 +don't have this optional +wreck expression. + +526 +00:26:27,200 --> 00:26:30,800 +And we don't have this n +times Rekha expression, + +527 +00:26:30,800 --> 00:26:36,605 +but we can quickly implement +that in our implementation. + +528 +00:26:36,605 --> 00:26:40,549 +So if you want to build the +optional reg expression, + +529 +00:26:40,549 --> 00:26:41,870 +which is defined here, + +530 +00:26:41,870 --> 00:26:44,570 +a little function which +takes a reg expression and + +531 +00:26:44,570 --> 00:26:47,360 +returns the alternative of R one. + +532 +00:26:47,360 --> 00:26:49,624 +So it essentially +takes the definition + +533 +00:26:49,624 --> 00:26:53,240 +of optional R and +replaces it by that. + +534 +00:26:53,240 --> 00:26:56,150 +The end times what we +essentially do it, + +535 +00:26:56,150 --> 00:27:01,535 +There's beaches built n +copies of this r. Ok? + +536 +00:27:01,535 --> 00:27:04,745 +So if this n times was 0, + +537 +00:27:04,745 --> 00:27:06,245 +we just return one. + +538 +00:27:06,245 --> 00:27:11,570 +If it's one, then we return +just the reg expression. + +539 +00:27:11,570 --> 00:27:15,575 +And if it's a, something +bigger than one, + +540 +00:27:15,575 --> 00:27:18,560 +then we just built the +sequence of one of + +541 +00:27:18,560 --> 00:27:20,270 +these copies and call + +542 +00:27:20,270 --> 00:27:22,925 +this function again +with n minus one. + +543 +00:27:22,925 --> 00:27:26,330 +So we just now n copies of that. + +544 +00:27:26,330 --> 00:27:30,710 +Okay? Okay, so we can look +at our first example. + +545 +00:27:30,710 --> 00:27:32,629 +This is the work expression, + +546 +00:27:32,629 --> 00:27:36,035 +and I call that here +even rec expression1. + +547 +00:27:36,035 --> 00:27:37,670 +It doesn't look as beautiful + +548 +00:27:37,670 --> 00:27:39,140 +as what we write down on paper. + +549 +00:27:39,140 --> 00:27:41,240 +We will actually look +at this later on + +550 +00:27:41,240 --> 00:27:43,640 +if this can be improved. +But here it is. + +551 +00:27:43,640 --> 00:27:45,724 +Here's the reg expression, + +552 +00:27:45,724 --> 00:27:49,520 +and here's a little function +which measures the time. + +553 +00:27:49,520 --> 00:27:53,180 +And we can now start testing + +554 +00:27:53,180 --> 00:27:57,845 +this reg expression with +strings of just containing A's. + +555 +00:27:57,845 --> 00:28:00,020 +And we first a bit cautious, + +556 +00:28:00,020 --> 00:28:04,985 +be tested between 020 +and be count by two. + +557 +00:28:04,985 --> 00:28:08,330 +And I actually will not +start that with Scala, + +558 +00:28:08,330 --> 00:28:12,965 +but actually the orbital online. + +559 +00:28:12,965 --> 00:28:15,305 +And that's out. + +560 +00:28:15,305 --> 00:28:17,269 +And that correlates. + +561 +00:28:17,269 --> 00:28:20,675 +So for 18 days it's pretty fast. + +562 +00:28:20,675 --> 00:28:24,815 +But for 20 it already +needs to seconds. + +563 +00:28:24,815 --> 00:28:28,265 +And you can see +actually this jump from + +564 +00:28:28,265 --> 00:28:32,825 +18 to 20 in the runtime +is prepared to. + +565 +00:28:32,825 --> 00:28:37,460 +And if we actually measure +it more accurately, + +566 +00:28:37,460 --> 00:28:39,770 +then we get this picture. + +567 +00:28:39,770 --> 00:28:41,255 +And what turns out, + +568 +00:28:41,255 --> 00:28:45,830 +we actually inverse as Python +and Ruby in this example. + +569 +00:28:45,830 --> 00:28:49,230 +So I guess that's a fail. diff -r b294cfbb5c01 -r e8402d8ec8e6 videos/02-prelims.srt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/videos/02-prelims.srt Tue Sep 29 12:52:07 2020 +0100 @@ -0,0 +1,1217 @@ +1 +00:00:09,990 --> 00:00:13,465 +Welcome back to this +week's lecture. + +2 +00:00:13,465 --> 00:00:16,450 +The task this week is to actually + +3 +00:00:16,450 --> 00:00:20,320 +implement a regular +expression matcher. + +4 +00:00:20,320 --> 00:00:22,510 +And we want to be a bit + +5 +00:00:22,510 --> 00:00:25,315 +faster than the regular +expression matcher + +6 +00:00:25,315 --> 00:00:29,380 +in Python, Ruby, +Javascript, and Java. + +7 +00:00:29,380 --> 00:00:31,960 +Remember this regular expression + +8 +00:00:31,960 --> 00:00:35,785 +and strings which are +just a number of a's, + +9 +00:00:35,785 --> 00:00:39,670 +these regular expression matchers +where abysmally slow. + +10 +00:00:39,670 --> 00:00:43,170 +They could only match +approximately 30 a's in + +11 +00:00:43,170 --> 00:00:45,665 +something like half a minute. + +12 +00:00:45,665 --> 00:00:49,460 +What we like to do with +our regular expression matcher. + +13 +00:00:49,460 --> 00:00:51,890 +We will try a few times, + +14 +00:00:51,890 --> 00:00:55,505 +but our second trial will already +be much, much better. + +15 +00:00:55,505 --> 00:00:58,400 +It will probably +match around maybe + +16 +00:00:58,400 --> 00:01:02,030 +thousand a's in something +in half a minute. + +17 +00:01:02,030 --> 00:01:03,920 +But our third version in + +18 +00:01:03,920 --> 00:01:06,230 +comparison will be +blindingly fast. + +19 +00:01:06,230 --> 00:01:08,180 +And we'll be able to match + +20 +00:01:08,180 --> 00:01:10,145 +strings of length 10,000 a's + +21 +00:01:10,145 --> 00:01:12,230 +and will not require + +22 +00:01:12,230 --> 00:01:14,975 +more than five seconds. +So let's go ahead with that. + +23 +00:01:14,975 --> 00:01:18,095 +We will first implement + +24 +00:01:18,095 --> 00:01:19,880 +our regular expression +matcher only + +25 +00:01:19,880 --> 00:01:22,069 +for the basic +regular expressions. + +26 +00:01:22,069 --> 00:01:25,430 +So we will have only six +cases to think about. + +27 +00:01:25,430 --> 00:01:27,620 +This will simplify matters at first. + +28 +00:01:27,620 --> 00:01:30,350 +Later we can +think about how to + +29 +00:01:30,350 --> 00:01:34,100 +extend that to the extended +regular expressions. + +30 +00:01:34,100 --> 00:01:37,625 +Unfortunately, before we can +start our implementation, + +31 +00:01:37,625 --> 00:01:39,290 +we have to discuss + +32 +00:01:39,290 --> 00:01:42,470 +when two regular +expressions are equivalent. + +33 +00:01:42,470 --> 00:01:46,595 +And we use here this notation +of the triple equals. + +34 +00:01:46,595 --> 00:01:48,215 +We say a regular expression + +35 +00:01:48,215 --> 00:01:50,180 +r1 and r2 are + +36 +00:01:50,180 --> 00:01:56,059 +equivalent if and only +if the language of r1 is + +37 +00:01:56,059 --> 00:01:59,360 +equal to the language of r2. + +38 +00:01:59,360 --> 00:02:02,915 +Sounds complicated, +but essentially means + +39 +00:02:02,915 --> 00:02:04,700 +if the two regular expressions can + +40 +00:02:04,700 --> 00:02:06,950 +match exactly the same strings, + +41 +00:02:06,950 --> 00:02:09,800 +then we want to regard +them as equivalent. + +42 +00:02:09,800 --> 00:02:14,300 +This equivalence justifies +why we can be sloppy + +43 +00:02:14,300 --> 00:02:16,040 +with parentheses. + +44 +00:02:16,040 --> 00:02:23,870 +For example, if we have +(r1 + r2) + r3, + +45 +00:02:23,870 --> 00:02:32,270 +then this will be equivalent +to r1 + (r2 + r3). + +46 +00:02:32,270 --> 00:02:34,910 +Remember, regular +expressions are trees, + +47 +00:02:34,910 --> 00:02:37,940 +so these are two different +regular expressions, + +48 +00:02:37,940 --> 00:02:41,540 +but they can match +the same strings. + +49 +00:02:41,540 --> 00:02:45,695 +Because, well, if we take +here the meaning of that, + +50 +00:02:45,695 --> 00:02:54,350 +that would be just L(r1 + r2 + r3), + + +51 +00:02:54,350 --> 00:03:04,100 +which is equal to L(r1 + r2) u L(r3). + + +52 +00:03:04,100 --> 00:03:10,595 +And that is equal to of + +53 +00:03:10,595 --> 00:03:15,710 +L(r1) u L(r2) u L(r3). + + +54 +00:03:15,710 --> 00:03:17,930 +And if you do the +same calculation + +55 +00:03:17,930 --> 00:03:19,445 +for that regular expression, + +56 +00:03:19,445 --> 00:03:22,940 +you will find out the +two sets are the same. + +57 +00:03:22,940 --> 00:03:26,195 +So both regular expressions +can match the same strings. + +58 +00:03:26,195 --> 00:03:28,805 +So even if they're different +regular expressions, + +59 +00:03:28,805 --> 00:03:31,220 +we can regard them as eqivalent. + +60 +00:03:31,220 --> 00:03:33,290 +And so that's why we can forget + +61 +00:03:33,290 --> 00:03:35,270 +about writing these parentheses. + +62 +00:03:35,270 --> 00:03:40,205 +Let's have a look +at some more examples. + +63 +00:03:40,205 --> 00:03:41,870 +So the first one here, + +64 +00:03:41,870 --> 00:03:43,205 +that is now clear. + +65 +00:03:43,205 --> 00:03:45,200 +We did this calculation already + +66 +00:03:45,200 --> 00:03:47,480 +for arbitrary regular expressions. + +67 +00:03:47,480 --> 00:03:49,520 +Here it is for +concrete ones a, b and c. + +68 +00:03:49,520 --> 00:03:52,690 +Here on the left-hand side, + +69 +00:03:52,690 --> 00:03:54,895 +the regular expression +has the same meaning + +70 +00:03:54,895 --> 00:03:56,620 +as on the right-hand side. + +71 +00:03:56,620 --> 00:03:58,420 +So they are equivalent. + +72 +00:03:58,420 --> 00:04:01,375 +We can ignore the parentheses. + +73 +00:04:01,375 --> 00:04:03,220 +If I have a choice, + +74 +00:04:03,220 --> 00:04:06,610 +but the choice is +the same a or a, + +75 +00:04:06,610 --> 00:04:09,265 +then this is +equivalent to just a. + +76 +00:04:09,265 --> 00:04:13,075 +So the same choice doesn't +introduce anything more. + +77 +00:04:13,075 --> 00:04:15,370 +In the next case, if I have + +78 +00:04:15,370 --> 00:04:19,450 +a regular expression +which can match a or b, + +79 +00:04:19,450 --> 00:04:23,860 +that can match the same +strings as b or a. + +80 +00:04:23,860 --> 00:04:27,325 +So you have a kind of +commutativity for the plus, + +81 +00:04:27,325 --> 00:04:29,485 +which is like on natural numbers. + +82 +00:04:29,485 --> 00:04:32,880 +But you would not have a +communitivity for the sequence: + +83 +00:04:32,880 --> 00:04:37,070 +if you have +first a and then b, + +84 +00:04:37,070 --> 00:04:40,850 +that's not the same as +matching first b and then a. + +85 +00:04:40,850 --> 00:04:42,800 +So for the sequence ... + +86 +00:04:42,800 --> 00:04:44,615 +they are not equivalent. + +87 +00:04:44,615 --> 00:04:49,475 +However, for the sequence I +can, like for the plus, ignore + +88 +00:04:49,475 --> 00:04:51,245 +prarentheses. + +89 +00:04:51,245 --> 00:04:55,070 +If I have the parentheses +and this way, + +90 +00:04:55,070 --> 00:04:57,785 +or I have it in this way. + +91 +00:04:57,785 --> 00:05:00,605 +These are two different +regular expressions, + +92 +00:05:00,605 --> 00:05:02,120 +but they have the same meaning. + +93 +00:05:02,120 --> 00:05:03,815 +So they are equivalent. + +94 +00:05:03,815 --> 00:05:05,780 +And so I can leave out parentheses + +95 +00:05:05,780 --> 00:05:09,170 +for times as well. + +96 +00:05:09,170 --> 00:05:12,185 +The next one is a slightly +more interesting one. + +97 +00:05:12,185 --> 00:05:15,020 +If I have a regular +expression which can match + +98 +00:05:15,020 --> 00:05:19,655 +c first followed by (a + b), + +99 +00:05:19,655 --> 00:05:25,355 +then this is the same as +first c followed by a, + +100 +00:05:25,355 --> 00:05:28,640 +or c followed by b. + +101 +00:05:28,640 --> 00:05:31,265 +So that's the kind of +distributivity law + +102 +00:05:31,265 --> 00:05:33,545 +on regular expressions. + +103 +00:05:33,545 --> 00:05:36,350 +But they're also regular expressions +which are not equivalent. + +104 +00:05:36,350 --> 00:05:38,990 +If I have the regular +expression which can + +105 +00:05:38,990 --> 00:05:42,800 +match the string containing +exactly two a's. + +106 +00:05:42,800 --> 00:05:44,240 +That is not equivalent + +107 +00:05:44,240 --> 00:05:46,730 +to the regular +expression which can just match + +108 +00:05:46,730 --> 00:05:49,250 +a single a. And similarly + +109 +00:05:49,250 --> 00:05:51,680 +in this case, if I can match + +110 +00:05:51,680 --> 00:05:56,075 +a or the string b followed by c, + +111 +00:05:56,075 --> 00:05:59,825 +that is not the same as a or b, + +112 +00:05:59,825 --> 00:06:02,000 +followed by a or c. + +113 +00:06:02,000 --> 00:06:05,990 +I'll let you think about +why is that not equivalent. + +114 +00:06:05,990 --> 00:06:08,060 +Essentially you +have to find out what's + +115 +00:06:08,060 --> 00:06:10,475 +the meaning of both +regular expressions. + +116 +00:06:10,475 --> 00:06:14,090 +And are they the +same sets or not? + +117 +00:06:14,090 --> 00:06:17,435 +There are again some +interesting corner cases. + +118 +00:06:17,435 --> 00:06:20,540 +If I have a regular expression +that can match a, + +119 +00:06:20,540 --> 00:06:23,450 +but followed by the regular +expression which cannot match + +120 +00:06:23,450 --> 00:06:25,670 +anything, that is not equivalent + +121 +00:06:25,670 --> 00:06:29,255 +to the regular +expression which can match a. + +122 +00:06:29,255 --> 00:06:31,340 +Again here you have +to do the calculation + +123 +00:06:31,340 --> 00:06:32,915 +to convince you of that. + +124 +00:06:32,915 --> 00:06:35,840 +Similarly, if I have a regular +expression which can + +125 +00:06:35,840 --> 00:06:38,750 +match a or the empty string, + +126 +00:06:38,750 --> 00:06:40,640 +this is not equivalent to + +127 +00:06:40,640 --> 00:06:43,355 +the regular expression +which can just match a. + +128 +00:06:43,355 --> 00:06:46,925 +Here are some interesting +ones with zeros and ones. + +129 +00:06:46,925 --> 00:06:48,860 +So if I have the regular expression + +130 +00:06:48,860 --> 00:06:50,975 +that can match the empty string, + +131 +00:06:50,975 --> 00:06:53,540 +this will be actually equivalent to + +132 +00:06:53,540 --> 00:06:56,435 +the regular expression +which can match nothing, + +133 +00:06:56,435 --> 00:06:59,405 +but star of that. + +134 +00:06:59,405 --> 00:07:01,970 +Remember the star +always introduces, + +135 +00:07:01,970 --> 00:07:04,370 +no matter what, the empty string. + +136 +00:07:04,370 --> 00:07:06,170 +So this regular expression can match + +137 +00:07:06,170 --> 00:07:08,930 +the empty string, +just like the 1. + +138 +00:07:08,930 --> 00:07:12,125 +So these two expressions +are not the same, + +139 +00:07:12,125 --> 00:07:14,210 +but they are equivalent. + +140 +00:07:14,210 --> 00:07:16,700 +And it doesn't matter +whether you take + +141 +00:07:16,700 --> 00:07:20,090 +the empty string to the star, + +142 +00:07:20,090 --> 00:07:23,855 +because if I can match 0 or +more times the empty string, + +143 +00:07:23,855 --> 00:07:27,450 +that will be equivalent to +just the empty string. + +144 +00:07:27,520 --> 00:07:32,510 +Slightly similar to the +third case is this one. + +145 +00:07:32,510 --> 00:07:35,720 +Zero star is not equivalent to 0 + +146 +00:07:35,720 --> 00:07:40,025 +because that can match +exactly the empty string. + +147 +00:07:40,025 --> 00:07:42,740 +This cannot match anything. + +148 +00:07:42,740 --> 00:07:44,839 +While the previous + +149 +00:07:44,839 --> 00:07:47,540 +equivalences are all very +instructive to look at, + +150 +00:07:47,540 --> 00:07:49,910 +these are the +equivalences we need + +151 +00:07:49,910 --> 00:07:52,685 +in our regular expression matchers. + +152 +00:07:52,685 --> 00:07:54,350 +And they are defined for + +153 +00:07:54,350 --> 00:07:58,160 +all regular expressions r. +So r plus 0, + +154 +00:07:58,160 --> 00:08:00,470 +no matter what r looks +like is equivalent + +155 +00:08:00,470 --> 00:08:02,945 +to just r. Similarly + +156 +00:08:02,945 --> 00:08:05,930 +0 plus r is also +equivalent to r. + +157 +00:08:05,930 --> 00:08:08,525 +The order of this +choice doesn't matter; + +158 +00:08:08,525 --> 00:08:11,495 +r followed by 1, + +159 +00:08:11,495 --> 00:08:14,030 +that's the sequence +regular expression, is + +160 +00:08:14,030 --> 00:08:16,760 +equivalent to r. The +sam, r followed by + +161 +00:08:16,760 --> 00:08:19,010 +r is equivalent to r. + +162 +00:08:19,010 --> 00:08:20,990 +And r followed by + +163 +00:08:20,990 --> 00:08:23,390 +the regular expression which +cannot match anything, + +164 +00:08:23,390 --> 00:08:27,455 +is equivalent to just 0. + +165 +00:08:27,455 --> 00:08:30,740 +And 0 followed by r is also equivalent to 0. + +166 +00:08:30,740 --> 00:08:33,615 +And if you have the +choice of r plus r, + +167 +00:08:33,615 --> 00:08:37,210 +that is also +equivalent to just r. + +168 +00:08:37,210 --> 00:08:40,270 +What we're going to do +with these equivalences is + +169 +00:08:40,270 --> 00:08:42,820 +that we regard them as +simplification rules. + +170 +00:08:42,820 --> 00:08:43,930 +So whenever we see + +171 +00:08:43,930 --> 00:08:46,465 +this regular expression +in our algorithm, + +172 +00:08:46,465 --> 00:08:48,580 +we will replace it by +the right-hand side. + +173 +00:08:48,580 --> 00:08:51,715 +So we will orient +these equivalences. + +174 +00:08:51,715 --> 00:08:53,650 +If we see, this regular expression, + +175 +00:08:53,650 --> 00:08:55,225 +we will replace it by that one. + +176 +00:08:55,225 --> 00:08:58,945 +And it will not matter +because the left-hand sides + +177 +00:08:58,945 --> 00:09:01,255 +can match exactly +the same strings + +178 +00:09:01,255 --> 00:09:03,475 +as the right-hand sides. + +179 +00:09:03,475 --> 00:09:06,370 +Here two quick examples. + +180 +00:09:06,370 --> 00:09:08,680 +The first one, let's +assume you have + +181 +00:09:08,680 --> 00:09:10,270 +a regular expression r and + +182 +00:09:10,270 --> 00:09:11,905 +there is something +in front of it. + +183 +00:09:11,905 --> 00:09:13,720 +This "something in front of it" + +184 +00:09:13,720 --> 00:09:15,870 +can be simplified as follows. + +185 +00:09:15,870 --> 00:09:20,000 +This 1 times b +can be simplified to b. + +186 +00:09:20,000 --> 00:09:23,555 +Then this b plus 0 can +be simplified to b. + +187 +00:09:23,555 --> 00:09:25,490 +And assuming that nothing can + +188 +00:09:25,490 --> 00:09:27,470 +be simplified inside this r, + +189 +00:09:27,470 --> 00:09:28,700 +then this would be + +190 +00:09:28,700 --> 00:09:33,050 +the simplified version +of this regular expression. + +191 +00:09:33,050 --> 00:09:34,820 +And because the simplification + +192 +00:09:34,820 --> 00:09:36,965 +rules preserve what can be matched, + +193 +00:09:36,965 --> 00:09:39,080 +we can be sure that +this regular expression + +194 +00:09:39,080 --> 00:09:41,255 +can match exactly the strings + +195 +00:09:41,255 --> 00:09:43,940 +this regular expression can match. + +196 +00:09:43,940 --> 00:09:45,740 +Here is the other example. + +197 +00:09:45,740 --> 00:09:49,550 +This has just a tiny change +that this becomes here as 0. + +198 +00:09:49,550 --> 00:09:54,485 +Then 0 times b can be +replaced by just 0. + +199 +00:09:54,485 --> 00:09:56,705 +Then they are actually two +rules which could apply + +200 +00:09:56,705 --> 00:09:59,014 +either that we have +the same choice + +201 +00:09:59,014 --> 00:10:01,160 +or we can just say something plus + +202 +00:10:01,160 --> 00:10:04,100 +0 will always go to something. + +203 +00:10:04,100 --> 00:10:06,245 +So we can simplify it to that. + +204 +00:10:06,245 --> 00:10:08,510 +And then we have +0 times r again, + +205 +00:10:08,510 --> 00:10:10,460 +and that can be simplified to 0. + +206 +00:10:10,460 --> 00:10:12,080 +So actually what we find out with + +207 +00:10:12,080 --> 00:10:14,645 +this calculation is that +this regular expression, + +208 +00:10:14,645 --> 00:10:16,835 +even if it looks +quite complicated, + +209 +00:10:16,835 --> 00:10:19,295 +actually doesn't +match any string. + +210 +00:10:19,295 --> 00:10:21,290 +Because it matches exactly + +211 +00:10:21,290 --> 00:10:23,420 +those string this regular +expression can match. + +212 +00:10:23,420 --> 00:10:26,285 +And this reg expression +cannot match any. + +213 +00:10:26,285 --> 00:10:29,930 +We need one more +operation on languages. + +214 +00:10:29,930 --> 00:10:31,700 +I call this operation + +215 +00:10:31,700 --> 00:10:35,165 +the semantic derivative +of a language. + +216 +00:10:35,165 --> 00:10:37,775 +This operation takes +two arguments. + +217 +00:10:37,775 --> 00:10:40,670 +It takes a character +which we take + +218 +00:10:40,670 --> 00:10:43,925 +the semantic derivative +according to, + +219 +00:10:43,925 --> 00:10:45,980 +and it takes a language. + +220 +00:10:45,980 --> 00:10:49,579 +And what it does is it +looks into this language + +221 +00:10:49,579 --> 00:10:51,365 +and looks for all the strings + +222 +00:10:51,365 --> 00:10:53,735 +that start with this character, + +223 +00:10:53,735 --> 00:10:56,405 +let's say c, and then + +224 +00:10:56,405 --> 00:11:00,920 +just strips off that c. +So here's an example. + +225 +00:11:00,920 --> 00:11:02,975 +Suppose you have a language A, + +226 +00:11:02,975 --> 00:11:04,460 +with the strings + +227 +00:11:04,460 --> 00:11:06,965 +foo, bar and frak. + +228 +00:11:06,965 --> 00:11:10,835 +And you take the semantic +derivative according to f, + +229 +00:11:10,835 --> 00:11:14,450 +then we discard all the +strings which do not + +230 +00:11:14,450 --> 00:11:18,320 +start with an f. So +bar will be discarded. + +231 +00:11:18,320 --> 00:11:22,850 +Foo and frac start with +an f. So we keep them + +232 +00:11:22,850 --> 00:11:25,265 +but we strip off the first f. + +233 +00:11:25,265 --> 00:11:29,045 +So here we have only oo and rak. + +234 +00:11:29,045 --> 00:11:31,609 +If you take the +semantic derivative + +235 +00:11:31,609 --> 00:11:33,830 +of that language according to b, + +236 +00:11:33,830 --> 00:11:37,190 +then we will discard foo and +frak because they don't + +237 +00:11:37,190 --> 00:11:40,940 +start with b, and just keep bar. + +238 +00:11:40,940 --> 00:11:44,585 +But again, we have to +strip off this first b. + +239 +00:11:44,585 --> 00:11:49,305 +And if you take the semantic +derivative according to a, + +240 +00:11:49,305 --> 00:11:52,585 +then none of these +strings start with a. + +241 +00:11:52,585 --> 00:11:56,800 +So this will be defined +as just the empty set. + +242 +00:11:56,800 --> 00:11:59,695 +You can slightly +generalized this + +243 +00:11:59,695 --> 00:12:02,560 +semantic derivative +to also strings. + +244 +00:12:02,560 --> 00:12:05,170 +So imagine you have +a language A and you + +245 +00:12:05,170 --> 00:12:08,274 +build the semantic derivative +according to a string s. + +246 +00:12:08,274 --> 00:12:10,990 +Then you look in +this language and + +247 +00:12:10,990 --> 00:12:13,420 +you look for all the +strings that start with + +248 +00:12:13,420 --> 00:12:19,555 +this s. But you strip +off that s. Ok? + +249 +00:12:19,555 --> 00:12:23,830 +So if you have a string +starting with the s, + +250 +00:12:23,830 --> 00:12:26,065 +then you keep that string, + +251 +00:12:26,065 --> 00:12:27,490 +but you keep only the rest... + +252 +00:12:27,490 --> 00:12:28,810 +what's coming after this s. + +253 +00:12:28,810 --> 00:12:32,010 +That is here indicated +with this s'. + +254 +00:12:32,010 --> 00:12:35,330 +I also have this convention, +this personal convention + +255 +00:12:35,330 --> 00:12:40,055 +I have to say, that everything +which works on lists, + +256 +00:12:40,055 --> 00:12:42,185 +remember strings are +lists of characters. + +257 +00:12:42,185 --> 00:12:46,655 +I just put there an 's'. So +here's the one for characters. + +258 +00:12:46,655 --> 00:12:48,680 +I just call it Der with a capital + +259 +00:12:48,680 --> 00:12:51,740 +D. And I call that Ders, + +260 +00:12:51,740 --> 00:12:54,350 +because that works over strings. + +261 +00:12:54,350 --> 00:12:55,865 +And you can see how it would + +262 +00:12:55,865 --> 00:12:59,750 +be defined if you only take this +as a primitive operation. + +263 +00:12:59,750 --> 00:13:01,340 +We would just need to iterate + +264 +00:13:01,340 --> 00:13:04,050 +that essentially for this Ders. + +265 +00:13:04,060 --> 00:13:07,970 +Ok, we now have all +the theory in place. + +266 +00:13:07,970 --> 00:13:11,345 +And we can finally start +implementing our algorithm. + +267 +00:13:11,345 --> 00:13:14,580 +That's when we'll do +in the next video.