diff -r b294cfbb5c01 -r e8402d8ec8e6 videos/01-basics2.srt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/videos/01-basics2.srt Tue Sep 29 12:52:07 2020 +0100 @@ -0,0 +1,1627 @@ +1 +00:00:05,810 --> 00:00:11,039 +Okay, so we have strings. +The empty string and string "abc". + +2 +00:00:11,039 --> 00:00:13,200 +And we can take the head + +3 +00:00:13,200 --> 00:00:15,615 +of a string and the +tail of the string, + +4 +00:00:15,615 --> 00:00:18,480 +since we regard them as +lists of characters. + +5 +00:00:18,480 --> 00:00:22,230 +We also introduced the +notion of a language. + +6 +00:00:22,230 --> 00:00:25,305 +A language is just +a set of strings. + +7 +00:00:25,305 --> 00:00:26,700 +So here's a language. + +8 +00:00:26,700 --> 00:00:28,275 +It contains the empty string, + +9 +00:00:28,275 --> 00:00:30,360 +the string hello, string foobar, + +10 +00:00:30,360 --> 00:00:31,965 +the string, which is + +11 +00:00:31,965 --> 00:00:34,874 +just a character a and +the string abc. + +12 +00:00:34,874 --> 00:00:37,800 +There will be also other +languages, which for example, + +13 +00:00:37,800 --> 00:00:39,780 +contain infinitely many strings. + +14 +00:00:39,780 --> 00:00:42,190 +Actually that will +happen quite often. + +15 +00:00:42,190 --> 00:00:45,560 +Now, you've seen this operation + +16 +00:00:45,560 --> 00:00:47,660 +of concatenation of two strings. + +17 +00:00:47,660 --> 00:00:50,840 +So if you have the string +foo and a string bar, + +18 +00:00:50,840 --> 00:00:53,255 +we can just put them +next to each other. + +19 +00:00:53,255 --> 00:00:57,725 +We will now extend that +notion also told languages. + +20 +00:00:57,725 --> 00:01:02,270 +So assume you have a +language A and a language B. + +21 +00:01:02,270 --> 00:01:05,825 +That means we take every +string from the language + +22 +00:01:05,825 --> 00:01:11,195 +A and concatenate it with every +string of the language B. + +23 +00:01:11,195 --> 00:01:13,160 +Here's an example. + +24 +00:01:13,160 --> 00:01:14,750 +If you have the language A + +25 +00:01:14,750 --> 00:01:17,030 +containing foo and bar as strings, + +26 +00:01:17,030 --> 00:01:20,585 +and the language B containing +a and b as strings. + +27 +00:01:20,585 --> 00:01:23,000 +Then concatenating +these two languages + +28 +00:01:23,000 --> 00:01:27,110 +means I take foo with this a and b, + +29 +00:01:27,110 --> 00:01:28,625 +giving fooa and foob, + +30 +00:01:28,625 --> 00:01:30,905 +and I take bar and +concatenated with + +31 +00:01:30,905 --> 00:01:34,160 +a and b, giving bara and barb. + +32 +00:01:34,160 --> 00:01:36,185 +So the pointer is we're + +33 +00:01:36,185 --> 00:01:39,110 +overloading this notion +of concatenating + +34 +00:01:39,110 --> 00:01:45,020 +two strings also to +concatenating two languages. + +35 +00:01:45,020 --> 00:01:48,890 +Okay, here are two corner +cases of this definition. + +36 +00:01:48,890 --> 00:01:51,395 +What happens if I +have any language, + +37 +00:01:51,395 --> 00:01:54,470 +say A, and I concatenate it with + +38 +00:01:54,470 --> 00:01:58,640 +the language which contains +only the empty string. + +39 +00:01:58,640 --> 00:02:02,270 +And secondly, if I +have any language, + +40 +00:02:02,270 --> 00:02:04,220 +say again A, and I + +41 +00:02:04,220 --> 00:02:05,870 +concatenate it with the language + +42 +00:02:05,870 --> 00:02:08,345 +which doesn't contain any string. + +43 +00:02:08,345 --> 00:02:11,720 +How would these two +cases be defined? + +44 +00:02:11,720 --> 00:02:16,055 +Remember, this language here +contains a single element, + +45 +00:02:16,055 --> 00:02:17,645 +namely the empty string, + +46 +00:02:17,645 --> 00:02:20,210 +while this language + +47 +00:02:20,210 --> 00:02:23,585 +does not contain any +string. It is empty. + +48 +00:02:23,585 --> 00:02:26,930 +So these are two +different languages. + +49 +00:02:26,930 --> 00:02:29,960 +Think about how this +would be defined. + +50 +00:02:29,960 --> 00:02:33,455 +So what's the point of +all these definitions? + +51 +00:02:33,455 --> 00:02:36,110 +Well, we want to make precise + +52 +00:02:36,110 --> 00:02:38,735 +what is the meaning of +a regular expression. + +53 +00:02:38,735 --> 00:02:41,510 +For this, we use +this function L. It + +54 +00:02:41,510 --> 00:02:44,180 +will take a regular expression as + +55 +00:02:44,180 --> 00:02:47,150 +argument and generates a + +56 +00:02:47,150 --> 00:02:49,970 +set of strings or a language. + +57 +00:02:49,970 --> 00:02:52,670 +And it's supposed to be the +language which contains + +58 +00:02:52,670 --> 00:02:56,330 +all the strings this regular +expression can match. + +59 +00:02:56,330 --> 00:02:58,670 +So remember the 0 +regular expression, + +60 +00:02:58,670 --> 00:03:01,115 +it's not supposed to +match any string. + +61 +00:03:01,115 --> 00:03:05,105 +So we give it the empty +language or empty set. + +62 +00:03:05,105 --> 00:03:06,740 +This regular expression, + +63 +00:03:06,740 --> 00:03:09,380 +the one regular expression +is supposed to match + +64 +00:03:09,380 --> 00:03:13,445 +exactly one string, +namely the empty string. + +65 +00:03:13,445 --> 00:03:15,710 +the regular expression + +66 +00:03:15,710 --> 00:03:18,875 +recognizing just the +character c, in this case. + +67 +00:03:18,875 --> 00:03:22,820 +Well, that will be the +language which contains + +68 +00:03:22,820 --> 00:03:28,175 +the string only containing +the single character c. + +69 +00:03:28,175 --> 00:03:31,295 +Now, what is this regular +expression supposed to match? + +70 +00:03:31,295 --> 00:03:34,835 +Well, a string is matched +by this regular expression + +71 +00:03:34,835 --> 00:03:37,355 +if it is matched by +the first component, r1, + +72 +00:03:37,355 --> 00:03:40,970 +or by the second component, r2. + +73 +00:03:40,970 --> 00:03:43,190 +That's why this plus +is the alternative. + +74 +00:03:43,190 --> 00:03:45,605 +So which are the strings + +75 +00:03:45,605 --> 00:03:48,320 +this regular expression +can match? Well, + +76 +00:03:48,320 --> 00:03:51,275 +all the strings +r1 one can match, + +77 +00:03:51,275 --> 00:03:54,170 +given by L(r1) union... + +78 +00:03:54,170 --> 00:03:57,380 +all the strings +R2 can match. + +79 +00:03:57,380 --> 00:04:01,325 +And we use here the union +because this alternative, + +80 +00:04:01,325 --> 00:04:05,945 +even if it says this string +is matched by r1 or by r2. + +81 +00:04:05,945 --> 00:04:08,465 +The meaning of this +reg expression + +82 +00:04:08,465 --> 00:04:11,209 +is the union of both languages. + +83 +00:04:11,209 --> 00:04:14,315 +Now the sequence case. + +84 +00:04:14,315 --> 00:04:17,960 +This means a string is +matched by this regular expression + +85 +00:04:17,960 --> 00:04:20,510 +if the first part of this +string is matched by r1 + +86 +00:04:20,510 --> 00:04:24,935 +and the second part of +the string is matched by r2. + +87 +00:04:24,935 --> 00:04:28,490 +Now we have to say, which +are all the strings + +88 +00:04:28,490 --> 00:04:31,645 +this regular expression +can match. Well, it's + +89 +00:04:31,645 --> 00:04:34,495 +all the strings +r1 can match, + +90 +00:04:34,495 --> 00:04:39,205 +concatenated with all the +strings r2 can match. + +91 +00:04:39,205 --> 00:04:42,895 +So for this, we have to use +this concatenation operation. + +92 +00:04:42,895 --> 00:04:47,440 +So if r1 can match a string +and r2 can match a string, + +93 +00:04:47,440 --> 00:04:51,220 +then in the meaning will be +the concatenation of that. + +94 +00:04:51,220 --> 00:04:53,170 +So with this function L + +95 +00:04:53,170 --> 00:04:56,995 +we can make precise +what are the strings, + +96 +00:04:56,995 --> 00:05:00,205 +*all* the strings a regular +expression can match. + +97 +00:05:00,205 --> 00:05:04,689 +The only case we haven't +specified yet is the r*. + +98 +00:05:04,689 --> 00:05:07,750 +For this, we need +another operation + +99 +00:05:07,750 --> 00:05:12,470 +on languages or sets of +strings, which we do next. + +100 +00:05:12,490 --> 00:05:17,285 +We introduce a power +operation for languages. + +101 +00:05:17,285 --> 00:05:19,100 +The power operation or + +102 +00:05:19,100 --> 00:05:22,835 +the nth power is +defined recursively. + +103 +00:05:22,835 --> 00:05:26,615 +So the A to the power of 0 + +104 +00:05:26,615 --> 00:05:31,205 +would be defined as the set +containing the empty string. + +105 +00:05:31,205 --> 00:05:36,200 +And A to the power of +n + 1 would be A + +106 +00:05:36,200 --> 00:05:38,705 +concatenated with +A to the power of + +107 +00:05:38,705 --> 00:05:42,160 +n. Here are a few examples. + +108 +00:05:42,160 --> 00:05:45,380 +A to the power of four +would be essentially + +109 +00:05:45,380 --> 00:05:47,660 +A concatenated with +A concatenated with + +110 +00:05:47,660 --> 00:05:49,640 +A concatenated with A, + +111 +00:05:49,640 --> 00:05:51,650 +and also concatenated with + +112 +00:05:51,650 --> 00:05:55,580 +the language which +contains the empty string. + +113 +00:05:55,580 --> 00:05:57,275 +Because that's how we define + +114 +00:05:57,275 --> 00:05:59,975 +the base case, A +to the power of 0. + +115 +00:05:59,975 --> 00:06:03,410 +And A to the power +one would be just A, + +116 +00:06:03,410 --> 00:06:05,330 +again followed by that one, + +117 +00:06:05,330 --> 00:06:07,790 +but this would be just A. + +118 +00:06:07,790 --> 00:06:10,385 +And A to the power 0 + +119 +00:06:10,385 --> 00:06:14,270 +is this language which +contains the empty string. + +120 +00:06:14,270 --> 00:06:16,670 +With this power operation, + +121 +00:06:16,670 --> 00:06:19,505 +I can also fill in this case. + +122 +00:06:19,505 --> 00:06:23,210 +Remember this r* operation +is supposed to match + +123 +00:06:23,210 --> 00:06:27,680 +a string with either +0 or more copies of r. + +124 +00:06:27,680 --> 00:06:31,940 +So the meaning of this +regular expression would be + +125 +00:06:31,940 --> 00:06:37,475 +now the Union of all the +powers greater or equal than 0, + +126 +00:06:37,475 --> 00:06:40,835 +of the language that r can match. + +127 +00:06:40,835 --> 00:06:44,945 +And we take now at the +union of all these powers, + +128 +00:06:44,945 --> 00:06:47,150 +that means essentially what + +129 +00:06:47,150 --> 00:06:48,530 +can the regular expression match + +130 +00:06:48,530 --> 00:06:51,440 +if I take L of r +to the 0th power, + +131 +00:06:51,440 --> 00:06:53,030 +what can I match with + +132 +00:06:53,030 --> 00:06:57,305 +the first power, the +second power, and so on. + +133 +00:06:57,305 --> 00:07:00,544 +And I take the union +of all these powers. + +134 +00:07:00,544 --> 00:07:03,830 +That's the definition +of which strings + +135 +00:07:03,830 --> 00:07:05,510 +this regular expression +is supposed to + +136 +00:07:05,510 --> 00:07:08,510 +match. The meaning of +this regular expression. + +137 +00:07:08,510 --> 00:07:11,300 +This is such an +important definition, + +138 +00:07:11,300 --> 00:07:13,250 +that there's actually +a name for it. + +139 +00:07:13,250 --> 00:07:15,020 +It's called the Kleene star. + +140 +00:07:15,020 --> 00:07:16,400 +And it's written like this. + +141 +00:07:16,400 --> 00:07:18,335 +If I have a language A, + +142 +00:07:18,335 --> 00:07:21,785 +I can build the star +of this language + +143 +00:07:21,785 --> 00:07:26,255 +by union up all the powers +greater or equal than 0. + +144 +00:07:26,255 --> 00:07:28,580 +The A-star of + +145 +00:07:28,580 --> 00:07:31,580 +a language would expand +to a to the power 0, + +146 +00:07:31,580 --> 00:07:33,770 +union A to the power of one, + +147 +00:07:33,770 --> 00:07:36,665 +A to the power of two, and so on. + +148 +00:07:36,665 --> 00:07:39,230 +And it would +essentially mean, well, + +149 +00:07:39,230 --> 00:07:43,460 +it's the language which contains +the empty string because + +150 +00:07:43,460 --> 00:07:48,290 +of the A to the 0, one copy of A, + +151 +00:07:48,290 --> 00:07:51,020 +two concatenated copies of A, + +152 +00:07:51,020 --> 00:07:55,070 +three concatenated +copies of A, and so on. + +153 +00:07:55,070 --> 00:07:59,060 +So that's what this A-star +is defined as. + +154 +00:07:59,060 --> 00:08:03,725 +Essentially as the Union +of all the powers. + +155 +00:08:03,725 --> 00:08:05,990 +And it's essentially +each power is how many + +156 +00:08:05,990 --> 00:08:08,750 +times I have to +concatenate this language A. + +157 +00:08:08,750 --> 00:08:13,549 +And note that this language +A-star will always contain + +158 +00:08:13,549 --> 00:08:21,240 +the empty string because that +is how the A^0 is defined. + +159 +00:08:21,310 --> 00:08:23,540 +So in this definition, + +160 +00:08:23,540 --> 00:08:25,969 +I could have also written star + +161 +00:08:25,969 --> 00:08:29,180 +here because the +meaning of this r*, + +162 +00:08:29,180 --> 00:08:34,055 +regular expression +is the Kleene star of + +163 +00:08:34,055 --> 00:08:37,130 +the language of r. +Remember that's + +164 +00:08:37,130 --> 00:08:41,525 +the union of all powers +greater or equal than 0. + +165 +00:08:41,525 --> 00:08:43,640 +It's behind. +I hope you don't get + +166 +00:08:43,640 --> 00:08:45,800 +confused by the use of the stars. + +167 +00:08:45,800 --> 00:08:48,845 +This star is for +the regular expressions. + +168 +00:08:48,845 --> 00:08:52,205 +That star is for languages. + +169 +00:08:52,205 --> 00:08:54,274 +They are two +different operations. + +170 +00:08:54,274 --> 00:08:56,555 +They don't even +have the same type. + +171 +00:08:56,555 --> 00:08:58,954 +Here might be +something interesting. + +172 +00:08:58,954 --> 00:09:00,560 +Remember I asked you to think + +173 +00:09:00,560 --> 00:09:03,035 +about these two corner cases. + +174 +00:09:03,035 --> 00:09:04,730 +I hope you have done so, + +175 +00:09:04,730 --> 00:09:07,070 +but let's also have look +at this together. + +176 +00:09:07,070 --> 00:09:09,785 +According to the definition + +177 +00:09:09,785 --> 00:09:11,930 +for this append of languages, + +178 +00:09:11,930 --> 00:09:13,670 +I have to take every string in + +179 +00:09:13,670 --> 00:09:16,130 +this set and have + +180 +00:09:16,130 --> 00:09:18,695 +two concatenated with +every string in that set. + +181 +00:09:18,695 --> 00:09:22,115 +So this set contains only +the empty string as element. + +182 +00:09:22,115 --> 00:09:24,820 +So if I concatenate anything in + +183 +00:09:24,820 --> 00:09:27,745 +there with the empty string, +that will be left untouched. + +184 +00:09:27,745 --> 00:09:31,855 +So this one will be +actually A. + +185 +00:09:31,855 --> 00:09:34,600 +This one I have to +take every string in + +186 +00:09:34,600 --> 00:09:36,190 +this language and I have to + +187 +00:09:36,190 --> 00:09:39,115 +concatenate with every +string in that language. + +188 +00:09:39,115 --> 00:09:41,110 +But here isn't any string. + +189 +00:09:41,110 --> 00:09:43,300 +So I can't concatenate that. + +190 +00:09:43,300 --> 00:09:46,900 +That will be actually +the empty set (not empty string). + +191 +00:09:46,900 --> 00:09:49,570 +So now let's have + +192 +00:09:49,570 --> 00:09:51,670 +a look at regular expressions. + +193 +00:09:51,670 --> 00:09:53,230 +That was with languages. + +194 +00:09:53,230 --> 00:09:56,035 +But let's have a look at +regular expressions now. + +195 +00:09:56,035 --> 00:09:58,660 +What would be the +meaning, for example, + +196 +00:09:58,660 --> 00:10:01,945 +of r followed by 1? + +197 +00:10:01,945 --> 00:10:04,149 +That is a regular expression. + +198 +00:10:04,149 --> 00:10:06,255 +And it essentially says: + +199 +00:10:06,255 --> 00:10:07,850 +this regular expression matches + +200 +00:10:07,850 --> 00:10:10,685 +some strings and this matches +the empty string. + +201 +00:10:10,685 --> 00:10:13,730 +So if you find out +what the meaning is, + +202 +00:10:13,730 --> 00:10:16,955 +we would apply this +L-function to it. + +203 +00:10:16,955 --> 00:10:19,430 +And if we now look +up the definition, + +204 +00:10:19,430 --> 00:10:27,360 +that would be L of r +appended L of 1. + +205 +00:10:27,850 --> 00:10:32,255 +If you look up what +this is defined, + +206 +00:10:32,255 --> 00:10:41,640 +then this would be L of r +appended with this language. + +207 +00:10:41,950 --> 00:10:46,325 +And if you now look up +our definition earlier, + +208 +00:10:46,325 --> 00:10:50,455 +that will be just L of r. + +209 +00:10:50,455 --> 00:10:52,690 +What that essentially + +210 +00:10:52,690 --> 00:10:55,765 +means is if you have +this regular expression, + +211 +00:10:55,765 --> 00:10:58,885 +this will match +exactly those strings + +212 +00:10:58,885 --> 00:11:01,000 +which this regular +expression can match. + +213 +00:11:01,000 --> 00:11:04,794 +And that pretty much +coincides with our intuition. + +214 +00:11:04,794 --> 00:11:05,950 +This is supposed to match + +215 +00:11:05,950 --> 00:11:08,410 +the empty string at +the end of the string, + +216 +00:11:08,410 --> 00:11:11,005 +so it doesn't really +make any difference. + +217 +00:11:11,005 --> 00:11:13,480 +And the meaning +really tells us that. + +218 +00:11:13,480 --> 00:11:15,880 +But here's the +interesting thing. + +219 +00:11:15,880 --> 00:11:17,979 +When you were in school, + +220 +00:11:17,979 --> 00:11:23,124 +how would you think +about this expression? + +221 +00:11:23,124 --> 00:11:29,515 +Well, r times 1 +equals just our, okay? + +222 +00:11:29,515 --> 00:11:33,634 +Furthermore, if you had r times 0. + +223 +00:11:33,634 --> 00:11:35,045 +What would that be equal? + +224 +00:11:35,045 --> 00:11:37,205 +Well, it would be 0. + +225 +00:11:37,205 --> 00:11:39,650 +Ok, let's have + +226 +00:11:39,650 --> 00:11:42,605 +a look how that works out +with regular expressions. + +227 +00:11:42,605 --> 00:11:46,355 +So if you take r followed by 0, + +228 +00:11:46,355 --> 00:11:48,260 +remember this is +the regular expression + +229 +00:11:48,260 --> 00:11:49,895 +that cannot match anything. + +230 +00:11:49,895 --> 00:11:52,415 +By unfolding the definition, + +231 +00:11:52,415 --> 00:11:59,104 +I would get L of r +appended with L of 0. + +232 +00:11:59,104 --> 00:12:01,775 +This is of course defined as L of r + +233 +00:12:01,775 --> 00:12:05,915 +appended with the empty set. + +234 +00:12:05,915 --> 00:12:10,760 +And this would, according +to the definition earlier, + +235 +00:12:10,760 --> 00:12:13,830 +well just the empty set. + +236 +00:12:14,160 --> 00:12:20,260 +And this would be +of course L of 0. + +237 +00:12:20,260 --> 00:12:24,580 +So it seems like these laws, + +238 +00:12:24,580 --> 00:12:27,175 +at least for the times, + +239 +00:12:27,175 --> 00:12:29,785 +seem to be valid from math, + +240 +00:12:29,785 --> 00:12:31,420 +are also valid with regular expressions, + +241 +00:12:31,420 --> 00:12:33,370 +if we look at their meaning. + +242 +00:12:33,370 --> 00:12:36,670 +These laws on natural numbers + +243 +00:12:36,670 --> 00:12:40,105 +and regular expressions and +their close correspondance + +244 +00:12:40,105 --> 00:12:42,460 +probably explain why I use + +245 +00:12:42,460 --> 00:12:46,975 +a times for the sequence +regular expression. + +246 +00:12:46,975 --> 00:12:51,040 +So r followed by the +regular expression 1, + +247 +00:12:51,040 --> 00:12:54,505 +will have the same meaning +as that regular expression. + +248 +00:12:54,505 --> 00:12:59,120 +And r followed by the +0 regular expression + +249 +00:12:59,120 --> 00:13:01,295 +will have this meaning. + +250 +00:13:01,295 --> 00:13:03,590 +You might now think, but I also + +251 +00:13:03,590 --> 00:13:06,605 +wrote the alternative with a plus. + +252 +00:13:06,605 --> 00:13:10,370 +Does a similar law +holds for plus? + +253 +00:13:10,370 --> 00:13:15,965 +Of course, if I take r +plus 0 on natural numbers, + +254 +00:13:15,965 --> 00:13:20,060 +that would be just r. Does +hold for regular expressions? + +255 +00:13:20,060 --> 00:13:22,085 +Yes, indeed it holds. + +256 +00:13:22,085 --> 00:13:26,735 +If I have this +regular expression and + +257 +00:13:26,735 --> 00:13:33,245 +unfold the definition that +would be L(r) union L(0). + +258 +00:13:33,245 --> 00:13:38,060 +This would be equal +to L(r) union... + +259 +00:13:38,060 --> 00:13:41,150 +What is this? Well, that's +just the empty set. + +260 +00:13:41,150 --> 00:13:43,760 +And if I union something +with the empty set, + +261 +00:13:43,760 --> 00:13:47,180 +then this will be +just of L(r). So yes, + +262 +00:13:47,180 --> 00:13:50,120 +indeed, plus is also very much + +263 +00:13:50,120 --> 00:13:54,184 +inspired by the laws +on natural numbers. + +264 +00:13:54,184 --> 00:13:57,065 +There exists other notations too, + +265 +00:13:57,065 --> 00:14:01,310 +but I prefer this +using the plus for + +266 +00:14:01,310 --> 00:14:03,844 +the alternatives and the times + +267 +00:14:03,844 --> 00:14:05,765 +for the sequence expression. + +268 +00:14:05,765 --> 00:14:07,460 +It's also the reason why I call + +269 +00:14:07,460 --> 00:14:10,325 +this regular expression for +the empty string 1. + +270 +00:14:10,325 --> 00:14:14,915 +And for matching +nothing at all 0. + +271 +00:14:14,915 --> 00:14:18,665 +This correspondence to +natural numbers doesn't quite + +272 +00:14:18,665 --> 00:14:22,055 +extend to the star +regular expression. + +273 +00:14:22,055 --> 00:14:25,055 +But there's still a +connection. There too. + +274 +00:14:25,055 --> 00:14:26,510 +Can you remember how + +275 +00:14:26,510 --> 00:14:29,345 +the power operation on +languages was defined? + +276 +00:14:29,345 --> 00:14:31,370 +The 0 case was defined + +277 +00:14:31,370 --> 00:14:34,520 +as the set containing +the empty string. + +278 +00:14:34,520 --> 00:14:37,039 +Why is that? It looks +a bit arbitrary. + +279 +00:14:37,039 --> 00:14:41,150 +Why is it not the empty set +or why is it not defined as A? + +280 +00:14:41,150 --> 00:14:43,745 +Why is defined in this +particular way? + +281 +00:14:43,745 --> 00:14:46,880 +Well, can you remember how + +282 +00:14:46,880 --> 00:14:48,950 +the power operation r to the + +283 +00:14:48,950 --> 00:14:51,440 +0 is defined on natural numbers? + +284 +00:14:51,440 --> 00:14:53,930 +Yes, that's defined as 1. + +285 +00:14:53,930 --> 00:14:57,125 +Does that also apply to +regular expressions? + +286 +00:14:57,125 --> 00:15:00,725 +Well, if we take the meaning +of a regular expression, + +287 +00:15:00,725 --> 00:15:04,730 +let's say r, and raise +it to the 0th power, + +288 +00:15:04,730 --> 00:15:07,685 +then this will be, of +course, by definition, + +289 +00:15:07,685 --> 00:15:12,245 +be defined as the set +containing the empty string. + +290 +00:15:12,245 --> 00:15:17,160 +And that is of course +equal to L(1). + +291 +00:15:17,830 --> 00:15:20,570 +Again, you can see that + +292 +00:15:20,570 --> 00:15:23,300 +this law on natural numbers also + +293 +00:15:23,300 --> 00:15:29,000 +holds on the laws of regular +expressions - on heir meaning. + +294 +00:15:29,000 --> 00:15:32,810 +What I find really beautiful +here is that of course, + +295 +00:15:32,810 --> 00:15:36,244 +the meaning is defined on +sets, sets of strings. + +296 +00:15:36,244 --> 00:15:38,975 +This of course, on natural numbers. + +297 +00:15:38,975 --> 00:15:41,060 +You can probably +see now where I'm + +298 +00:15:41,060 --> 00:15:43,085 +coming from with my notation. + +299 +00:15:43,085 --> 00:15:46,205 +That is actually a slight +problem in this field, + +300 +00:15:46,205 --> 00:15:48,590 +since it's so old. + +301 +00:15:48,590 --> 00:15:52,205 +Pretty much everybody +introduced their own notation. + +302 +00:15:52,205 --> 00:15:53,870 +And you now have heaps of + +303 +00:15:53,870 --> 00:15:55,550 +different notations +for the same thing. + +304 +00:15:55,550 --> 00:15:57,379 +Okay, that is slightly +exaggerating. + +305 +00:15:57,379 --> 00:16:00,830 +And also the notation +I used for times and + +306 +00:16:00,830 --> 00:16:04,295 +plus and 0 and 1s...definitely +I'm not the only one. + +307 +00:16:04,295 --> 00:16:05,435 +There are many people + +308 +00:16:05,435 --> 00:16:07,625 +who have used this +notation as well. + +309 +00:16:07,625 --> 00:16:10,190 +It's just, it's not universal. + +310 +00:16:10,190 --> 00:16:12,290 +Well, here's a question now. + +311 +00:16:12,290 --> 00:16:15,635 +Why did we go through this +kerfuffle in the first place? + +312 +00:16:15,635 --> 00:16:19,370 +Why did we introduce these +operations on languages? + +313 +00:16:19,370 --> 00:16:21,260 +And why did we introduce + +314 +00:16:21,260 --> 00:16:23,255 +the meaning of a +regular expression? + +315 +00:16:23,255 --> 00:16:26,300 +Well, remember at the beginning, + +316 +00:16:26,300 --> 00:16:28,265 +we wanted to specify + +317 +00:16:28,265 --> 00:16:31,880 +what problem our algorithms +actually supposed to solve. + +318 +00:16:31,880 --> 00:16:35,960 +And we want to make that +very precise so that we can + +319 +00:16:35,960 --> 00:16:38,555 +be sure that our implementation + +320 +00:16:38,555 --> 00:16:40,295 +really solves the problem. + +321 +00:16:40,295 --> 00:16:41,540 +And that's what you can do now. + +322 +00:16:41,540 --> 00:16:45,605 +We can say a regular +expression, r say, + +323 +00:16:45,605 --> 00:16:50,015 +is matching a string +s if and only if + +324 +00:16:50,015 --> 00:16:55,474 +the string s is in the language +of the regular expression. + +325 +00:16:55,474 --> 00:17:00,770 +So that is the problem our +matcher is supposed to solve. + +326 +00:17:00,770 --> 00:17:03,320 +And it's supposed to +solve that a bit faster + +327 +00:17:03,320 --> 00:17:06,860 +than in Python, Ruby and Java. + +328 +00:17:06,860 --> 00:17:09,585 +And unfortunately we cannot use + +329 +00:17:09,585 --> 00:17:12,260 +the definition of L +directly for that. + +330 +00:17:12,260 --> 00:17:15,815 +Because remember, in +the case of the star, + +331 +00:17:15,815 --> 00:17:17,690 +sometimes the meaning of + +332 +00:17:17,690 --> 00:17:18,830 +a regular expression is actually + +333 +00:17:18,830 --> 00:17:20,925 +an infinite set of strings. + +334 +00:17:20,925 --> 00:17:23,575 +And it's a tiny bit difficult + +335 +00:17:23,575 --> 00:17:27,040 +to decide whether a string +is an infinite set. + +336 +00:17:27,040 --> 00:17:30,790 +So we have to do something +more clever in our algorithm. + +337 +00:17:30,790 --> 00:17:33,535 +But that's for next week. +So see you next week. Bye. + +338 +00:17:38,535 --> 00:17:41,680 +Okay, just to troll you. +Here's one more slide. + +339 +00:17:41,680 --> 00:17:45,850 +So when you go over the +handouts and the videos, + +340 +00:17:45,850 --> 00:17:47,875 +think about this example. + +341 +00:17:47,875 --> 00:17:49,840 +Imagine you have a language A, + +342 +00:17:49,840 --> 00:17:51,970 +which contains four strings. + +343 +00:17:51,970 --> 00:17:53,725 +First string is just the character a, + +344 +00:17:53,725 --> 00:17:57,445 +second string is just +the character b, and so on. + +345 +00:17:57,445 --> 00:18:02,335 +How many strings are there +in A to the power four. + +346 +00:18:02,335 --> 00:18:05,210 +Okay, that should be +relatively simple. + +347 +00:18:05,210 --> 00:18:07,310 +If not, just try it out by + +348 +00:18:07,310 --> 00:18:11,165 +implementing it in Scala and see +how many strings there are. + +349 +00:18:11,165 --> 00:18:13,850 +But now the more +interesting question, + +350 +00:18:13,850 --> 00:18:16,670 +imagine the same language, + +351 +00:18:16,670 --> 00:18:19,400 +except that instead +of the character d, + +352 +00:18:19,400 --> 00:18:22,250 +we have here, the empty string. + +353 +00:18:22,250 --> 00:18:26,630 +How many strings are in +A to the power four? + +354 +00:18:26,630 --> 00:18:31,320 +Okay, I'll let you think +about this. Bye now.