# HG changeset patch # User Christian Urban # Date 1601987520 -3600 # Node ID 939c10745a3af7706a9f5654ba72b210a0df6c89 # Parent 16aafc7691d83cd0ea76325eaf7a8c6eb71bb8c2 updated diff -r 16aafc7691d8 -r 939c10745a3a hws/hw01.pdf Binary file hws/hw01.pdf has changed diff -r 16aafc7691d8 -r 939c10745a3a hws/hw01.tex --- a/hws/hw01.tex Tue Oct 06 00:39:34 2020 +0100 +++ b/hws/hw01.tex Tue Oct 06 13:32:00 2020 +0100 @@ -72,7 +72,7 @@ \{[a], [b], [c], []\}$. \item (1) How many basic regular expressions are there to match - the string $abcd$? (2) How many if they cannot include + \textbf{only} the string $abcd$? (2) How many if they cannot include $\ONE$ and $\ZERO$? (3) How many if they are also not allowed to contain stars? (4) How many if they are also not allowed to contain $\_ + \_$? diff -r 16aafc7691d8 -r 939c10745a3a slides/slides03.pdf Binary file slides/slides03.pdf has changed diff -r 16aafc7691d8 -r 939c10745a3a slides/slides03.tex --- a/slides/slides03.tex Tue Oct 06 00:39:34 2020 +0100 +++ b/slides/slides03.tex Tue Oct 06 13:32:00 2020 +0100 @@ -55,17 +55,17 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\begin{frame}[c] -\frametitle{Scala Book, Exams} +%\begin{frame}[c] +%\frametitle{Scala Book, Exams} +% +%\begin{itemize} +%\item \small https://nms.kcl.ac.uk/christian.urban/ProgInScala2ed.pdf\normalsize +%\item homework (written exam 80\%) +%\item coursework (20\%)\bigskip +%\item short survey at KEATS; to be answered until Sunday +%\end{itemize} -\begin{itemize} -\item \small https://nms.kcl.ac.uk/christian.urban/ProgInScala2ed.pdf\normalsize -\item homework (written exam 80\%) -\item coursework (20\%)\bigskip -\item short survey at KEATS; to be answered until Sunday -\end{itemize} - -\end{frame} +%\end{frame} %%%%%%%%%%% @@ -92,7 +92,7 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\begin{frame}[c] +\begin{frame}[t] \frametitle{Negation} Assume you have an alphabet consisting of the letters \bl{$a$}, \bl{$b$} and \bl{$c$} only. diff -r 16aafc7691d8 -r 939c10745a3a videos/02-proof.srt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/videos/02-proof.srt Tue Oct 06 13:32:00 2020 +0100 @@ -0,0 +1,2110 @@ +1 +00:00:06,260 --> 00:00:10,275 +I come back, I know +the topic of proofs + +2 +00:00:10,275 --> 00:00:13,950 +is usually not the most +favorite topics with students. + +3 +00:00:13,950 --> 00:00:17,220 +But I'm really passionate +about proofs in + +4 +00:00:17,220 --> 00:00:19,320 +this module because they really + +5 +00:00:19,320 --> 00:00:21,885 +help you with understanding +what's going on. + +6 +00:00:21,885 --> 00:00:25,170 +And very often they help +you with preventing errors. + +7 +00:00:25,170 --> 00:00:28,695 +You seen me earlier doing +these calculations. + +8 +00:00:28,695 --> 00:00:30,210 +And if I hadn't had + +9 +00:00:30,210 --> 00:00:33,330 +the safety net of +knowing what I'm doing, + +10 +00:00:33,330 --> 00:00:35,130 +I could have easily gone wrong. + +11 +00:00:35,130 --> 00:00:36,390 +And that's actually a problem is + +12 +00:00:36,390 --> 00:00:38,935 +existing wreck Expression +Matching engines. + +13 +00:00:38,935 --> 00:00:41,330 +Sometimes in corner +cases as we seen, + +14 +00:00:41,330 --> 00:00:43,130 +they produce slow results, + +15 +00:00:43,130 --> 00:00:45,455 +which is actually not so bad. + +16 +00:00:45,455 --> 00:00:48,020 +Much worse is that +we also know they + +17 +00:00:48,020 --> 00:00:50,570 +sometimes produce +incorrect results. + +18 +00:00:50,570 --> 00:00:51,980 +So for us here, + +19 +00:00:51,980 --> 00:00:53,390 +the proofs will be + +20 +00:00:53,390 --> 00:00:55,100 +essentially the safety nets + +21 +00:00:55,100 --> 00:00:58,295 +of making sure that our +algorithms actually correct. + +22 +00:00:58,295 --> 00:01:00,095 +And that's actually +the beauty of that. + +23 +00:01:00,095 --> 00:01:03,170 +It doesn't happen very often +in real life that we can + +24 +00:01:03,170 --> 00:01:06,500 +actually prove the +correctness of our algorithm. + +25 +00:01:06,500 --> 00:01:08,405 +But we can do this here. + +26 +00:01:08,405 --> 00:01:11,000 +So we will start +a bit slow first, + +27 +00:01:11,000 --> 00:01:14,330 +I will show you how +properties are stated. + +28 +00:01:14,330 --> 00:01:17,255 +And then along the way +we will see how proofs + +29 +00:01:17,255 --> 00:01:20,940 +actually can be performed +and how they can help you. + +30 +00:01:20,980 --> 00:01:25,205 +You can probably remember this +slide from the beginning. + +31 +00:01:25,205 --> 00:01:29,300 +Now we would probably stayed +the same property that + +32 +00:01:29,300 --> 00:01:32,420 +our algorithm matches +the specification + +33 +00:01:32,420 --> 00:01:33,695 +of what that means. + +34 +00:01:33,695 --> 00:01:35,450 +Maybe slightly more mathematical, + +35 +00:01:35,450 --> 00:01:37,130 +we would write +something like this. + +36 +00:01:37,130 --> 00:01:39,545 +If our Marcia is given as + +37 +00:01:39,545 --> 00:01:42,740 +a string direct expression +R and says yes, + +38 +00:01:42,740 --> 00:01:45,110 +then this string +really needs to be in + +39 +00:01:45,110 --> 00:01:48,350 +the language of R. And +if the matcher says no, + +40 +00:01:48,350 --> 00:01:50,120 +then this string cannot be in + +41 +00:01:50,120 --> 00:01:52,325 +the language of r. And we spent + +42 +00:01:52,325 --> 00:01:54,590 +quite some effort to make precise + +43 +00:01:54,590 --> 00:01:57,065 +what the language is +of a wreck expression. + +44 +00:01:57,065 --> 00:01:59,090 +And we spend some effort to + +45 +00:01:59,090 --> 00:02:01,505 +make precise what +our algorithm is. + +46 +00:02:01,505 --> 00:02:02,840 +So we could now take + +47 +00:02:02,840 --> 00:02:06,875 +this property and we could +generate some test cases, + +48 +00:02:06,875 --> 00:02:09,410 +and that is all fine and good. + +49 +00:02:09,410 --> 00:02:13,340 +Testing is a very important +thing for our software. + +50 +00:02:13,340 --> 00:02:16,715 +But we can only do +this testing so far. + +51 +00:02:16,715 --> 00:02:19,385 +Remember in the homework, + +52 +00:02:19,385 --> 00:02:21,275 +there was a single string, + +53 +00:02:21,275 --> 00:02:23,750 +abcd, and there were +actually infinitely + +54 +00:02:23,750 --> 00:02:27,530 +many wreck expressions which +can match only that string. + +55 +00:02:27,530 --> 00:02:30,470 +So testing something for +infinitely many things, + +56 +00:02:30,470 --> 00:02:32,000 +that's a bit hard. + +57 +00:02:32,000 --> 00:02:36,710 +And Summa, as soon as we have +an expression with a star, + +58 +00:02:36,710 --> 00:02:39,289 +this l function returns, + +59 +00:02:39,289 --> 00:02:41,360 +in the general case +an infinite set. + +60 +00:02:41,360 --> 00:02:43,940 +And testing whether a +string is an infinite set, + +61 +00:02:43,940 --> 00:02:45,500 +that's also a bit hard. + +62 +00:02:45,500 --> 00:02:47,510 +So testing can only do so much. + +63 +00:02:47,510 --> 00:02:49,820 +We could generate some +wreck expressions + +64 +00:02:49,820 --> 00:02:51,560 +and we can generate some strings. + +65 +00:02:51,560 --> 00:02:53,990 +And we can test +whether that is true. + +66 +00:02:53,990 --> 00:02:57,470 +So B would be still +in the quandary that + +67 +00:02:57,470 --> 00:03:00,950 +maybe there's a corner +case which doesn't work. + +68 +00:03:00,950 --> 00:03:03,110 +So what we want to +achieve instead + +69 +00:03:03,110 --> 00:03:05,615 +is you want to really make sure + +70 +00:03:05,615 --> 00:03:07,325 +that this algorithm works for + +71 +00:03:07,325 --> 00:03:10,850 +all reg expression +and for all strings. + +72 +00:03:10,850 --> 00:03:16,400 +And that ONE proving will +help us to establish. + +73 +00:03:16,400 --> 00:03:19,040 +Testing cannot do that. + +74 +00:03:19,040 --> 00:03:22,310 +Remember, metro +mainly consisted of + +75 +00:03:22,310 --> 00:03:25,444 +these functions, +nullable and derivative. + +76 +00:03:25,444 --> 00:03:27,755 +So if you want to prove +anything about Metro. + +77 +00:03:27,755 --> 00:03:30,200 +Bet on, make sure that +we know what actually + +78 +00:03:30,200 --> 00:03:33,890 +the specification of +nullable and derivative is. + +79 +00:03:33,890 --> 00:03:35,750 +That's actually quite +instructive to also + +80 +00:03:35,750 --> 00:03:37,850 +understand how they work. + +81 +00:03:37,850 --> 00:03:39,695 +So let's do that next. + +82 +00:03:39,695 --> 00:03:41,975 +The central property of nullable, + +83 +00:03:41,975 --> 00:03:43,445 +Actually that's not so hard. + +84 +00:03:43,445 --> 00:03:48,740 +It's just says a function or +a reg expression is nullable + +85 +00:03:48,740 --> 00:03:50,360 +if and only if + +86 +00:03:50,360 --> 00:03:52,160 +the empty string is in + +87 +00:03:52,160 --> 00:03:54,410 +the language that was the +purpose of this novel, + +88 +00:03:54,410 --> 00:03:55,670 +it takes work, expression and + +89 +00:03:55,670 --> 00:03:58,490 +test whether can match +the empty string. + +90 +00:03:58,490 --> 00:04:01,010 +Meaning this empty string needs + +91 +00:04:01,010 --> 00:04:03,230 +to be in the language +of R. And again, + +92 +00:04:03,230 --> 00:04:04,385 +we have this if an only, + +93 +00:04:04,385 --> 00:04:05,885 +if this one says yes, + +94 +00:04:05,885 --> 00:04:08,465 +then the empty string needs +to be in this language. + +95 +00:04:08,465 --> 00:04:10,385 +And if this nullable says no, + +96 +00:04:10,385 --> 00:04:13,085 +the empty string CoMP +unless language. + +97 +00:04:13,085 --> 00:04:15,290 +So that's relatively easy. + +98 +00:04:15,290 --> 00:04:18,110 +The next one might be also +very instructive to look + +99 +00:04:18,110 --> 00:04:22,775 +at what's the property the +derivative should satisfy. + +100 +00:04:22,775 --> 00:04:26,329 +And remember, the derivative +was a kind of operation. + +101 +00:04:26,329 --> 00:04:28,850 +We taking a wrecker +expression and we're looking + +102 +00:04:28,850 --> 00:04:31,955 +at all the strings this +reg expression can match, + +103 +00:04:31,955 --> 00:04:35,510 +but which starts with +C. And then me somehow + +104 +00:04:35,510 --> 00:04:37,250 +looking for reg expression +which can match + +105 +00:04:37,250 --> 00:04:39,335 +the same string starting with C, + +106 +00:04:39,335 --> 00:04:42,440 +except that this +c is chopped off. + +107 +00:04:42,440 --> 00:04:47,150 +And that actually saves +what this property states. + +108 +00:04:47,150 --> 00:04:50,300 +Take the, take a +break expression are, + +109 +00:04:50,300 --> 00:04:53,045 +and build the derivative +according to see, + +110 +00:04:53,045 --> 00:04:56,150 +okay, so this gives +us a reg expression, + +111 +00:04:56,150 --> 00:04:58,025 +the derivative of that + +112 +00:04:58,025 --> 00:05:01,775 +r. Now we take the +a language of that, + +113 +00:05:01,775 --> 00:05:06,620 +are that supposed to be +the same set as if I take + +114 +00:05:06,620 --> 00:05:12,845 +the language of R and then +build the semantic derivative. + +115 +00:05:12,845 --> 00:05:16,055 +Remember the semantic derivative +was doing exactly that. + +116 +00:05:16,055 --> 00:05:18,380 +It goes through all +the strings in this + +117 +00:05:18,380 --> 00:05:21,350 +that filters out everything which + +118 +00:05:21,350 --> 00:05:24,770 +doesn't start with C. +So throws them away of + +119 +00:05:24,770 --> 00:05:26,510 +the remaining strings +which start with + +120 +00:05:26,510 --> 00:05:28,670 +C. It chops off so that C, + +121 +00:05:28,670 --> 00:05:31,370 +So this will be exactly + +122 +00:05:31,370 --> 00:05:34,205 +what our derivative +is supposed to do. + +123 +00:05:34,205 --> 00:05:36,605 +And yes, if you can +establish that, + +124 +00:05:36,605 --> 00:05:38,450 +then the language of + +125 +00:05:38,450 --> 00:05:43,355 +this derivative operation +will produce the same sat. + +126 +00:05:43,355 --> 00:05:45,815 +This actually also +how I'm going to + +127 +00:05:45,815 --> 00:05:49,760 +test in cases icons +yet at the beginning, + +128 +00:05:49,760 --> 00:05:52,910 +how your coursework is correct. + +129 +00:05:52,910 --> 00:05:55,625 +If you give me a +definition for derivative + +130 +00:05:55,625 --> 00:05:58,730 +of one of these bounded +record expressions, + +131 +00:05:58,730 --> 00:06:01,085 +and I can't see +whether it's correct. + +132 +00:06:01,085 --> 00:06:03,380 +Then what it will do +is I will just try to + +133 +00:06:03,380 --> 00:06:05,675 +establish that this +property holds. + +134 +00:06:05,675 --> 00:06:07,265 +And if I can establish that, + +135 +00:06:07,265 --> 00:06:08,870 +then you coursework is find. + +136 +00:06:08,870 --> 00:06:10,550 +And if I can't establish that, + +137 +00:06:10,550 --> 00:06:12,410 +then usually it shows + +138 +00:06:12,410 --> 00:06:14,360 +there is a counterexample +somewhere. + +139 +00:06:14,360 --> 00:06:19,220 +Because this derivative, +as you've seen desk times, + +140 +00:06:19,220 --> 00:06:21,725 +more than one solution +you can give. + +141 +00:06:21,725 --> 00:06:23,915 +Some of them are more +efficient than others. + +142 +00:06:23,915 --> 00:06:25,685 +But in terms of correctness, + +143 +00:06:25,685 --> 00:06:30,680 +there are multiple solutions +for the coursework. + +144 +00:06:30,680 --> 00:06:34,145 +And in the past I really had +to scratch my head and say, + +145 +00:06:34,145 --> 00:06:36,995 +Is that definition +really correct? + +146 +00:06:36,995 --> 00:06:39,620 +And so what I did is +I just went ahead and + +147 +00:06:39,620 --> 00:06:42,785 +tried to prove that or +establish this property. + +148 +00:06:42,785 --> 00:06:45,110 +The reason why we writing + +149 +00:06:45,110 --> 00:06:47,300 +these properties +down so precisely, + +150 +00:06:47,300 --> 00:06:49,880 +remember we know here +what the derivative is, + +151 +00:06:49,880 --> 00:06:51,290 +we know what the +languages we know + +152 +00:06:51,290 --> 00:06:53,495 +what this semantic derivative is. + +153 +00:06:53,495 --> 00:06:56,960 +This is usually when you are +trying to prove something. + +154 +00:06:56,960 --> 00:06:58,715 +And very important step. + +155 +00:06:58,715 --> 00:07:01,385 +If you know what you're +actually trying to prove, + +156 +00:07:01,385 --> 00:07:05,510 +then you have already +done half of the proof. + +157 +00:07:05,510 --> 00:07:08,300 +Remember, we want to +establish these properties + +158 +00:07:08,300 --> 00:07:10,460 +not just for some reg expression, + +159 +00:07:10,460 --> 00:07:13,940 +but for all wrecker expression +and for all characters. + +160 +00:07:13,940 --> 00:07:17,225 +So proving is the only +method to achieve that. + +161 +00:07:17,225 --> 00:07:18,590 +And I said at the beginning of + +162 +00:07:18,590 --> 00:07:21,170 +this video's proving is + +163 +00:07:21,170 --> 00:07:23,810 +not the most favorite +subject of students. + +164 +00:07:23,810 --> 00:07:25,730 +But I can promise you, + +165 +00:07:25,730 --> 00:07:29,630 +if you just follow a very +simple minded recipe, + +166 +00:07:29,630 --> 00:07:33,695 +that brings, you're already +very far in these proofs. + +167 +00:07:33,695 --> 00:07:35,720 +Of course, there's +always some creativity + +168 +00:07:35,720 --> 00:07:37,070 +involved in these proofs. + +169 +00:07:37,070 --> 00:07:39,380 +But a simple-minded +recipe you can + +170 +00:07:39,380 --> 00:07:42,620 +follow brings you're already +very close to the end. + +171 +00:07:42,620 --> 00:07:46,610 +So let's have a look how +this recipe can be derived. + +172 +00:07:46,610 --> 00:07:48,740 +Establishing a property is for + +173 +00:07:48,740 --> 00:07:50,900 +all Rekha expression nearly + +174 +00:07:50,900 --> 00:07:55,805 +always requires an induction +over wreck expressions. + +175 +00:07:55,805 --> 00:07:57,350 +And so we have to look at what's + +176 +00:07:57,350 --> 00:07:59,150 +the recipe for doing inductions + +177 +00:07:59,150 --> 00:08:02,390 +over wreck expressions +and inductions. + +178 +00:08:02,390 --> 00:08:03,590 +You probably remember from + +179 +00:08:03,590 --> 00:08:07,100 +school time you did induction +over natural numbers. + +180 +00:08:07,100 --> 00:08:11,120 +You had to establish that +some property holds for 0. + +181 +00:08:11,120 --> 00:08:13,775 +And you had to establish it + +182 +00:08:13,775 --> 00:08:14,960 +that if you assume that + +183 +00:08:14,960 --> 00:08:16,970 +the property holds +for the N case, + +184 +00:08:16,970 --> 00:08:21,365 +it has to also hold for +the n plus one case. + +185 +00:08:21,365 --> 00:08:24,350 +And induction over wreck +expressions is very, + +186 +00:08:24,350 --> 00:08:26,810 +very similar to this +kind of induction. + +187 +00:08:26,810 --> 00:08:30,754 +Just that in case of +break expressions, + +188 +00:08:30,754 --> 00:08:33,200 +we have essentially +three base cases. + +189 +00:08:33,200 --> 00:08:35,360 +So in our grandma +or we say it rec, + +190 +00:08:35,360 --> 00:08:38,705 +expressions are 01 +and characters. + +191 +00:08:38,705 --> 00:08:42,260 +And these will be the base +cases in our induction. + +192 +00:08:42,260 --> 00:08:43,895 +So we have to establish + +193 +00:08:43,895 --> 00:08:46,475 +any property we want +to prove by induction, + +194 +00:08:46,475 --> 00:08:49,055 +you have to show +that it holds for 0, + +195 +00:08:49,055 --> 00:08:50,495 +it holds for one, + +196 +00:08:50,495 --> 00:08:52,505 +and it holds for the character. + +197 +00:08:52,505 --> 00:08:57,350 +So there's nothing else we +can assume in these cases. + +198 +00:08:57,350 --> 00:08:59,300 +We just have to prove +that this property + +199 +00:08:59,300 --> 00:09:01,835 +holds in these three base cases. + +200 +00:09:01,835 --> 00:09:04,490 +It is very much the +same as if you proved + +201 +00:09:04,490 --> 00:09:06,050 +something over natural numbers + +202 +00:09:06,050 --> 00:09:08,270 +and you had to prove it for 0. + +203 +00:09:08,270 --> 00:09:10,880 +Now, the sequence case, + +204 +00:09:10,880 --> 00:09:12,755 +the alternative in a star. + +205 +00:09:12,755 --> 00:09:15,815 +These are the kind +of inductive cases. + +206 +00:09:15,815 --> 00:09:17,750 +Remember in natural numbers, + +207 +00:09:17,750 --> 00:09:20,450 +you could assume that +a property holds for + +208 +00:09:20,450 --> 00:09:24,770 +n and then you had to +establish it for n plus one. + +209 +00:09:24,770 --> 00:09:26,945 +This is here very simmer. + +210 +00:09:26,945 --> 00:09:29,000 +You can assume that + +211 +00:09:29,000 --> 00:09:30,920 +these properties hold for + +212 +00:09:30,920 --> 00:09:33,140 +the components of +these reg expression. + +213 +00:09:33,140 --> 00:09:35,840 +So you can assume that it +holds for this on one, + +214 +00:09:35,840 --> 00:09:38,960 +and you can assume that +it holds for this R2. + +215 +00:09:38,960 --> 00:09:41,090 +And your task is now to + +216 +00:09:41,090 --> 00:09:44,240 +establish this property +also for sequences. + +217 +00:09:44,240 --> 00:09:46,255 +The same for alternative soup. + +218 +00:09:46,255 --> 00:09:48,710 +Assume that it holds +already for R1. + +219 +00:09:48,710 --> 00:09:51,005 +You can assume that +it holds for the R2. + +220 +00:09:51,005 --> 00:09:53,930 +You have to establish that +this property holds also for + +221 +00:09:53,930 --> 00:09:58,460 +the entire alternative, +the same four star. + +222 +00:09:58,460 --> 00:10:00,440 +So there you can assume it for r, + +223 +00:10:00,440 --> 00:10:02,330 +but you have to now +establish it that + +224 +00:10:02,330 --> 00:10:04,760 +it also holds for r star. + +225 +00:10:04,760 --> 00:10:07,910 +Ok, that leads to this +simple-minded recipe + +226 +00:10:07,910 --> 00:10:09,560 +which you can always follow. + +227 +00:10:09,560 --> 00:10:10,970 +So first you have to know + +228 +00:10:10,970 --> 00:10:13,085 +what's the property +you want to show. + +229 +00:10:13,085 --> 00:10:16,790 +Luckily for you, that's +almost always given by me. + +230 +00:10:16,790 --> 00:10:19,100 +So there's nothing +to think about it. + +231 +00:10:19,100 --> 00:10:22,100 +But then what you have to +do in an induction proof, + +232 +00:10:22,100 --> 00:10:23,855 +you have to show that +this property holds + +233 +00:10:23,855 --> 00:10:26,345 +for 041 and characters. + +234 +00:10:26,345 --> 00:10:28,790 +These are the base +cases in the induction. + +235 +00:10:28,790 --> 00:10:31,280 +And then in the induction cases, + +236 +00:10:31,280 --> 00:10:32,555 +you just have to say, well, + +237 +00:10:32,555 --> 00:10:34,535 +I know that this +property holds for one, + +238 +00:10:34,535 --> 00:10:36,980 +I know that the +property holds for R2. + +239 +00:10:36,980 --> 00:10:38,720 +What does it look like that + +240 +00:10:38,720 --> 00:10:40,895 +this property holds +for the alternative, + +241 +00:10:40,895 --> 00:10:44,480 +for the sequence +and for the r-star, + +242 +00:10:44,480 --> 00:10:46,955 +that will then requires +some reasoning. + +243 +00:10:46,955 --> 00:10:48,230 +But if you know what you can + +244 +00:10:48,230 --> 00:10:50,359 +assume and what do you +have to establish? + +245 +00:10:50,359 --> 00:10:52,940 +Often can already see bought + +246 +00:10:52,940 --> 00:10:55,490 +needs to be proved and +what's the argument like? + +247 +00:10:55,490 --> 00:10:58,325 +So let's try that and +then concrete example. + +248 +00:10:58,325 --> 00:11:01,265 +So how does approve +work in real life? + +249 +00:11:01,265 --> 00:11:03,530 +And say that the beginning +first you have to know + +250 +00:11:03,530 --> 00:11:05,630 +what's the property +you want to prove. + +251 +00:11:05,630 --> 00:11:10,280 +So in our case, the P of R is if + +252 +00:11:10,280 --> 00:11:14,900 +nullable of R holds if + +253 +00:11:14,900 --> 00:11:19,760 +and only if the empty string +is in the language of. + +254 +00:11:19,760 --> 00:11:23,690 +So we want to do an induction +of rec expressions. + +255 +00:11:23,690 --> 00:11:28,400 +So we better have Fan property +which depends on this + +256 +00:11:28,400 --> 00:11:31,130 +R. And so now in + +257 +00:11:31,130 --> 00:11:34,985 +each case I would start with +a blank piece of paper. + +258 +00:11:34,985 --> 00:11:38,030 +And I would write on the top, + +259 +00:11:38,030 --> 00:11:40,190 +what are my assumptions? + +260 +00:11:40,190 --> 00:11:42,470 +And on the bottom, I would write + +261 +00:11:42,470 --> 00:11:45,485 +down what I need to prove. + +262 +00:11:45,485 --> 00:11:49,280 +And then I would try +to somehow navigate + +263 +00:11:49,280 --> 00:11:53,540 +my way from these assumptions +to what I want to prove, + +264 +00:11:53,540 --> 00:11:57,740 +to somehow infer +small steps so that I + +265 +00:11:57,740 --> 00:11:59,960 +can assure that what I'm trying + +266 +00:11:59,960 --> 00:12:02,780 +to prove holds for +my assumptions. + +267 +00:12:02,780 --> 00:12:06,905 +Obviously. That depends +on how hard that case is. + +268 +00:12:06,905 --> 00:12:08,930 +And if you look at +our first case, + +269 +00:12:08,930 --> 00:12:10,835 +what we have to show is if to + +270 +00:12:10,835 --> 00:12:13,655 +prove that this P of 0 holds. + +271 +00:12:13,655 --> 00:12:15,410 +And actually no assumption in + +272 +00:12:15,410 --> 00:12:17,450 +this case because it's +one of the base cases. + +273 +00:12:17,450 --> 00:12:23,015 +So what I have to show is I +have to show that nullable of + +274 +00:12:23,015 --> 00:12:29,570 +0 if and only if the empty +string is in L of 0. + +275 +00:12:29,570 --> 00:12:31,340 +Okay? And I have to + +276 +00:12:31,340 --> 00:12:34,190 +somehow show that this +implementation of nullable in + +277 +00:12:34,190 --> 00:12:35,600 +this case produces + +278 +00:12:35,600 --> 00:12:39,185 +exactly the same answer +as that specification. + +279 +00:12:39,185 --> 00:12:41,660 +Okay? So what is nullable of 0 is + +280 +00:12:41,660 --> 00:12:45,605 +defined where debt is +just false by definition. + +281 +00:12:45,605 --> 00:12:50,225 +And what is this +language of LCA defined? + +282 +00:12:50,225 --> 00:12:52,535 +Well, that is just +the empty string. + +283 +00:12:52,535 --> 00:12:55,010 +In the empty language. + +284 +00:12:55,010 --> 00:12:57,110 +Is the empty string and +the empty language, + +285 +00:12:57,110 --> 00:12:59,240 +no, that is false too. + +286 +00:12:59,240 --> 00:13:01,610 +So I have shown that + +287 +00:13:01,610 --> 00:13:03,890 +both of them produce +the same result. + +288 +00:13:03,890 --> 00:13:08,765 +So I would already be done +in the first base case. + +289 +00:13:08,765 --> 00:13:12,125 +The second base case is not +so much more difficult. + +290 +00:13:12,125 --> 00:13:15,455 +We want to prove +that P of one holds. + +291 +00:13:15,455 --> 00:13:21,680 +And that would say +if nullable of one, + +292 +00:13:21,680 --> 00:13:23,825 +if and only if + +293 +00:13:23,825 --> 00:13:27,665 +the empty string is in +the language of one. + +294 +00:13:27,665 --> 00:13:31,235 +Okay? I can again look +at what's that defined, + +295 +00:13:31,235 --> 00:13:33,275 +that is defined as true. + +296 +00:13:33,275 --> 00:13:35,450 +And what is this language to + +297 +00:13:35,450 --> 00:13:37,640 +find where that is defined as + +298 +00:13:37,640 --> 00:13:39,320 +the empty string in + +299 +00:13:39,320 --> 00:13:42,230 +the language containing +the empty string. + +300 +00:13:42,230 --> 00:13:44,150 +And is the empty string in + +301 +00:13:44,150 --> 00:13:46,055 +the language containing +the empty string? + +302 +00:13:46,055 --> 00:13:47,975 +Yes, that's true as well. + +303 +00:13:47,975 --> 00:13:50,735 +So again, I've shown that +the actual implementation + +304 +00:13:50,735 --> 00:13:55,250 +produces the same result +as the specification. + +305 +00:13:55,250 --> 00:13:57,635 +I leave the character base case + +306 +00:13:57,635 --> 00:14:00,690 +for you to fill in the details. + +307 +00:14:01,150 --> 00:14:03,320 +More interesting are these + +308 +00:14:03,320 --> 00:14:05,300 +inductive cases as +you can imagine. + +309 +00:14:05,300 --> 00:14:08,360 +So let's try to prove +it for the alternative. + +310 +00:14:08,360 --> 00:14:10,639 +So I want to show a property + +311 +00:14:10,639 --> 00:14:13,790 +for this record expression way. + +312 +00:14:13,790 --> 00:14:15,410 +What does our property save + +313 +00:14:15,410 --> 00:14:16,970 +for that reg expression +where we have to + +314 +00:14:16,970 --> 00:14:20,915 +say that nullable of + +315 +00:14:20,915 --> 00:14:25,490 +R1 plus R2 if and only if + +316 +00:14:25,490 --> 00:14:31,519 +the empty string is in the +language if R1 plus R2. + +317 +00:14:31,519 --> 00:14:35,090 +Ok? So I would now go ahead with + +318 +00:14:35,090 --> 00:14:39,695 +my plank piece of paper and +I would write down on top. + +319 +00:14:39,695 --> 00:14:41,475 +Now what can I assume? + +320 +00:14:41,475 --> 00:14:45,170 +Well, I can assume +that P of R1 holds and + +321 +00:14:45,170 --> 00:14:49,175 +I can assume what +p of r two holds. + +322 +00:14:49,175 --> 00:14:50,750 +And let's just write that down. + +323 +00:14:50,750 --> 00:14:56,210 +We can say that nullable of R1 if + +324 +00:14:56,210 --> 00:14:58,655 +and only if the empty string + +325 +00:14:58,655 --> 00:15:02,000 +is in the language of +R1, I can assume that. + +326 +00:15:02,000 --> 00:15:03,500 +And I can assume + +327 +00:15:03,500 --> 00:15:10,355 +that none above our two if +and only if the empty string. + +328 +00:15:10,355 --> 00:15:14,675 +Or to Hawaii. What's now, + +329 +00:15:14,675 --> 00:15:15,830 +what do I want to prove? + +330 +00:15:15,830 --> 00:15:18,455 +Where I want to +prove this property? + +331 +00:15:18,455 --> 00:15:21,935 +So let me write this here +again on the bottom. + +332 +00:15:21,935 --> 00:15:30,935 +So p of R1 plus R2 +is narrow form of R1 + +333 +00:15:30,935 --> 00:15:34,595 +plus R2 if and only if + +334 +00:15:34,595 --> 00:15:40,610 +the empty string is in the +language of R1 plus R2. + +335 +00:15:40,610 --> 00:15:43,280 +Well, the first thing I can do is + +336 +00:15:43,280 --> 00:15:46,760 +I can infer what's the +definition of that? + +337 +00:15:46,760 --> 00:15:51,920 +Where that would +be nullable of R1 + +338 +00:15:51,920 --> 00:15:57,530 +or nullable of our two. + +339 +00:15:57,530 --> 00:15:59,960 +That's how it was defined, okay? + +340 +00:15:59,960 --> 00:16:01,730 +And what can I do with this? + +341 +00:16:01,730 --> 00:16:04,700 +I can also look at how is + +342 +00:16:04,700 --> 00:16:08,510 +this language to find +where that is defined as + +343 +00:16:08,510 --> 00:16:16,730 +the empty string in the +language of R1 union R2. + +344 +00:16:16,730 --> 00:16:19,910 +Okay? So what does that mean? + +345 +00:16:19,910 --> 00:16:23,585 +Well, Venice, the empty +string in a union. + +346 +00:16:23,585 --> 00:16:27,095 +Well, it has to be +the case that either + +347 +00:16:27,095 --> 00:16:31,190 +the empty string +is in L of R one, + +348 +00:16:31,190 --> 00:16:37,670 +all the empty string +is o. L of R two. + +349 +00:16:37,670 --> 00:16:43,715 +Ok, so that's, I just +applied all the definitions. + +350 +00:16:43,715 --> 00:16:47,930 +And now I have to look better +my induction hypothesis. + +351 +00:16:47,930 --> 00:16:50,930 +These somehow help. + +352 +00:16:50,930 --> 00:16:55,820 +Well, I know nullable +of R1 holds if + +353 +00:16:55,820 --> 00:17:00,815 +and only if the empty +string is in L of one. + +354 +00:17:00,815 --> 00:17:05,330 +So that is by the induction +hypothesis, a hypothesis. + +355 +00:17:05,330 --> 00:17:07,670 +And I also know by this, + +356 +00:17:07,670 --> 00:17:10,040 +that this holds only if + +357 +00:17:10,040 --> 00:17:17,465 +the empty string is in R2 +and there's an in-between. + +358 +00:17:17,465 --> 00:17:19,580 +Now you can see again, + +359 +00:17:19,580 --> 00:17:23,690 +this is what my specification +say It must hold. + +360 +00:17:23,690 --> 00:17:25,640 +And this is essentially + +361 +00:17:25,640 --> 00:17:28,685 +what I could infer from +my implementation. + +362 +00:17:28,685 --> 00:17:30,140 +And they are the same. + +363 +00:17:30,140 --> 00:17:32,270 +So also this case, + +364 +00:17:32,270 --> 00:17:35,520 +I've shown my property holds. + +365 +00:17:35,740 --> 00:17:39,275 +Just for fun. Let's do this +also for the sequence. + +366 +00:17:39,275 --> 00:17:43,835 +So we want to show +P of 014 by two. + +367 +00:17:43,835 --> 00:17:51,050 +And that means I have to +prove that nullable of R1 by + +368 +00:17:51,050 --> 00:17:54,095 +R2 if and only if + +369 +00:17:54,095 --> 00:18:00,440 +the empty string is +one followed by two. + +370 +00:18:00,440 --> 00:18:04,880 +Ok? And in this case +I'm just writing + +371 +00:18:04,880 --> 00:18:09,695 +my induction +hypothesis on the top. + +372 +00:18:09,695 --> 00:18:11,060 +And that would be, + +373 +00:18:11,060 --> 00:18:13,460 +I know that p of r one holds. + +374 +00:18:13,460 --> 00:18:18,290 +So another ball of R one + +375 +00:18:18,290 --> 00:18:24,860 +if and only if the empty +string is in L one. + +376 +00:18:24,860 --> 00:18:29,525 +And the P of R to say +is that I know already + +377 +00:18:29,525 --> 00:18:36,815 +nullable of R2 if and only +if the empty string then R2. + +378 +00:18:36,815 --> 00:18:39,230 +Okay? And I know some hard to + +379 +00:18:39,230 --> 00:18:42,285 +infer that under +these assumptions, + +380 +00:18:42,285 --> 00:18:45,970 +this property, like +in the earlier case, + +381 +00:18:45,970 --> 00:18:48,070 +I just have a look how + +382 +00:18:48,070 --> 00:18:51,145 +everything is defined and +see if I can go from there. + +383 +00:18:51,145 --> 00:18:56,485 +So nullable of the +sequence would be nullable + +384 +00:18:56,485 --> 00:19:04,555 +of R1 and nullable of R2. + +385 +00:19:04,555 --> 00:19:08,559 +Okay? By the +induction hypothesis, + +386 +00:19:08,559 --> 00:19:16,255 +I can now infer that +the empty string is + +387 +00:19:16,255 --> 00:19:24,750 +in L one and the empty +string is in L two. + +388 +00:19:24,750 --> 00:19:28,175 +Ok? Not much as I +can do on this side. + +389 +00:19:28,175 --> 00:19:31,520 +But I can now try +to infer something + +390 +00:19:31,520 --> 00:19:33,110 +here because I know how + +391 +00:19:33,110 --> 00:19:35,735 +this L of this +sequence is defined. + +392 +00:19:35,735 --> 00:19:39,665 +That says the empty +string has to be n l, + +393 +00:19:39,665 --> 00:19:45,110 +one append of L of R, two. + +394 +00:19:45,110 --> 00:19:48,365 +Ok? So they are not yet equal. + +395 +00:19:48,365 --> 00:19:50,150 +But now I have to +essentially scratch + +396 +00:19:50,150 --> 00:19:52,430 +my head. What does that mean? + +397 +00:19:52,430 --> 00:19:54,965 +So this is the language +concatenation, + +398 +00:19:54,965 --> 00:19:57,035 +Venice, the empty string. + +399 +00:19:57,035 --> 00:20:01,265 +In this concatenation of +these two language as well. + +400 +00:20:01,265 --> 00:20:05,990 +It will be there if +L of our wormholes + +401 +00:20:05,990 --> 00:20:12,560 +and The empty string +of L R2 holds. + +402 +00:20:12,560 --> 00:20:14,600 +And so again, I have shown that + +403 +00:20:14,600 --> 00:20:17,780 +this is equivalent to that. + +404 +00:20:17,780 --> 00:20:20,480 +Which means that this must be + +405 +00:20:20,480 --> 00:20:24,540 +equal to that o holds +if and only if. + +406 +00:20:25,600 --> 00:20:28,940 +Finally, the star case, + +407 +00:20:28,940 --> 00:20:31,235 +p of r star. + +408 +00:20:31,235 --> 00:20:35,450 +Ok? So I know my +induction hypothesis, + +409 +00:20:35,450 --> 00:20:41,075 +let me write it again on +the top that P of R holds, + +410 +00:20:41,075 --> 00:20:46,835 +which means nullable of R if and + +411 +00:20:46,835 --> 00:20:53,060 +only if the empty +string is in L. Okay? + +412 +00:20:53,060 --> 00:20:58,580 +And I have to prove +that nullable of + +413 +00:20:58,580 --> 00:21:08,930 +r star if and only if the +empty string is n l of r star. + +414 +00:21:08,930 --> 00:21:15,020 +Ok? So again, I just follow +how are things define? + +415 +00:21:15,020 --> 00:21:18,125 +So this is defined +actually as true. + +416 +00:21:18,125 --> 00:21:21,365 +So a star's always nullable. + +417 +00:21:21,365 --> 00:21:23,090 +What is this defined where, + +418 +00:21:23,090 --> 00:21:26,225 +remember that wants to find +using the Kleene star. + +419 +00:21:26,225 --> 00:21:28,685 +So that is essentially L of + +420 +00:21:28,685 --> 00:21:33,680 +R. And then we had this +star on languages. + +421 +00:21:33,680 --> 00:21:36,260 +And does the star on + +422 +00:21:36,260 --> 00:21:39,380 +languages always contain +the empty string? + +423 +00:21:39,380 --> 00:21:42,980 +Yes, that is true as well. + +424 +00:21:42,980 --> 00:21:45,320 +And so again, I've +shown their clone. + +425 +00:21:45,320 --> 00:21:47,420 +And here you can see, even if I + +426 +00:21:47,420 --> 00:21:49,730 +have here in this case +an induction hypothesis, + +427 +00:21:49,730 --> 00:21:51,860 +I actually didn't need +that in my reasoning. + +428 +00:21:51,860 --> 00:21:53,960 +Well, that might happen. + +429 +00:21:53,960 --> 00:21:56,840 +This might have been the +first induction proof + +430 +00:21:56,840 --> 00:21:58,445 +for regular expressions for you. + +431 +00:21:58,445 --> 00:22:00,680 +But you've seen at +Watson so difficult, + +432 +00:22:00,680 --> 00:22:02,480 +you just follow this recipe. + +433 +00:22:02,480 --> 00:22:04,280 +It told us we can + +434 +00:22:04,280 --> 00:22:06,725 +assume in each case +what we have to prove. + +435 +00:22:06,725 --> 00:22:09,860 +And then we essentially just +followed what are things to + +436 +00:22:09,860 --> 00:22:13,595 +find and try to infer +whether in each case, + +437 +00:22:13,595 --> 00:22:15,560 +both of them produce +the same result, + +438 +00:22:15,560 --> 00:22:18,245 +the specification and +the implementation. + +439 +00:22:18,245 --> 00:22:20,300 +And the great thing about this, + +440 +00:22:20,300 --> 00:22:22,250 +if you've done all these cases, + +441 +00:22:22,250 --> 00:22:25,490 +we not just know that it holds +for some reg expression, + +442 +00:22:25,490 --> 00:22:28,280 +but actually it holds for +all wreck expressions be + +443 +00:22:28,280 --> 00:22:30,890 +couldn't achieve the same +result by looking just + +444 +00:22:30,890 --> 00:22:33,710 +at testing because +we would not be + +445 +00:22:33,710 --> 00:22:37,280 +able to test for all +record expressions. + +446 +00:22:37,280 --> 00:22:39,560 +Ok, of course, there will be + +447 +00:22:39,560 --> 00:22:43,010 +also induction proofs which +are more complicated. + +448 +00:22:43,010 --> 00:22:46,170 +But let's leave that +for another time. + +449 +00:22:49,210 --> 00:22:53,405 +You know, I always like to +talk about the subject. + +450 +00:22:53,405 --> 00:22:55,700 +So here's one last thought. + +451 +00:22:55,700 --> 00:22:58,040 +You might think these proofs for + +452 +00:22:58,040 --> 00:22:59,930 +this little property +about nullable are + +453 +00:22:59,930 --> 00:23:04,295 +totally overkill and +total waste of your time. + +454 +00:23:04,295 --> 00:23:07,310 +Please rest assured disproves + +455 +00:23:07,310 --> 00:23:10,160 +have saved me from so +much embarrassment. + +456 +00:23:10,160 --> 00:23:13,745 +You cannot imagine, +once I have a proof, + +457 +00:23:13,745 --> 00:23:15,425 +I can sleep much better. + +458 +00:23:15,425 --> 00:23:16,820 +If I don't have a proof, + +459 +00:23:16,820 --> 00:23:19,560 +I don't believe what I'm doing.