videos/02-cw1.srt
changeset 774 42733adf2a48
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/videos/02-cw1.srt	Mon Oct 05 17:46:12 2020 +0100
@@ -0,0 +1,1107 @@
+1
+00:00:06,410 --> 00:00:09,390
+The final video for this week.
+
+2
+00:00:09,390 --> 00:00:12,465
+And let me say something
+about the coursework.
+
+3
+00:00:12,465 --> 00:00:15,300
+First. You can solve
+
+4
+00:00:15,300 --> 00:00:17,745
+the coursework in any programming
+language you're like,
+
+5
+00:00:17,745 --> 00:00:22,440
+I already said that. You
+have to submit a PDF file.
+
+6
+00:00:22,440 --> 00:00:23,850
+There will be some questions,
+
+7
+00:00:23,850 --> 00:00:26,250
+so you have to write down the
+solution to this questions.
+
+8
+00:00:26,250 --> 00:00:30,315
+Please use a PDF and you have
+to submit your source code.
+
+9
+00:00:30,315 --> 00:00:35,580
+And yes, if you do use a
+language which isn't Scala,
+
+10
+00:00:35,580 --> 00:00:39,450
+it might help to tell me
+how I can run your code.
+
+11
+00:00:39,450 --> 00:00:42,550
+If I can't run your code,
+I will ask you anyway.
+
+12
+00:00:42,550 --> 00:00:50,044
+Also, please submit both the
+PDF and the code in a file,
+
+13
+00:00:50,044 --> 00:00:52,730
+in a zip file, which generates
+
+14
+00:00:52,730 --> 00:00:55,835
+a subdirectory with your
+name and your family name.
+
+15
+00:00:55,835 --> 00:00:58,970
+That we'll just save a
+lot of time for me to try
+
+16
+00:00:58,970 --> 00:01:02,030
+to figure out who
+has submitted what.
+
+17
+00:01:02,030 --> 00:01:04,445
+Please do that.
+
+18
+00:01:04,445 --> 00:01:07,789
+So what is the task
+into coursework?
+
+19
+00:01:07,789 --> 00:01:09,885
+I essentially showed you how
+
+20
+00:01:09,885 --> 00:01:11,995
+the Brzozowski matcher
+
+21
+00:01:11,995 --> 00:01:14,645
+works for the basic
+regular expressions.
+
+22
+00:01:14,645 --> 00:01:16,295
+I also showed you actually how it
+
+23
+00:01:16,295 --> 00:01:18,110
+works for the n-times.
+
+24
+00:01:18,110 --> 00:01:20,300
+And the task in coursework
+
+25
+00:01:20,300 --> 00:01:22,970
+is that you extend the
+Brzozowski matcher to
+
+26
+00:01:22,970 --> 00:01:25,820
+the other regular expressions
+
+27
+00:01:25,820 --> 00:01:27,800
+from the extended
+regular expressions.
+
+28
+00:01:27,800 --> 00:01:30,245
+So you're supposed
+to extended with
+
+29
+00:01:30,245 --> 00:01:34,490
+r-plus, optional r, for n-times.
+
+30
+00:01:34,490 --> 00:01:38,420
+You've already seen that.
+For 0 or more times,
+
+31
+00:01:38,420 --> 00:01:41,120
+but not more than m
+times regular expression.
+
+32
+00:01:41,120 --> 00:01:45,890
+For at least n-times regular
+expression and for between
+
+33
+00:01:45,890 --> 00:01:47,480
+n times and m times
+
+34
+00:01:47,480 --> 00:01:50,810
+regular expression and also this
+NOT regular expression.
+
+35
+00:01:50,810 --> 00:01:52,790
+So your task is to
+
+36
+00:01:52,790 --> 00:01:55,310
+essentially find the definition
+
+37
+00:01:55,310 --> 00:01:57,155
+for nullable in these cases.
+
+38
+00:01:57,155 --> 00:02:00,215
+Find a definition for derivative,
+
+39
+00:02:00,215 --> 00:02:02,480
+implement them,
+write them down in
+
+40
+00:02:02,480 --> 00:02:06,065
+a PDF and then do some
+experiments with them.
+
+41
+00:02:06,065 --> 00:02:08,875
+Well, how can you do that?
+
+42
+00:02:08,875 --> 00:02:10,370
+Well I've given you
+
+43
+00:02:10,370 --> 00:02:13,565
+the meaning of these additional
+regular expressions.
+
+44
+00:02:13,565 --> 00:02:16,730
+So here's, for example, the
+meaning of this r-plus.
+
+45
+00:02:16,730 --> 00:02:18,290
+So that would be, I have
+
+46
+00:02:18,290 --> 00:02:22,115
+at least one copy up to infinity.
+
+47
+00:02:22,115 --> 00:02:25,070
+And the optional-r would be it's
+
+48
+00:02:25,070 --> 00:02:28,370
+the language of r plus
+the empty string.
+
+49
+00:02:28,370 --> 00:02:30,440
+If I have it exactly n times,
+
+50
+00:02:30,440 --> 00:02:31,985
+then it would be
+just the language
+
+51
+00:02:31,985 --> 00:02:34,010
+of r exactly n-times.
+
+52
+00:02:34,010 --> 00:02:38,105
+And here you have union
+from 0 to m and so on.
+
+53
+00:02:38,105 --> 00:02:42,560
+This might be a slightly
+interesting regular expression.
+
+54
+00:02:42,560 --> 00:02:46,580
+So that's supposed to be the
+character set that is very
+
+55
+00:02:46,580 --> 00:02:48,410
+similar to character ranges like
+
+56
+00:02:48,410 --> 00:02:51,005
+in the existing regular
+expression matchers.
+
+57
+00:02:51,005 --> 00:02:52,820
+So this just says
+
+58
+00:02:52,820 --> 00:02:55,565
+...this regular
+expression can match
+
+59
+00:02:55,565 --> 00:03:00,860
+either the character c1 or
+the character c2 up to cn.
+
+60
+00:03:00,860 --> 00:03:03,620
+Why do I include
+that regular expression?
+
+61
+00:03:03,620 --> 00:03:09,050
+I could also just say
+c1 plus c2 plus c3...
+
+62
+00:03:09,050 --> 00:03:12,889
+a big alternative.
+Well that's possible.
+
+63
+00:03:12,889 --> 00:03:16,595
+But remember the Achilles'
+heel of this algorithm is,
+
+64
+00:03:16,595 --> 00:03:18,425
+if the regular expression is large,
+
+65
+00:03:18,425 --> 00:03:20,825
+then the computation
+take always a long time.
+
+66
+00:03:20,825 --> 00:03:23,840
+So I'm trying to compress
+that as much as I can.
+
+67
+00:03:23,840 --> 00:03:25,370
+So instead of giving a big
+
+68
+00:03:25,370 --> 00:03:29,134
+alternative, I am trying to
+give one regular expression,
+
+69
+00:03:29,134 --> 00:03:31,340
+which can not just match
+a single character,
+
+70
+00:03:31,340 --> 00:03:34,230
+but a set of characters.
+
+71
+00:03:34,630 --> 00:03:36,980
+How can you be sure that
+
+72
+00:03:36,980 --> 00:03:39,215
+definition you come
+up with are correct?
+
+73
+00:03:39,215 --> 00:03:41,450
+Well, I showed you which are
+
+74
+00:03:41,450 --> 00:03:45,170
+the properties these two
+functions need to satisfy.
+
+75
+00:03:45,170 --> 00:03:47,060
+I've given you here what
+
+76
+00:03:47,060 --> 00:03:49,325
+the meaning of these
+expressions are.
+
+77
+00:03:49,325 --> 00:03:52,700
+So you will always know what's
+on the right-hand side.
+
+78
+00:03:52,700 --> 00:03:54,515
+This is completely determined.
+
+79
+00:03:54,515 --> 00:03:56,720
+You essentially
+have to fill something
+
+80
+00:03:56,720 --> 00:03:58,910
+equivalent on the left-hand side.
+
+81
+00:03:58,910 --> 00:04:02,105
+That's the main task
+of the coursework.
+
+82
+00:04:02,105 --> 00:04:08,090
+And you can start from the
+files I provided on KEATS.
+
+83
+00:04:08,090 --> 00:04:12,125
+So you would just have to
+add these additional cases.
+
+84
+00:04:12,125 --> 00:04:15,020
+When you add these
+additional cases
+
+85
+00:04:15,020 --> 00:04:17,330
+and you're using the Scala language,
+
+86
+00:04:17,330 --> 00:04:18,980
+you can do me a favour.
+
+87
+00:04:18,980 --> 00:04:22,550
+You can call these
+constructors for
+
+88
+00:04:22,550 --> 00:04:25,400
+these different regular
+expressions as RANGE,
+
+89
+00:04:25,400 --> 00:04:28,985
+PLUS, OPTIONAL and NTIMES,
+UPTO, FROM and BETWEEN.
+
+90
+00:04:28,985 --> 00:04:31,370
+Remember I have this
+convention to always
+
+91
+00:04:31,370 --> 00:04:34,025
+use capital letters
+for regular expressions.
+
+92
+00:04:34,025 --> 00:04:36,680
+It would be nice if you could use
+
+93
+00:04:36,680 --> 00:04:39,260
+these names because
+then it will be
+
+94
+00:04:39,260 --> 00:04:42,335
+very easy for me
+to test your code.
+
+95
+00:04:42,335 --> 00:04:44,690
+If you're using a different
+programming language
+
+96
+00:04:44,690 --> 00:04:46,370
+like let's say Rust,
+
+97
+00:04:46,370 --> 00:04:48,860
+I expect some people will use that, where I
+
+98
+00:04:48,860 --> 00:04:51,380
+have no idea what are the
+conventions in your language,
+
+99
+00:04:51,380 --> 00:04:53,420
+how you have to call
+these constructors,
+
+100
+00:04:53,420 --> 00:04:56,420
+we will see whatever you
+submit. I'll have a look.
+
+101
+00:04:56,420 --> 00:04:59,360
+There's one more
+constraint I have to
+
+102
+00:04:59,360 --> 00:05:02,194
+impose to make this
+coursework interesting.
+
+103
+00:05:02,194 --> 00:05:04,730
+I do not want you 
+that you just take
+
+104
+00:05:04,730 --> 00:05:07,370
+these extended regular
+expressions and that you
+
+105
+00:05:07,370 --> 00:05:10,550
+expand them using
+their definition.
+
+106
+00:05:10,550 --> 00:05:12,320
+Because that would make the regular
+
+107
+00:05:12,320 --> 00:05:13,955
+expression matcher very slow.
+
+108
+00:05:13,955 --> 00:05:16,160
+So for example, if
+you want to define
+
+109
+00:05:16,160 --> 00:05:18,785
+what's the derivative for r-plus,
+
+110
+00:05:18,785 --> 00:05:21,560
+then what does not
+count as a solution:
+
+111
+00:05:21,560 --> 00:05:24,770
+if you just expand that
+to the definition that r
+
+112
+00:05:24,770 --> 00:05:28,935
+plus is equivalent to
+r followed by r-star.
+
+113
+00:05:28,935 --> 00:05:32,845
+So in code, what I
+would like to not see,
+
+114
+00:05:32,845 --> 00:05:35,680
+I would not give any
+marks for that is,
+
+115
+00:05:35,680 --> 00:05:37,780
+if you add the plus as follows,
+
+116
+00:05:37,780 --> 00:05:39,910
+so that is still perfectly fine.
+
+117
+00:05:39,910 --> 00:05:42,160
+There's nothing you
+can do differently.
+
+118
+00:05:42,160 --> 00:05:44,065
+That would be the constructor.
+
+119
+00:05:44,065 --> 00:05:46,480
+But when you define nullable,
+
+120
+00:05:46,480 --> 00:05:49,180
+I do not want to see
+that you defined
+
+121
+00:05:49,180 --> 00:05:51,790
+this plus r as nullable
+
+122
+00:05:51,790 --> 00:05:54,385
+of the sequence of r and star-r,
+
+123
+00:05:54,385 --> 00:05:58,075
+just unfolding
+the definition.
+
+124
+00:05:58,075 --> 00:06:00,415
+As you can see, when you come
+
+125
+00:06:00,415 --> 00:06:02,815
+up with a much better
+solution for that.
+
+126
+00:06:02,815 --> 00:06:05,110
+This is a very inefficient way
+
+127
+00:06:05,110 --> 00:06:07,090
+for how to calculate nullable.
+
+128
+00:06:07,090 --> 00:06:10,825
+And so the same for derivative
+would not be allowed.
+
+129
+00:06:10,825 --> 00:06:13,895
+If you, I have the plus r here,
+
+130
+00:06:13,895 --> 00:06:16,685
+you can't just unfold
+the definition,
+
+131
+00:06:16,685 --> 00:06:19,790
+with r-plus
+being defined as r
+
+132
+00:06:19,790 --> 00:06:21,350
+followed by r star and
+
+133
+00:06:21,350 --> 00:06:23,375
+then just build the
+derivative of that.
+
+134
+00:06:23,375 --> 00:06:25,280
+You have to find something more
+
+135
+00:06:25,280 --> 00:06:28,730
+primitive that involves
+only the derivative of r,
+
+136
+00:06:28,730 --> 00:06:31,790
+essentially, nothing
+more involved. The same
+
+137
+00:06:31,790 --> 00:06:33,815
+if you have n-times, for example,
+
+138
+00:06:33,815 --> 00:06:39,935
+you can't just unfold this
+n-times in this sequence
+
+139
+00:06:39,935 --> 00:06:43,310
+of .... n-copies
+
+140
+00:06:43,310 --> 00:06:47,285
+of this r. You have to get
+something more primitive.
+
+141
+00:06:47,285 --> 00:06:49,760
+Otherwise, as you've
+seen earlier,
+
+142
+00:06:49,760 --> 00:06:53,135
+your regular expression matcher
+would be totally slow.
+
+143
+00:06:53,135 --> 00:06:55,475
+When you submit your solution,
+
+144
+00:06:55,475 --> 00:06:58,445
+please submit this
+solution in the PDF.
+
+145
+00:06:58,445 --> 00:07:01,580
+So give the cases of your definition
+
+146
+00:07:01,580 --> 00:07:06,004
+in a form like that for
+nullable and derivative.
+
+147
+00:07:06,004 --> 00:07:08,405
+And also implement it in code.
+
+148
+00:07:08,405 --> 00:07:10,910
+That is just helping me to
+
+149
+00:07:10,910 --> 00:07:13,400
+find out whether you have
+the correct solution or not.
+
+150
+00:07:13,400 --> 00:07:15,710
+So you needed a kind of
+mathematical notation of
+
+151
+00:07:15,710 --> 00:07:18,695
+your solution, and
+an actual code.
+
+152
+00:07:18,695 --> 00:07:22,414
+And then once you
+have your definition,
+
+153
+00:07:22,414 --> 00:07:25,699
+also make sure you try
+it out on some examples.
+
+154
+00:07:25,699 --> 00:07:28,970
+These will be the examples
+I will definitely try out,
+
+155
+00:07:28,970 --> 00:07:30,560
+but probably also more
+
+156
+00:07:30,560 --> 00:07:33,215
+depending on what
+definitions you give me.
+
+157
+00:07:33,215 --> 00:07:38,300
+There's one more question I
+ask you to do. You've seen
+
+158
+00:07:38,300 --> 00:07:40,130
+there's some regular
+expressions that
+
+159
+00:07:40,130 --> 00:07:42,215
+are involved for characters,
+
+160
+00:07:42,215 --> 00:07:46,925
+for character ranges or
+character sets essentially.
+
+161
+00:07:46,925 --> 00:07:48,665
+And sometimes I also want to have
+
+162
+00:07:48,665 --> 00:07:51,665
+just a reg expression which
+can match any character!!
+
+163
+00:07:51,665 --> 00:07:56,195
+And I could have for all of
+them separate definitions.
+
+164
+00:07:56,195 --> 00:07:57,710
+And that would mean I also need
+
+165
+00:07:57,710 --> 00:07:59,645
+separate definitions for nullable,
+
+166
+00:07:59,645 --> 00:08:02,405
+separate definitions
+for derivative.
+
+167
+00:08:02,405 --> 00:08:05,825
+But actually they can be
+generalized to just one constructor.
+
+168
+00:08:05,825 --> 00:08:08,210
+And that is if we introduce
+
+169
+00:08:08,210 --> 00:08:11,630
+a constructor for regular expressions,
+
+170
+00:08:11,630 --> 00:08:13,760
+which not just takes
+a single character
+
+171
+00:08:13,760 --> 00:08:15,469
+or set of characters.
+
+172
+00:08:15,469 --> 00:08:18,245
+But, which I call here CFUN.
+
+173
+00:08:18,245 --> 00:08:23,165
+And it takes a function from
+characters to booleans.
+
+174
+00:08:23,165 --> 00:08:24,470
+So if I want to match
+
+175
+00:08:24,470 --> 00:08:26,900
+just a single character
+then this function f
+
+176
+00:08:26,900 --> 00:08:29,060
+would only say true
+
+177
+00:08:29,060 --> 00:08:32,225
+if it gets as argument
+the single character.
+
+178
+00:08:32,225 --> 00:08:34,850
+If I have a character set,
+
+179
+00:08:34,850 --> 00:08:36,515
+then I would say, well,
+
+180
+00:08:36,515 --> 00:08:38,300
+I need a function
+which says true
+
+181
+00:08:38,300 --> 00:08:40,970
+for all the characters
+in this set.
+
+182
+00:08:40,970 --> 00:08:43,460
+And otherwise it's false.
+
+183
+00:08:43,460 --> 00:08:46,205
+And if I want to
+match any character!!,
+
+184
+00:08:46,205 --> 00:08:48,500
+then they should get a function
+
+185
+00:08:48,500 --> 00:08:53,450
+which says true for
+all characters.
+
+186
+00:08:53,450 --> 00:08:56,630
+Okay? If I have
+such a constructor
+
+187
+00:08:56,630 --> 00:09:00,140
+that actually I can save
+myself a bit of work.
+
+188
+00:09:00,140 --> 00:09:03,200
+And I can just have one case
+
+189
+00:09:03,200 --> 00:09:06,920
+for nullable and one
+case for CFUNS.
+
+190
+00:09:06,920 --> 00:09:09,800
+So that would then subsume
+the character ranges and
+
+191
+00:09:09,800 --> 00:09:13,385
+the character and also
+this ALL regular expression.
+
+192
+00:09:13,385 --> 00:09:15,380
+Ok, this was not explicitly
+included at the beginning,
+
+193
+00:09:15,380 --> 00:09:17,510
+but that's what I can now define.
+
+194
+00:09:17,510 --> 00:09:21,740
+And then I can define
+this for characters,
+
+195
+00:09:21,740 --> 00:09:23,885
+just as special cases
+for these functions.
+
+196
+00:09:23,885 --> 00:09:25,610
+And I told you already
+what this function
+
+197
+00:09:25,610 --> 00:09:28,025
+should look like in
+these three cases.
+
+198
+00:09:28,025 --> 00:09:30,350
+So I let ...
+
+199
+00:09:30,350 --> 00:09:33,515
+you decide how you're
+going to implement that.
+
+200
+00:09:33,515 --> 00:09:35,450
+If you first define
+
+201
+00:09:35,450 --> 00:09:38,495
+your implementation is
+all these explicit cases.
+
+202
+00:09:38,495 --> 00:09:41,900
+Because in these cases it's
+actually more intuitive,
+
+203
+00:09:41,900 --> 00:09:45,980
+what nullable and
+derivative should be.
+
+204
+00:09:45,980 --> 00:09:48,035
+And then in a second step,
+
+205
+00:09:48,035 --> 00:09:51,140
+you implement these
+more general cases.
+
+206
+00:09:51,140 --> 00:09:53,360
+And just keep the original ones
+
+207
+00:09:53,360 --> 00:09:54,500
+in your implementation if you
+
+208
+00:09:54,500 --> 00:09:57,710
+want, or if you feel
+adventurous enough,
+
+209
+00:09:57,710 --> 00:09:59,225
+and want to be lazy,
+
+210
+00:09:59,225 --> 00:10:01,115
+that you just implement that.
+
+211
+00:10:01,115 --> 00:10:02,660
+And then you have already done
+
+212
+00:10:02,660 --> 00:10:06,530
+at least two constructors
+by just implementing one.
+
+213
+00:10:06,530 --> 00:10:08,915
+But as said that 
+requires a bit skill
+
+214
+00:10:08,915 --> 00:10:11,450
+of generalizing how
+that should be.
+
+215
+00:10:11,450 --> 00:10:14,180
+And the other questions
+are just then
+
+216
+00:10:14,180 --> 00:10:16,745
+trying out your
+expression matcher on
+
+217
+00:10:16,745 --> 00:10:19,580
+some examples, more
+power examples,
+
+218
+00:10:19,580 --> 00:10:22,400
+and they should be
+very easy to solve.
+
+219
+00:10:22,400 --> 00:10:24,605
+So I hope it's not
+too much asked for
+
+220
+00:10:24,605 --> 00:10:26,615
+and I hope you have fun.
+
+221
+00:10:26,615 --> 00:10:29,675
+It is not too much ask
+because you can,
+
+222
+00:10:29,675 --> 00:10:32,420
+I hope it's not too much
+because you can start from
+
+223
+00:10:32,420 --> 00:10:35,615
+my definitions or
+my implementation.
+
+224
+00:10:35,615 --> 00:10:39,425
+It's really essentially
+mostly is brain work,
+
+225
+00:10:39,425 --> 00:10:42,170
+how these nullable and
+derivative should work.
+
+226
+00:10:42,170 --> 00:10:44,510
+If you're in a
+language like Scala,
+
+227
+00:10:44,510 --> 00:10:48,875
+the implementation should
+then be easy-peasy.
+
+228
+00:10:48,875 --> 00:10:51,365
+If you are in a different language
+
+229
+00:10:51,365 --> 00:10:52,610
+I assume you also
+
+230
+00:10:52,610 --> 00:10:54,890
+dedicated to that
+language to start with,
+
+231
+00:10:54,890 --> 00:10:58,475
+so it should be also very
+easy for you to translate
+
+232
+00:10:58,475 --> 00:11:01,100
+my Scala code into whatever
+language you want to
+
+233
+00:11:01,100 --> 00:11:04,280
+do, first and then
+start from there.
+
+234
+00:11:04,280 --> 00:11:06,230
+If there are any questions,
+
+235
+00:11:06,230 --> 00:11:08,360
+please ask me or the TAs.
+
+236
+00:11:08,360 --> 00:11:10,040
+We are trying to be as helpful
+
+237
+00:11:10,040 --> 00:11:12,900
+as possible with the coursework.
+
+238
+00:11:13,240 --> 00:11:17,885
+Bear in mind, this is the
+first step in our compiler.
+
+239
+00:11:17,885 --> 00:11:21,005
+The coursework 2 will then
+build on top of that.
+
+240
+00:11:21,005 --> 00:11:25,770
+So it's better to get this
+correct first. Thanks.