videos/01-evilregexes.srt
changeset 837 499405058cfd
parent 765 b294cfbb5c01
equal deleted inserted replaced
836:a3418ee8c404 837:499405058cfd
   167 question mark, which says you
   167 question mark, which says you
   168 
   168 
   169 38
   169 38
   170 00:01:55,790 --> 00:01:59,105
   170 00:01:55,790 --> 00:01:59,105
   171 match either it is there
   171 match either it is there
   172 or it ss not there.
   172 or it is not there.
   173 
   173 
   174 39
   174 39
   175 00:01:59,105 --> 00:02:01,340
   175 00:01:59,105 --> 00:02:01,340
   176 You are also regular
   176 You are also regular
   177 expressions which
   177 expressions which
   227 and z and 0 to 9 and so on.
   227 and z and 0 to 9 and so on.
   228 
   228 
   229 51
   229 51
   230 00:02:33,635 --> 00:02:36,545
   230 00:02:33,635 --> 00:02:36,545
   231 Here you also have regular
   231 Here you also have regular
   232 expression which can
   232 expressions which can
   233 
   233 
   234 52
   234 52
   235 00:02:36,545 --> 00:02:40,070
   235 00:02:36,545 --> 00:02:40,070
   236 match something which
   236 match something which
   237 isn't in this range.
   237 isn't in this range.
   302 Okay, so these are the
   302 Okay, so these are the
   303 typical regular expressions.
   303 typical regular expressions.
   304 
   304 
   305 67
   305 67
   306 00:03:19,925 --> 00:03:23,075
   306 00:03:19,925 --> 00:03:23,075
   307 And here's a particular one.
   307 And here's a particular one
   308 
   308 
   309 68
   309 68
   310 00:03:23,075 --> 00:03:25,820
   310 00:03:23,075 --> 00:03:25,820
   311 Trying to match something
   311 trying to match something
   312 
   312 
   313 69
   313 69
   314 00:03:25,820 --> 00:03:28,770
   314 00:03:25,820 --> 00:03:28,770
   315 which resembles
   315 which resembles
   316 an email address.
   316 an email address.
   519 match the regular expressions of
   519 match the regular expressions of
   520 a star star b.
   520 a star star b.
   521 
   521 
   522 114
   522 114
   523 00:05:42,935 --> 00:05:46,805
   523 00:05:42,935 --> 00:05:46,805
   524 And we measure time with longer
   524 And we measure the time with longer
   525 and longer strings of a.
   525 and longer strings of a's.
   526 
   526 
   527 115
   527 115
   528 00:05:46,805 --> 00:05:48,770
   528 00:05:46,805 --> 00:05:48,770
   529 And so conveniently we can give
   529 And so conveniently we can give
   530 
   530 
   562 00:06:07,640 --> 00:06:09,110
   562 00:06:07,640 --> 00:06:09,110
   563 did not match the string.
   563 did not match the string.
   564 
   564 
   565 123
   565 123
   566 00:06:09,110 --> 00:06:11,255
   566 00:06:09,110 --> 00:06:11,255
   567 That's indicated by this none.
   567 That's indicated by this None.
   568 
   568 
   569 124
   569 124
   570 00:06:11,255 --> 00:06:13,925
   570 00:06:11,255 --> 00:06:13,925
   571 Let's take ten a's.
   571 Let's take ten a's.
   572 
   572 
   591 They are not 100
   591 They are not 100
   592 percent accurate.
   592 percent accurate.
   593 
   593 
   594 130
   594 130
   595 00:06:28,040 --> 00:06:31,490
   595 00:06:28,040 --> 00:06:31,490
   596 So 15 is also a let's take
   596 So 15 is also OK.
       
   597 Let's take 20.
   597 
   598 
   598 131
   599 131
   599 00:06:31,490 --> 00:06:36,965
   600 00:06:31,490 --> 00:06:36,965
   600 28th notes already
   601 Hmmm this already takes
   601 double the time.
   602 double the time.
   602 
   603 
   603 132
   604 132
   604 00:06:36,965 --> 00:06:42,440
   605 00:06:36,965 --> 00:06:42,440
   605 Twenty-five longer.
   606 Twenty-five. Then even longer.
   606 
   607 
   607 133
   608 133
   608 00:06:42,440 --> 00:06:45,680
   609 00:06:42,440 --> 00:06:45,680
   609 Okay, that suddenly
   610 Okay, then suddenly
   610 from 02 seconds,
   611 from 0.2 seconds,
   611 
   612 
   612 134
   613 134
   613 00:06:45,680 --> 00:06:48,960
   614 00:06:45,680 --> 00:06:48,960
   614 it takes almost four seconds.
   615 it now takes almost four seconds.
   615 
   616 
   616 135
   617 135
   617 00:06:49,600 --> 00:06:54,890
   618 00:06:49,600 --> 00:06:54,890
   618 Six this
   619 Twenty-Six, this
   619 
   620 
   620 136
   621 136
   621 00:06:54,890 --> 00:07:01,415
   622 00:06:54,890 --> 00:07:01,415
   622 takes six seconds
   623 takes six seconds...
   623 already Double, okay?
   624 already double. 
   624 
   625 
   625 137
   626 137
   626 00:07:01,415 --> 00:07:07,229
   627 00:07:01,415 --> 00:07:07,229
   627 Go to 28. That would be now.
   628 Let's go to 28. That would be
       
   629 ...hmmm....hmmm
   628 
   630 
   629 138
   631 138
   630 00:07:08,890 --> 00:07:11,840
   632 00:07:08,890 --> 00:07:11,840
   631 You see the string
   633 You see the string
   632 isn't very long,
   634 isn't very long,
   655 to find out that
   657 to find out that
   656 this string of 28
   658 this string of 28
   657 
   659 
   658 144
   660 144
   659 00:07:24,710 --> 00:07:26,570
   661 00:07:24,710 --> 00:07:26,570
   660 AES is actually not much
   662 a's is actually not matched
   661 
   663 
   662 145
   664 145
   663 00:07:26,570 --> 00:07:28,490
   665 00:07:26,570 --> 00:07:28,490
   664 by that you see it's
   666 by that. You see it's
   665 still not finished.
   667 still not finished.
   666 
   668 
   667 146
   669 146
   668 00:07:28,490 --> 00:07:32,900
   670 00:07:28,490 --> 00:07:32,900
   669 I think it should take
   671 I think it should take
   677 00:07:34,400 --> 00:07:36,530
   679 00:07:34,400 --> 00:07:36,530
   678 And if we would try
   680 And if we would try
   679 
   681 
   680 149
   682 149
   681 00:07:36,530 --> 00:07:40,805
   683 00:07:36,530 --> 00:07:40,805
   682 30 would be already
   684 30, we would be already here
   683 more than a minute.
   685 for more than a minute.
   684 
   686 
   685 150
   687 150
   686 00:07:40,805 --> 00:07:43,940
   688 00:07:40,805 --> 00:07:43,940
   687 And if I could read
   689 And if I could use
   688 something like hundreds,
   690 something like 100,
   689 
   691 
   690 151
   692 151
   691 00:07:43,940 --> 00:07:46,220
   693 00:07:43,940 --> 00:07:46,220
   692 you remember if a doubling in
   694 you remember if a doubling in
   693 
   695 
   754 00:08:21,830 --> 00:08:23,930
   756 00:08:21,830 --> 00:08:23,930
   755 So there's not any difference
   757 So there's not any difference
   756 
   758 
   757 166
   759 166
   758 00:08:23,930 --> 00:08:26,150
   760 00:08:23,930 --> 00:08:26,150
   759 in the strings this work
   761 in the strings this regular
   760 expression matches.
   762 expression matches.
   761 
   763 
   762 167
   764 167
   763 00:08:26,150 --> 00:08:28,610
   765 00:08:26,150 --> 00:08:28,610
   764 We'll just start at the
   766 We'll just start at the
   770 end of the string.
   772 end of the string.
   771 
   773 
   772 169
   774 169
   773 00:08:31,460 --> 00:08:35,285
   775 00:08:31,460 --> 00:08:35,285
   774 And we again, we just use
   776 And we again, we just use
   775 repeated A's for that.
   777 repeated a's for that.
   776 
   778 
   777 170
   779 170
   778 00:08:35,285 --> 00:08:38,195
   780 00:08:35,285 --> 00:08:38,195
   779 And similarly, we can
   781 And similarly, we can
   780 
   782 
   783 call it on the command line
   785 call it on the command line
   784 and can do some timing.
   786 and can do some timing.
   785 
   787 
   786 172
   788 172
   787 00:08:41,930 --> 00:08:44,540
   789 00:08:41,930 --> 00:08:44,540
   788 So ten SBA, good.
   790 So ten a's is very good.
   789 
   791 
   790 173
   792 173
   791 00:08:44,540 --> 00:08:46,340
   793 00:08:44,540 --> 00:08:46,340
   792 Here's the string.
   794 Here's the string.
   793 
   795 
   799 00:08:48,320 --> 00:08:50,525
   801 00:08:48,320 --> 00:08:50,525
   800 And it's pretty fast.
   802 And it's pretty fast.
   801 
   803 
   802 176
   804 176
   803 00:08:50,525 --> 00:08:54,725
   805 00:08:50,525 --> 00:08:54,725
   804 Friendly. Although pretty fast.
   806 Twenty...also pretty fast.
   805 
   807 
   806 177
   808 177
   807 00:08:54,725 --> 00:08:59,120
   809 00:08:54,725 --> 00:08:59,120
   808 Five, again,
   810 Twenty-five... Again,
   809 
   811 
   810 178
   812 178
   811 00:08:59,120 --> 00:09:06,650
   813 00:08:59,120 --> 00:09:06,650
   812 somehow is kind of
   814 somehow is a kind of
   813 threshold that is 25, 26.
   815 threshold that is 25, 26.
   814 
   816 
   815 179
   817 179
   816 00:09:06,650 --> 00:09:09,485
   818 00:09:06,650 --> 00:09:09,485
   817 Suddenly it takes much longer.
   819 Suddenly it takes much longer.
   825 00:09:14,360 --> 00:09:17,165
   827 00:09:14,360 --> 00:09:17,165
   826 So you'll see in now from 26 on,
   828 So you'll see in now from 26 on,
   827 
   829 
   828 182
   830 182
   829 00:09:17,165 --> 00:09:19,250
   831 00:09:17,165 --> 00:09:19,250
   830 the Times has always
   832 the times always
   831 doubling from
   833 double from
   832 
   834 
   833 183
   835 183
   834 00:09:19,250 --> 00:09:21,860
   836 00:09:19,250 --> 00:09:21,860
   835 three seconds to seven seconds.
   837 three seconds to seven seconds.
   836 
   838 
   847 27 and you see the
   849 27 and you see the
   848 string isn't very long.
   850 string isn't very long.
   849 
   851 
   850 187
   852 187
   851 00:09:30,230 --> 00:09:32,165
   853 00:09:30,230 --> 00:09:32,165
   852 Let's choose twenties or maize.
   854 It is just twenty-or-something a's.
   853 
   855 
   854 188
   856 188
   855 00:09:32,165 --> 00:09:35,419
   857 00:09:32,165 --> 00:09:35,419
   856 Imagine you have to
   858 Imagine you have to
   857 search a database
   859 search a database
   858 
   860 
   859 189
   861 189
   860 00:09:35,419 --> 00:09:38,720
   862 00:09:35,419 --> 00:09:38,720
   861 with kilobytes of data.
   863 with Gigabytes of data
   862 
   864 
   863 190
   865 190
   864 00:09:38,720 --> 00:09:42,260
   866 00:09:38,720 --> 00:09:42,260
   865 This, these regular
   867 with these regular
   866 expressions that would years
   868 expressions that would 
   867 
   869 
   868 191
   870 191
   869 00:09:42,260 --> 00:09:48,150
   871 00:09:42,260 --> 00:09:48,150
   870 need years to go through with
   872 need years to go through with
   871 these regular expressions.
   873 these regular expressions.
   891 00:10:01,045 --> 00:10:03,415
   893 00:10:01,045 --> 00:10:03,415
   892 You can see this again
   894 You can see this again
   893 
   895 
   894 197
   896 197
   895 00:10:03,415 --> 00:10:05,980
   897 00:10:03,415 --> 00:10:05,980
   896 is the reg expression
   898 is the regular expression
   897 and we just having
   899 and we just having
   898 
   900 
   899 198
   901 198
   900 00:10:05,980 --> 00:10:08,320
   902 00:10:05,980 --> 00:10:08,320
   901 some scaffolding to generate
   903 some scaffolding to generate
   902 
   904 
   903 199
   905 199
   904 00:10:08,320 --> 00:10:11,905
   906 00:10:08,320 --> 00:10:11,905
   905 strings from five up till 28.
   907 strings from 5 up till 28.
   906 
   908 
   907 200
   909 200
   908 00:10:11,905 --> 00:10:14,305
   910 00:10:11,905 --> 00:10:14,305
   909 And if we run that,
   911 And if we run that,
   910 
   912 
   917 So uphill 19, pretty fast,
   919 So uphill 19, pretty fast,
   918 
   920 
   919 203
   921 203
   920 00:10:19,900 --> 00:10:24,925
   922 00:10:19,900 --> 00:10:24,925
   921 but then starting from
   923 but then starting from
   922 23, skidding pretty slow.
   924 23, it is getting pretty slow.
   923 
   925 
   924 204
   926 204
   925 00:10:24,925 --> 00:10:27,445
   927 00:10:24,925 --> 00:10:27,445
   926 So the question is
   928 So the question is
   927 what's going on?
   929 what's going on?
   928 
   930 
   929 205
   931 205
   930 00:10:27,445 --> 00:10:29,230
   932 00:10:27,445 --> 00:10:29,230
   931 By the way, I'm not quoting here.
   933 By the way, I'm not gloating here.
   932 
   934 
   933 206
   935 206
   934 00:10:29,230 --> 00:10:33,755
   936 00:10:29,230 --> 00:10:33,755
   935 Scala, using internally
   937 Scala uses internally
   936 the regular expression
   938 the regular expression
   937 
   939 
   938 207
   940 207
   939 00:10:33,755 --> 00:10:36,665
   941 00:10:33,755 --> 00:10:36,665
   940 matching engine from Java.
   942 matching engine from Java.
   967 We will see we can out-compete
   969 We will see we can out-compete
   968 them by magnitudes.
   970 them by magnitudes.
   969 
   971 
   970 214
   972 214
   971 00:10:55,490 --> 00:10:57,605
   973 00:10:55,490 --> 00:10:57,605
   972 So I think I can that.
   974 So I think I can 
   973 
   975 
   974 215
   976 215
   975 00:10:57,605 --> 00:10:59,165
   977 00:10:57,605 --> 00:10:59,165
   976 Now, just finish here.
   978 now, just finish this here.
   977 
   979 
   978 216
   980 216
   979 00:10:59,165 --> 00:11:04,025
   981 00:10:59,165 --> 00:11:04,025
   980 You see the problem. Just
   982 You see the problem.
   981 for completeness sake.
   983 Just for completeness sake.
   982 
   984 
   983 217
   985 217
   984 00:11:04,025 --> 00:11:07,010
   986 00:11:04,025 --> 00:11:07,010
   985 Here is a Ruby program.
   987 Here is a Ruby program.
   986 
   988 
   995 string should match.
   997 string should match.
   996 
   998 
   997 220
   999 220
   998 00:11:12,935 --> 00:11:20,300
  1000 00:11:12,935 --> 00:11:20,300
   999 And again it tries out
  1001 And again it tries out
  1000 strings between 130 here.
  1002 strings between 1 and 30 here.
  1001 
  1003 
  1002 221
  1004 221
  1003 00:11:20,300 --> 00:11:23,450
  1005 00:11:20,300 --> 00:11:23,450
  1004 That's a program actually
  1006 That's a program actually
  1005 a former student produced.
  1007 a former student produced.
  1009 And you can see four a's
  1011 And you can see four a's
  1010 
  1012 
  1011 223
  1013 223
  1012 00:11:25,565 --> 00:11:29,780
  1014 00:11:25,565 --> 00:11:29,780
  1013 of links up till 20
  1015 of links up till 20
  1014 AES is pretty fast.
  1016 a's is pretty fast.
  1015 
  1017 
  1016 224
  1018 224
  1017 00:11:29,780 --> 00:11:32,495
  1019 00:11:29,780 --> 00:11:32,495
  1018 But then starting at 26,
  1020 But then starting at 26,
  1019 
  1021 
  1058 00:11:49,610 --> 00:11:52,250
  1060 00:11:49,610 --> 00:11:52,250
  1059 as you will see later on.
  1061 as you will see later on.
  1060 
  1062 
  1061 234
  1063 234
  1062 00:11:52,250 --> 00:11:55,620
  1064 00:11:52,250 --> 00:11:55,620
  1063 Hey, I also just stop that here.
  1065 I also just stop that here.
  1064 
  1066 
  1065 235
  1067 235
  1066 00:11:55,710 --> 00:12:00,985
  1068 00:11:55,710 --> 00:12:00,985
  1067 Okay, this slight collect
  1069 Okay, this slide collects
  1068 this information about times.
  1070 the information about times.
  1069 
  1071 
  1070 236
  1072 236
  1071 00:12:00,985 --> 00:12:03,400
  1073 00:12:00,985 --> 00:12:03,400
  1072 On the right hand side will
  1074 On the right-hand side will
  1073 
  1075 
  1074 237
  1076 237
  1075 00:12:03,400 --> 00:12:05,860
  1077 00:12:03,400 --> 00:12:05,860
  1076 be our regular expression mantra,
  1078 be our regular expression matcher,
  1077 
  1079 
  1078 238
  1080 238
  1079 00:12:05,860 --> 00:12:08,290
  1081 00:12:05,860 --> 00:12:08,290
  1080 which we implement next week.
  1082 which we implement next week.
  1081 
  1083 
  1084 On the left-hand side,
  1086 On the left-hand side,
  1085 are these times by
  1087 are these times by
  1086 
  1088 
  1087 240
  1089 240
  1088 00:12:10,795 --> 00:12:14,260
  1090 00:12:10,795 --> 00:12:14,260
  1089 barriers than regular
  1091 various other regular
  1090 expression matching engines?
  1092 expression matching engines?
  1091 
  1093 
  1092 241
  1094 241
  1093 00:12:14,260 --> 00:12:17,809
  1095 00:12:14,260 --> 00:12:17,809
  1094 On the top is this
  1096 On the top is this
  1095 regular expression.
  1097 regular expression.
  1096 
  1098 
  1097 242
  1099 242
  1098 00:12:19,080 --> 00:12:23,335
  1100 00:12:19,080 --> 00:12:23,335
  1099 Possible a n times a n times.
  1101 Possible a n-times a n-times.
  1100 
  1102 
  1101 243
  1103 243
  1102 00:12:23,335 --> 00:12:26,890
  1104 00:12:23,335 --> 00:12:26,890
  1103 And on the lowest
  1105 And on the lower
  1104 is a star, star b.
  1106 is (a*)* b.
  1105 
  1107 
  1106 244
  1108 244
  1107 00:12:26,890 --> 00:12:30,370
  1109 00:12:26,890 --> 00:12:30,370
  1108 And the x-axis show here
  1110 And the x-axis show here
  1109 
  1111 
  1112 the length of the
  1114 the length of the
  1113 string. How many a's.
  1115 string. How many a's.
  1114 
  1116 
  1115 246
  1117 246
  1116 00:12:35,335 --> 00:12:38,925
  1118 00:12:35,335 --> 00:12:38,925
  1117 And on the y axis is the time.
  1119 And on the y-axis is the time
  1118 
  1120 
  1119 247
  1121 247
  1120 00:12:38,925 --> 00:12:41,660
  1122 00:12:38,925 --> 00:12:41,660
  1121 They need to decide whether
  1123 they need to decide whether
  1122 
  1124 
  1123 248
  1125 248
  1124 00:12:41,660 --> 00:12:44,615
  1126 00:12:41,660 --> 00:12:44,615
  1125 the string is matched by
  1127 the string is matched by
  1126 the rate expression or not.
  1128 the regular expression or not.
  1127 
  1129 
  1128 249
  1130 249
  1129 00:12:44,615 --> 00:12:46,415
  1131 00:12:44,615 --> 00:12:46,415
  1130 So you can see here, Python,
  1132 So you can see here, Python,
  1131 
  1133 
  1132 250
  1134 250
  1133 00:12:46,415 --> 00:12:47,945
  1135 00:12:46,415 --> 00:12:47,945
  1134 Java eight in JavaScript,
  1136 Java 8 and JavaScript,
  1135 
  1137 
  1136 251
  1138 251
  1137 00:12:47,945 --> 00:12:52,250
  1139 00:12:47,945 --> 00:12:52,250
  1138 they max out approximately
  1140 they max out approximately
  1139 at between 2530.
  1141 at between 25 and 30.
  1140 
  1142 
  1141 252
  1143 252
  1142 00:12:52,250 --> 00:12:53,900
  1144 00:12:52,250 --> 00:12:53,900
  1143 The kristin, it takes already
  1145 Because then it takes already
  1144 
  1146 
  1145 253
  1147 253
  1146 00:12:53,900 --> 00:12:55,160
  1148 00:12:53,900 --> 00:12:55,160
  1147 a half a minute to
  1149 a half a minute to
  1148 
  1150 
  1156 And similarly, in
  1158 And similarly, in
  1157 the other example,
  1159 the other example,
  1158 
  1160 
  1159 256
  1161 256
  1160 00:13:00,815 --> 00:13:03,830
  1162 00:13:00,815 --> 00:13:03,830
  1161 Python and derived Ruby max out
  1163 Python and Ruby max out
  1162 
  1164 
  1163 257
  1165 257
  1164 00:13:03,830 --> 00:13:07,220
  1166 00:13:03,830 --> 00:13:07,220
  1165 at a similar kind of
  1167 at a similar kind of
  1166 length of the strings.
  1168 length of the strings.
  1170 Because then they use also
  1172 Because then they use also
  1171 half a minute to decide
  1173 half a minute to decide
  1172 
  1174 
  1173 259
  1175 259
  1174 00:13:10,400 --> 00:13:13,940
  1176 00:13:10,400 --> 00:13:13,940
  1175 whether this rec expression
  1177 whether this regular expression
  1176 actually matches the string.
  1178 actually matches the string.
  1177 
  1179 
  1178 260
  1180 260
  1179 00:13:13,940 --> 00:13:16,790
  1181 00:13:13,940 --> 00:13:16,790
  1180 Contrast that with
  1182 Contrast that with
  1181 the reg expression
  1183 
  1182 
  1184 
  1183 261
  1185 261
  1184 00:13:16,790 --> 00:13:19,235
  1186 00:13:16,790 --> 00:13:19,235
  1185 which we are regular
  1187 the regular expression matcher
  1186 expression mantra,
       
  1187 
  1188 
  1188 262
  1189 262
  1189 00:13:19,235 --> 00:13:21,470
  1190 00:13:19,235 --> 00:13:21,470
  1190 which we're going to implement.
  1191 which we're going to implement.
  1191 
  1192 
  1204 Actually, there will be
  1205 Actually, there will be
  1205 two versions of that.
  1206 two versions of that.
  1206 
  1207 
  1207 266
  1208 266
  1208 00:13:32,285 --> 00:13:34,850
  1209 00:13:32,285 --> 00:13:34,850
  1209 First version may be
  1210 The first version will be
  1210 also relatively slow.
  1211 also relatively slow.
  1211 
  1212 
  1212 267
  1213 267
  1213 00:13:34,850 --> 00:13:36,410
  1214 00:13:34,850 --> 00:13:36,410
  1214 But the second version,
  1215 But the second version,
  1226 And in the second example,
  1227 And in the second example,
  1227 
  1228 
  1228 271
  1229 271
  1229 00:13:42,380 --> 00:13:45,740
  1230 00:13:42,380 --> 00:13:45,740
  1230 you have to be careful
  1231 you have to be careful
  1231 about the x axis because
  1232 about the x-axis because
  1232 
  1233 
  1233 272
  1234 272
  1234 00:13:45,740 --> 00:13:49,385
  1235 00:13:45,740 --> 00:13:49,385
  1235 that means four times
  1236 that means four times
  1236 ten to the power six.
  1237 ten to the power six.
  1237 
  1238 
  1238 273
  1239 273
  1239 00:13:49,385 --> 00:13:51,695
  1240 00:13:49,385 --> 00:13:51,695
  1240 It's actually 4 million A's.
  1241 It's actually 4 million a's.
  1241 
  1242 
  1242 274
  1243 274
  1243 00:13:51,695 --> 00:13:55,100
  1244 00:13:51,695 --> 00:13:55,100
  1244 So our regular
  1245 So our regular
  1245 expression match or need
  1246 expression matcher needs
  1246 
  1247 
  1247 275
  1248 275
  1248 00:13:55,100 --> 00:13:57,635
  1249 00:13:55,100 --> 00:13:57,635
  1249 less than ten seconds to
  1250 less than ten seconds to
  1250 
  1251 
  1251 276
  1252 276
  1252 00:13:57,635 --> 00:14:00,725
  1253 00:13:57,635 --> 00:14:00,725
  1253 match a string of length
  1254 match a string of length
  1254 of 4 million A's.
  1255 of 4 million a's.
  1255 
  1256 
  1256 277
  1257 277
  1257 00:14:00,725 --> 00:14:04,430
  1258 00:14:00,725 --> 00:14:04,430
  1258 Contrast that Python, Java eight,
  1259 Contrast that Python, Java 8,
  1259 
  1260 
  1260 278
  1261 278
  1261 00:14:04,430 --> 00:14:06,770
  1262 00:14:04,430 --> 00:14:06,770
  1262 and JavaScript need half a minute
  1263 and JavaScript need half a minute
  1263 
  1264 
  1264 279
  1265 279
  1265 00:14:06,770 --> 00:14:09,905
  1266 00:14:06,770 --> 00:14:09,905
  1266 already for a string
  1267 already for a string
  1267 of length just 30,
  1268 of length just 30.
  1268 
  1269 
  1269 280
  1270 280
  1270 00:14:09,905 --> 00:14:12,365
  1271 00:14:09,905 --> 00:14:12,365
  1271 unless you're very
  1272 I was very careful with Java 8.
  1272 careful with Java eight.
       
  1273 
  1273 
  1274 281
  1274 281
  1275 00:14:12,365 --> 00:14:15,725
  1275 00:14:12,365 --> 00:14:15,725
  1276 Yes, Java nine and above,
  1276 Yes, Java 9 and above,
  1277 
  1277 
  1278 282
  1278 282
  1279 00:14:15,725 --> 00:14:17,180
  1279 00:14:15,725 --> 00:14:17,180
  1280 they already have
  1280 they already have
  1281 
  1281 
  1289 but still we will be running
  1289 but still we will be running
  1290 circles around them.
  1290 circles around them.
  1291 
  1291 
  1292 285
  1292 285
  1293 00:14:22,805 --> 00:14:27,050
  1293 00:14:22,805 --> 00:14:27,050
  1294 It's this data. I
  1294 with this data.
  1295 call this slide.
  1295 I call this slide:
  1296 
  1296 
  1297 286
  1297 286
  1298 00:14:27,050 --> 00:14:29,675
  1298 00:14:27,050 --> 00:14:29,675
  1299 Why bother with
  1299 Why bother with
  1300 regular expressions?
  1300 regular expressions?
  1304 But you can probably
  1304 But you can probably
  1305 see these are
  1305 see these are
  1306 
  1306 
  1307 288
  1307 288
  1308 00:14:33,515 --> 00:14:34,910
  1308 00:14:33,515 --> 00:14:34,910
  1309 at least more times by
  1309 abysmal times by
  1310 
  1310 
  1311 289
  1311 289
  1312 00:14:34,910 --> 00:14:38,015
  1312 00:14:34,910 --> 00:14:38,015
  1313 the existing regular
  1313 the existing regular
  1314 expression matching engines.
  1314 expression matching engines.
  1324 do substantially better.
  1324 do substantially better.
  1325 
  1325 
  1326 292
  1326 292
  1327 00:14:42,695 --> 00:14:47,495
  1327 00:14:42,695 --> 00:14:47,495
  1328 And if you don't believe
  1328 And if you don't believe
  1329 in D times, I gave here,
  1329 in the times, I gave here,
  1330 
  1330 
  1331 293
  1331 293
  1332 00:14:47,495 --> 00:14:50,090
  1332 00:14:47,495 --> 00:14:50,090
  1333 please feel free to
  1333 please feel free to
  1334 play on your own
  1334 play on your own
  1335 
  1335 
  1336 294
  1336 294
  1337 00:14:50,090 --> 00:14:52,865
  1337 00:14:50,090 --> 00:14:52,865
  1338 with the examples
  1338 with the examples
  1339 I uploaded, Keats.
  1339 I uploaded on KEATS.
  1340 
  1340 
  1341 295
  1341 295
  1342 00:14:52,865 --> 00:14:55,235
  1342 00:14:52,865 --> 00:14:55,235
  1343 These are exactly the programs
  1343 These are exactly the programs
  1344 
  1344 
  1345 296
  1345 296
  1346 00:14:55,235 --> 00:14:57,470
  1346 00:14:55,235 --> 00:14:57,470
  1347 are used here in the examples.
  1347 I used here in the examples.
  1348 
  1348 
  1349 297
  1349 297
  1350 00:14:57,470 --> 00:14:59,255
  1350 00:14:57,470 --> 00:14:59,255
  1351 So feel free.
  1351 So feel free.
  1352 
  1352 
  1355 You might however now think, hmm.
  1355 You might however now think, hmm.
  1356 
  1356 
  1357 299
  1357 299
  1358 00:15:01,970 --> 00:15:05,449
  1358 00:15:01,970 --> 00:15:05,449
  1359 These are two very
  1359 These are two very
  1360 well chosen examples.
  1360 well chosen examples,
  1361 
  1361 
  1362 300
  1362 300
  1363 00:15:05,449 --> 00:15:07,145
  1363 00:15:05,449 --> 00:15:07,145
  1364 And I admit that's true.
  1364 and I admit that's true,
  1365 
  1365 
  1366 301
  1366 301
  1367 00:15:07,145 --> 00:15:09,410
  1367 00:15:07,145 --> 00:15:09,410
  1368 And such problem there never
  1368 and such problems never
  1369 
  1369 
  1370 302
  1370 302
  1371 00:15:09,410 --> 00:15:12,540
  1371 00:15:09,410 --> 00:15:12,540
  1372 causing any problems
  1372 cause any problems
  1373 in real life.
  1373 in real life.
  1374 
  1374 
  1375 303
  1375 303
  1376 00:15:13,300 --> 00:15:15,980
  1376 00:15:13,300 --> 00:15:15,980
  1377 Regular expressions are used very
  1377 Regular expressions are used very
  1385 00:15:19,415 --> 00:15:21,410
  1385 00:15:19,415 --> 00:15:21,410
  1386 So here's my first example from
  1386 So here's my first example from
  1387 
  1387 
  1388 306
  1388 306
  1389 00:15:21,410 --> 00:15:23,885
  1389 00:15:21,410 --> 00:15:23,885
  1390 a company called cloudflare.
  1390 a company called Cloudflare.
  1391 
  1391 
  1392 307
  1392 307
  1393 00:15:23,885 --> 00:15:27,560
  1393 00:15:23,885 --> 00:15:27,560
  1394 This is a huge hosting company
  1394 This is a huge hosting company
  1395 
  1395 
  1396 308
  1396 308
  1397 00:15:27,560 --> 00:15:30,935
  1397 00:15:27,560 --> 00:15:30,935
  1398 which host very
  1398 which hosts very
  1399 well-known web pages.
  1399 well-known web pages.
  1400 
  1400 
  1401 309
  1401 309
  1402 00:15:30,935 --> 00:15:34,970
  1402 00:15:30,935 --> 00:15:34,970
  1403 And they really try hard
  1403 And they really try hard
  1408 And they manage
  1408 And they manage
  1409 that for six years.
  1409 that for six years.
  1410 
  1410 
  1411 311
  1411 311
  1412 00:15:37,340 --> 00:15:39,320
  1412 00:15:37,340 --> 00:15:39,320
  1413 But then a Rekha expression,
  1413 But then a regular expression,
  1414 
  1414 
  1415 312
  1415 312
  1416 00:15:39,320 --> 00:15:41,180
  1416 00:15:39,320 --> 00:15:41,180
  1417 actually this one caused
  1417 actually this one, caused
  1418 a problem and you
  1418 a problem and you
  1419 
  1419 
  1420 313
  1420 313
  1421 00:15:41,180 --> 00:15:43,265
  1421 00:15:41,180 --> 00:15:43,265
  1422 can see they're also
  1422 can see they're also
  1423 like two stars.
  1423 two stars
  1424 
  1424 
  1425 314
  1425 314
  1426 00:15:43,265 --> 00:15:44,630
  1426 00:15:43,265 --> 00:15:44,630
  1427 They are at the end.
  1427 at the end.
  1428 
  1428 
  1429 315
  1429 315
  1430 00:15:44,630 --> 00:15:46,955
  1430 00:15:44,630 --> 00:15:46,955
  1431 And because of that string needed
  1431 And because of that string needed
  1432 
  1432 
  1461 where Stack Exchange,
  1461 where Stack Exchange,
  1462 I guess you know
  1462 I guess you know
  1463 
  1463 
  1464 323
  1464 323
  1465 00:16:04,040 --> 00:16:06,650
  1465 00:16:04,040 --> 00:16:06,650
  1466 this webpage had
  1466 this webpage, had
  1467 also an outage from,
  1467 also an outage for
  1468 
  1468 
  1469 324
  1469 324
  1470 00:16:06,650 --> 00:16:08,390
  1470 00:16:06,650 --> 00:16:08,390
  1471 I think at least an hour.
  1471 I think at least an hour.
  1472 
  1472 
  1473 325
  1473 325
  1474 00:16:08,390 --> 00:16:13,070
  1474 00:16:08,390 --> 00:16:13,070
  1475 Because a regular expression
  1475 Because a regular expression,
  1476 then needed to format posts,
  1476 needed to format posts,
  1477 
  1477 
  1478 326
  1478 326
  1479 00:16:13,070 --> 00:16:15,575
  1479 00:16:13,070 --> 00:16:15,575
  1480 needed too much time to
  1480 needed too much time to
  1481 
  1481 
  1485 should be accepted or not.
  1485 should be accepted or not.
  1486 
  1486 
  1487 328
  1487 328
  1488 00:16:19,010 --> 00:16:23,390
  1488 00:16:19,010 --> 00:16:23,390
  1489 And again, there was a
  1489 And again, there was a
  1490 semi kind of problem.
  1490 similar kind of problem.
  1491 
  1491 
  1492 329
  1492 329
  1493 00:16:23,390 --> 00:16:24,950
  1493 00:16:23,390 --> 00:16:24,950
  1494 And you can read
  1494 And you can read
  1495 the stories behind
  1495 the stories behind
  1504 this the first time,
  1504 this the first time,
  1505 
  1505 
  1506 332
  1506 332
  1507 00:16:31,730 --> 00:16:34,175
  1507 00:16:31,730 --> 00:16:34,175
  1508 what surprised me is
  1508 what surprised me is
  1509 that theoretician
  1509 that theoreticians,
  1510 
  1510 
  1511 333
  1511 333
  1512 00:16:34,175 --> 00:16:37,520
  1512 00:16:34,175 --> 00:16:37,520
  1513 who sometimes dedicate their
  1513 who sometimes dedicate their
  1514 life to regular expression.
  1514 life to regular expressions
  1515 
  1515 
  1516 334
  1516 334
  1517 00:16:37,520 --> 00:16:39,440
  1517 00:16:37,520 --> 00:16:39,440
  1518 And no really a lot about
  1518 and know really a lot about
  1519 
  1519 
  1520 335
  1520 335
  1521 00:16:39,440 --> 00:16:41,690
  1521 00:16:39,440 --> 00:16:41,690
  1522 them didn't know
  1522 them, didn't know
  1523 anything about this.
  1523 anything about this.
  1524 
  1524 
  1525 336
  1525 336
  1526 00:16:41,690 --> 00:16:43,610
  1526 00:16:41,690 --> 00:16:43,610
  1527 But engineers, they
  1527 But engineers, they
  1528 already created
  1528 already created
  1529 
  1529 
  1530 337
  1530 337
  1531 00:16:43,610 --> 00:16:46,160
  1531 00:16:43,610 --> 00:16:46,160
  1532 a name for that
  1532 a name for that:
  1533 regular expression,
  1533 Regular Expression
  1534 
  1534 
  1535 338
  1535 338
  1536 00:16:46,160 --> 00:16:47,975
  1536 00:16:46,160 --> 00:16:47,975
  1537 denial of service attack.
  1537 Denial of Service Attack.
  1538 
  1538 
  1539 339
  1539 339
  1540 00:16:47,975 --> 00:16:49,745
  1540 00:16:47,975 --> 00:16:49,745
  1541 Because what you can,
  1541 Because what you can,
  1542 
  1542 
  1545 what can happen now is that
  1545 what can happen now is that
  1546 
  1546 
  1547 341
  1547 341
  1548 00:16:51,230 --> 00:16:54,920
  1548 00:16:51,230 --> 00:16:54,920
  1549 attackers look for
  1549 attackers look for
  1550 certain strings.
  1550 certain strings
  1551 
  1551 
  1552 342
  1552 342
  1553 00:16:54,920 --> 00:16:56,780
  1553 00:16:54,920 --> 00:16:56,780
  1554 You make your regular expression
  1554 that make your regular expression
  1555 
  1555 
  1556 343
  1556 343
  1557 00:16:56,780 --> 00:16:59,105
  1557 00:16:56,780 --> 00:16:59,105
  1558 matching engine topple over.
  1558 matching engine topple over.
  1559 
  1559 
  1560 344
  1560 344
  1561 00:16:59,105 --> 00:17:01,370
  1561 00:16:59,105 --> 00:17:01,370
  1562 And these kind of expressions,
  1562 And these kind of 
  1563 
  1563 
  1564 345
  1564 345
  1565 00:17:01,370 --> 00:17:04,160
  1565 00:17:01,370 --> 00:17:04,160
  1566 regular expressions called
  1566 regular expressions are called
  1567 Eve of reg expression.
  1567 Evil Regular Expressions.
  1568 
  1568 
  1569 346
  1569 346
  1570 00:17:04,160 --> 00:17:06,350
  1570 00:17:04,160 --> 00:17:06,350
  1571 And actually there are
  1571 And actually there are
  1572 quite a number of them.
  1572 quite a number of them.
  1589 And you can easily have in
  1589 And you can easily have in
  1590 
  1590 
  1591 351
  1591 351
  1592 00:17:15,620 --> 00:17:18,560
  1592 00:17:15,620 --> 00:17:18,560
  1593 your program one of
  1593 your program one of
  1594 these reg expression.
  1594 these regular expressions.
  1595 
  1595 
  1596 352
  1596 352
  1597 00:17:18,560 --> 00:17:21,830
  1597 00:17:18,560 --> 00:17:21,830
  1598 And then you have the
  1598 And then you have the
  1599 problem that if you do have
  1599 problem that if you do have
  1607 somebody finds the
  1607 somebody finds the
  1608 corresponding string,
  1608 corresponding string,
  1609 
  1609 
  1610 355
  1610 355
  1611 00:17:25,640 --> 00:17:29,945
  1611 00:17:25,640 --> 00:17:29,945
  1612 which make the records
  1612 which make the regular
  1613 matching engine topple over,
  1613 matching engine topple over,
  1614 
  1614 
  1615 356
  1615 356
  1616 00:17:29,945 --> 00:17:31,820
  1616 00:17:29,945 --> 00:17:31,820
  1617 then you have a problem
  1617 then you have a problem
  1618 
  1618 
  1619 357
  1619 357
  1620 00:17:31,820 --> 00:17:34,295
  1620 00:17:31,820 --> 00:17:34,295
  1621 because your webpage is
  1621 because your webpage is
  1622 probably not variable.
  1622 probably not available.
  1623 
  1623 
  1624 358
  1624 358
  1625 00:17:34,295 --> 00:17:36,140
  1625 00:17:34,295 --> 00:17:36,140
  1626 This is also sometimes called
  1626 This phenomenon is also sometimes 
  1627 
  1627 
  1628 359
  1628 359
  1629 00:17:36,140 --> 00:17:39,350
  1629 00:17:36,140 --> 00:17:39,350
  1630 this phenomenon,
  1630 called
  1631 catastrophic backtracking.
  1631 catastrophic backtracking.
  1632 
  1632 
  1633 360
  1633 360
  1634 00:17:39,350 --> 00:17:43,595
  1634 00:17:39,350 --> 00:17:43,595
  1635 In lecture three, we will
  1635 In lecture three, we will
  1641 is such a problem in
  1641 is such a problem in
  1642 
  1642 
  1643 362
  1643 362
  1644 00:17:46,910 --> 00:17:50,795
  1644 00:17:46,910 --> 00:17:50,795
  1645 real life is actually
  1645 real life is actually
  1646 not to do with Lexus.
  1646 not to do with lexers.
  1647 
  1647 
  1648 363
  1648 363
  1649 00:17:50,795 --> 00:17:53,180
  1649 00:17:50,795 --> 00:17:53,180
  1650 Yes, regular
  1650 Yes, regular
  1651 expressions are used as
  1651 expressions are used as
  1654 00:17:53,180 --> 00:17:55,040
  1654 00:17:53,180 --> 00:17:55,040
  1655 the basic tool for implementing
  1655 the basic tool for implementing
  1656 
  1656 
  1657 365
  1657 365
  1658 00:17:55,040 --> 00:17:57,185
  1658 00:17:55,040 --> 00:17:57,185
  1659 like source bad reg expressions,
  1659 lexers. But regular expressions,
  1660 
  1660 
  1661 366
  1661 366
  1662 00:17:57,185 --> 00:18:00,065
  1662 00:17:57,185 --> 00:18:00,065
  1663 of course, used in
  1663 of course, used in
  1664 a much wider area.
  1664 a much wider area.
  1668 And they especially used for
  1668 And they especially used for
  1669 network intrusion detection.
  1669 network intrusion detection.
  1670 
  1670 
  1671 368
  1671 368
  1672 00:18:03,770 --> 00:18:06,590
  1672 00:18:03,770 --> 00:18:06,590
  1673 Remember, you having to
  1673 Remember, say you're having to
  1674 
  1674 
  1675 369
  1675 369
  1676 00:18:06,590 --> 00:18:10,130
  1676 00:18:06,590 --> 00:18:10,130
  1677 administer a big network
  1677 administer a big network
  1678 and you only want to let
  1678 and you only want to let
  1679 
  1679 
  1680 370
  1680 370
  1681 00:18:10,130 --> 00:18:13,640
  1681 00:18:10,130 --> 00:18:13,640
  1682 in packets which you think are K
  1682 in packets which you think are OK
  1683 
  1683 
  1684 371
  1684 371
  1685 00:18:13,640 --> 00:18:14,930
  1685 00:18:13,640 --> 00:18:14,930
  1686 and you want to keep out
  1686 and you want to keep out
  1687 
  1687 
  1725 the problem is that our
  1725 the problem is that our
  1726 hardware is already
  1726 hardware is already
  1727 
  1727 
  1728 381
  1728 381
  1729 00:18:39,080 --> 00:18:43,190
  1729 00:18:39,080 --> 00:18:43,190
  1730 so fast that the reg expressions
  1730 so fast that the regular
       
  1731 expressions
  1731 
  1732 
  1732 382
  1733 382
  1733 00:18:43,190 --> 00:18:45,169
  1734 00:18:43,190 --> 00:18:45,169
  1734 really become a bottleneck.
  1735 really become a bottleneck.
  1735 
  1736 
  1737 00:18:45,169 --> 00:18:47,060
  1738 00:18:45,169 --> 00:18:47,060
  1738 Because what do you do if now is
  1739 Because what do you do if now is
  1739 
  1740 
  1740 384
  1741 384
  1741 00:18:47,060 --> 00:18:49,880
  1742 00:18:47,060 --> 00:18:49,880
  1742 suddenly a reg expression
  1743 suddenly a regular expression
  1743 takes too much time
  1744 takes too much time?
  1744 
  1745 
  1745 385
  1746 385
  1746 00:18:49,880 --> 00:18:52,670
  1747 00:18:49,880 --> 00:18:52,670
  1747 to just stop the matching
  1748 Do you just stop the matching
  1748 
  1749 
  1749 386
  1750 386
  1750 00:18:52,670 --> 00:18:55,100
  1751 00:18:52,670 --> 00:18:55,100
  1751 and let the package
  1752 and let the package
  1752 in regardless?
  1753 in regardless?
  1776 by this engineer.
  1777 by this engineer.
  1777 
  1778 
  1778 392
  1779 392
  1779 00:19:09,965 --> 00:19:13,820
  1780 00:19:09,965 --> 00:19:13,820
  1780 And it's always say that
  1781 And it's always say that
  1781 Germans don't have any Yammer.
  1782 Germans don't have any humor.
  1782 
  1783 
  1783 393
  1784 393
  1784 00:19:13,820 --> 00:19:16,985
  1785 00:19:13,820 --> 00:19:16,985
  1785 But I found that
  1786 But I found that
  1786 video quite funny.
  1787 video quite funny.
  1791 different opinion,
  1792 different opinion,
  1792 
  1793 
  1793 395
  1794 395
  1794 00:19:19,145 --> 00:19:21,095
  1795 00:19:19,145 --> 00:19:21,095
  1795 but feel free to
  1796 but feel free to
  1796 have a look which
  1797 have a look. 
  1797 
  1798 
  1798 396
  1799 396
  1799 00:19:21,095 --> 00:19:23,705
  1800 00:19:21,095 --> 00:19:23,705
  1800 explains exactly that problem.
  1801 It explains exactly that problem.
  1801 
  1802 
  1802 397
  1803 397
  1803 00:19:23,705 --> 00:19:25,610
  1804 00:19:23,705 --> 00:19:25,610
  1804 So in the next video,
  1805 So in the next video,
  1805 
  1806