302 Okay, so these are the |
302 Okay, so these are the |
303 typical regular expressions. |
303 typical regular expressions. |
304 |
304 |
305 67 |
305 67 |
306 00:03:19,925 --> 00:03:23,075 |
306 00:03:19,925 --> 00:03:23,075 |
307 And here's a particular one. |
307 And here's a particular one |
308 |
308 |
309 68 |
309 68 |
310 00:03:23,075 --> 00:03:25,820 |
310 00:03:23,075 --> 00:03:25,820 |
311 Trying to match something |
311 trying to match something |
312 |
312 |
313 69 |
313 69 |
314 00:03:25,820 --> 00:03:28,770 |
314 00:03:25,820 --> 00:03:28,770 |
315 which resembles |
315 which resembles |
316 an email address. |
316 an email address. |
519 match the regular expressions of |
519 match the regular expressions of |
520 a star star b. |
520 a star star b. |
521 |
521 |
522 114 |
522 114 |
523 00:05:42,935 --> 00:05:46,805 |
523 00:05:42,935 --> 00:05:46,805 |
524 And we measure time with longer |
524 And we measure the time with longer |
525 and longer strings of a. |
525 and longer strings of a's. |
526 |
526 |
527 115 |
527 115 |
528 00:05:46,805 --> 00:05:48,770 |
528 00:05:46,805 --> 00:05:48,770 |
529 And so conveniently we can give |
529 And so conveniently we can give |
530 |
530 |
562 00:06:07,640 --> 00:06:09,110 |
562 00:06:07,640 --> 00:06:09,110 |
563 did not match the string. |
563 did not match the string. |
564 |
564 |
565 123 |
565 123 |
566 00:06:09,110 --> 00:06:11,255 |
566 00:06:09,110 --> 00:06:11,255 |
567 That's indicated by this none. |
567 That's indicated by this None. |
568 |
568 |
569 124 |
569 124 |
570 00:06:11,255 --> 00:06:13,925 |
570 00:06:11,255 --> 00:06:13,925 |
571 Let's take ten a's. |
571 Let's take ten a's. |
572 |
572 |
591 They are not 100 |
591 They are not 100 |
592 percent accurate. |
592 percent accurate. |
593 |
593 |
594 130 |
594 130 |
595 00:06:28,040 --> 00:06:31,490 |
595 00:06:28,040 --> 00:06:31,490 |
596 So 15 is also a let's take |
596 So 15 is also OK. |
|
597 Let's take 20. |
597 |
598 |
598 131 |
599 131 |
599 00:06:31,490 --> 00:06:36,965 |
600 00:06:31,490 --> 00:06:36,965 |
600 28th notes already |
601 Hmmm this already takes |
601 double the time. |
602 double the time. |
602 |
603 |
603 132 |
604 132 |
604 00:06:36,965 --> 00:06:42,440 |
605 00:06:36,965 --> 00:06:42,440 |
605 Twenty-five longer. |
606 Twenty-five. Then even longer. |
606 |
607 |
607 133 |
608 133 |
608 00:06:42,440 --> 00:06:45,680 |
609 00:06:42,440 --> 00:06:45,680 |
609 Okay, that suddenly |
610 Okay, then suddenly |
610 from 02 seconds, |
611 from 0.2 seconds, |
611 |
612 |
612 134 |
613 134 |
613 00:06:45,680 --> 00:06:48,960 |
614 00:06:45,680 --> 00:06:48,960 |
614 it takes almost four seconds. |
615 it now takes almost four seconds. |
615 |
616 |
616 135 |
617 135 |
617 00:06:49,600 --> 00:06:54,890 |
618 00:06:49,600 --> 00:06:54,890 |
618 Six this |
619 Twenty-Six, this |
619 |
620 |
620 136 |
621 136 |
621 00:06:54,890 --> 00:07:01,415 |
622 00:06:54,890 --> 00:07:01,415 |
622 takes six seconds |
623 takes six seconds... |
623 already Double, okay? |
624 already double. |
624 |
625 |
625 137 |
626 137 |
626 00:07:01,415 --> 00:07:07,229 |
627 00:07:01,415 --> 00:07:07,229 |
627 Go to 28. That would be now. |
628 Let's go to 28. That would be |
|
629 ...hmmm....hmmm |
628 |
630 |
629 138 |
631 138 |
630 00:07:08,890 --> 00:07:11,840 |
632 00:07:08,890 --> 00:07:11,840 |
631 You see the string |
633 You see the string |
632 isn't very long, |
634 isn't very long, |
655 to find out that |
657 to find out that |
656 this string of 28 |
658 this string of 28 |
657 |
659 |
658 144 |
660 144 |
659 00:07:24,710 --> 00:07:26,570 |
661 00:07:24,710 --> 00:07:26,570 |
660 AES is actually not much |
662 a's is actually not matched |
661 |
663 |
662 145 |
664 145 |
663 00:07:26,570 --> 00:07:28,490 |
665 00:07:26,570 --> 00:07:28,490 |
664 by that you see it's |
666 by that. You see it's |
665 still not finished. |
667 still not finished. |
666 |
668 |
667 146 |
669 146 |
668 00:07:28,490 --> 00:07:32,900 |
670 00:07:28,490 --> 00:07:32,900 |
669 I think it should take |
671 I think it should take |
677 00:07:34,400 --> 00:07:36,530 |
679 00:07:34,400 --> 00:07:36,530 |
678 And if we would try |
680 And if we would try |
679 |
681 |
680 149 |
682 149 |
681 00:07:36,530 --> 00:07:40,805 |
683 00:07:36,530 --> 00:07:40,805 |
682 30 would be already |
684 30, we would be already here |
683 more than a minute. |
685 for more than a minute. |
684 |
686 |
685 150 |
687 150 |
686 00:07:40,805 --> 00:07:43,940 |
688 00:07:40,805 --> 00:07:43,940 |
687 And if I could read |
689 And if I could use |
688 something like hundreds, |
690 something like 100, |
689 |
691 |
690 151 |
692 151 |
691 00:07:43,940 --> 00:07:46,220 |
693 00:07:43,940 --> 00:07:46,220 |
692 you remember if a doubling in |
694 you remember if a doubling in |
693 |
695 |
754 00:08:21,830 --> 00:08:23,930 |
756 00:08:21,830 --> 00:08:23,930 |
755 So there's not any difference |
757 So there's not any difference |
756 |
758 |
757 166 |
759 166 |
758 00:08:23,930 --> 00:08:26,150 |
760 00:08:23,930 --> 00:08:26,150 |
759 in the strings this work |
761 in the strings this regular |
760 expression matches. |
762 expression matches. |
761 |
763 |
762 167 |
764 167 |
763 00:08:26,150 --> 00:08:28,610 |
765 00:08:26,150 --> 00:08:28,610 |
764 We'll just start at the |
766 We'll just start at the |
770 end of the string. |
772 end of the string. |
771 |
773 |
772 169 |
774 169 |
773 00:08:31,460 --> 00:08:35,285 |
775 00:08:31,460 --> 00:08:35,285 |
774 And we again, we just use |
776 And we again, we just use |
775 repeated A's for that. |
777 repeated a's for that. |
776 |
778 |
777 170 |
779 170 |
778 00:08:35,285 --> 00:08:38,195 |
780 00:08:35,285 --> 00:08:38,195 |
779 And similarly, we can |
781 And similarly, we can |
780 |
782 |
799 00:08:48,320 --> 00:08:50,525 |
801 00:08:48,320 --> 00:08:50,525 |
800 And it's pretty fast. |
802 And it's pretty fast. |
801 |
803 |
802 176 |
804 176 |
803 00:08:50,525 --> 00:08:54,725 |
805 00:08:50,525 --> 00:08:54,725 |
804 Friendly. Although pretty fast. |
806 Twenty...also pretty fast. |
805 |
807 |
806 177 |
808 177 |
807 00:08:54,725 --> 00:08:59,120 |
809 00:08:54,725 --> 00:08:59,120 |
808 Five, again, |
810 Twenty-five... Again, |
809 |
811 |
810 178 |
812 178 |
811 00:08:59,120 --> 00:09:06,650 |
813 00:08:59,120 --> 00:09:06,650 |
812 somehow is kind of |
814 somehow is a kind of |
813 threshold that is 25, 26. |
815 threshold that is 25, 26. |
814 |
816 |
815 179 |
817 179 |
816 00:09:06,650 --> 00:09:09,485 |
818 00:09:06,650 --> 00:09:09,485 |
817 Suddenly it takes much longer. |
819 Suddenly it takes much longer. |
825 00:09:14,360 --> 00:09:17,165 |
827 00:09:14,360 --> 00:09:17,165 |
826 So you'll see in now from 26 on, |
828 So you'll see in now from 26 on, |
827 |
829 |
828 182 |
830 182 |
829 00:09:17,165 --> 00:09:19,250 |
831 00:09:17,165 --> 00:09:19,250 |
830 the Times has always |
832 the times always |
831 doubling from |
833 double from |
832 |
834 |
833 183 |
835 183 |
834 00:09:19,250 --> 00:09:21,860 |
836 00:09:19,250 --> 00:09:21,860 |
835 three seconds to seven seconds. |
837 three seconds to seven seconds. |
836 |
838 |
847 27 and you see the |
849 27 and you see the |
848 string isn't very long. |
850 string isn't very long. |
849 |
851 |
850 187 |
852 187 |
851 00:09:30,230 --> 00:09:32,165 |
853 00:09:30,230 --> 00:09:32,165 |
852 Let's choose twenties or maize. |
854 It is just twenty-or-something a's. |
853 |
855 |
854 188 |
856 188 |
855 00:09:32,165 --> 00:09:35,419 |
857 00:09:32,165 --> 00:09:35,419 |
856 Imagine you have to |
858 Imagine you have to |
857 search a database |
859 search a database |
858 |
860 |
859 189 |
861 189 |
860 00:09:35,419 --> 00:09:38,720 |
862 00:09:35,419 --> 00:09:38,720 |
861 with kilobytes of data. |
863 with Gigabytes of data |
862 |
864 |
863 190 |
865 190 |
864 00:09:38,720 --> 00:09:42,260 |
866 00:09:38,720 --> 00:09:42,260 |
865 This, these regular |
867 with these regular |
866 expressions that would years |
868 expressions that would |
867 |
869 |
868 191 |
870 191 |
869 00:09:42,260 --> 00:09:48,150 |
871 00:09:42,260 --> 00:09:48,150 |
870 need years to go through with |
872 need years to go through with |
871 these regular expressions. |
873 these regular expressions. |
891 00:10:01,045 --> 00:10:03,415 |
893 00:10:01,045 --> 00:10:03,415 |
892 You can see this again |
894 You can see this again |
893 |
895 |
894 197 |
896 197 |
895 00:10:03,415 --> 00:10:05,980 |
897 00:10:03,415 --> 00:10:05,980 |
896 is the reg expression |
898 is the regular expression |
897 and we just having |
899 and we just having |
898 |
900 |
899 198 |
901 198 |
900 00:10:05,980 --> 00:10:08,320 |
902 00:10:05,980 --> 00:10:08,320 |
901 some scaffolding to generate |
903 some scaffolding to generate |
902 |
904 |
903 199 |
905 199 |
904 00:10:08,320 --> 00:10:11,905 |
906 00:10:08,320 --> 00:10:11,905 |
905 strings from five up till 28. |
907 strings from 5 up till 28. |
906 |
908 |
907 200 |
909 200 |
908 00:10:11,905 --> 00:10:14,305 |
910 00:10:11,905 --> 00:10:14,305 |
909 And if we run that, |
911 And if we run that, |
910 |
912 |
917 So uphill 19, pretty fast, |
919 So uphill 19, pretty fast, |
918 |
920 |
919 203 |
921 203 |
920 00:10:19,900 --> 00:10:24,925 |
922 00:10:19,900 --> 00:10:24,925 |
921 but then starting from |
923 but then starting from |
922 23, skidding pretty slow. |
924 23, it is getting pretty slow. |
923 |
925 |
924 204 |
926 204 |
925 00:10:24,925 --> 00:10:27,445 |
927 00:10:24,925 --> 00:10:27,445 |
926 So the question is |
928 So the question is |
927 what's going on? |
929 what's going on? |
928 |
930 |
929 205 |
931 205 |
930 00:10:27,445 --> 00:10:29,230 |
932 00:10:27,445 --> 00:10:29,230 |
931 By the way, I'm not quoting here. |
933 By the way, I'm not gloating here. |
932 |
934 |
933 206 |
935 206 |
934 00:10:29,230 --> 00:10:33,755 |
936 00:10:29,230 --> 00:10:33,755 |
935 Scala, using internally |
937 Scala uses internally |
936 the regular expression |
938 the regular expression |
937 |
939 |
938 207 |
940 207 |
939 00:10:33,755 --> 00:10:36,665 |
941 00:10:33,755 --> 00:10:36,665 |
940 matching engine from Java. |
942 matching engine from Java. |
967 We will see we can out-compete |
969 We will see we can out-compete |
968 them by magnitudes. |
970 them by magnitudes. |
969 |
971 |
970 214 |
972 214 |
971 00:10:55,490 --> 00:10:57,605 |
973 00:10:55,490 --> 00:10:57,605 |
972 So I think I can that. |
974 So I think I can |
973 |
975 |
974 215 |
976 215 |
975 00:10:57,605 --> 00:10:59,165 |
977 00:10:57,605 --> 00:10:59,165 |
976 Now, just finish here. |
978 now, just finish this here. |
977 |
979 |
978 216 |
980 216 |
979 00:10:59,165 --> 00:11:04,025 |
981 00:10:59,165 --> 00:11:04,025 |
980 You see the problem. Just |
982 You see the problem. |
981 for completeness sake. |
983 Just for completeness sake. |
982 |
984 |
983 217 |
985 217 |
984 00:11:04,025 --> 00:11:07,010 |
986 00:11:04,025 --> 00:11:07,010 |
985 Here is a Ruby program. |
987 Here is a Ruby program. |
986 |
988 |
995 string should match. |
997 string should match. |
996 |
998 |
997 220 |
999 220 |
998 00:11:12,935 --> 00:11:20,300 |
1000 00:11:12,935 --> 00:11:20,300 |
999 And again it tries out |
1001 And again it tries out |
1000 strings between 130 here. |
1002 strings between 1 and 30 here. |
1001 |
1003 |
1002 221 |
1004 221 |
1003 00:11:20,300 --> 00:11:23,450 |
1005 00:11:20,300 --> 00:11:23,450 |
1004 That's a program actually |
1006 That's a program actually |
1005 a former student produced. |
1007 a former student produced. |
1058 00:11:49,610 --> 00:11:52,250 |
1060 00:11:49,610 --> 00:11:52,250 |
1059 as you will see later on. |
1061 as you will see later on. |
1060 |
1062 |
1061 234 |
1063 234 |
1062 00:11:52,250 --> 00:11:55,620 |
1064 00:11:52,250 --> 00:11:55,620 |
1063 Hey, I also just stop that here. |
1065 I also just stop that here. |
1064 |
1066 |
1065 235 |
1067 235 |
1066 00:11:55,710 --> 00:12:00,985 |
1068 00:11:55,710 --> 00:12:00,985 |
1067 Okay, this slight collect |
1069 Okay, this slide collects |
1068 this information about times. |
1070 the information about times. |
1069 |
1071 |
1070 236 |
1072 236 |
1071 00:12:00,985 --> 00:12:03,400 |
1073 00:12:00,985 --> 00:12:03,400 |
1072 On the right hand side will |
1074 On the right-hand side will |
1073 |
1075 |
1074 237 |
1076 237 |
1075 00:12:03,400 --> 00:12:05,860 |
1077 00:12:03,400 --> 00:12:05,860 |
1076 be our regular expression mantra, |
1078 be our regular expression matcher, |
1077 |
1079 |
1078 238 |
1080 238 |
1079 00:12:05,860 --> 00:12:08,290 |
1081 00:12:05,860 --> 00:12:08,290 |
1080 which we implement next week. |
1082 which we implement next week. |
1081 |
1083 |
1084 On the left-hand side, |
1086 On the left-hand side, |
1085 are these times by |
1087 are these times by |
1086 |
1088 |
1087 240 |
1089 240 |
1088 00:12:10,795 --> 00:12:14,260 |
1090 00:12:10,795 --> 00:12:14,260 |
1089 barriers than regular |
1091 various other regular |
1090 expression matching engines? |
1092 expression matching engines? |
1091 |
1093 |
1092 241 |
1094 241 |
1093 00:12:14,260 --> 00:12:17,809 |
1095 00:12:14,260 --> 00:12:17,809 |
1094 On the top is this |
1096 On the top is this |
1095 regular expression. |
1097 regular expression. |
1096 |
1098 |
1097 242 |
1099 242 |
1098 00:12:19,080 --> 00:12:23,335 |
1100 00:12:19,080 --> 00:12:23,335 |
1099 Possible a n times a n times. |
1101 Possible a n-times a n-times. |
1100 |
1102 |
1101 243 |
1103 243 |
1102 00:12:23,335 --> 00:12:26,890 |
1104 00:12:23,335 --> 00:12:26,890 |
1103 And on the lowest |
1105 And on the lower |
1104 is a star, star b. |
1106 is (a*)* b. |
1105 |
1107 |
1106 244 |
1108 244 |
1107 00:12:26,890 --> 00:12:30,370 |
1109 00:12:26,890 --> 00:12:30,370 |
1108 And the x-axis show here |
1110 And the x-axis show here |
1109 |
1111 |
1112 the length of the |
1114 the length of the |
1113 string. How many a's. |
1115 string. How many a's. |
1114 |
1116 |
1115 246 |
1117 246 |
1116 00:12:35,335 --> 00:12:38,925 |
1118 00:12:35,335 --> 00:12:38,925 |
1117 And on the y axis is the time. |
1119 And on the y-axis is the time |
1118 |
1120 |
1119 247 |
1121 247 |
1120 00:12:38,925 --> 00:12:41,660 |
1122 00:12:38,925 --> 00:12:41,660 |
1121 They need to decide whether |
1123 they need to decide whether |
1122 |
1124 |
1123 248 |
1125 248 |
1124 00:12:41,660 --> 00:12:44,615 |
1126 00:12:41,660 --> 00:12:44,615 |
1125 the string is matched by |
1127 the string is matched by |
1126 the rate expression or not. |
1128 the regular expression or not. |
1127 |
1129 |
1128 249 |
1130 249 |
1129 00:12:44,615 --> 00:12:46,415 |
1131 00:12:44,615 --> 00:12:46,415 |
1130 So you can see here, Python, |
1132 So you can see here, Python, |
1131 |
1133 |
1132 250 |
1134 250 |
1133 00:12:46,415 --> 00:12:47,945 |
1135 00:12:46,415 --> 00:12:47,945 |
1134 Java eight in JavaScript, |
1136 Java 8 and JavaScript, |
1135 |
1137 |
1136 251 |
1138 251 |
1137 00:12:47,945 --> 00:12:52,250 |
1139 00:12:47,945 --> 00:12:52,250 |
1138 they max out approximately |
1140 they max out approximately |
1139 at between 2530. |
1141 at between 25 and 30. |
1140 |
1142 |
1141 252 |
1143 252 |
1142 00:12:52,250 --> 00:12:53,900 |
1144 00:12:52,250 --> 00:12:53,900 |
1143 The kristin, it takes already |
1145 Because then it takes already |
1144 |
1146 |
1145 253 |
1147 253 |
1146 00:12:53,900 --> 00:12:55,160 |
1148 00:12:53,900 --> 00:12:55,160 |
1147 a half a minute to |
1149 a half a minute to |
1148 |
1150 |
1170 Because then they use also |
1172 Because then they use also |
1171 half a minute to decide |
1173 half a minute to decide |
1172 |
1174 |
1173 259 |
1175 259 |
1174 00:13:10,400 --> 00:13:13,940 |
1176 00:13:10,400 --> 00:13:13,940 |
1175 whether this rec expression |
1177 whether this regular expression |
1176 actually matches the string. |
1178 actually matches the string. |
1177 |
1179 |
1178 260 |
1180 260 |
1179 00:13:13,940 --> 00:13:16,790 |
1181 00:13:13,940 --> 00:13:16,790 |
1180 Contrast that with |
1182 Contrast that with |
1181 the reg expression |
1183 |
1182 |
1184 |
1183 261 |
1185 261 |
1184 00:13:16,790 --> 00:13:19,235 |
1186 00:13:16,790 --> 00:13:19,235 |
1185 which we are regular |
1187 the regular expression matcher |
1186 expression mantra, |
|
1187 |
1188 |
1188 262 |
1189 262 |
1189 00:13:19,235 --> 00:13:21,470 |
1190 00:13:19,235 --> 00:13:21,470 |
1190 which we're going to implement. |
1191 which we're going to implement. |
1191 |
1192 |
1204 Actually, there will be |
1205 Actually, there will be |
1205 two versions of that. |
1206 two versions of that. |
1206 |
1207 |
1207 266 |
1208 266 |
1208 00:13:32,285 --> 00:13:34,850 |
1209 00:13:32,285 --> 00:13:34,850 |
1209 First version may be |
1210 The first version will be |
1210 also relatively slow. |
1211 also relatively slow. |
1211 |
1212 |
1212 267 |
1213 267 |
1213 00:13:34,850 --> 00:13:36,410 |
1214 00:13:34,850 --> 00:13:36,410 |
1214 But the second version, |
1215 But the second version, |
1226 And in the second example, |
1227 And in the second example, |
1227 |
1228 |
1228 271 |
1229 271 |
1229 00:13:42,380 --> 00:13:45,740 |
1230 00:13:42,380 --> 00:13:45,740 |
1230 you have to be careful |
1231 you have to be careful |
1231 about the x axis because |
1232 about the x-axis because |
1232 |
1233 |
1233 272 |
1234 272 |
1234 00:13:45,740 --> 00:13:49,385 |
1235 00:13:45,740 --> 00:13:49,385 |
1235 that means four times |
1236 that means four times |
1236 ten to the power six. |
1237 ten to the power six. |
1237 |
1238 |
1238 273 |
1239 273 |
1239 00:13:49,385 --> 00:13:51,695 |
1240 00:13:49,385 --> 00:13:51,695 |
1240 It's actually 4 million A's. |
1241 It's actually 4 million a's. |
1241 |
1242 |
1242 274 |
1243 274 |
1243 00:13:51,695 --> 00:13:55,100 |
1244 00:13:51,695 --> 00:13:55,100 |
1244 So our regular |
1245 So our regular |
1245 expression match or need |
1246 expression matcher needs |
1246 |
1247 |
1247 275 |
1248 275 |
1248 00:13:55,100 --> 00:13:57,635 |
1249 00:13:55,100 --> 00:13:57,635 |
1249 less than ten seconds to |
1250 less than ten seconds to |
1250 |
1251 |
1251 276 |
1252 276 |
1252 00:13:57,635 --> 00:14:00,725 |
1253 00:13:57,635 --> 00:14:00,725 |
1253 match a string of length |
1254 match a string of length |
1254 of 4 million A's. |
1255 of 4 million a's. |
1255 |
1256 |
1256 277 |
1257 277 |
1257 00:14:00,725 --> 00:14:04,430 |
1258 00:14:00,725 --> 00:14:04,430 |
1258 Contrast that Python, Java eight, |
1259 Contrast that Python, Java 8, |
1259 |
1260 |
1260 278 |
1261 278 |
1261 00:14:04,430 --> 00:14:06,770 |
1262 00:14:04,430 --> 00:14:06,770 |
1262 and JavaScript need half a minute |
1263 and JavaScript need half a minute |
1263 |
1264 |
1264 279 |
1265 279 |
1265 00:14:06,770 --> 00:14:09,905 |
1266 00:14:06,770 --> 00:14:09,905 |
1266 already for a string |
1267 already for a string |
1267 of length just 30, |
1268 of length just 30. |
1268 |
1269 |
1269 280 |
1270 280 |
1270 00:14:09,905 --> 00:14:12,365 |
1271 00:14:09,905 --> 00:14:12,365 |
1271 unless you're very |
1272 I was very careful with Java 8. |
1272 careful with Java eight. |
|
1273 |
1273 |
1274 281 |
1274 281 |
1275 00:14:12,365 --> 00:14:15,725 |
1275 00:14:12,365 --> 00:14:15,725 |
1276 Yes, Java nine and above, |
1276 Yes, Java 9 and above, |
1277 |
1277 |
1278 282 |
1278 282 |
1279 00:14:15,725 --> 00:14:17,180 |
1279 00:14:15,725 --> 00:14:17,180 |
1280 they already have |
1280 they already have |
1281 |
1281 |
1324 do substantially better. |
1324 do substantially better. |
1325 |
1325 |
1326 292 |
1326 292 |
1327 00:14:42,695 --> 00:14:47,495 |
1327 00:14:42,695 --> 00:14:47,495 |
1328 And if you don't believe |
1328 And if you don't believe |
1329 in D times, I gave here, |
1329 in the times, I gave here, |
1330 |
1330 |
1331 293 |
1331 293 |
1332 00:14:47,495 --> 00:14:50,090 |
1332 00:14:47,495 --> 00:14:50,090 |
1333 please feel free to |
1333 please feel free to |
1334 play on your own |
1334 play on your own |
1335 |
1335 |
1336 294 |
1336 294 |
1337 00:14:50,090 --> 00:14:52,865 |
1337 00:14:50,090 --> 00:14:52,865 |
1338 with the examples |
1338 with the examples |
1339 I uploaded, Keats. |
1339 I uploaded on KEATS. |
1340 |
1340 |
1341 295 |
1341 295 |
1342 00:14:52,865 --> 00:14:55,235 |
1342 00:14:52,865 --> 00:14:55,235 |
1343 These are exactly the programs |
1343 These are exactly the programs |
1344 |
1344 |
1345 296 |
1345 296 |
1346 00:14:55,235 --> 00:14:57,470 |
1346 00:14:55,235 --> 00:14:57,470 |
1347 are used here in the examples. |
1347 I used here in the examples. |
1348 |
1348 |
1349 297 |
1349 297 |
1350 00:14:57,470 --> 00:14:59,255 |
1350 00:14:57,470 --> 00:14:59,255 |
1351 So feel free. |
1351 So feel free. |
1352 |
1352 |
1355 You might however now think, hmm. |
1355 You might however now think, hmm. |
1356 |
1356 |
1357 299 |
1357 299 |
1358 00:15:01,970 --> 00:15:05,449 |
1358 00:15:01,970 --> 00:15:05,449 |
1359 These are two very |
1359 These are two very |
1360 well chosen examples. |
1360 well chosen examples, |
1361 |
1361 |
1362 300 |
1362 300 |
1363 00:15:05,449 --> 00:15:07,145 |
1363 00:15:05,449 --> 00:15:07,145 |
1364 And I admit that's true. |
1364 and I admit that's true, |
1365 |
1365 |
1366 301 |
1366 301 |
1367 00:15:07,145 --> 00:15:09,410 |
1367 00:15:07,145 --> 00:15:09,410 |
1368 And such problem there never |
1368 and such problems never |
1369 |
1369 |
1370 302 |
1370 302 |
1371 00:15:09,410 --> 00:15:12,540 |
1371 00:15:09,410 --> 00:15:12,540 |
1372 causing any problems |
1372 cause any problems |
1373 in real life. |
1373 in real life. |
1374 |
1374 |
1375 303 |
1375 303 |
1376 00:15:13,300 --> 00:15:15,980 |
1376 00:15:13,300 --> 00:15:15,980 |
1377 Regular expressions are used very |
1377 Regular expressions are used very |
1385 00:15:19,415 --> 00:15:21,410 |
1385 00:15:19,415 --> 00:15:21,410 |
1386 So here's my first example from |
1386 So here's my first example from |
1387 |
1387 |
1388 306 |
1388 306 |
1389 00:15:21,410 --> 00:15:23,885 |
1389 00:15:21,410 --> 00:15:23,885 |
1390 a company called cloudflare. |
1390 a company called Cloudflare. |
1391 |
1391 |
1392 307 |
1392 307 |
1393 00:15:23,885 --> 00:15:27,560 |
1393 00:15:23,885 --> 00:15:27,560 |
1394 This is a huge hosting company |
1394 This is a huge hosting company |
1395 |
1395 |
1396 308 |
1396 308 |
1397 00:15:27,560 --> 00:15:30,935 |
1397 00:15:27,560 --> 00:15:30,935 |
1398 which host very |
1398 which hosts very |
1399 well-known web pages. |
1399 well-known web pages. |
1400 |
1400 |
1401 309 |
1401 309 |
1402 00:15:30,935 --> 00:15:34,970 |
1402 00:15:30,935 --> 00:15:34,970 |
1403 And they really try hard |
1403 And they really try hard |
1408 And they manage |
1408 And they manage |
1409 that for six years. |
1409 that for six years. |
1410 |
1410 |
1411 311 |
1411 311 |
1412 00:15:37,340 --> 00:15:39,320 |
1412 00:15:37,340 --> 00:15:39,320 |
1413 But then a Rekha expression, |
1413 But then a regular expression, |
1414 |
1414 |
1415 312 |
1415 312 |
1416 00:15:39,320 --> 00:15:41,180 |
1416 00:15:39,320 --> 00:15:41,180 |
1417 actually this one caused |
1417 actually this one, caused |
1418 a problem and you |
1418 a problem and you |
1419 |
1419 |
1420 313 |
1420 313 |
1421 00:15:41,180 --> 00:15:43,265 |
1421 00:15:41,180 --> 00:15:43,265 |
1422 can see they're also |
1422 can see they're also |
1423 like two stars. |
1423 two stars |
1424 |
1424 |
1425 314 |
1425 314 |
1426 00:15:43,265 --> 00:15:44,630 |
1426 00:15:43,265 --> 00:15:44,630 |
1427 They are at the end. |
1427 at the end. |
1428 |
1428 |
1429 315 |
1429 315 |
1430 00:15:44,630 --> 00:15:46,955 |
1430 00:15:44,630 --> 00:15:46,955 |
1431 And because of that string needed |
1431 And because of that string needed |
1432 |
1432 |
1461 where Stack Exchange, |
1461 where Stack Exchange, |
1462 I guess you know |
1462 I guess you know |
1463 |
1463 |
1464 323 |
1464 323 |
1465 00:16:04,040 --> 00:16:06,650 |
1465 00:16:04,040 --> 00:16:06,650 |
1466 this webpage had |
1466 this webpage, had |
1467 also an outage from, |
1467 also an outage for |
1468 |
1468 |
1469 324 |
1469 324 |
1470 00:16:06,650 --> 00:16:08,390 |
1470 00:16:06,650 --> 00:16:08,390 |
1471 I think at least an hour. |
1471 I think at least an hour. |
1472 |
1472 |
1473 325 |
1473 325 |
1474 00:16:08,390 --> 00:16:13,070 |
1474 00:16:08,390 --> 00:16:13,070 |
1475 Because a regular expression |
1475 Because a regular expression, |
1476 then needed to format posts, |
1476 needed to format posts, |
1477 |
1477 |
1478 326 |
1478 326 |
1479 00:16:13,070 --> 00:16:15,575 |
1479 00:16:13,070 --> 00:16:15,575 |
1480 needed too much time to |
1480 needed too much time to |
1481 |
1481 |
1504 this the first time, |
1504 this the first time, |
1505 |
1505 |
1506 332 |
1506 332 |
1507 00:16:31,730 --> 00:16:34,175 |
1507 00:16:31,730 --> 00:16:34,175 |
1508 what surprised me is |
1508 what surprised me is |
1509 that theoretician |
1509 that theoreticians, |
1510 |
1510 |
1511 333 |
1511 333 |
1512 00:16:34,175 --> 00:16:37,520 |
1512 00:16:34,175 --> 00:16:37,520 |
1513 who sometimes dedicate their |
1513 who sometimes dedicate their |
1514 life to regular expression. |
1514 life to regular expressions |
1515 |
1515 |
1516 334 |
1516 334 |
1517 00:16:37,520 --> 00:16:39,440 |
1517 00:16:37,520 --> 00:16:39,440 |
1518 And no really a lot about |
1518 and know really a lot about |
1519 |
1519 |
1520 335 |
1520 335 |
1521 00:16:39,440 --> 00:16:41,690 |
1521 00:16:39,440 --> 00:16:41,690 |
1522 them didn't know |
1522 them, didn't know |
1523 anything about this. |
1523 anything about this. |
1524 |
1524 |
1525 336 |
1525 336 |
1526 00:16:41,690 --> 00:16:43,610 |
1526 00:16:41,690 --> 00:16:43,610 |
1527 But engineers, they |
1527 But engineers, they |
1528 already created |
1528 already created |
1529 |
1529 |
1530 337 |
1530 337 |
1531 00:16:43,610 --> 00:16:46,160 |
1531 00:16:43,610 --> 00:16:46,160 |
1532 a name for that |
1532 a name for that: |
1533 regular expression, |
1533 Regular Expression |
1534 |
1534 |
1535 338 |
1535 338 |
1536 00:16:46,160 --> 00:16:47,975 |
1536 00:16:46,160 --> 00:16:47,975 |
1537 denial of service attack. |
1537 Denial of Service Attack. |
1538 |
1538 |
1539 339 |
1539 339 |
1540 00:16:47,975 --> 00:16:49,745 |
1540 00:16:47,975 --> 00:16:49,745 |
1541 Because what you can, |
1541 Because what you can, |
1542 |
1542 |
1545 what can happen now is that |
1545 what can happen now is that |
1546 |
1546 |
1547 341 |
1547 341 |
1548 00:16:51,230 --> 00:16:54,920 |
1548 00:16:51,230 --> 00:16:54,920 |
1549 attackers look for |
1549 attackers look for |
1550 certain strings. |
1550 certain strings |
1551 |
1551 |
1552 342 |
1552 342 |
1553 00:16:54,920 --> 00:16:56,780 |
1553 00:16:54,920 --> 00:16:56,780 |
1554 You make your regular expression |
1554 that make your regular expression |
1555 |
1555 |
1556 343 |
1556 343 |
1557 00:16:56,780 --> 00:16:59,105 |
1557 00:16:56,780 --> 00:16:59,105 |
1558 matching engine topple over. |
1558 matching engine topple over. |
1559 |
1559 |
1560 344 |
1560 344 |
1561 00:16:59,105 --> 00:17:01,370 |
1561 00:16:59,105 --> 00:17:01,370 |
1562 And these kind of expressions, |
1562 And these kind of |
1563 |
1563 |
1564 345 |
1564 345 |
1565 00:17:01,370 --> 00:17:04,160 |
1565 00:17:01,370 --> 00:17:04,160 |
1566 regular expressions called |
1566 regular expressions are called |
1567 Eve of reg expression. |
1567 Evil Regular Expressions. |
1568 |
1568 |
1569 346 |
1569 346 |
1570 00:17:04,160 --> 00:17:06,350 |
1570 00:17:04,160 --> 00:17:06,350 |
1571 And actually there are |
1571 And actually there are |
1572 quite a number of them. |
1572 quite a number of them. |
1589 And you can easily have in |
1589 And you can easily have in |
1590 |
1590 |
1591 351 |
1591 351 |
1592 00:17:15,620 --> 00:17:18,560 |
1592 00:17:15,620 --> 00:17:18,560 |
1593 your program one of |
1593 your program one of |
1594 these reg expression. |
1594 these regular expressions. |
1595 |
1595 |
1596 352 |
1596 352 |
1597 00:17:18,560 --> 00:17:21,830 |
1597 00:17:18,560 --> 00:17:21,830 |
1598 And then you have the |
1598 And then you have the |
1599 problem that if you do have |
1599 problem that if you do have |
1607 somebody finds the |
1607 somebody finds the |
1608 corresponding string, |
1608 corresponding string, |
1609 |
1609 |
1610 355 |
1610 355 |
1611 00:17:25,640 --> 00:17:29,945 |
1611 00:17:25,640 --> 00:17:29,945 |
1612 which make the records |
1612 which make the regular |
1613 matching engine topple over, |
1613 matching engine topple over, |
1614 |
1614 |
1615 356 |
1615 356 |
1616 00:17:29,945 --> 00:17:31,820 |
1616 00:17:29,945 --> 00:17:31,820 |
1617 then you have a problem |
1617 then you have a problem |
1618 |
1618 |
1619 357 |
1619 357 |
1620 00:17:31,820 --> 00:17:34,295 |
1620 00:17:31,820 --> 00:17:34,295 |
1621 because your webpage is |
1621 because your webpage is |
1622 probably not variable. |
1622 probably not available. |
1623 |
1623 |
1624 358 |
1624 358 |
1625 00:17:34,295 --> 00:17:36,140 |
1625 00:17:34,295 --> 00:17:36,140 |
1626 This is also sometimes called |
1626 This phenomenon is also sometimes |
1627 |
1627 |
1628 359 |
1628 359 |
1629 00:17:36,140 --> 00:17:39,350 |
1629 00:17:36,140 --> 00:17:39,350 |
1630 this phenomenon, |
1630 called |
1631 catastrophic backtracking. |
1631 catastrophic backtracking. |
1632 |
1632 |
1633 360 |
1633 360 |
1634 00:17:39,350 --> 00:17:43,595 |
1634 00:17:39,350 --> 00:17:43,595 |
1635 In lecture three, we will |
1635 In lecture three, we will |
1654 00:17:53,180 --> 00:17:55,040 |
1654 00:17:53,180 --> 00:17:55,040 |
1655 the basic tool for implementing |
1655 the basic tool for implementing |
1656 |
1656 |
1657 365 |
1657 365 |
1658 00:17:55,040 --> 00:17:57,185 |
1658 00:17:55,040 --> 00:17:57,185 |
1659 like source bad reg expressions, |
1659 lexers. But regular expressions, |
1660 |
1660 |
1661 366 |
1661 366 |
1662 00:17:57,185 --> 00:18:00,065 |
1662 00:17:57,185 --> 00:18:00,065 |
1663 of course, used in |
1663 of course, used in |
1664 a much wider area. |
1664 a much wider area. |
1668 And they especially used for |
1668 And they especially used for |
1669 network intrusion detection. |
1669 network intrusion detection. |
1670 |
1670 |
1671 368 |
1671 368 |
1672 00:18:03,770 --> 00:18:06,590 |
1672 00:18:03,770 --> 00:18:06,590 |
1673 Remember, you having to |
1673 Remember, say you're having to |
1674 |
1674 |
1675 369 |
1675 369 |
1676 00:18:06,590 --> 00:18:10,130 |
1676 00:18:06,590 --> 00:18:10,130 |
1677 administer a big network |
1677 administer a big network |
1678 and you only want to let |
1678 and you only want to let |
1679 |
1679 |
1680 370 |
1680 370 |
1681 00:18:10,130 --> 00:18:13,640 |
1681 00:18:10,130 --> 00:18:13,640 |
1682 in packets which you think are K |
1682 in packets which you think are OK |
1683 |
1683 |
1684 371 |
1684 371 |
1685 00:18:13,640 --> 00:18:14,930 |
1685 00:18:13,640 --> 00:18:14,930 |
1686 and you want to keep out |
1686 and you want to keep out |
1687 |
1687 |
1737 00:18:45,169 --> 00:18:47,060 |
1738 00:18:45,169 --> 00:18:47,060 |
1738 Because what do you do if now is |
1739 Because what do you do if now is |
1739 |
1740 |
1740 384 |
1741 384 |
1741 00:18:47,060 --> 00:18:49,880 |
1742 00:18:47,060 --> 00:18:49,880 |
1742 suddenly a reg expression |
1743 suddenly a regular expression |
1743 takes too much time |
1744 takes too much time? |
1744 |
1745 |
1745 385 |
1746 385 |
1746 00:18:49,880 --> 00:18:52,670 |
1747 00:18:49,880 --> 00:18:52,670 |
1747 to just stop the matching |
1748 Do you just stop the matching |
1748 |
1749 |
1749 386 |
1750 386 |
1750 00:18:52,670 --> 00:18:55,100 |
1751 00:18:52,670 --> 00:18:55,100 |
1751 and let the package |
1752 and let the package |
1752 in regardless? |
1753 in regardless? |
1791 different opinion, |
1792 different opinion, |
1792 |
1793 |
1793 395 |
1794 395 |
1794 00:19:19,145 --> 00:19:21,095 |
1795 00:19:19,145 --> 00:19:21,095 |
1795 but feel free to |
1796 but feel free to |
1796 have a look which |
1797 have a look. |
1797 |
1798 |
1798 396 |
1799 396 |
1799 00:19:21,095 --> 00:19:23,705 |
1800 00:19:21,095 --> 00:19:23,705 |
1800 explains exactly that problem. |
1801 It explains exactly that problem. |
1801 |
1802 |
1802 397 |
1803 397 |
1803 00:19:23,705 --> 00:19:25,610 |
1804 00:19:23,705 --> 00:19:25,610 |
1804 So in the next video, |
1805 So in the next video, |
1805 |
1806 |