afl-material: comparison handouts/ho01.tex

equal deleted inserted replaced

-:f7de0915fff2
+:f4818c95a32e
 % compiler explorer
 % https://gcc.godbolt.org
 %https://www.youtube.com/watch?v=gmhMQfFQu20
 \begin{document}
-\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018}
+\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2019}
 \section*{Handout 1}
 This module is about text processing, be it for web-crawlers,
-compilers, dictionaries, DNA-data, ad filters and so on.  When looking for a
+compilers, dictionaries, DNA-data, ad filters and so on.  When looking
-particular string, like $abc$ in a large text we can use the
+for a particular string, like \pcode{"foobar"} in a large text we can
-Knuth-Morris-Pratt algorithm, which is currently the most efficient
+use the Knuth-Morris-Pratt algorithm, which is currently the most
-general string search algorithm. But often we do \emph{not} just look
+efficient general string search algorithm. But often we do \emph{not}
-for a particular string, but for string patterns. For example, in
+just look for a particular string, but for string patterns. For
-program code we need to identify what are the keywords (\texttt{if}, \texttt{then},
+example, in program code we need to identify what are the keywords
-\texttt{while}, \texttt{for}, etc), what are the identifiers (variable names). A pattern for
+(\texttt{if}, \texttt{then}, \texttt{while}, \texttt{for}, etc), what
-identifiers could be stated as: they start with a letter, followed by
+are the identifiers (variable names). A pattern for identifiers could
-zero or more letters, numbers and underscores.  Often we also face the
+be stated as: they start with a letter, followed by zero or more
-problem that we are given a string (for example some user input) and
+letters, numbers and underscores. You might also be surprised what
-want to know whether it matches a particular pattern---be it an email
+constraints programming languages impose about numbers: for example
-address, for example. In this way we can exclude user input that would
+123 in JSON is OK, but 0123 is not.
-otherwise have nasty effects on our program (crashing it or making it
-go into an infinite loop, if not worse). In tools like Snort, scanning
+Often we also face the problem that we are given a string (for example
-for computer viruses or filtering out spam usually involves scanning
+some user input) and want to know whether it matches a particular
-for some signature (essentially a string pattern).  The point is that
+pattern---be it an email address, for example. In this way we can
-the fast Knuth-Morris-Pratt algorithm for strings is not good enough
+exclude user input that would otherwise have nasty effects on our
-for such string \emph{patterns}.\smallskip
+program (crashing it or making it go into an infinite loop, if not
+worse). In tools like Snort, scanning for computer viruses or
+filtering out spam usually involves scanning for some signature
+(essentially a string pattern).  The point is that the fast
+Knuth-Morris-Pratt algorithm for strings is not good enough for such
+string \emph{patterns}.\smallskip
 \defn{Regular expressions} help with conveniently specifying
 such patterns. The idea behind regular expressions is that
 they are a simple method for describing languages (or sets of
 strings)\ldots at least languages we are interested in in
 much ubiquitous in computer science. There are many libraries
 implementing regular expressions. I am sure you have come across them
 before (remember the PRA module?). Why on earth then is there any
 interest in studying them again in depth in this module? Well, one
 answer is in the following two graphs about regular expression
-matching in Python, Ruby and Java (Version 8).
+matching in Python, Ruby, JavaScript and Java (Version 8).
 \begin{center}
 \begin{tabular}{@{\hspace{-1mm}}c@{\hspace{1mm}}c@{}}
 \begin{tikzpicture}
 \begin{axis}[
 ytick={0,5,...,30},
 scaled ticks=false,
 axis lines=left,
 width=5.5cm,
 height=4.5cm,
-legend entries={Python, Java 8},
+legend entries={Python, Java 8, JavaScript},
 legend pos=north west,
 legend cell align=left]
 \addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
+\addplot[red,mark=*, mark options={fill=white}] table {re-js.data};
 \end{axis}
 \end{tikzpicture}
 &
 \begin{tikzpicture}
 \begin{axis}[
 \end{axis}
 \end{tikzpicture}
 \end{tabular}
 \end{center}
-\noindent This first graph shows that Python and Java 8 need
+\noindent This first graph shows that Python, JavaScript and Java 8 need
 approximately 30 seconds to find out that the regular expression
 $\texttt{(a*)*\,b}$ does not match strings of 28 \texttt{a}s.
 Similarly, the second shows that Python needs approximately 29 seconds
 for finding out whether a string of 28 \texttt{a}s matches the regular
 expression \texttt{a?\{28\}\,a\{28\}}.  Ruby is even slightly

changeset 618	f4818c95a32e
parent 570	617c3b0e0a81
child 621	cf287db8dc15