updated
authorChristian Urban <christian.urban@kcl.ac.uk>
Sat, 02 Apr 2022 01:52:43 +0100
changeset 872 5f5e165c9a57
parent 871 94b84d880c2b
child 873 a25da86f7c8c
updated
handouts/ho01.pdf
handouts/ho01.tex
Binary file handouts/ho01.pdf has changed
--- a/handouts/ho01.tex	Tue Mar 22 00:36:18 2022 +0000
+++ b/handouts/ho01.tex	Sat Apr 02 01:52:43 2022 +0100
@@ -29,7 +29,7 @@
 %% emacs regexes
 %% https://www.gnu.org/software/emacs/manual/html_node/elisp/Regular-Expressions.html
 
-%%  reasons for a new programming language
+%% reasons for a new programming language
 %% http://beautifulracket.com
 
 % compiler explorer
@@ -75,9 +75,9 @@
 \end{quote}  
 
 \noindent
-So the goal of this module is to become a solid (beginner) compiler
-hacker and as part of the coursework to implement a small
-compiler for a very small programming language.
+Given this, the goal of this module is to become a solid (beginner) compiler
+hacker and as part of the coursework to implement two small
+compilers for two very small programming languages.
 
 The first part of the module is about the problem of text processing,
 which is needed for compilers, but also for dictionaries, DNA-data,
@@ -105,9 +105,9 @@
 pattern---is it an email address, for example. In this way we can
 exclude user input that would otherwise have nasty effects on our
 program (crashing it or making it go into an infinite loop, if not
-worse). This kind of ``inspecting'' mechanism is also implemented in
+worse). This kind of ``vetting'' mechanism is also implemented in
 popular network security tools such as Snort and
-Bro.\here{www.snort.org}\here{www.bro.org} They scan incoming
+Zeek.\here{www.snort.org}\here{www.bro.org} They scan incoming
 network traffic for computer viruses or malicious packets. Similarly
 filtering out spam usually involves looking for some signature
 (essentially a string pattern). The point is that the fast
@@ -300,7 +300,7 @@
 \end{tabular}
 \end{center}
 
-\noindent This first graph shows that Python, JavaScript and Java 8 need
+\noindent The first graph shows that Python, JavaScript and Java 8 need
 approximately 30 seconds to find out that the regular expression
 $\texttt{(a*)*\,b}$ does not match strings of 28 \texttt{a}s. Similarly,
 the second shows that Python and Ruby need approximately 29 seconds for finding
@@ -570,7 +570,7 @@
 do or which problems they are supposed to solve.
 
 To define the meaning of a regular expression we will
-associate with every regular expression a language, or set of
+associate with every regular expression a language---a set of
 strings. This language contains all the strings the regular
 expression is supposed to match. To understand what is going
 on here it is crucial that you have read the handout
@@ -626,13 +626,13 @@
 The point of the definition of $L$ is that we can use it to
 precisely specify when a string $s$ is matched by a regular
 expression $r$, namely if and only if $s \in L(r)$. In fact we
-will write a program \pcode{match} that takes any string $s$
-and any regular expression $r$ as arguments and returns
+will write a program \pcode{match} that takes a string $s$
+and a regular expression $r$ as arguments and returns
 \emph{yes}, if $s \in L(r)$ and \emph{no}, if $s \not\in
 L(r)$. We leave this for the next lecture.
 
 There is one more feature of regular expressions that is worth
-mentioning. Given some strings, there are in general many
+mentioning here. Given some strings, there are in general many
 different regular expressions that can recognise these
 strings. This is obvious with the regular expression $a + b$
 which can match the strings $a$ and $b$. But also the regular
@@ -649,7 +649,7 @@
 
 \noindent That means two regular expressions are said to be
 equivalent if they match the same set of strings. That is
-their meanings is the same. Therefore we
+their meanings are the same. Therefore we
 do not really distinguish between the different regular
 expression $(r_1 + r_2) + r_3$ and $r_1 + (r_2 + r_3)$,
 because they are equivalent. I leave you to the question
@@ -730,12 +730,12 @@
 (remember the shocking runtime of the regular expression matchers in
 Python, Ruby and Java in some instances and the problems in Stack
 Exchange and the Atom editor). They are used in tools like Snort and
-Bro in order to monitor network traffic. They have a beautiful mathematical
+Zeek in order to monitor network traffic. They have a beautiful mathematical
 theory behind them, which can be sometimes quite deep and which
 sometimes contains hidden snares.  People who are not very familiar
 with the mathematical background of regular expressions get them
 consistently wrong (this is surprising given they are a supposed to be
-core skill for computer scientists). The hope is that we can do better
+a core skill for computer scientists). The hope is that we can do better
 in the future---for example by proving that the algorithms actually
 satisfy their specification and that the corresponding implementations
 do not contain any bugs. We are close, but not yet quite there.
@@ -763,7 +763,7 @@
 they wilfully violate the RFC 5322 standard, which specifies
 the syntax of email addresses. With their proposed regular
 expression they are too strict in some cases and too lax in
-others. Not a good situation to be in. A regular expression
+others\ldots{}not a good situation to be in. A regular expression
 that is claimed to be closer to the standard is shown in
 Figure~\ref{monster}. Whether this claim is true or not, I
 would not know---the only thing I can say about this regular