updated
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Mon, 15 Sep 2014 04:42:48 +0100
changeset 248 ce767ca23244
parent 247 84b4bc6e8554
child 249 377c59df7297
updated
handouts/ho01.pdf
handouts/ho01.tex
Binary file handouts/ho01.pdf has changed
--- a/handouts/ho01.tex	Mon Sep 15 04:16:28 2014 +0100
+++ b/handouts/ho01.tex	Mon Sep 15 04:42:48 2014 +0100
@@ -82,7 +82,7 @@
 
 Many programming language offer libraries that can be used to
 validate such strings against regular expressions. Also there
-are some common, and I am sure very familiar, ways how to
+are some common, and I am sure very familiar, ways of how to
 construct regular expressions. For example in Scala we have: 
 
 \begin{center}
@@ -261,9 +261,9 @@
 careful with our overloading of the star: assuming you have
 read the handout about our basic mathematical notation, you
 will see that in the context of languages (sets of strings)
-the star stands for an operation on languages. While here
-$r^*$ stands for a regular expression, which is different from
-the operation on sets is defined as
+the star stands for an operation on languages. Here $r^*$
+stands for a regular expression, which is different from the
+operation on sets is defined as
 
 \[
 A^* \dn \bigcup_{0\le n} A^n
@@ -277,8 +277,8 @@
 will be helpful. For example we will write $(r_1 + r_2)^*$,
 which is different from, say $r_1 + (r_2)^*$. The former means
 roughly zero or more times $r_1$ or $r_2$, while the latter
-means $r_1$ or zero or more times $r_2$. This will turn out
-are two different patterns, which match in general different
+means $r_1$ or zero or more times $r_2$. This will turn out to
+be two different patterns, which match in general different
 strings. We should also write $(r_1 + r_2) + r_3$, which is
 different from the regular expression $r_1 + (r_2 + r_3)$, but
 in case of $+$ and $\cdot$ we actually do not care about the
@@ -304,7 +304,7 @@
 should write
 
 \[
-{\it h} \cdot {\it e} \cdot {\it l} \cdot {\it l} \cdot {\it o}
+h \cdot e \cdot l \cdot l \cdot o
 \]
 
 \noindent
@@ -313,7 +313,7 @@
 If you prefer to think in terms of the implementation
 of regular expressions in Scala, the constructors and
 classes relate as follows\footnote{More about Scala is 
-in the handout about a crash-course on Scala.}
+in the handout about A Crash-Course on Scala.}
 
 \begin{center}
 \begin{tabular}{rcl}
@@ -378,8 +378,7 @@
 e \cdot l \cdot l \cdot o$ is, namely
 
 \[
-L({\it h} \cdot {\it e} \cdot {\it l} \cdot {\it l} \cdot
-{\it o}) = \{"hello"\}
+L(h \cdot e \cdot  l \cdot l \cdot o) = \{"hello"\}
 \]
 
 \noindent This is expected because this regular expression 
@@ -393,7 +392,7 @@
 \noindent You can now also see why we do not make a difference
 between the different regular expressions $(r_1 + r_2) + r_3$
 and $r_1 + (r_2 + r_3)$\ldots they are not the same regular
-expression, but have the same meaning. 
+expression, but they have the same meaning. For example
 
 \begin{eqnarray*}
 L((r_1 + r_2) + r_3) & = & L(r_1 + r_2) \cup L(r_3)\\
@@ -426,17 +425,18 @@
 r_1 \equiv r_2 \;\dn\; L(r_1) = L(r_2)
 \]
 
-\noindent That means two regular expressions are equivalent if
-they match the same set of strings. Therefore we do not really
-distinguish between the different regular expression $(r_1 +
-r_2) + r_3$ and $r_1 + (r_2 + r_3)$, because they are
-equivalent. I leave you to the question whether
+\noindent That means two regular expressions are said to be
+equivalent if they match the same set of strings. Therefore we
+do not really distinguish between the different regular
+expression $(r_1 + r_2) + r_3$ and $r_1 + (r_2 + r_3)$,
+because they are equivalent. I leave you to the question
+whether
 
 \[
 \varnothing^* \equiv \epsilon^*
 \]
 
-\noindent holds. Such equivalences will be important for out
+\noindent holds. Such equivalences will be important for our
 matching algorithm, because we can use them to simplify 
 regular expressions. 
 
@@ -458,7 +458,7 @@
 
 To understand my fascination nowadays with regular
 expressions, you need to know that my main scientific interest
-for the last 14 years has been with in theorem provers. I am a
+for the last 14 years has been with theorem provers. I am a
 core developer of one of
 them.\footnote{\url{http://isabelle.in.tum.de}} Theorem
 provers are systems in which you can formally reason about
@@ -480,7 +480,7 @@
 expression matchers are
 buggy.\footnote{\url{http://www.haskell.org/haskellwiki/Regex_Posix}}
 With my PhD student Fahad Ausaf I am currently working on
-proving the correctness for one such algorithm that was
+proving the correctness for one such matcher that was
 proposed by Sulzmann and Lu in
 2014.\footnote{\url{http://goo.gl/bz0eHp}} This would be an
 attractive results since we will be able to prove that the
@@ -502,8 +502,8 @@
 shown via automata. Even sombody who has written a 700+-page
 book\footnote{\url{http://goo.gl/fD0eHx}} on regular
 exprssions did not know better. Well, we showed it can also be
-done with regular expressions only. What a feeling if you
-are an outsider to the subject!
+done with regular expressions only.\footnote{\url{http://www.inf.kcl.ac.uk/staff/urbanc/Publications/rexp.pdf}} 
+What a feeling if you are an outsider to the subject!
 
 To conclude: Despite my early ignorance about regular
 expressions, I find them now quite interesting. They have a
@@ -511,15 +511,15 @@
 importance (remember the shocking runtime of the regular
 expression matchers in Python and Ruby in some instances).
 People who are not very familiar with the mathematical
-background get them consistently wrong. The hope is that we
-can do better in the future---for example by proving that the
-algorithms actually satisfy their specification and that the
-corresponding implementations do not contain any bugs. We are
-close, but not yet quite there.
+background of regular expressions get them consistently wrong.
+The hope is that we can do better in the future---for example
+by proving that the algorithms actually satisfy their
+specification and that the corresponding implementations do
+not contain any bugs. We are close, but not yet quite there.
 
 Despite my fascination, I am also happy to admit that regular
 expressions have their shortcomings. There are some well-known
-``theoretical shortcomings'', for example recognising strings
+``theoretical'' shortcomings, for example recognising strings
 of the form $a^{n}b^{n}$. I am not so bothered by them. What I
 am bothered about is when regular expressions are in the way
 of practical programming. For example, it turns out that the
@@ -528,7 +528,7 @@
 being touted as something every computer scientist should know
 about). The W3 Consortium (which standardises the Web)
 proposes to use the following, already more complicated
-regular expressions:
+regular expressions for email addresses:
 
 {\small\begin{lstlisting}[language={},keywordstyle=\color{black},numbers=none]
 [a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*
@@ -541,11 +541,12 @@
 others. Not a good situation to be in. A regular expression
 that is claimed to be closer to the standard is shown in
 Figure~\ref{monster}. Whether this claim is true or not, I
-would not know---the only thing I can say it is a monstrosity. 
-However, this might actually be an argument against the 
-standard, rather than against regular expressions. Still it
-is good to know that some tasks in text processing just 
-cannot be achieved by using regular expressions.
+would not know---the only thing I can say to this regular
+expression is it is a monstrosity. However, this might
+actually be an argument against the RFC standard, rather than
+against regular expressions. Still it is good to know that
+some tasks in text processing just cannot be achieved by using
+regular expressions.
 
 
 \begin{figure}[p]