--- a/ecp/ecoop_paper.tex Wed Jun 26 16:08:49 2019 +0100
+++ b/ecp/ecoop_paper.tex Wed Jun 26 17:15:48 2019 +0100
@@ -11,7 +11,12 @@
%\usepackage{pmboxdraw}
\title{POSIX Regular Expression Matching and Lexing}
-\author[1]{Chengsong Tan \\ King's College London\\chengsong.tan@kcl.ac.uk}
+\author[1]{Chengsong Tan}
+\affil[1]{\\ Department of Informatics, King's College London\\
+London, UK\\
+\texttt{chengsong.tan@kcl.ac.uk}}
+\authorrunning{Chengsong Tan}
+\Copyright{Chengsong Tan}
\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
\newcommand{\ZERO}{\mbox{\bf 0}}
@@ -42,6 +47,7 @@
\begin{document}
\maketitle
+
\begin{abstract}
Brzozowski introduced in 1964 a beautifully simple algorithm for
regular expression matching based on the notion of derivatives of
@@ -158,8 +164,8 @@
report that they have found thousands of such evil regular expressions
in the JavaScript and Python ecosystems \cite{Davis18}.
-This exponential blowup sometimes causes real pain in ``real life'':
-for example one evil regular expression brought on 20 July 2016 the
+This exponential blowup sometimes causes real pain in real life:
+for example on 20 July 2016 one evil regular expression brought the
webpage \href{http://stackexchange.com}{Stack Exchange} to its knees\cite{SE16}.
In this instance, a regular expression intended to just trim white
spaces from the beginning and the end of a line actually consumed
@@ -196,8 +202,8 @@
matching, only relatively recently precise definitions of what POSIX
matching actually means have been formalised
\cite{AusafDyckhoffUrban2016,OkuiSuzuki2010,Vansummeren2006}. Roughly,
-POSIX matching means to match the longest initial substring and
-possible ties are solved according to some priorities attached to the
+POSIX matching means matching the longest initial substring and
+in the case of a tie, the initial submatch is chosen according to some priorities attached to the
regular expressions (e.g.~keywords have a higher priority than
identifiers). This sounds rather simple, but according to Grathwohl et
al \cite[Page 36]{CrashCourse2014} this is not the case. They wrote:
@@ -438,7 +444,7 @@
simplification rules in order to keep derivatives small. We have
developed such ``aggressive'' simplification rules and generated test
data that show that the expected bound can be achieved. Obviously we
-could only cover partially the search space as there are infinitely
+could only partially cover the search space as there are infinitely
many regular expressions and strings. One modification we introduced
is to allow a list of annotated regular expressions in the
\textit{ALTS} constructor. This allows us to not just delete