ecp/ecoop_paper.tex
changeset 24 bffa240d5b7a
parent 22 feffec3af1a1
child 25 5ca7bf724474
--- a/ecp/ecoop_paper.tex	Wed Jun 26 16:08:49 2019 +0100
+++ b/ecp/ecoop_paper.tex	Wed Jun 26 17:15:48 2019 +0100
@@ -11,7 +11,12 @@
 %\usepackage{pmboxdraw}
  
 \title{POSIX Regular Expression Matching and Lexing}
-\author[1]{Chengsong Tan \\ King's College London\\chengsong.tan@kcl.ac.uk}
+\author[1]{Chengsong Tan}
+\affil[1]{\\ Department of Informatics, King's College London\\
+London, UK\\
+\texttt{chengsong.tan@kcl.ac.uk}}
+\authorrunning{Chengsong Tan}
+\Copyright{Chengsong Tan}
 
 \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
 \newcommand{\ZERO}{\mbox{\bf 0}}
@@ -42,6 +47,7 @@
 \begin{document}
 
 \maketitle
+
 \begin{abstract}
   Brzozowski introduced in 1964 a beautifully simple algorithm for
   regular expression matching based on the notion of derivatives of
@@ -158,8 +164,8 @@
 report that they have found thousands of such evil regular expressions
 in the JavaScript and Python ecosystems \cite{Davis18}.
 
-This exponential blowup sometimes causes real pain in ``real life'':
-for example one evil regular expression brought on 20 July 2016 the
+This exponential blowup sometimes causes real pain in real life:
+for example on 20 July 2016 one evil regular expression brought the
 webpage \href{http://stackexchange.com}{Stack Exchange} to its knees\cite{SE16}.
 In this instance, a regular expression intended to just trim white
 spaces from the beginning and the end of a line actually consumed
@@ -196,8 +202,8 @@
 matching, only relatively recently precise definitions of what POSIX
 matching actually means have been formalised
 \cite{AusafDyckhoffUrban2016,OkuiSuzuki2010,Vansummeren2006}. Roughly,
-POSIX matching means to match the longest initial substring and
-possible ties are solved according to some priorities attached to the
+POSIX matching means matching the longest initial substring and
+in the case of a tie, the initial submatch is chosen according to some priorities attached to the
 regular expressions (e.g.~keywords have a higher priority than
 identifiers). This sounds rather simple, but according to Grathwohl et
 al \cite[Page 36]{CrashCourse2014} this is not the case. They wrote:
@@ -438,7 +444,7 @@
 simplification rules in order to keep derivatives small.  We have
 developed such ``aggressive'' simplification rules and generated test
 data that show that the expected bound can be achieved. Obviously we
-could only cover partially the search space as there are infinitely
+could only partially cover  the search space as there are infinitely
 many regular expressions and strings. One modification we introduced
 is to allow a list of annotated regular expressions in the
 \textit{ALTS} constructor. This allows us to not just delete