cst_tests: comparison ecp/ecoop

equal deleted inserted replaced

-:fc1597145975
+:bffa240d5b7a
 % \usepackage{hyperref}
 % \usepackage[margin=0.5in]{geometry}
 %\usepackage{pmboxdraw}
 \title{POSIX Regular Expression Matching and Lexing}
-\author[1]{Chengsong Tan \\ King's College London\\chengsong.tan@kcl.ac.uk}
+\author[1]{Chengsong Tan}
+\affil[1]{\\ Department of Informatics, King's College London\\
+London, UK\\
+\texttt{chengsong.tan@kcl.ac.uk}}
+\authorrunning{Chengsong Tan}
+\Copyright{Chengsong Tan}
 \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
 \newcommand{\ZERO}{\mbox{\bf 0}}
 \newcommand{\ONE}{\mbox{\bf 1}}
 \def\lexer{\mathit{lexer}}
 \begin{document}
 \maketitle
 \begin{abstract}
 Brzozowski introduced in 1964 a beautifully simple algorithm for
 regular expression matching based on the notion of derivatives of
 regular expressions. In 2014, Sulzmann and Lu extended this
 algorithm to not just give a YES/NO answer for whether or not a regular
 frequent enough that a separate name has been created for
 them---\emph{evil regular expressions}. In empiric work, Davis et al
 report that they have found thousands of such evil regular expressions
 in the JavaScript and Python ecosystems \cite{Davis18}.
-This exponential blowup sometimes causes real pain in ``real life'':
+This exponential blowup sometimes causes real pain in real life:
-for example one evil regular expression brought on 20 July 2016 the
+for example on 20 July 2016 one evil regular expression brought the
 webpage \href{http://stackexchange.com}{Stack Exchange} to its knees\cite{SE16}.
 In this instance, a regular expression intended to just trim white
 spaces from the beginning and the end of a line actually consumed
 massive amounts of CPU-resources and because of this the web servers
 ground to a halt. This happened when a post with 20,000 white spaces
 ``fire''---so is it an identifier or a keyword?  While in applications
 there is a well-known strategy to decide these questions, called POSIX
 matching, only relatively recently precise definitions of what POSIX
 matching actually means have been formalised
 \cite{AusafDyckhoffUrban2016,OkuiSuzuki2010,Vansummeren2006}. Roughly,
-POSIX matching means to match the longest initial substring and
+POSIX matching means matching the longest initial substring and
-possible ties are solved according to some priorities attached to the
+in the case of a tie, the initial submatch is chosen according to some priorities attached to the
 regular expressions (e.g.~keywords have a higher priority than
 identifiers). This sounds rather simple, but according to Grathwohl et
 al \cite[Page 36]{CrashCourse2014} this is not the case. They wrote:
 \begin{quote}
 The main point of the bitsequences and annotated regular expressions
 is that we can apply rather aggressive (in terms of size)
 simplification rules in order to keep derivatives small.  We have
 developed such ``aggressive'' simplification rules and generated test
 data that show that the expected bound can be achieved. Obviously we
-could only cover partially the search space as there are infinitely
+could only partially cover  the search space as there are infinitely
 many regular expressions and strings. One modification we introduced
 is to allow a list of annotated regular expressions in the
 \textit{ALTS} constructor. This allows us to not just delete
 unnecessary $\ZERO$s and $\ONE$s from regular expressions, but also
 unnecessary ``copies'' of regular expressions (very similar to

changeset 24	bffa240d5b7a
parent 22	feffec3af1a1
child 25	5ca7bf724474