# HG changeset patch # User Chengsong # Date 1561565748 -3600 # Node ID bffa240d5b7a8c280aa1ffd1e371f1e6f7b8d42b # Parent fc15971459755d49028c61fbcbef10af37ad67c1 easy changes, url to misc, author info format changing, mysterious bug on line 49 diff -r fc1597145975 -r bffa240d5b7a ecp/ecoop_paper.tex --- a/ecp/ecoop_paper.tex Wed Jun 26 16:08:49 2019 +0100 +++ b/ecp/ecoop_paper.tex Wed Jun 26 17:15:48 2019 +0100 @@ -11,7 +11,12 @@ %\usepackage{pmboxdraw} \title{POSIX Regular Expression Matching and Lexing} -\author[1]{Chengsong Tan \\ King's College London\\chengsong.tan@kcl.ac.uk} +\author[1]{Chengsong Tan} +\affil[1]{\\ Department of Informatics, King's College London\\ +London, UK\\ +\texttt{chengsong.tan@kcl.ac.uk}} +\authorrunning{Chengsong Tan} +\Copyright{Chengsong Tan} \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% \newcommand{\ZERO}{\mbox{\bf 0}} @@ -42,6 +47,7 @@ \begin{document} \maketitle + \begin{abstract} Brzozowski introduced in 1964 a beautifully simple algorithm for regular expression matching based on the notion of derivatives of @@ -158,8 +164,8 @@ report that they have found thousands of such evil regular expressions in the JavaScript and Python ecosystems \cite{Davis18}. -This exponential blowup sometimes causes real pain in ``real life'': -for example one evil regular expression brought on 20 July 2016 the +This exponential blowup sometimes causes real pain in real life: +for example on 20 July 2016 one evil regular expression brought the webpage \href{http://stackexchange.com}{Stack Exchange} to its knees\cite{SE16}. In this instance, a regular expression intended to just trim white spaces from the beginning and the end of a line actually consumed @@ -196,8 +202,8 @@ matching, only relatively recently precise definitions of what POSIX matching actually means have been formalised \cite{AusafDyckhoffUrban2016,OkuiSuzuki2010,Vansummeren2006}. Roughly, -POSIX matching means to match the longest initial substring and -possible ties are solved according to some priorities attached to the +POSIX matching means matching the longest initial substring and +in the case of a tie, the initial submatch is chosen according to some priorities attached to the regular expressions (e.g.~keywords have a higher priority than identifiers). This sounds rather simple, but according to Grathwohl et al \cite[Page 36]{CrashCourse2014} this is not the case. They wrote: @@ -438,7 +444,7 @@ simplification rules in order to keep derivatives small. We have developed such ``aggressive'' simplification rules and generated test data that show that the expected bound can be achieved. Obviously we -could only cover partially the search space as there are infinitely +could only partially cover the search space as there are infinitely many regular expressions and strings. One modification we introduced is to allow a list of annotated regular expressions in the \textit{ALTS} constructor. This allows us to not just delete diff -r fc1597145975 -r bffa240d5b7a ecp/lipics-sample-article.pdf Binary file ecp/lipics-sample-article.pdf has changed diff -r fc1597145975 -r bffa240d5b7a ecp/root.bib --- a/ecp/root.bib Wed Jun 26 16:08:49 2019 +0100 +++ b/ecp/root.bib Wed Jun 26 17:15:48 2019 +0100 @@ -1,24 +1,23 @@ %% This BibTeX bibliography file was created using BibDesk. %% https://bibdesk.sourceforge.io/ -%% Created for CS TAN at 2019-06-26 12:42:11 +0100 +%% Created for CS TAN at 2019-06-26 17:07:31 +0100 %% Saved with string encoding Unicode (UTF-8) -@url{SE16, +@misc{SE16, Author = {StackStatus}, Date-Added = {2019-06-26 11:28:41 +0000}, - Date-Modified = {2019-06-26 11:42:11 +0000}, + Date-Modified = {2019-06-26 16:07:31 +0000}, Keywords = {ReDos Attack}, - Month = {07}, + Month = {July}, Rating = {5}, Read = {1}, Title = {Stack Overflow Outage Postmortem}, Url = {https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}, - Urldate = {2016.7.20}, Year = {2016}, Bdsk-Url-1 = {https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}}