lexing: ChengsongTanPhdThesis/Chapters/RelatedWork.tex@4b1149fb5aec


% Chapter Template

\chapter{Related Work} % Main chapter title

\label{RelatedWork} 

In this chapter, we introduce
the work relevant to this thesis.


\section{Work on Back-References}
We introduced back-references
in chapter \ref{Introduction}.
This is a quite deep problem,
with theoretical work on them being
fairly recent.

Campaneu gave 


See \cite{AHO1990255} for a survey
of these works and comparison between different
flavours (e.g. whether references can be circular, 
can labels be repeated etc.) of back-references syntax.


\subsection{Matchers and Lexers with Mechanised Proofs}
We are aware
of a mechanised correctness proof of Brzozowski's derivative-based matcher in HOL4 by
Owens and Slind~\parencite{Owens2008}. Another one in Isabelle/HOL is part
of the work by Krauss and Nipkow \parencite{Krauss2011}.  And another one
in Coq is given by Coquand and Siles \parencite{Coquand2012}.
Also Ribeiro and Du Bois give one in Agda \parencite{RibeiroAgda2017}.
 
\subsection{Static Analysis of Evil Regex Patterns} 
 When a regular expression does not behave as intended,
people usually try to rewrite the regex to some equivalent form
or they try to avoid the possibly problematic patterns completely,
for which many false positives exist\parencite{Davis18}.
Animated tools to "debug" regular expressions such as
 \parencite{regexploit2021} \parencite{regex101} are also popular.
We are also aware of static analysis work on regular expressions that
aims to detect potentially expoential regex patterns. Rathnayake and Thielecke 
\parencite{Rathnayake2014StaticAF} proposed an algorithm
that detects regular expressions triggering exponential
behavious on backtracking matchers.
Weideman \parencite{Weideman2017Static} came up with 
non-linear polynomial worst-time estimates
for regexes, attack string that exploit the worst-time 
scenario, and "attack automata" that generates
attack strings.






Thanks to our theorem-prover-friendly approach,
we believe that 
this finiteness bound can be improved to a bound
linear to input and
cubic to the regular expression size using a technique by
Antimirov\cite{Antimirov95}.
Once formalised, this would be a guarantee for the absence of all super-linear behavious.
We are working out the
details.

 
To our best knowledge, no lexing libraries using Brzozowski derivatives
have similar complexity-related bounds, 
and claims about running time are usually speculative and backed by empirical
evidence on a few test cases.
If a matching or lexing algorithm
does not come with certain basic complexity related 
guarantees (for examaple the internal data structure size
does not grow indefinitely), 
then they cannot claim with confidence having solved the problem
of catastrophic backtracking.
author	Chengsong
	Fri, 11 Nov 2022 00:23:53 +0000
changeset 622	4b1149fb5aec
parent 609	61139fdddae0
child 624	8ffa28fce271
permissions	-rw-r--r--