lexing: ChengsongTanPhdThesis/Chapters/RelatedWork.tex@4b1149fb5aec (annotated)

608 37b6fd310a16 added related work chap Chengsong parents: diff changeset	1	% Chapter Template
37b6fd310a16 added related work chap Chengsong parents: diff changeset	2
37b6fd310a16 added related work chap Chengsong parents: diff changeset	3	\chapter{Related Work} % Main chapter title
37b6fd310a16 added related work chap Chengsong parents: diff changeset	4
37b6fd310a16 added related work chap Chengsong parents: diff changeset	5	\label{RelatedWork}
37b6fd310a16 added related work chap Chengsong parents: diff changeset	6
37b6fd310a16 added related work chap Chengsong parents: diff changeset	7	In this chapter, we introduce
37b6fd310a16 added related work chap Chengsong parents: diff changeset	8	the work relevant to this thesis.
37b6fd310a16 added related work chap Chengsong parents: diff changeset	9
37b6fd310a16 added related work chap Chengsong parents: diff changeset	10
37b6fd310a16 added related work chap Chengsong parents: diff changeset	11	\section{Work on Back-References}
37b6fd310a16 added related work chap Chengsong parents: diff changeset	12	We introduced back-references
37b6fd310a16 added related work chap Chengsong parents: diff changeset	13	in chapter \ref{Introduction}.
37b6fd310a16 added related work chap Chengsong parents: diff changeset	14	This is a quite deep problem,
37b6fd310a16 added related work chap Chengsong parents: diff changeset	15	with theoretical work on them being
37b6fd310a16 added related work chap Chengsong parents: diff changeset	16	fairly recent.
37b6fd310a16 added related work chap Chengsong parents: diff changeset	17
37b6fd310a16 added related work chap Chengsong parents: diff changeset	18	Campaneu gave
37b6fd310a16 added related work chap Chengsong parents: diff changeset	19
37b6fd310a16 added related work chap Chengsong parents: diff changeset	20
37b6fd310a16 added related work chap Chengsong parents: diff changeset	21	See \cite{AHO1990255} for a survey
37b6fd310a16 added related work chap Chengsong parents: diff changeset	22	of these works and comparison between different
37b6fd310a16 added related work chap Chengsong parents: diff changeset	23	flavours (e.g. whether references can be circular,
37b6fd310a16 added related work chap Chengsong parents: diff changeset	24	can labels be repeated etc.) of back-references syntax.
37b6fd310a16 added related work chap Chengsong parents: diff changeset	25
609 61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	26
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	27	\subsection{Matchers and Lexers with Mechanised Proofs}
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	28	We are aware
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	29	of a mechanised correctness proof of Brzozowski's derivative-based matcher in HOL4 by
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	30	Owens and Slind~\parencite{Owens2008}. Another one in Isabelle/HOL is part
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	31	of the work by Krauss and Nipkow \parencite{Krauss2011}. And another one
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	32	in Coq is given by Coquand and Siles \parencite{Coquand2012}.
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	33	Also Ribeiro and Du Bois give one in Agda \parencite{RibeiroAgda2017}.
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	34
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	35	\subsection{Static Analysis of Evil Regex Patterns}
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	36	When a regular expression does not behave as intended,
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	37	people usually try to rewrite the regex to some equivalent form
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	38	or they try to avoid the possibly problematic patterns completely,
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	39	for which many false positives exist\parencite{Davis18}.
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	40	Animated tools to "debug" regular expressions such as
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	41	\parencite{regexploit2021} \parencite{regex101} are also popular.
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	42	We are also aware of static analysis work on regular expressions that
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	43	aims to detect potentially expoential regex patterns. Rathnayake and Thielecke
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	44	\parencite{Rathnayake2014StaticAF} proposed an algorithm
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	45	that detects regular expressions triggering exponential
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	46	behavious on backtracking matchers.
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	47	Weideman \parencite{Weideman2017Static} came up with
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	48	non-linear polynomial worst-time estimates
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	49	for regexes, attack string that exploit the worst-time
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	50	scenario, and "attack automata" that generates
61139fdddae0 chap1 totally done Chengsong parents: 608 diff changeset	51	attack strings.
622 4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	52
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	53
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	54
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	55
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	56
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	57
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	58	Thanks to our theorem-prover-friendly approach,
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	59	we believe that
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	60	this finiteness bound can be improved to a bound
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	61	linear to input and
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	62	cubic to the regular expression size using a technique by
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	63	Antimirov\cite{Antimirov95}.
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	64	Once formalised, this would be a guarantee for the absence of all super-linear behavious.
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	65	We are working out the
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	66	details.
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	67
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	68
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	69	To our best knowledge, no lexing libraries using Brzozowski derivatives
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	70	have similar complexity-related bounds,
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	71	and claims about running time are usually speculative and backed by empirical
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	72	evidence on a few test cases.
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	73	If a matching or lexing algorithm
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	74	does not come with certain basic complexity related
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	75	guarantees (for examaple the internal data structure size
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	76	does not grow indefinitely),
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	77	then they cannot claim with confidence having solved the problem
4b1149fb5aec incorporated more comments, bib Chengsong parents: 609 diff changeset	78	of catastrophic backtracking.

author	Chengsong
	Fri, 11 Nov 2022 00:23:53 +0000
changeset 622	4b1149fb5aec
parent 609	61139fdddae0
child 624	8ffa28fce271
permissions	-rw-r--r--