608
|
1 |
% Chapter Template
|
|
2 |
|
|
3 |
\chapter{Related Work} % Main chapter title
|
|
4 |
|
|
5 |
\label{RelatedWork}
|
|
6 |
|
|
7 |
In this chapter, we introduce
|
|
8 |
the work relevant to this thesis.
|
|
9 |
|
|
10 |
|
|
11 |
\section{Work on Back-References}
|
|
12 |
We introduced back-references
|
|
13 |
in chapter \ref{Introduction}.
|
|
14 |
This is a quite deep problem,
|
|
15 |
with theoretical work on them being
|
|
16 |
fairly recent.
|
|
17 |
|
624
|
18 |
Campaneu et al. studied regexes with back-references
|
|
19 |
in the context of formal languages theory in
|
|
20 |
their 2003 work \cite{campeanu2003formal}.
|
|
21 |
They devised a pumping lemma for Perl-like regexes,
|
|
22 |
proving that the langugages denoted by them
|
|
23 |
is properly contained in context-sensitive
|
|
24 |
languages.
|
|
25 |
More interesting questions such as
|
|
26 |
whether the language denoted by Perl-like regexes
|
|
27 |
can express palindromes ($\{w \mid w = w^R\}$)
|
|
28 |
are discussed in \cite{campeanu2009patterns}
|
|
29 |
and \cite{CAMPEANU2009Intersect}.
|
|
30 |
Freydenberger \cite{Frey2013} investigated the
|
|
31 |
expressive power of back-references. He showed several
|
|
32 |
undecidability and decriptional complexity results
|
|
33 |
of back-references, concluding that they add
|
|
34 |
great power to certain programming tasks but are difficult to harness.
|
|
35 |
An interesting question would then be
|
|
36 |
whether we can add restrictions to them,
|
|
37 |
so that they become expressive enough, but not too powerful.
|
|
38 |
Freydenberger and Schmid \cite{FREYDENBERGER20191}
|
|
39 |
introduced the notion of deterministic
|
|
40 |
regular expressions with back-references to achieve
|
|
41 |
a better balance between expressiveness and tractability.
|
608
|
42 |
|
|
43 |
|
624
|
44 |
Fernau and Schmid \cite{FERNAU2015287} and Schmid \cite{Schmid2012}
|
|
45 |
investigated the time complexity of different variants
|
|
46 |
of back-references.
|
|
47 |
|
|
48 |
See \cite{BERGLUND2022} for a survey
|
608
|
49 |
of these works and comparison between different
|
624
|
50 |
flavours of back-references syntax (e.g. whether references can be circular,
|
|
51 |
can labels be repeated etc.).
|
608
|
52 |
|
609
|
53 |
|
|
54 |
\subsection{Matchers and Lexers with Mechanised Proofs}
|
|
55 |
We are aware
|
|
56 |
of a mechanised correctness proof of Brzozowski's derivative-based matcher in HOL4 by
|
|
57 |
Owens and Slind~\parencite{Owens2008}. Another one in Isabelle/HOL is part
|
|
58 |
of the work by Krauss and Nipkow \parencite{Krauss2011}. And another one
|
|
59 |
in Coq is given by Coquand and Siles \parencite{Coquand2012}.
|
|
60 |
Also Ribeiro and Du Bois give one in Agda \parencite{RibeiroAgda2017}.
|
|
61 |
|
|
62 |
\subsection{Static Analysis of Evil Regex Patterns}
|
|
63 |
When a regular expression does not behave as intended,
|
|
64 |
people usually try to rewrite the regex to some equivalent form
|
|
65 |
or they try to avoid the possibly problematic patterns completely,
|
|
66 |
for which many false positives exist\parencite{Davis18}.
|
|
67 |
Animated tools to "debug" regular expressions such as
|
|
68 |
\parencite{regexploit2021} \parencite{regex101} are also popular.
|
|
69 |
We are also aware of static analysis work on regular expressions that
|
|
70 |
aims to detect potentially expoential regex patterns. Rathnayake and Thielecke
|
|
71 |
\parencite{Rathnayake2014StaticAF} proposed an algorithm
|
|
72 |
that detects regular expressions triggering exponential
|
|
73 |
behavious on backtracking matchers.
|
|
74 |
Weideman \parencite{Weideman2017Static} came up with
|
|
75 |
non-linear polynomial worst-time estimates
|
|
76 |
for regexes, attack string that exploit the worst-time
|
|
77 |
scenario, and "attack automata" that generates
|
|
78 |
attack strings.
|
622
|
79 |
|
|
80 |
|
|
81 |
|
|
82 |
|
|
83 |
|
|
84 |
|
|
85 |
Thanks to our theorem-prover-friendly approach,
|
|
86 |
we believe that
|
|
87 |
this finiteness bound can be improved to a bound
|
|
88 |
linear to input and
|
|
89 |
cubic to the regular expression size using a technique by
|
|
90 |
Antimirov\cite{Antimirov95}.
|
|
91 |
Once formalised, this would be a guarantee for the absence of all super-linear behavious.
|
|
92 |
We are working out the
|
|
93 |
details.
|
|
94 |
|
|
95 |
|
|
96 |
To our best knowledge, no lexing libraries using Brzozowski derivatives
|
|
97 |
have similar complexity-related bounds,
|
|
98 |
and claims about running time are usually speculative and backed by empirical
|
|
99 |
evidence on a few test cases.
|
|
100 |
If a matching or lexing algorithm
|
|
101 |
does not come with certain basic complexity related
|
|
102 |
guarantees (for examaple the internal data structure size
|
|
103 |
does not grow indefinitely),
|
|
104 |
then they cannot claim with confidence having solved the problem
|
|
105 |
of catastrophic backtracking.
|