ChengsongTanPhdThesis/Chapters/RelatedWork.tex
author Chengsong
Sat, 12 Nov 2022 21:34:40 +0000
changeset 624 8ffa28fce271
parent 622 4b1149fb5aec
child 625 b797c9a709d9
permissions -rw-r--r--
all comments incorporated!!+related work
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
608
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
     1
% Chapter Template
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
     2
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
     3
\chapter{Related Work} % Main chapter title
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
     4
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
     5
\label{RelatedWork} 
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
     6
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
     7
In this chapter, we introduce
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
     8
the work relevant to this thesis.
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
     9
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    10
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    11
\section{Work on Back-References}
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    12
We introduced back-references
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    13
in chapter \ref{Introduction}.
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    14
This is a quite deep problem,
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    15
with theoretical work on them being
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    16
fairly recent.
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    17
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    18
Campaneu et al. studied regexes with back-references
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    19
in the context of formal languages theory in 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    20
their 2003 work \cite{campeanu2003formal}.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    21
They devised a pumping lemma for Perl-like regexes, 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    22
proving that the langugages denoted by them
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    23
is  properly  contained in context-sensitive
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    24
languages.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    25
More interesting questions such as 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    26
whether the language denoted by Perl-like regexes
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    27
can express palindromes ($\{w \mid w = w^R\}$)
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    28
are discussed in \cite{campeanu2009patterns} 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    29
and \cite{CAMPEANU2009Intersect}.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    30
Freydenberger \cite{Frey2013} investigated the 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    31
expressive power of back-references. He showed several 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    32
undecidability and decriptional complexity results 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    33
of back-references, concluding that they add
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    34
great power to certain programming tasks but are difficult to harness.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    35
An interesting question would then be
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    36
whether we can add restrictions to them, 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    37
so that they become expressive enough, but not too powerful.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    38
Freydenberger and Schmid \cite{FREYDENBERGER20191}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    39
introduced the notion of deterministic
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    40
regular expressions with back-references to achieve
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    41
a better balance between expressiveness and tractability.
608
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    42
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    43
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    44
Fernau and Schmid \cite{FERNAU2015287} and Schmid \cite{Schmid2012}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    45
investigated the time complexity of different variants
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    46
of back-references.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    47
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    48
See \cite{BERGLUND2022} for a survey
608
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    49
of these works and comparison between different
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    50
flavours  of back-references syntax (e.g. whether references can be circular, 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 622
diff changeset
    51
can labels be repeated etc.).
608
37b6fd310a16 added related work chap
Chengsong
parents:
diff changeset
    52
609
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    53
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    54
\subsection{Matchers and Lexers with Mechanised Proofs}
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    55
We are aware
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    56
of a mechanised correctness proof of Brzozowski's derivative-based matcher in HOL4 by
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    57
Owens and Slind~\parencite{Owens2008}. Another one in Isabelle/HOL is part
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    58
of the work by Krauss and Nipkow \parencite{Krauss2011}.  And another one
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    59
in Coq is given by Coquand and Siles \parencite{Coquand2012}.
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    60
Also Ribeiro and Du Bois give one in Agda \parencite{RibeiroAgda2017}.
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    61
 
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    62
\subsection{Static Analysis of Evil Regex Patterns} 
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    63
 When a regular expression does not behave as intended,
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    64
people usually try to rewrite the regex to some equivalent form
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    65
or they try to avoid the possibly problematic patterns completely,
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    66
for which many false positives exist\parencite{Davis18}.
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    67
Animated tools to "debug" regular expressions such as
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    68
 \parencite{regexploit2021} \parencite{regex101} are also popular.
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    69
We are also aware of static analysis work on regular expressions that
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    70
aims to detect potentially expoential regex patterns. Rathnayake and Thielecke 
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    71
\parencite{Rathnayake2014StaticAF} proposed an algorithm
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    72
that detects regular expressions triggering exponential
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    73
behavious on backtracking matchers.
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    74
Weideman \parencite{Weideman2017Static} came up with 
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    75
non-linear polynomial worst-time estimates
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    76
for regexes, attack string that exploit the worst-time 
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    77
scenario, and "attack automata" that generates
61139fdddae0 chap1 totally done
Chengsong
parents: 608
diff changeset
    78
attack strings.
622
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    79
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    80
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    81
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    82
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    83
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    84
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    85
Thanks to our theorem-prover-friendly approach,
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    86
we believe that 
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    87
this finiteness bound can be improved to a bound
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    88
linear to input and
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    89
cubic to the regular expression size using a technique by
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    90
Antimirov\cite{Antimirov95}.
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    91
Once formalised, this would be a guarantee for the absence of all super-linear behavious.
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    92
We are working out the
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    93
details.
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    94
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    95
 
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    96
To our best knowledge, no lexing libraries using Brzozowski derivatives
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    97
have similar complexity-related bounds, 
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    98
and claims about running time are usually speculative and backed by empirical
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
    99
evidence on a few test cases.
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
   100
If a matching or lexing algorithm
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
   101
does not come with certain basic complexity related 
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
   102
guarantees (for examaple the internal data structure size
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
   103
does not grow indefinitely), 
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
   104
then they cannot claim with confidence having solved the problem
4b1149fb5aec incorporated more comments, bib
Chengsong
parents: 609
diff changeset
   105
of catastrophic backtracking.