PhdThesisRealOne/LaTeXTemplates_masters-doctoral-thesis_v2/Chapters/Chapter1.tex
author Chengsong
Sun, 20 Mar 2022 23:32:08 +0000
changeset 456 26a5e640cdd7
child 465 2e7c7111c0be
permissions -rwxr-xr-x
realPhdThesis
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
456
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
     1
% Chapter 1
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
     2
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
     3
\chapter{Introduction} % Main chapter title
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
     4
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
     5
\label{Chapter1} % For referencing the chapter elsewhere, use \ref{Chapter1} 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
     6
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
     7
%----------------------------------------------------------------------------------------
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
     8
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
     9
% Define some commands to keep the formatting separated from the content 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    10
\newcommand{\keyword}[1]{\textbf{#1}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    11
\newcommand{\tabhead}[1]{\textbf{#1}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    12
\newcommand{\code}[1]{\texttt{#1}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    13
\newcommand{\file}[1]{\texttt{\bfseries#1}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    14
\newcommand{\option}[1]{\texttt{\itshape#1}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    15
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    16
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    17
\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    18
\newcommand{\ZERO}{\mbox{\bf 0}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    19
\newcommand{\ONE}{\mbox{\bf 1}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    20
\def\lexer{\mathit{lexer}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    21
\def\mkeps{\mathit{mkeps}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    22
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    23
\def\DFA{\textit{DFA}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    24
\def\bmkeps{\textit{bmkeps}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    25
\def\retrieve{\textit{retrieve}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    26
\def\blexer{\textit{blexer}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    27
\def\flex{\textit{flex}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    28
\def\inj{\mathit{inj}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    29
\def\Empty{\mathit{Empty}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    30
\def\Left{\mathit{Left}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    31
\def\Right{\mathit{Right}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    32
\def\Stars{\mathit{Stars}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    33
\def\Char{\mathit{Char}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    34
\def\Seq{\mathit{Seq}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    35
\def\Der{\mathit{Der}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    36
\def\nullable{\mathit{nullable}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    37
\def\Z{\mathit{Z}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    38
\def\S{\mathit{S}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    39
\def\rup{r^\uparrow}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    40
\def\simp{\mathit{simp}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    41
\def\simpALTs{\mathit{simp}\_\mathit{ALTs}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    42
\def\map{\mathit{map}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    43
\def\distinct{\mathit{distinct}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    44
\def\blexersimp{\mathit{blexer}\_\mathit{simp}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    45
%----------------------------------------------------------------------------------------
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    46
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    47
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    48
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    49
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    50
Regular expression matching and lexing has been 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    51
 widely-used and well-implemented
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    52
in software industry. 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    53
If you go to a popular programming language's
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    54
regex engine,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    55
you can supply it with regex and strings of your own,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    56
and get matching/lexing  information such as how a 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    57
sub-part of the regex matches a sub-part of the string.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    58
These lexing libraries are on average quite fast.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    59
For example, the regex engines some network intrusion detection
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    60
systems use are able to process
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    61
 megabytes or even gigabytes of network traffic per second.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    62
 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    63
Why do we need to have our version, if the algorithms work well on 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    64
average?
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    65
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    66
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    67
Take $(a^*)^*\,b$ and ask whether
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    68
strings of the form $aa..a$ match this regular
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    69
expression. Obviously this is not the case---the expected $b$ in the last
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    70
position is missing. One would expect that modern regular expression
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    71
matching engines can find this out very quickly. Alas, if one tries
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    72
this example in JavaScript, Python or Java 8 with strings like 28
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    73
$a$'s, one discovers that this decision takes around 30 seconds and
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    74
takes considerably longer when adding a few more $a$'s, as the graphs
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    75
below show:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    76
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    77
\begin{center}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    78
\begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    79
\begin{tikzpicture}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    80
\begin{axis}[
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    81
    xlabel={$n$},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    82
    x label style={at={(1.05,-0.05)}},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    83
    ylabel={time in secs},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    84
    enlargelimits=false,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    85
    xtick={0,5,...,30},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    86
    xmax=33,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    87
    ymax=35,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    88
    ytick={0,5,...,30},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    89
    scaled ticks=false,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    90
    axis lines=left,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    91
    width=5cm,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    92
    height=4cm, 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    93
    legend entries={JavaScript},  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    94
    legend pos=north west,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    95
    legend cell align=left]
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    96
\addplot[red,mark=*, mark options={fill=white}] table {re-js.data};
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    97
\end{axis}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    98
\end{tikzpicture}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
    99
  &
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   100
\begin{tikzpicture}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   101
\begin{axis}[
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   102
    xlabel={$n$},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   103
    x label style={at={(1.05,-0.05)}},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   104
    %ylabel={time in secs},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   105
    enlargelimits=false,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   106
    xtick={0,5,...,30},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   107
    xmax=33,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   108
    ymax=35,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   109
    ytick={0,5,...,30},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   110
    scaled ticks=false,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   111
    axis lines=left,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   112
    width=5cm,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   113
    height=4cm, 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   114
    legend entries={Python},  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   115
    legend pos=north west,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   116
    legend cell align=left]
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   117
\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   118
\end{axis}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   119
\end{tikzpicture}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   120
  &
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   121
\begin{tikzpicture}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   122
\begin{axis}[
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   123
    xlabel={$n$},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   124
    x label style={at={(1.05,-0.05)}},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   125
    %ylabel={time in secs},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   126
    enlargelimits=false,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   127
    xtick={0,5,...,30},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   128
    xmax=33,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   129
    ymax=35,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   130
    ytick={0,5,...,30},
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   131
    scaled ticks=false,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   132
    axis lines=left,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   133
    width=5cm,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   134
    height=4cm, 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   135
    legend entries={Java 8},  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   136
    legend pos=north west,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   137
    legend cell align=left]
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   138
\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   139
\end{axis}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   140
\end{tikzpicture}\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   141
\multicolumn{3}{c}{Graphs: Runtime for matching $(a^*)^*\,b$ with strings 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   142
           of the form $\underbrace{aa..a}_{n}$.}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   143
\end{tabular}    
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   144
\end{center}  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   145
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   146
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   147
This is clearly exponential behaviour, and 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   148
is triggered by some relatively simple regex patterns.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   149
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   150
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   151
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   152
The opens up the possibility of
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   153
 a ReDoS (regular expression denial-of-service ) attack.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   154
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   155
Theoretical results say that regular expression matching
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   156
should be linear with respect to the input. You could construct
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   157
an NFA via Thompson construction, and simulate  running it.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   158
This would be linear.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   159
Or you could determinize the NFA into a DFA, and minimize that
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   160
DFA. Once you have the DFA, the running time is also linear, requiring just 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   161
one scanning pass of the input.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   162
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   163
But modern  regex libraries in popular language engines
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   164
 often want to support richer constructs
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   165
than  the most basic regular expressions such as bounded repetitions
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   166
and back references. 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   167
%put in illustrations
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   168
%a{1}{3}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   169
These make a DFA construction impossible because
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   170
of an exponential states explosion.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   171
 They also want to support lexing rather than just matching
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   172
 for tasks that involves text processing.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   173
 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   174
 Existing regex libraries either pose restrictions on the user input, or does 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   175
 not give linear running time guarantee.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   176
 %TODO: give examples such as RE2 GOLANG 1000 restriction, rust no repetitions 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   177
 For example, the Rust regex engine claims to be linear, 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   178
 but does not support lookarounds and back-references.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   179
 The GoLang regex library does not support over 1000 repetitions.  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   180
 Java and Python both support back-references, but shows
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   181
catastrophic backtracking behaviours on inputs without back-references(
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   182
when the language is still regular).
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   183
 %TODO: test performance of Rust on (((((a*a*)b*)b){20})*)c  baabaabababaabaaaaaaaaababaaaababababaaaabaaabaaaaaabaabaabababaababaaaaaaaaababaaaababababaaaaaaaaaaaaac
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   184
 %TODO: verify the fact Rust does not allow 1000+ reps
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   185
 %TODO: Java 17 updated graphs? Is it ok to still use Java 8 graphs?
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   186
 Another thing about the these libraries is that there
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   187
 is no correctness guarantee.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   188
 In some cases they either fails to generate a lexing result when there is a match,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   189
 or gives the wrong way of matching.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   190
 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   191
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   192
This superlinear blowup in matching algorithms sometimes causes
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   193
considerable grief in real life: for example on 20 July 2016 one evil
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   194
regular expression brought the webpage
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   195
\href{http://stackexchange.com}{Stack Exchange} to its
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   196
%knees.\footnote{\url{https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   197
In this instance, a regular expression intended to just trim white
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   198
spaces from the beginning and the end of a line actually consumed
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   199
massive amounts of CPU-resources---causing web servers to grind to a
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   200
halt. This happened when a post with 20,000 white spaces was submitted,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   201
but importantly the white spaces were neither at the beginning nor at
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   202
the end. As a result, the regular expression matching engine needed to
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   203
backtrack over many choices. In this example, the time needed to process
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   204
the string was $O(n^2)$ with respect to the string length. This
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   205
quadratic overhead was enough for the homepage of Stack Exchange to
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   206
respond so slowly that the load balancer assumed there must be some
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   207
attack and therefore stopped the servers from responding to any
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   208
requests. This made the whole site become unavailable. Another very
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   209
recent example is a global outage of all Cloudflare servers on 2 July
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   210
2019. A poorly written regular expression exhibited exponential
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   211
behaviour and exhausted CPUs that serve HTTP traffic. Although the
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   212
outage had several causes, at the heart was a regular expression that
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   213
was used to monitor network
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   214
%traffic.\footnote{\url{https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   215
 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   216
 Is it possible to have a regex lexing algorithm with proven correctness and 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   217
 time complexity, which allows easy extensions to
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   218
  constructs like 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   219
 bounded repetitions, negation,  lookarounds, and even back-references? 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   220
 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   221
 We propose Brzozowski's derivatives as a solution to this problem.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   222
 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   223
 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   224
 \section{Why Brzozowski}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   225
 In the last fifteen or so years, Brzozowski's derivatives of regular
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   226
expressions have sparked quite a bit of interest in the functional
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   227
programming and theorem prover communities.  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   228
The beauty of
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   229
Brzozowski's derivatives \parencite{Brzozowski1964} is that they are neatly
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   230
expressible in any functional language, and easily definable and
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   231
reasoned about in theorem provers---the definitions just consist of
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   232
inductive datatypes and simple recursive functions. 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   233
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   234
Suppose we have an alphabet $\Sigma$, the strings  whose characters
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   235
are from $\Sigma$
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   236
can be expressed as $\Sigma^*$.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   237
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   238
We use patterns to define a set of strings concisely. Regular expressions
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   239
are one of such patterns systems:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   240
The basic regular expressions  are defined inductively
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   241
 by the following grammar:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   242
\[			r ::=   \ZERO \mid  \ONE
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   243
			 \mid  c  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   244
			 \mid  r_1 \cdot r_2
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   245
			 \mid  r_1 + r_2   
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   246
			 \mid r^*         
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   247
\]
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   248
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   249
The language or set of strings defined by regular expressions are defined as
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   250
%TODO: FILL in the other defs
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   251
\begin{center}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   252
\begin{tabular}{lcl}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   253
$L \; r_1 + r_2$ & $\dn$ & $ L \; r_1 \cup L \; r_2$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   254
$L \; r_1 \cdot r_2$ & $\dn$ & $ L \; r_1 \cap L \; r_2$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   255
\end{tabular}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   256
\end{center}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   257
Which are also called the "language interpretation".
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   258
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   259
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   260
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   261
The Brzozowski derivative w.r.t character $c$ is an operation on the regex,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   262
where the operation transforms the regex to a new one containing
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   263
strings without the head character $c$.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   264
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   265
Formally, we define first such a transformation on any string set, which
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   266
we call semantic derivative:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   267
\begin{center}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   268
$\Der \; c\; \textit{StringSet} = \{s \mid c :: s \in StringSet\}$
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   269
\end{center}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   270
Mathematically, it can be expressed as the 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   271
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   272
If the $\textit{StringSet}$ happen to have some structure, for example,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   273
if it is regular, then we have that it
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   274
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   275
The the derivative of regular expression, denoted as
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   276
$r \backslash c$, is a function that takes parameters
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   277
$r$ and $c$, and returns another regular expression $r'$,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   278
which is computed by the following recursive function:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   279
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   280
\begin{center}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   281
\begin{tabular}{lcl}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   282
		$\ZERO \backslash c$ & $\dn$ & $\ZERO$\\  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   283
		$\ONE \backslash c$  & $\dn$ & $\ZERO$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   284
		$d \backslash c$     & $\dn$ & 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   285
		$\mathit{if} \;c = d\;\mathit{then}\;\ONE\;\mathit{else}\;\ZERO$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   286
$(r_1 + r_2)\backslash c$     & $\dn$ & $r_1 \backslash c \,+\, r_2 \backslash c$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   287
$(r_1 \cdot r_2)\backslash c$ & $\dn$ & $\mathit{if} \, nullable(r_1)$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   288
	&   & $\mathit{then}\;(r_1\backslash c) \cdot r_2 \,+\, r_2\backslash c$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   289
	&   & $\mathit{else}\;(r_1\backslash c) \cdot r_2$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   290
	$(r^*)\backslash c$           & $\dn$ & $(r\backslash c) \cdot r^*$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   291
\end{tabular}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   292
\end{center}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   293
\noindent
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   294
\noindent
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   295
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   296
The $\nullable$ function tests whether the empty string $""$ 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   297
is in the language of $r$:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   298
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   299
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   300
\begin{center}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   301
		\begin{tabular}{lcl}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   302
			$\nullable(\ZERO)$     & $\dn$ & $\mathit{false}$ \\  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   303
			$\nullable(\ONE)$      & $\dn$ & $\mathit{true}$ \\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   304
			$\nullable(c)$ 	       & $\dn$ & $\mathit{false}$ \\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   305
			$\nullable(r_1 + r_2)$ & $\dn$ & $\nullable(r_1) \vee \nullable(r_2)$ \\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   306
			$\nullable(r_1\cdot r_2)$  & $\dn$ & $\nullable(r_1) \wedge \nullable(r_2)$ \\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   307
			$\nullable(r^*)$       & $\dn$ & $\mathit{true}$ \\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   308
		\end{tabular}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   309
\end{center}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   310
\noindent
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   311
The empty set does not contain any string and
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   312
therefore not the empty string, the empty string 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   313
regular expression contains the empty string
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   314
by definition, the character regular expression
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   315
is the singleton that contains character only,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   316
and therefore does not contain the empty string,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   317
the alternative regular expression(or "or" expression)
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   318
might have one of its children regular expressions
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   319
being nullable and any one of its children being nullable
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   320
would suffice. The sequence regular expression
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   321
would require both children to have the empty string
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   322
to compose an empty string and the Kleene star
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   323
operation naturally introduced the empty string.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   324
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   325
We can give the meaning of regular expressions derivatives
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   326
by language interpretation:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   327
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   328
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   329
 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   330
 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   331
 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   332
Derivatives give a simple solution
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   333
to the problem of matching a string $s$ with a regular
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   334
expression $r$: if the derivative of $r$ w.r.t.\ (in
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   335
succession) all the characters of the string matches the empty string,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   336
then $r$ matches $s$ (and {\em vice versa}).  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   337
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   338
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   339
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   340
However, there are two difficulties with derivative-based matchers:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   341
First, Brzozowski's original matcher only generates a yes/no answer
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   342
for whether a regular expression matches a string or not.  This is too
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   343
little information in the context of lexing where separate tokens must
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   344
be identified and also classified (for example as keywords
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   345
or identifiers).  Sulzmann and Lu~\cite{Sulzmann2014} overcome this
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   346
difficulty by cleverly extending Brzozowski's matching
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   347
algorithm. Their extended version generates additional information on
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   348
\emph{how} a regular expression matches a string following the POSIX
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   349
rules for regular expression matching. They achieve this by adding a
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   350
second ``phase'' to Brzozowski's algorithm involving an injection
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   351
function.  In our own earlier work we provided the formal
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   352
specification of what POSIX matching means and proved in Isabelle/HOL
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   353
the correctness
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   354
of Sulzmann and Lu's extended algorithm accordingly
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   355
\cite{AusafDyckhoffUrban2016}.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   356
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   357
The second difficulty is that Brzozowski's derivatives can 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   358
grow to arbitrarily big sizes. For example if we start with the
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   359
regular expression $(a+aa)^*$ and take
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   360
successive derivatives according to the character $a$, we end up with
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   361
a sequence of ever-growing derivatives like 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   362
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   363
\def\ll{\stackrel{\_\backslash{} a}{\longrightarrow}}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   364
\begin{center}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   365
\begin{tabular}{rll}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   366
$(a + aa)^*$ & $\ll$ & $(\ONE + \ONE{}a) \cdot (a + aa)^*$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   367
& $\ll$ & $(\ZERO + \ZERO{}a + \ONE) \cdot (a + aa)^* \;+\; (\ONE + \ONE{}a) \cdot (a + aa)^*$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   368
& $\ll$ & $(\ZERO + \ZERO{}a + \ZERO) \cdot (a + aa)^* + (\ONE + \ONE{}a) \cdot (a + aa)^* \;+\; $\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   369
& & $\qquad(\ZERO + \ZERO{}a + \ONE) \cdot (a + aa)^* + (\ONE + \ONE{}a) \cdot (a + aa)^*$\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   370
& $\ll$ & \ldots \hspace{15mm}(regular expressions of sizes 98, 169, 283, 468, 767, \ldots)
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   371
\end{tabular}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   372
\end{center}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   373
 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   374
\noindent where after around 35 steps we run out of memory on a
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   375
typical computer (we shall define shortly the precise details of our
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   376
regular expressions and the derivative operation).  Clearly, the
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   377
notation involving $\ZERO$s and $\ONE$s already suggests
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   378
simplification rules that can be applied to regular regular
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   379
expressions, for example $\ZERO{}\,r \Rightarrow \ZERO$, $\ONE{}\,r
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   380
\Rightarrow r$, $\ZERO{} + r \Rightarrow r$ and $r + r \Rightarrow
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   381
r$. While such simple-minded simplifications have been proved in our
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   382
earlier work to preserve the correctness of Sulzmann and Lu's
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   383
algorithm \cite{AusafDyckhoffUrban2016}, they unfortunately do
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   384
\emph{not} help with limiting the growth of the derivatives shown
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   385
above: the growth is slowed, but the derivatives can still grow rather
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   386
quickly beyond any finite bound.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   387
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   388
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   389
Sulzmann and Lu overcome this ``growth problem'' in a second algorithm
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   390
\cite{Sulzmann2014} where they introduce bitcoded
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   391
regular expressions. In this version, POSIX values are
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   392
represented as bitsequences and such sequences are incrementally generated
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   393
when derivatives are calculated. The compact representation
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   394
of bitsequences and regular expressions allows them to define a more
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   395
``aggressive'' simplification method that keeps the size of the
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   396
derivatives finite no matter what the length of the string is.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   397
They make some informal claims about the correctness and linear behaviour
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   398
of this version, but do not provide any supporting proof arguments, not
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   399
even ``pencil-and-paper'' arguments. They write about their bitcoded
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   400
\emph{incremental parsing method} (that is the algorithm to be formalised
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   401
in this paper):
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   402
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   403
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   404
\begin{quote}\it
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   405
  ``Correctness Claim: We further claim that the incremental parsing
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   406
  method [..] in combination with the simplification steps [..]
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   407
  yields POSIX parse trees. We have tested this claim
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   408
  extensively [..] but yet
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   409
  have to work out all proof details.'' \cite[Page 14]{Sulzmann2014}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   410
\end{quote}  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   411
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   412
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   413
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   414
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   415
\section{Backgound}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   416
%Regular expression matching and lexing has been 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   417
% widely-used and well-implemented
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   418
%in software industry. 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   419
%TODO: expand the above into a full paragraph
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   420
%TODO: look up snort rules to use here--give readers idea of what regexes look like
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   421
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   422
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   423
Theoretical results say that regular expression matching
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   424
should be linear with respect to the input.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   425
Under a certain class of regular expressions and inputs though,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   426
practical implementations  suffer from non-linear or even 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   427
exponential running time,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   428
allowing a ReDoS (regular expression denial-of-service ) attack.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   429
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   430
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   431
%----------------------------------------------------------------------------------------
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   432
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   433
\section{Existing Practical Approaches}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   434
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   435
The reason behind is that regex libraries in popular language engines
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   436
 often want to support richer constructs
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   437
than  the most basic regular expressions, and lexing rather than matching
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   438
is needed for sub-match extraction.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   439
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   440
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   441
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   442
\subsection{DFA Approach}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   443
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   444
Exponential states.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   445
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   446
\subsection{NFA Approach}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   447
Backtracking.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   448
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   449
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   450
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   451
%----------------------------------------------------------------------------------------
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   452
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   453
\section{Our Approach}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   454
In the last fifteen or so years, Brzozowski's derivatives of regular
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   455
expressions have sparked quite a bit of interest in the functional
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   456
programming and theorem prover communities.  The beauty of
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   457
Brzozowski's derivatives \parencite{Brzozowski1964} is that they are neatly
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   458
expressible in any functional language, and easily definable and
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   459
reasoned about in theorem provers---the definitions just consist of
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   460
inductive datatypes and simple recursive functions.  Derivatives of a
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   461
regular expression, written $r \backslash c$, give a simple solution
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   462
to the problem of matching a string $s$ with a regular
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   463
expression $r$: if the derivative of $r$ w.r.t.\ (in
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   464
succession) all the characters of the string matches the empty string,
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   465
then $r$ matches $s$ (and {\em vice versa}).  
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   466
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   467
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   468
This work aims to address the above vulnerability by the combination
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   469
of Brzozowski's derivatives and interactive theorem proving. We give an 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   470
improved version of  Sulzmann and Lu's bit-coded algorithm using 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   471
derivatives, which come with a formal guarantee in terms of correctness and 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   472
running time as an Isabelle/HOL proof.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   473
Then we improve the algorithm with an even stronger version of 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   474
simplification, and prove a time bound linear to input and
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   475
cubic to regular expression size using a technique by
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   476
Antimirov.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   477
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   478
\subsection{Existing Work}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   479
We are aware
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   480
of a mechanised correctness proof of Brzozowski's derivative-based matcher in HOL4 by
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   481
Owens and Slind~\parencite{Owens2008}. Another one in Isabelle/HOL is part
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   482
of the work by Krauss and Nipkow \parencite{Krauss2011}.  And another one
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   483
in Coq is given by Coquand and Siles \parencite{Coquand2012}.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   484
Also Ribeiro and Du Bois give one in Agda \parencite{RibeiroAgda2017}.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   485
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   486
%----------------------------------------------------------------------------------------
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   487
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   488
\section{What this Template Includes}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   489
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   490
\subsection{Folders}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   491
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   492
This template comes as a single zip file that expands out to several files and folders. The folder names are mostly self-explanatory:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   493
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   494
\keyword{Appendices} -- this is the folder where you put the appendices. Each appendix should go into its own separate \file{.tex} file. An example and template are included in the directory.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   495
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   496
\keyword{Chapters} -- this is the folder where you put the thesis chapters. A thesis usually has about six chapters, though there is no hard rule on this. Each chapter should go in its own separate \file{.tex} file and they can be split as:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   497
\begin{itemize}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   498
\item Chapter 1: Introduction to the thesis topic
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   499
\item Chapter 2: Background information and theory
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   500
\item Chapter 3: (Laboratory) experimental setup
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   501
\item Chapter 4: Details of experiment 1
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   502
\item Chapter 5: Details of experiment 2
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   503
\item Chapter 6: Discussion of the experimental results
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   504
\item Chapter 7: Conclusion and future directions
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   505
\end{itemize}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   506
This chapter layout is specialised for the experimental sciences, your discipline may be different.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   507
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   508
\keyword{Figures} -- this folder contains all figures for the thesis. These are the final images that will go into the thesis document.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   509
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   510
\subsection{Files}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   511
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   512
Included are also several files, most of them are plain text and you can see their contents in a text editor. After initial compilation, you will see that more auxiliary files are created by \LaTeX{} or BibTeX and which you don't need to delete or worry about:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   513
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   514
\keyword{example.bib} -- this is an important file that contains all the bibliographic information and references that you will be citing in the thesis for use with BibTeX. You can write it manually, but there are reference manager programs available that will create and manage it for you. Bibliographies in \LaTeX{} are a large subject and you may need to read about BibTeX before starting with this. Many modern reference managers will allow you to export your references in BibTeX format which greatly eases the amount of work you have to do.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   515
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   516
\keyword{MastersDoctoralThesis.cls} -- this is an important file. It is the class file that tells \LaTeX{} how to format the thesis. 
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   517
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   518
\keyword{main.pdf} -- this is your beautifully typeset thesis (in the PDF file format) created by \LaTeX{}. It is supplied in the PDF with the template and after you compile the template you should get an identical version.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   519
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   520
\keyword{main.tex} -- this is an important file. This is the file that you tell \LaTeX{} to compile to produce your thesis as a PDF file. It contains the framework and constructs that tell \LaTeX{} how to layout the thesis. It is heavily commented so you can read exactly what each line of code does and why it is there. After you put your own information into the \emph{THESIS INFORMATION} block -- you have now started your thesis!
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   521
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   522
Files that are \emph{not} included, but are created by \LaTeX{} as auxiliary files include:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   523
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   524
\keyword{main.aux} -- this is an auxiliary file generated by \LaTeX{}, if it is deleted \LaTeX{} simply regenerates it when you run the main \file{.tex} file.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   525
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   526
\keyword{main.bbl} -- this is an auxiliary file generated by BibTeX, if it is deleted, BibTeX simply regenerates it when you run the \file{main.aux} file. Whereas the \file{.bib} file contains all the references you have, this \file{.bbl} file contains the references you have actually cited in the thesis and is used to build the bibliography section of the thesis.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   527
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   528
\keyword{main.blg} -- this is an auxiliary file generated by BibTeX, if it is deleted BibTeX simply regenerates it when you run the main \file{.aux} file.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   529
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   530
\keyword{main.lof} -- this is an auxiliary file generated by \LaTeX{}, if it is deleted \LaTeX{} simply regenerates it when you run the main \file{.tex} file. It tells \LaTeX{} how to build the \emph{List of Figures} section.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   531
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   532
\keyword{main.log} -- this is an auxiliary file generated by \LaTeX{}, if it is deleted \LaTeX{} simply regenerates it when you run the main \file{.tex} file. It contains messages from \LaTeX{}, if you receive errors and warnings from \LaTeX{}, they will be in this \file{.log} file.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   533
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   534
\keyword{main.lot} -- this is an auxiliary file generated by \LaTeX{}, if it is deleted \LaTeX{} simply regenerates it when you run the main \file{.tex} file. It tells \LaTeX{} how to build the \emph{List of Tables} section.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   535
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   536
\keyword{main.out} -- this is an auxiliary file generated by \LaTeX{}, if it is deleted \LaTeX{} simply regenerates it when you run the main \file{.tex} file.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   537
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   538
So from this long list, only the files with the \file{.bib}, \file{.cls} and \file{.tex} extensions are the most important ones. The other auxiliary files can be ignored or deleted as \LaTeX{} and BibTeX will regenerate them.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   539
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   540
%----------------------------------------------------------------------------------------
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   541
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   542
\section{Filling in Your Information in the \file{main.tex} File}\label{FillingFile}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   543
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   544
You will need to personalise the thesis template and make it your own by filling in your own information. This is done by editing the \file{main.tex} file in a text editor or your favourite LaTeX environment.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   545
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   546
Open the file and scroll down to the third large block titled \emph{THESIS INFORMATION} where you can see the entries for \emph{University Name}, \emph{Department Name}, etc \ldots
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   547
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   548
Fill out the information about yourself, your group and institution. You can also insert web links, if you do, make sure you use the full URL, including the \code{http://} for this. If you don't want these to be linked, simply remove the \verb|\href{url}{name}| and only leave the name.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   549
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   550
When you have done this, save the file and recompile \code{main.tex}. All the information you filled in should now be in the PDF, complete with web links. You can now begin your thesis proper!
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   551
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   552
%----------------------------------------------------------------------------------------
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   553
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   554
\section{The \code{main.tex} File Explained}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   555
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   556
The \file{main.tex} file contains the structure of the thesis. There are plenty of written comments that explain what pages, sections and formatting the \LaTeX{} code is creating. Each major document element is divided into commented blocks with titles in all capitals to make it obvious what the following bit of code is doing. Initially there seems to be a lot of \LaTeX{} code, but this is all formatting, and it has all been taken care of so you don't have to do it.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   557
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   558
Begin by checking that your information on the title page is correct. For the thesis declaration, your institution may insist on something different than the text given. If this is the case, just replace what you see with what is required in the \emph{DECLARATION PAGE} block.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   559
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   560
Then comes a page which contains a funny quote. You can put your own, or quote your favourite scientist, author, person, and so on. Make sure to put the name of the person who you took the quote from.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   561
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   562
Following this is the abstract page which summarises your work in a condensed way and can almost be used as a standalone document to describe what you have done. The text you write will cause the heading to move up so don't worry about running out of space.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   563
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   564
Next come the acknowledgements. On this page, write about all the people who you wish to thank (not forgetting parents, partners and your advisor/supervisor).
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   565
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   566
The contents pages, list of figures and tables are all taken care of for you and do not need to be manually created or edited. The next set of pages are more likely to be optional and can be deleted since they are for a more technical thesis: insert a list of abbreviations you have used in the thesis, then a list of the physical constants and numbers you refer to and finally, a list of mathematical symbols used in any formulae. Making the effort to fill these tables means the reader has a one-stop place to refer to instead of searching the internet and references to try and find out what you meant by certain abbreviations or symbols.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   567
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   568
The list of symbols is split into the Roman and Greek alphabets. Whereas the abbreviations and symbols ought to be listed in alphabetical order (and this is \emph{not} done automatically for you) the list of physical constants should be grouped into similar themes.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   569
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   570
The next page contains a one line dedication. Who will you dedicate your thesis to?
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   571
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   572
Finally, there is the block where the chapters are included. Uncomment the lines (delete the \code{\%} character) as you write the chapters. Each chapter should be written in its own file and put into the \emph{Chapters} folder and named \file{Chapter1}, \file{Chapter2}, etc\ldots Similarly for the appendices, uncomment the lines as you need them. Each appendix should go into its own file and placed in the \emph{Appendices} folder.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   573
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   574
After the preamble, chapters and appendices finally comes the bibliography. The bibliography style (called \option{authoryear}) is used for the bibliography and is a fully featured style that will even include links to where the referenced paper can be found online. Do not underestimate how grateful your reader will be to find that a reference to a paper is just a click away. Of course, this relies on you putting the URL information into the BibTeX file in the first place.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   575
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   576
%----------------------------------------------------------------------------------------
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   577
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   578
\section{Thesis Features and Conventions}\label{ThesisConventions}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   579
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   580
To get the best out of this template, there are a few conventions that you may want to follow.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   581
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   582
One of the most important (and most difficult) things to keep track of in such a long document as a thesis is consistency. Using certain conventions and ways of doing things (such as using a Todo list) makes the job easier. Of course, all of these are optional and you can adopt your own method.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   583
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   584
\subsection{Printing Format}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   585
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   586
This thesis template is designed for double sided printing (i.e. content on the front and back of pages) as most theses are printed and bound this way. Switching to one sided printing is as simple as uncommenting the \option{oneside} option of the \code{documentclass} command at the top of the \file{main.tex} file. You may then wish to adjust the margins to suit specifications from your institution.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   587
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   588
The headers for the pages contain the page number on the outer side (so it is easy to flick through to the page you want) and the chapter name on the inner side.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   589
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   590
The text is set to 11 point by default with single line spacing, again, you can tune the text size and spacing should you want or need to using the options at the very start of \file{main.tex}. The spacing can be changed similarly by replacing the \option{singlespacing} with \option{onehalfspacing} or \option{doublespacing}.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   591
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   592
\subsection{Using US Letter Paper}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   593
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   594
The paper size used in the template is A4, which is the standard size in Europe. If you are using this thesis template elsewhere and particularly in the United States, then you may have to change the A4 paper size to the US Letter size. This can be done in the margins settings section in \file{main.tex}.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   595
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   596
Due to the differences in the paper size, the resulting margins may be different to what you like or require (as it is common for institutions to dictate certain margin sizes). If this is the case, then the margin sizes can be tweaked by modifying the values in the same block as where you set the paper size. Now your document should be set up for US Letter paper size with suitable margins.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   597
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   598
\subsection{References}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   599
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   600
The \code{biblatex} package is used to format the bibliography and inserts references such as this one \parencite{Reference1}. The options used in the \file{main.tex} file mean that the in-text citations of references are formatted with the author(s) listed with the date of the publication. Multiple references are separated by semicolons (e.g. \parencite{Reference2, Reference1}) and references with more than three authors only show the first author with \emph{et al.} indicating there are more authors (e.g. \parencite{Reference3}). This is done automatically for you. To see how you use references, have a look at the \file{Chapter1.tex} source file. Many reference managers allow you to simply drag the reference into the document as you type.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   601
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   602
Scientific references should come \emph{before} the punctuation mark if there is one (such as a comma or period). The same goes for footnotes\footnote{Such as this footnote, here down at the bottom of the page.}. You can change this but the most important thing is to keep the convention consistent throughout the thesis. Footnotes themselves should be full, descriptive sentences (beginning with a capital letter and ending with a full stop). The APA6 states: \enquote{Footnote numbers should be superscripted, [...], following any punctuation mark except a dash.} The Chicago manual of style states: \enquote{A note number should be placed at the end of a sentence or clause. The number follows any punctuation mark except the dash, which it precedes. It follows a closing parenthesis.}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   603
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   604
The bibliography is typeset with references listed in alphabetical order by the first author's last name. This is similar to the APA referencing style. To see how \LaTeX{} typesets the bibliography, have a look at the very end of this document (or just click on the reference number links in in-text citations).
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   605
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   606
\subsubsection{A Note on bibtex}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   607
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   608
The bibtex backend used in the template by default does not correctly handle unicode character encoding (i.e. "international" characters). You may see a warning about this in the compilation log and, if your references contain unicode characters, they may not show up correctly or at all. The solution to this is to use the biber backend instead of the outdated bibtex backend. This is done by finding this in \file{main.tex}: \option{backend=bibtex} and changing it to \option{backend=biber}. You will then need to delete all auxiliary BibTeX files and navigate to the template directory in your terminal (command prompt). Once there, simply type \code{biber main} and biber will compile your bibliography. You can then compile \file{main.tex} as normal and your bibliography will be updated. An alternative is to set up your LaTeX editor to compile with biber instead of bibtex, see \href{http://tex.stackexchange.com/questions/154751/biblatex-with-biber-configuring-my-editor-to-avoid-undefined-citations/}{here} for how to do this for various editors.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   609
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   610
\subsection{Tables}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   611
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   612
Tables are an important way of displaying your results, below is an example table which was generated with this code:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   613
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   614
{\small
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   615
\begin{verbatim}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   616
\begin{table}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   617
\caption{The effects of treatments X and Y on the four groups studied.}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   618
\label{tab:treatments}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   619
\centering
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   620
\begin{tabular}{l l l}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   621
\toprule
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   622
\tabhead{Groups} & \tabhead{Treatment X} & \tabhead{Treatment Y} \\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   623
\midrule
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   624
1 & 0.2 & 0.8\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   625
2 & 0.17 & 0.7\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   626
3 & 0.24 & 0.75\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   627
4 & 0.68 & 0.3\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   628
\bottomrule\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   629
\end{tabular}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   630
\end{table}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   631
\end{verbatim}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   632
}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   633
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   634
\begin{table}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   635
\caption{The effects of treatments X and Y on the four groups studied.}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   636
\label{tab:treatments}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   637
\centering
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   638
\begin{tabular}{l l l}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   639
\toprule
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   640
\tabhead{Groups} & \tabhead{Treatment X} & \tabhead{Treatment Y} \\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   641
\midrule
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   642
1 & 0.2 & 0.8\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   643
2 & 0.17 & 0.7\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   644
3 & 0.24 & 0.75\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   645
4 & 0.68 & 0.3\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   646
\bottomrule\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   647
\end{tabular}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   648
\end{table}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   649
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   650
You can reference tables with \verb|\ref{<label>}| where the label is defined within the table environment. See \file{Chapter1.tex} for an example of the label and citation (e.g. Table~\ref{tab:treatments}).
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   651
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   652
\subsection{Figures}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   653
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   654
There will hopefully be many figures in your thesis (that should be placed in the \emph{Figures} folder). The way to insert figures into your thesis is to use a code template like this:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   655
\begin{verbatim}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   656
\begin{figure}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   657
\centering
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   658
\includegraphics{Figures/Electron}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   659
\decoRule
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   660
\caption[An Electron]{An electron (artist's impression).}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   661
\label{fig:Electron}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   662
\end{figure}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   663
\end{verbatim}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   664
Also look in the source file. Putting this code into the source file produces the picture of the electron that you can see in the figure below.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   665
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   666
\begin{figure}[th]
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   667
\centering
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   668
\includegraphics{Figures/Electron}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   669
\decoRule
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   670
\caption[An Electron]{An electron (artist's impression).}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   671
\label{fig:Electron}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   672
\end{figure}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   673
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   674
Sometimes figures don't always appear where you write them in the source. The placement depends on how much space there is on the page for the figure. Sometimes there is not enough room to fit a figure directly where it should go (in relation to the text) and so \LaTeX{} puts it at the top of the next page. Positioning figures is the job of \LaTeX{} and so you should only worry about making them look good!
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   675
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   676
Figures usually should have captions just in case you need to refer to them (such as in Figure~\ref{fig:Electron}). The \verb|\caption| command contains two parts, the first part, inside the square brackets is the title that will appear in the \emph{List of Figures}, and so should be short. The second part in the curly brackets should contain the longer and more descriptive caption text.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   677
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   678
The \verb|\decoRule| command is optional and simply puts an aesthetic horizontal line below the image. If you do this for one image, do it for all of them.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   679
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   680
\LaTeX{} is capable of using images in pdf, jpg and png format.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   681
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   682
\subsection{Typesetting mathematics}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   683
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   684
If your thesis is going to contain heavy mathematical content, be sure that \LaTeX{} will make it look beautiful, even though it won't be able to solve the equations for you.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   685
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   686
The \enquote{Not So Short Introduction to \LaTeX} (available on \href{http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf}{CTAN}) should tell you everything you need to know for most cases of typesetting mathematics. If you need more information, a much more thorough mathematical guide is available from the AMS called, \enquote{A Short Math Guide to \LaTeX} and can be downloaded from:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   687
\url{ftp://ftp.ams.org/pub/tex/doc/amsmath/short-math-guide.pdf}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   688
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   689
There are many different \LaTeX{} symbols to remember, luckily you can find the most common symbols in \href{http://ctan.org/pkg/comprehensive}{The Comprehensive \LaTeX~Symbol List}.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   690
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   691
You can write an equation, which is automatically given an equation number by \LaTeX{} like this:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   692
\begin{verbatim}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   693
\begin{equation}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   694
E = mc^{2}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   695
\label{eqn:Einstein}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   696
\end{equation}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   697
\end{verbatim}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   698
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   699
This will produce Einstein's famous energy-matter equivalence equation:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   700
\begin{equation}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   701
E = mc^{2}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   702
\label{eqn:Einstein}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   703
\end{equation}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   704
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   705
All equations you write (which are not in the middle of paragraph text) are automatically given equation numbers by \LaTeX{}. If you don't want a particular equation numbered, use the unnumbered form:
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   706
\begin{verbatim}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   707
\[ a^{2}=4 \]
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   708
\end{verbatim}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   709
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   710
%----------------------------------------------------------------------------------------
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   711
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   712
\section{Sectioning and Subsectioning}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   713
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   714
You should break your thesis up into nice, bite-sized sections and subsections. \LaTeX{} automatically builds a table of Contents by looking at all the \verb|\chapter{}|, \verb|\section{}|  and \verb|\subsection{}| commands you write in the source.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   715
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   716
The Table of Contents should only list the sections to three (3) levels. A \verb|chapter{}| is level zero (0). A \verb|\section{}| is level one (1) and so a \verb|\subsection{}| is level two (2). In your thesis it is likely that you will even use a \verb|subsubsection{}|, which is level three (3). The depth to which the Table of Contents is formatted is set within \file{MastersDoctoralThesis.cls}. If you need this changed, you can do it in \file{main.tex}.
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   717
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   718
%----------------------------------------------------------------------------------------
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   719
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   720
\section{In Closing}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   721
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   722
You have reached the end of this mini-guide. You can now rename or overwrite this pdf file and begin writing your own \file{Chapter1.tex} and the rest of your thesis. The easy work of setting up the structure and framework has been taken care of for you. It's now your job to fill it out!
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   723
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   724
Good luck and have lots of fun!
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   725
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   726
\begin{flushright}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   727
Guide written by ---\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   728
Sunil Patel: \href{http://www.sunilpatel.co.uk}{www.sunilpatel.co.uk}\\
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   729
Vel: \href{http://www.LaTeXTemplates.com}{LaTeXTemplates.com}
26a5e640cdd7 realPhdThesis
Chengsong
parents:
diff changeset
   730
\end{flushright}