ChengsongTanPhdThesis/Chapters/Chapter1.tex
changeset 472 6953d2786e7c
parent 471 23818853a710
child 500 4d9eecfc936a
equal deleted inserted replaced
471:23818853a710 472:6953d2786e7c
    43 \def\distinct{\mathit{distinct}}
    43 \def\distinct{\mathit{distinct}}
    44 \def\blexersimp{\mathit{blexer}\_\mathit{simp}}
    44 \def\blexersimp{\mathit{blexer}\_\mathit{simp}}
    45 %----------------------------------------------------------------------------------------
    45 %----------------------------------------------------------------------------------------
    46 %This part is about regular expressions, Brzozowski derivatives,
    46 %This part is about regular expressions, Brzozowski derivatives,
    47 %and a bit-coded lexing algorithm with proven correctness and time bounds.
    47 %and a bit-coded lexing algorithm with proven correctness and time bounds.
       
    48 
       
    49 %TODO: look up snort rules to use here--give readers idea of what regexes look like
       
    50 
       
    51 
    48 Regular expressions are widely used in computer science: 
    52 Regular expressions are widely used in computer science: 
    49 be it in IDEs with syntax hightlighting and auto completion, 
    53 be it in IDEs with syntax hightlighting and auto completion, 
    50 command line tools like $\mathit{grep}$ that facilitates easy 
    54 command line tools like $\mathit{grep}$ that facilitates easy 
    51 text processing , network intrusion
    55 text processing , network intrusion
    52 detection systems that rejects suspicious traffic, or compiler
    56 detection systems that rejects suspicious traffic, or compiler
   448 As Grathwohl\parencite{grathwohl2014crash} commented,
   452 As Grathwohl\parencite{grathwohl2014crash} commented,
   449 \begin{center}
   453 \begin{center}
   450 	``The POSIX strategy is more complicated than the greedy because of the dependence on information about the length of matched strings in the various subexpressions.''
   454 	``The POSIX strategy is more complicated than the greedy because of the dependence on information about the length of matched strings in the various subexpressions.''
   451 \end{center}
   455 \end{center}
   452 
   456 
   453 \section{Engineering and Academic Approaches to Deal with Catastrophic Backtracking}
   457 %\section{How people solve problems with regexes}
   454 
   458 
   455 
   459 
   456 There is also static analysis work on regular expression that
   460 When a regular expression does not behave as intended,
   457 have potential expoential behavious. Rathnayake and Thielecke 
   461 people usually try to rewrite the regex to some equivalent form
       
   462 or they try to avoid the possibly problematic patterns completely\parencite{Davis18},
       
   463 of which there are many false positives.
       
   464 Animated tools to "debug" regular expressions
       
   465 are also quite popular, regexploit\parencite{regexploit2021}, regex101\parencite{regex101} 
       
   466 to name a few.
       
   467 There is also static analysis work on regular expressions that
       
   468 aims to detect potentially expoential regex patterns. Rathnayake and Thielecke 
   458 \parencite{Rathnayake2014StaticAF} proposed an algorithm
   469 \parencite{Rathnayake2014StaticAF} proposed an algorithm
   459 that detects regular expressions triggering exponential
   470 that detects regular expressions triggering exponential
   460 behavious on backtracking matchers.
   471 behavious on backtracking matchers.
   461 People also developed static analysis methods for
   472 Weideman \parencite{Weideman2017Static} came up with 
   462 generating non-linear polynomial worst-time estimates
   473 non-linear polynomial worst-time estimates
   463 for regexes, attack string that exploit the worst-time 
   474 for regexes, attack string that exploit the worst-time 
   464 scenario, and "attack automata" that generates
   475 scenario, and "attack automata" that generates
   465 attack strings \parencite{Weideman2017Static}.
   476 attack strings.
   466 There are also tools to "debug" regular expressions
   477 %Arguably these methods limits the programmers' freedom
   467 that allows people to see why a match failed or was especially slow
   478 %or productivity when all they want is to come up with a regex
   468 by showing the steps a back-tracking regex engine took\parencite{regexploit2021}.
   479 %that solves the text processing problem.
       
   480 
   469 %TODO:also the regex101 debugger
   481 %TODO:also the regex101 debugger
   470 \section{Our Solution--Formal Specification of POSIX and Brzozowski Derivatives}
   482 \section{Our Solution--Formal Specification of POSIX and Brzozowski Derivatives}
   471  Is it possible to have a regex lexing algorithm with proven correctness and 
   483  Is it possible to have a regex lexing algorithm with proven correctness and 
   472  time complexity, which allows easy extensions to
   484  time complexity, which allows easy extensions to
   473   constructs like 
   485   constructs like 
   474  bounded repetitions, negation,  lookarounds, and even back-references? 
   486  bounded repetitions, negation,  lookarounds, and even back-references? 
   475 Building on top of Sulzmann and Lu's attempt to formalize the 
   487   
   476 notion of POSIX lexing rules \parencite{Sulzmann2014}, 
   488   We propose Brzozowski derivatives on regular expressions as
   477 Ausaf and Urban\parencite{AusafDyckhoffUrban2016} modelled
   489   a solution to this.
   478 POSIX matching as a ternary relation recursively defined in a
   490   
   479 natural deduction style.
   491   In the last fifteen or so years, Brzozowski's derivatives of regular
   480 With the formally-specified rules for what a POSIX matching is,
       
   481 they designed a regex matching algorithm based on Brzozowski derivatives, and 
       
   482 proved in Isabelle/HOL that the algorithm gives correct results.
       
   483 
       
   484  
       
   485 
       
   486 
       
   487 
       
   488 %----------------------------------------------------------------------------------------
       
   489 
       
   490 \section{Our Approach}
       
   491 In the last fifteen or so years, Brzozowski's derivatives of regular
       
   492 expressions have sparked quite a bit of interest in the functional
   492 expressions have sparked quite a bit of interest in the functional
   493 programming and theorem prover communities.  The beauty of
   493 programming and theorem prover communities.  The beauty of
   494 Brzozowski's derivatives \parencite{Brzozowski1964} is that they are neatly
   494 Brzozowski's derivatives \parencite{Brzozowski1964} is that they are neatly
   495 expressible in any functional language, and easily definable and
   495 expressible in any functional language, and easily definable and
   496 reasoned about in theorem provers---the definitions just consist of
   496 reasoned about in theorem provers---the definitions just consist of
   497 inductive datatypes and simple recursive functions.  Derivatives of a
   497 inductive datatypes and simple recursive functions. 
   498 regular expression, written $r \backslash c$, give a simple solution
   498 And an algorithms based on it by 
   499 to the problem of matching a string $s$ with a regular
   499 Suzmann and Lu  \parencite{Sulzmann2014} allows easy extension
   500 expression $r$: if the derivative of $r$ w.r.t.\ (in
   500 to include  extended regular expressions and 
   501 succession) all the characters of the string matches the empty string,
   501  simplification of internal data structures 
   502 then $r$ matches $s$ (and {\em vice versa}).  
   502  eliminating the exponential behaviours.
   503 
   503  
   504 
   504 
   505 This work aims to address the above vulnerability by the combination
   505   
   506 of Brzozowski's derivatives and interactive theorem proving. We give an 
   506 
       
   507  
       
   508 
       
   509 
       
   510 
       
   511 %----------------------------------------------------------------------------------------
       
   512 
       
   513 \section{Our Contribution}
       
   514 
       
   515 
       
   516 
       
   517 This work addresses the vulnerability of super-linear and
       
   518 buggy regex implementations by the combination
       
   519 of Brzozowski's derivatives and interactive theorem proving. 
       
   520 We give an 
   507 improved version of  Sulzmann and Lu's bit-coded algorithm using 
   521 improved version of  Sulzmann and Lu's bit-coded algorithm using 
   508 derivatives, which come with a formal guarantee in terms of correctness and 
   522 derivatives, which come with a formal guarantee in terms of correctness and 
   509 running time as an Isabelle/HOL proof.
   523 running time as an Isabelle/HOL proof.
   510 Then we improve the algorithm with an even stronger version of 
   524 Then we improve the algorithm with an even stronger version of 
   511 simplification, and prove a time bound linear to input and
   525 simplification, and prove a time bound linear to input and
   512 cubic to regular expression size using a technique by
   526 cubic to regular expression size using a technique by
   513 Antimirov.
   527 Antimirov.
   514 
   528 
   515 \subsection{Existing Work}
       
   516 We are aware
       
   517 of a mechanised correctness proof of Brzozowski's derivative-based matcher in HOL4 by
       
   518 Owens and Slind~\parencite{Owens2008}. Another one in Isabelle/HOL is part
       
   519 of the work by Krauss and Nipkow \parencite{Krauss2011}.  And another one
       
   520 in Coq is given by Coquand and Siles \parencite{Coquand2012}.
       
   521 Also Ribeiro and Du Bois give one in Agda \parencite{RibeiroAgda2017}.
       
   522  
       
   523  %We propose Brzozowski's derivatives as a solution to this problem.
       
   524  
   529  
   525 The main contribution of this thesis is a proven correct lexing algorithm
   530 The main contribution of this thesis is a proven correct lexing algorithm
   526 with formalized time bounds.
   531 with formalized time bounds.
   527 To our best knowledge, there is no lexing libraries using Brzozowski derivatives
   532 To our best knowledge, there is no lexing libraries using Brzozowski derivatives
   528 that have a provable time guarantee, 
   533 that have a provable time guarantee, 
   540 In the performance evaluation section, they simply analyzed the run time
   545 In the performance evaluation section, they simply analyzed the run time
   541 of matching $a$ with the string $\underbrace{a \ldots a}_{\text{n a's}}$
   546 of matching $a$ with the string $\underbrace{a \ldots a}_{\text{n a's}}$
   542 and concluded that the algorithm is quadratic in terms of input length.
   547 and concluded that the algorithm is quadratic in terms of input length.
   543 When we tried out their extracted OCaml code with our example $(a+aa)^*$,
   548 When we tried out their extracted OCaml code with our example $(a+aa)^*$,
   544 the time it took to lex only 40 $a$'s was 5 minutes.
   549 the time it took to lex only 40 $a$'s was 5 minutes.
   545 We therefore believe our results of a proof of performance on general
   550 
       
   551 We  believe our results of a proof of performance on general
   546 inputs rather than specific examples a novel contribution.\\
   552 inputs rather than specific examples a novel contribution.\\
   547 
   553 
   548  \section{Preliminaries about Lexing Using Brzozowski derivatives}
   554 
   549  In the last fifteen or so years, Brzozowski's derivatives of regular
   555 \subsection{Related Work}
   550 expressions have sparked quite a bit of interest in the functional
   556 We are aware
   551 programming and theorem prover communities.  
   557 of a mechanised correctness proof of Brzozowski's derivative-based matcher in HOL4 by
   552 The beauty of
   558 Owens and Slind~\parencite{Owens2008}. Another one in Isabelle/HOL is part
   553 Brzozowski's derivatives \parencite{Brzozowski1964} is that they are neatly
   559 of the work by Krauss and Nipkow \parencite{Krauss2011}.  And another one
   554 expressible in any functional language, and easily definable and
   560 in Coq is given by Coquand and Siles \parencite{Coquand2012}.
   555 reasoned about in theorem provers---the definitions just consist of
   561 Also Ribeiro and Du Bois give one in Agda \parencite{RibeiroAgda2017}.
   556 inductive datatypes and simple recursive functions. 
   562  
       
   563  %We propose Brzozowski's derivatives as a solution to this problem.
       
   564 % about Lexing Using Brzozowski derivatives
       
   565  \section{Preliminaries}
   557 
   566 
   558 Suppose we have an alphabet $\Sigma$, the strings  whose characters
   567 Suppose we have an alphabet $\Sigma$, the strings  whose characters
   559 are from $\Sigma$
   568 are from $\Sigma$
   560 can be expressed as $\Sigma^*$.
   569 can be expressed as $\Sigma^*$.
   561 
   570 
   593 \end{center}
   602 \end{center}
   594 Mathematically, it can be expressed as the 
   603 Mathematically, it can be expressed as the 
   595 
   604 
   596 If the $\textit{StringSet}$ happen to have some structure, for example,
   605 If the $\textit{StringSet}$ happen to have some structure, for example,
   597 if it is regular, then we have that it
   606 if it is regular, then we have that it
       
   607 
       
   608 % Derivatives of a
       
   609 %regular expression, written $r \backslash c$, give a simple solution
       
   610 %to the problem of matching a string $s$ with a regular
       
   611 %expression $r$: if the derivative of $r$ w.r.t.\ (in
       
   612 %succession) all the characters of the string matches the empty string,
       
   613 %then $r$ matches $s$ (and {\em vice versa}).  
   598 
   614 
   599 The the derivative of regular expression, denoted as
   615 The the derivative of regular expression, denoted as
   600 $r \backslash c$, is a function that takes parameters
   616 $r \backslash c$, is a function that takes parameters
   601 $r$ and $c$, and returns another regular expression $r'$,
   617 $r$ and $c$, and returns another regular expression $r'$,
   602 which is computed by the following recursive function:
   618 which is computed by the following recursive function:
   807 		\end{tabular}
   823 		\end{tabular}
   808 	\end{tabular}
   824 	\end{tabular}
   809 \end{center}
   825 \end{center}
   810 
   826 
   811 \noindent
   827 \noindent
       
   828 
       
   829 Building on top of Sulzmann and Lu's attempt to formalize the 
       
   830 notion of POSIX lexing rules \parencite{Sulzmann2014}, 
       
   831 Ausaf and Urban\parencite{AusafDyckhoffUrban2016} modelled
       
   832 POSIX matching as a ternary relation recursively defined in a
       
   833 natural deduction style.
       
   834 With the formally-specified rules for what a POSIX matching is,
       
   835 they proved in Isabelle/HOL that the algorithm gives correct results.
       
   836 
       
   837 But having a correct result is still not enough, we want $\mathbf{efficiency}$.
       
   838 
       
   839 
       
   840 
   812 One regular expression can have multiple lexical values. For example
   841 One regular expression can have multiple lexical values. For example
   813 for the regular expression $(a+b)^*$, it has a infinite list of
   842 for the regular expression $(a+b)^*$, it has a infinite list of
   814 values corresponding to it: $\Stars\,[]$, $\Stars\,[\Left(Char(a))]$,
   843 values corresponding to it: $\Stars\,[]$, $\Stars\,[\Left(Char(a))]$,
   815 $\Stars\,[\Right(Char(b))]$, $\Stars\,[\Left(Char(a),\,\Right(Char(b))]$,
   844 $\Stars\,[\Right(Char(b))]$, $\Stars\,[\Left(Char(a),\,\Right(Char(b))]$,
   816 $\ldots$, and vice versa.
   845 $\ldots$, and vice versa.
   827 and a closed form formula can be calculated to be
   856 and a closed form formula can be calculated to be
   828 \begin{equation}
   857 \begin{equation}
   829 	C_n =\frac{(2+\sqrt{2})^n - (2-\sqrt{2})^n}{4\sqrt{2}}
   858 	C_n =\frac{(2+\sqrt{2})^n - (2-\sqrt{2})^n}{4\sqrt{2}}
   830 \end{equation}
   859 \end{equation}
   831 which is clearly in exponential order.
   860 which is clearly in exponential order.
       
   861 
   832 A lexer aimed at getting all the possible values has an exponential
   862 A lexer aimed at getting all the possible values has an exponential
   833 worst case runtime. Therefore it is impractical to try to generate
   863 worst case runtime. Therefore it is impractical to try to generate
   834 all possible matches in a run. In practice, we are usually 
   864 all possible matches in a run. In practice, we are usually 
   835 interested about POSIX values, which by intuition always
   865 interested about POSIX values, which by intuition always
   836 match the leftmost regular expression when there is a choice
   866 match the leftmost regular expression when there is a choice
   939 It can be achieved by recording some extra rectification functions
   969 It can be achieved by recording some extra rectification functions
   940 during the derivatives step, and applying these rectifications in 
   970 during the derivatives step, and applying these rectifications in 
   941 each run during the injection phase.
   971 each run during the injection phase.
   942 And we can prove that the POSIX value of how
   972 And we can prove that the POSIX value of how
   943 regular expressions match strings will not be affected---although is much harder
   973 regular expressions match strings will not be affected---although is much harder
   944 to establish. Some initial results in this regard have been
   974 to establish. 
       
   975 Some initial results in this regard have been
   945 obtained in \cite{AusafDyckhoffUrban2016}. 
   976 obtained in \cite{AusafDyckhoffUrban2016}. 
       
   977 
       
   978 
   946 
   979 
   947 %Brzozowski, after giving the derivatives and simplification,
   980 %Brzozowski, after giving the derivatives and simplification,
   948 %did not explore lexing with simplification or he may well be 
   981 %did not explore lexing with simplification or he may well be 
   949 %stuck on an efficient simplificaiton with a proof.
   982 %stuck on an efficient simplificaiton with a proof.
   950 %He went on to explore the use of derivatives together with 
   983 %He went on to explore the use of derivatives together with 
  1432 \end{quote}  
  1465 \end{quote}  
  1433 
  1466 
  1434 
  1467 
  1435 
  1468 
  1436 
  1469 
  1437 \section{Backgound}
       
  1438 %Regular expression matching and lexing has been 
       
  1439 % widely-used and well-implemented
       
  1440 %in software industry. 
       
  1441 %TODO: expand the above into a full paragraph
       
  1442 %TODO: look up snort rules to use here--give readers idea of what regexes look like
       
  1443 
       
  1444 
       
  1445 Theoretical results say that regular expression matching
       
  1446 should be linear with respect to the input.
       
  1447 Under a certain class of regular expressions and inputs though,
       
  1448 practical implementations  suffer from non-linear or even 
       
  1449 exponential running time,
       
  1450 allowing a ReDoS (regular expression denial-of-service ) attack.
       
  1451 
  1470 
  1452 
  1471 
  1453 %----------------------------------------------------------------------------------------
  1472 %----------------------------------------------------------------------------------------
  1454 
  1473 
  1455 
  1474 
  1456 %----------------------------------------------------------------------------------------
  1475 %----------------------------------------------------------------------------------------
  1457 
  1476 
  1458 \section{What this Template Includes}
       
  1459 
       
  1460 \subsection{Folders}
       
  1461 
       
  1462 This template comes as a single zip file that expands out to several files and folders. The folder names are mostly self-explanatory:
       
  1463 
       
  1464 \keyword{Appendices} -- this is the folder where you put the appendices. Each appendix should go into its own separate \file{.tex} file. An example and template are included in the directory.
       
  1465 
       
  1466 \keyword{Chapters} -- this is the folder where you put the thesis chapters. A thesis usually has about six chapters, though there is no hard rule on this. Each chapter should go in its own separate \file{.tex} file and they can be split as:
       
  1467 \begin{itemize}
       
  1468 \item Chapter 1: Introduction to the thesis topic
       
  1469 \item Chapter 2: Background information and theory
       
  1470 \item Chapter 3: (Laboratory) experimental setup
       
  1471 \item Chapter 4: Details of experiment 1
       
  1472 \item Chapter 5: Details of experiment 2
       
  1473 \item Chapter 6: Discussion of the experimental results
       
  1474 \item Chapter 7: Conclusion and future directions
       
  1475 \end{itemize}
       
  1476 This chapter layout is specialised for the experimental sciences, your discipline may be different.
       
  1477 
       
  1478 \keyword{Figures} -- this folder contains all figures for the thesis. These are the final images that will go into the thesis document.
       
  1479 
       
  1480 \subsection{Files}
       
  1481 
       
  1482 Included are also several files, most of them are plain text and you can see their contents in a text editor. After initial compilation, you will see that more auxiliary files are created by \LaTeX{} or BibTeX and which you don't need to delete or worry about:
       
  1483 
       
  1484 \keyword{example.bib} -- this is an important file that contains all the bibliographic information and references that you will be citing in the thesis for use with BibTeX. You can write it manually, but there are reference manager programs available that will create and manage it for you. Bibliographies in \LaTeX{} are a large subject and you may need to read about BibTeX before starting with this. Many modern reference managers will allow you to export your references in BibTeX format which greatly eases the amount of work you have to do.
       
  1485 
       
  1486 \keyword{MastersDoctoralThesis.cls} -- this is an important file. It is the class file that tells \LaTeX{} how to format the thesis. 
       
  1487 
       
  1488 \keyword{main.pdf} -- this is your beautifully typeset thesis (in the PDF file format) created by \LaTeX{}. It is supplied in the PDF with the template and after you compile the template you should get an identical version.
       
  1489 
       
  1490 \keyword{main.tex} -- this is an important file. This is the file that you tell \LaTeX{} to compile to produce your thesis as a PDF file. It contains the framework and constructs that tell \LaTeX{} how to layout the thesis. It is heavily commented so you can read exactly what each line of code does and why it is there. After you put your own information into the \emph{THESIS INFORMATION} block -- you have now started your thesis!
       
  1491 
       
  1492 Files that are \emph{not} included, but are created by \LaTeX{} as auxiliary files include:
       
  1493 
       
  1494 \keyword{main.aux} -- this is an auxiliary file generated by \LaTeX{}, if it is deleted \LaTeX{} simply regenerates it when you run the main \file{.tex} file.
       
  1495 
       
  1496 \keyword{main.bbl} -- this is an auxiliary file generated by BibTeX, if it is deleted, BibTeX simply regenerates it when you run the \file{main.aux} file. Whereas the \file{.bib} file contains all the references you have, this \file{.bbl} file contains the references you have actually cited in the thesis and is used to build the bibliography section of the thesis.
       
  1497 
       
  1498 \keyword{main.blg} -- this is an auxiliary file generated by BibTeX, if it is deleted BibTeX simply regenerates it when you run the main \file{.aux} file.
       
  1499 
       
  1500 \keyword{main.lof} -- this is an auxiliary file generated by \LaTeX{}, if it is deleted \LaTeX{} simply regenerates it when you run the main \file{.tex} file. It tells \LaTeX{} how to build the \emph{List of Figures} section.
       
  1501 
       
  1502 \keyword{main.log} -- this is an auxiliary file generated by \LaTeX{}, if it is deleted \LaTeX{} simply regenerates it when you run the main \file{.tex} file. It contains messages from \LaTeX{}, if you receive errors and warnings from \LaTeX{}, they will be in this \file{.log} file.
       
  1503 
       
  1504 \keyword{main.lot} -- this is an auxiliary file generated by \LaTeX{}, if it is deleted \LaTeX{} simply regenerates it when you run the main \file{.tex} file. It tells \LaTeX{} how to build the \emph{List of Tables} section.
       
  1505 
       
  1506 \keyword{main.out} -- this is an auxiliary file generated by \LaTeX{}, if it is deleted \LaTeX{} simply regenerates it when you run the main \file{.tex} file.
       
  1507 
       
  1508 So from this long list, only the files with the \file{.bib}, \file{.cls} and \file{.tex} extensions are the most important ones. The other auxiliary files can be ignored or deleted as \LaTeX{} and BibTeX will regenerate them.
       
  1509 
       
  1510 %----------------------------------------------------------------------------------------
  1477 %----------------------------------------------------------------------------------------
  1511 
  1478 
  1512 \section{Filling in Your Information in the \file{main.tex} File}\label{FillingFile}
       
  1513 
       
  1514 You will need to personalise the thesis template and make it your own by filling in your own information. This is done by editing the \file{main.tex} file in a text editor or your favourite LaTeX environment.
       
  1515 
       
  1516 Open the file and scroll down to the third large block titled \emph{THESIS INFORMATION} where you can see the entries for \emph{University Name}, \emph{Department Name}, etc \ldots
       
  1517 
       
  1518 Fill out the information about yourself, your group and institution. You can also insert web links, if you do, make sure you use the full URL, including the \code{http://} for this. If you don't want these to be linked, simply remove the \verb|\href{url}{name}| and only leave the name.
       
  1519 
       
  1520 When you have done this, save the file and recompile \code{main.tex}. All the information you filled in should now be in the PDF, complete with web links. You can now begin your thesis proper!
       
  1521 
       
  1522 %----------------------------------------------------------------------------------------
  1479 %----------------------------------------------------------------------------------------
  1523 
  1480 
  1524 \section{The \code{main.tex} File Explained}
  1481 
  1525 
       
  1526 The \file{main.tex} file contains the structure of the thesis. There are plenty of written comments that explain what pages, sections and formatting the \LaTeX{} code is creating. Each major document element is divided into commented blocks with titles in all capitals to make it obvious what the following bit of code is doing. Initially there seems to be a lot of \LaTeX{} code, but this is all formatting, and it has all been taken care of so you don't have to do it.
       
  1527 
       
  1528 Begin by checking that your information on the title page is correct. For the thesis declaration, your institution may insist on something different than the text given. If this is the case, just replace what you see with what is required in the \emph{DECLARATION PAGE} block.
       
  1529 
       
  1530 Then comes a page which contains a funny quote. You can put your own, or quote your favourite scientist, author, person, and so on. Make sure to put the name of the person who you took the quote from.
       
  1531 
       
  1532 Following this is the abstract page which summarises your work in a condensed way and can almost be used as a standalone document to describe what you have done. The text you write will cause the heading to move up so don't worry about running out of space.
       
  1533 
       
  1534 Next come the acknowledgements. On this page, write about all the people who you wish to thank (not forgetting parents, partners and your advisor/supervisor).
       
  1535 
       
  1536 The contents pages, list of figures and tables are all taken care of for you and do not need to be manually created or edited. The next set of pages are more likely to be optional and can be deleted since they are for a more technical thesis: insert a list of abbreviations you have used in the thesis, then a list of the physical constants and numbers you refer to and finally, a list of mathematical symbols used in any formulae. Making the effort to fill these tables means the reader has a one-stop place to refer to instead of searching the internet and references to try and find out what you meant by certain abbreviations or symbols.
       
  1537 
       
  1538 The list of symbols is split into the Roman and Greek alphabets. Whereas the abbreviations and symbols ought to be listed in alphabetical order (and this is \emph{not} done automatically for you) the list of physical constants should be grouped into similar themes.
       
  1539 
       
  1540 The next page contains a one line dedication. Who will you dedicate your thesis to?
       
  1541 
       
  1542 Finally, there is the block where the chapters are included. Uncomment the lines (delete the \code{\%} character) as you write the chapters. Each chapter should be written in its own file and put into the \emph{Chapters} folder and named \file{Chapter1}, \file{Chapter2}, etc\ldots Similarly for the appendices, uncomment the lines as you need them. Each appendix should go into its own file and placed in the \emph{Appendices} folder.
       
  1543 
       
  1544 After the preamble, chapters and appendices finally comes the bibliography. The bibliography style (called \option{authoryear}) is used for the bibliography and is a fully featured style that will even include links to where the referenced paper can be found online. Do not underestimate how grateful your reader will be to find that a reference to a paper is just a click away. Of course, this relies on you putting the URL information into the BibTeX file in the first place.
       
  1545 
       
  1546 %----------------------------------------------------------------------------------------
       
  1547 
       
  1548 \section{Thesis Features and Conventions}\label{ThesisConventions}
       
  1549 
       
  1550 To get the best out of this template, there are a few conventions that you may want to follow.
       
  1551 
       
  1552 One of the most important (and most difficult) things to keep track of in such a long document as a thesis is consistency. Using certain conventions and ways of doing things (such as using a Todo list) makes the job easier. Of course, all of these are optional and you can adopt your own method.
       
  1553 
       
  1554 \subsection{Printing Format}
       
  1555 
       
  1556 This thesis template is designed for double sided printing (i.e. content on the front and back of pages) as most theses are printed and bound this way. Switching to one sided printing is as simple as uncommenting the \option{oneside} option of the \code{documentclass} command at the top of the \file{main.tex} file. You may then wish to adjust the margins to suit specifications from your institution.
       
  1557 
       
  1558 The headers for the pages contain the page number on the outer side (so it is easy to flick through to the page you want) and the chapter name on the inner side.
       
  1559 
       
  1560 The text is set to 11 point by default with single line spacing, again, you can tune the text size and spacing should you want or need to using the options at the very start of \file{main.tex}. The spacing can be changed similarly by replacing the \option{singlespacing} with \option{onehalfspacing} or \option{doublespacing}.
       
  1561 
       
  1562 \subsection{Using US Letter Paper}
       
  1563 
       
  1564 The paper size used in the template is A4, which is the standard size in Europe. If you are using this thesis template elsewhere and particularly in the United States, then you may have to change the A4 paper size to the US Letter size. This can be done in the margins settings section in \file{main.tex}.
       
  1565 
       
  1566 Due to the differences in the paper size, the resulting margins may be different to what you like or require (as it is common for institutions to dictate certain margin sizes). If this is the case, then the margin sizes can be tweaked by modifying the values in the same block as where you set the paper size. Now your document should be set up for US Letter paper size with suitable margins.
       
  1567 
       
  1568 \subsection{References}
       
  1569 
       
  1570 The \code{biblatex} package is used to format the bibliography and inserts references such as this one \parencite{Reference1}. The options used in the \file{main.tex} file mean that the in-text citations of references are formatted with the author(s) listed with the date of the publication. Multiple references are separated by semicolons (e.g. \parencite{Reference2, Reference1}) and references with more than three authors only show the first author with \emph{et al.} indicating there are more authors (e.g. \parencite{Reference3}). This is done automatically for you. To see how you use references, have a look at the \file{Chapter1.tex} source file. Many reference managers allow you to simply drag the reference into the document as you type.
       
  1571 
       
  1572 Scientific references should come \emph{before} the punctuation mark if there is one (such as a comma or period). The same goes for footnotes\footnote{Such as this footnote, here down at the bottom of the page.}. You can change this but the most important thing is to keep the convention consistent throughout the thesis. Footnotes themselves should be full, descriptive sentences (beginning with a capital letter and ending with a full stop). The APA6 states: \enquote{Footnote numbers should be superscripted, [...], following any punctuation mark except a dash.} The Chicago manual of style states: \enquote{A note number should be placed at the end of a sentence or clause. The number follows any punctuation mark except the dash, which it precedes. It follows a closing parenthesis.}
       
  1573 
       
  1574 The bibliography is typeset with references listed in alphabetical order by the first author's last name. This is similar to the APA referencing style. To see how \LaTeX{} typesets the bibliography, have a look at the very end of this document (or just click on the reference number links in in-text citations).
       
  1575 
       
  1576 \subsubsection{A Note on bibtex}
       
  1577 
       
  1578 The bibtex backend used in the template by default does not correctly handle unicode character encoding (i.e. "international" characters). You may see a warning about this in the compilation log and, if your references contain unicode characters, they may not show up correctly or at all. The solution to this is to use the biber backend instead of the outdated bibtex backend. This is done by finding this in \file{main.tex}: \option{backend=bibtex} and changing it to \option{backend=biber}. You will then need to delete all auxiliary BibTeX files and navigate to the template directory in your terminal (command prompt). Once there, simply type \code{biber main} and biber will compile your bibliography. You can then compile \file{main.tex} as normal and your bibliography will be updated. An alternative is to set up your LaTeX editor to compile with biber instead of bibtex, see \href{http://tex.stackexchange.com/questions/154751/biblatex-with-biber-configuring-my-editor-to-avoid-undefined-citations/}{here} for how to do this for various editors.
       
  1579 
       
  1580 \subsection{Tables}
       
  1581 
       
  1582 Tables are an important way of displaying your results, below is an example table which was generated with this code:
       
  1583 
       
  1584 {\small
       
  1585 \begin{verbatim}
       
  1586 \begin{table}
       
  1587 \caption{The effects of treatments X and Y on the four groups studied.}
       
  1588 \label{tab:treatments}
       
  1589 \centering
       
  1590 \begin{tabular}{l l l}
       
  1591 \toprule
       
  1592 \tabhead{Groups} & \tabhead{Treatment X} & \tabhead{Treatment Y} \\
       
  1593 \midrule
       
  1594 1 & 0.2 & 0.8\\
       
  1595 2 & 0.17 & 0.7\\
       
  1596 3 & 0.24 & 0.75\\
       
  1597 4 & 0.68 & 0.3\\
       
  1598 \bottomrule\\
       
  1599 \end{tabular}
       
  1600 \end{table}
       
  1601 \end{verbatim}
       
  1602 }
       
  1603 
       
  1604 \begin{table}
       
  1605 \caption{The effects of treatments X and Y on the four groups studied.}
       
  1606 \label{tab:treatments}
       
  1607 \centering
       
  1608 \begin{tabular}{l l l}
       
  1609 \toprule
       
  1610 \tabhead{Groups} & \tabhead{Treatment X} & \tabhead{Treatment Y} \\
       
  1611 \midrule
       
  1612 1 & 0.2 & 0.8\\
       
  1613 2 & 0.17 & 0.7\\
       
  1614 3 & 0.24 & 0.75\\
       
  1615 4 & 0.68 & 0.3\\
       
  1616 \bottomrule\\
       
  1617 \end{tabular}
       
  1618 \end{table}
       
  1619 
       
  1620 You can reference tables with \verb|\ref{<label>}| where the label is defined within the table environment. See \file{Chapter1.tex} for an example of the label and citation (e.g. Table~\ref{tab:treatments}).
       
  1621 
       
  1622 \subsection{Figures}
       
  1623 
       
  1624 There will hopefully be many figures in your thesis (that should be placed in the \emph{Figures} folder). The way to insert figures into your thesis is to use a code template like this:
       
  1625 %\begin{verbatim}
       
  1626 %\begin{figure}
       
  1627 %\centering
       
  1628 %\includegraphics{Figures/Electron}
       
  1629 %\decoRule
       
  1630 %\caption[An Electron]{An electron (artist's impression).}
       
  1631 %\label{fig:Electron}
       
  1632 %\end{figure}
       
  1633 %\end{verbatim}
       
  1634 %Also look in the source file. Putting this code into the source file produces the picture of the electron that you can see in the figure below.
       
  1635 %
       
  1636 %\begin{figure}[th]
       
  1637 %\centering
       
  1638 %\includegraphics{Figures/Electron}
       
  1639 %\decoRule
       
  1640 %\caption[An Electron]{An electron (artist's impression).}
       
  1641 %\label{fig:Electron}
       
  1642 %\end{figure}
       
  1643 
       
  1644 %Sometimes figures don't always appear where you write them in the source. The placement depends on how much space there is on the page for the figure. Sometimes there is not enough room to fit a figure directly where it should go (in relation to the text) and so \LaTeX{} puts it at the top of the next page. Positioning figures is the job of \LaTeX{} and so you should only worry about making them look good!
       
  1645 %
       
  1646 %Figures usually should have captions just in case you need to refer to them (such as in Figure~\ref{fig:Electron}). The \verb|\caption| command contains two parts, the first part, inside the square brackets is the title that will appear in the \emph{List of Figures}, and so should be short. The second part in the curly brackets should contain the longer and more descriptive caption text.
       
  1647 %
       
  1648 %The \verb|\decoRule| command is optional and simply puts an aesthetic horizontal line below the image. If you do this for one image, do it for all of them.
       
  1649 %
       
  1650 %\LaTeX{} is capable of using images in pdf, jpg and png format.
       
  1651 %
       
  1652 %\subsection{Typesetting mathematics}
       
  1653 %
       
  1654 %If your thesis is going to contain heavy mathematical content, be sure that \LaTeX{} will make it look beautiful, even though it won't be able to solve the equations for you.
       
  1655 %
       
  1656 %The \enquote{Not So Short Introduction to \LaTeX} (available on \href{http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf}{CTAN}) should tell you everything you need to know for most cases of typesetting mathematics. If you need more information, a much more thorough mathematical guide is available from the AMS called, \enquote{A Short Math Guide to \LaTeX} and can be downloaded from:
       
  1657 %\url{ftp://ftp.ams.org/pub/tex/doc/amsmath/short-math-guide.pdf}
       
  1658 %
       
  1659 %There are many different \LaTeX{} symbols to remember, luckily you can find the most common symbols in \href{http://ctan.org/pkg/comprehensive}{The Comprehensive \LaTeX~Symbol List}.
       
  1660 %
       
  1661 You can write an equation, which is automatically given an equation number by \LaTeX{} like this:
       
  1662 \begin{verbatim}
       
  1663 \begin{equation}
       
  1664 E = mc^{2}
       
  1665 \label{eqn:Einstein}
       
  1666 \end{equation}
       
  1667 \end{verbatim}
       
  1668 
       
  1669 This will produce Einstein's famous energy-matter equivalence equation:
       
  1670 \begin{equation}
       
  1671 E = mc^{2}
       
  1672 \label{eqn:Einstein}
       
  1673 \end{equation}
       
  1674 
       
  1675 All equations you write (which are not in the middle of paragraph text) are automatically given equation numbers by \LaTeX{}. If you don't want a particular equation numbered, use the unnumbered form:
       
  1676 \begin{verbatim}
       
  1677 \[ a^{2}=4 \]
       
  1678 \end{verbatim}
       
  1679 
       
  1680 %----------------------------------------------------------------------------------------
       
  1681 
       
  1682 \section{Sectioning and Subsectioning}
       
  1683 
       
  1684 You should break your thesis up into nice, bite-sized sections and subsections. \LaTeX{} automatically builds a table of Contents by looking at all the \verb|\chapter{}|, \verb|\section{}|  and \verb|\subsection{}| commands you write in the source.
       
  1685 
       
  1686 The Table of Contents should only list the sections to three (3) levels. A \verb|chapter{}| is level zero (0). A \verb|\section{}| is level one (1) and so a \verb|\subsection{}| is level two (2). In your thesis it is likely that you will even use a \verb|subsubsection{}|, which is level three (3). The depth to which the Table of Contents is formatted is set within \file{MastersDoctoralThesis.cls}. If you need this changed, you can do it in \file{main.tex}.
       
  1687 
       
  1688 %----------------------------------------------------------------------------------------
       
  1689 
       
  1690 \section{In Closing}
       
  1691 
       
  1692 You have reached the end of this mini-guide. You can now rename or overwrite this pdf file and begin writing your own \file{Chapter1.tex} and the rest of your thesis. The easy work of setting up the structure and framework has been taken care of for you. It's now your job to fill it out!
       
  1693 
       
  1694 Good luck and have lots of fun!
       
  1695 
       
  1696 \begin{flushright}
       
  1697 Guide written by ---\\
       
  1698 Sunil Patel: \href{http://www.sunilpatel.co.uk}{www.sunilpatel.co.uk}\\
       
  1699 Vel: \href{http://www.LaTeXTemplates.com}{LaTeXTemplates.com}
       
  1700 \end{flushright}