ChengsongTanPhdThesis/Chapters/Introduction.tex
changeset 654 2ad20ba5b178
parent 653 bc5571c38d1f
child 664 ba44144875b1
equal deleted inserted replaced
653:bc5571c38d1f 654:2ad20ba5b178
   202 %This part is about regular expressions, Brzozowski derivatives,
   202 %This part is about regular expressions, Brzozowski derivatives,
   203 %and a bit-coded lexing algorithm with proven correctness and time bounds.
   203 %and a bit-coded lexing algorithm with proven correctness and time bounds.
   204 
   204 
   205 %TODO: look up snort rules to use here--give readers idea of what regexes look like
   205 %TODO: look up snort rules to use here--give readers idea of what regexes look like
   206 
   206 
   207 
   207 \marginpar{rephrasing using "imprecise words"}
   208 Regular expressions, since their inception in the 1940s, 
   208 Regular expressions, since their inception in the 1940s, 
   209 have been subject to extensive study and implementation. 
   209 have been subject to extensive study and implementation. 
   210 Their primary application lies in text processing--finding
   210 Their primary application lies in text processing--finding
   211 matches and identifying patterns in a string.
   211 matches and identifying patterns in a string.
   212 %It is often used to match strings that comprises of numerous fields, 
   212 %It is often used to match strings that comprises of numerous fields, 
   213 %where certain fields may recur or be omitted. 
   213 %where certain fields may recur or be omitted. 
   214 For example, a simple regular expression that tries 
   214 For example, a simple regular expression that tries 
   215 to recognise email addresses is
   215 to recognise email addresses is
   216 \marginpar{rephrased from "the regex for recognising" to "a simple regex that tries to match email"}
   216 \marginpar{rephrased from "the regex for recognising" to "a simple regex that tries to match email"}
   217 \begin{center}
   217 \begin{center}
   218 $[a-z0-9.\_]^\backslash+@[a-z0-9.-]^\backslash+\.\{a-z\}\{2,6\}$
   218 \verb|[a-z0-9._]^+@[a-z0-9.-]^+\.\{a-z\}\{2,6\}|
   219 %$[a-z0-9._]^+@[a-z0-9.-]^+\.[a-z]{2,6}$.
   219 %$[a-z0-9._]^+@[a-z0-9.-]^+\.[a-z]{2,6}$.
   220 \end{center}
   220 \end{center}
   221 \marginpar{Simplified example, but the distinction between . and escaped . is correct
   221 \marginpar{Simplified example, but the distinction between . and escaped . is correct
   222 and therefore left unchanged.}
   222 and therefore left unchanged. Also verbatim package does not straightforwardly support superscripts so + kept as they are.}
   223 
       
   224 %Using this, regular expression matchers and lexers are able to extract 
   223 %Using this, regular expression matchers and lexers are able to extract 
   225 %the domain names by the use of \verb|[a-zA-Z0-9.-]+|. 
   224 %the domain names by the use of \verb|[a-zA-Z0-9.-]+|. 
   226 \marginpar{Rewrote explanation for the expression.}
   225 \marginpar{Rewrote explanation for the expression.}
   227 The bracketed sub-expressions are used to extract specific parts of an email address.
   226 The bracketed sub-expressions are used to extract specific parts of an email address.
   228 The local part is recognised by the expression enclosed in 
   227 The local part is recognised by the expression enclosed in