handouts/ho03.tex
changeset 229 ea921d6a1819
parent 228 4f7c7997b68b
child 230 603cbd28e988
equal deleted inserted replaced
228:4f7c7997b68b 229:ea921d6a1819
     5 \begin{document}
     5 \begin{document}
     6 
     6 
     7 \section*{Handout 3 (Buffer Overflow Attacks)}
     7 \section*{Handout 3 (Buffer Overflow Attacks)}
     8 
     8 
     9 By far the most popular attack method on computers are buffer
     9 By far the most popular attack method on computers are buffer
    10 overflow attacks or simple variations thereof. The popularity is
    10 overflow attacks or variations thereof. The popularity is
    11 unfortunate because we nowadays have technology in place to prevent them
    11 unfortunate because we nowadays have technology in place to
    12 effectively. But these kind of attacks are still very relevant
    12 prevent them effectively. But these kind of attacks are still
    13 even today since there are many legacy systems out there and
    13 very relevant even today since there are many legacy systems
    14 also many modern embedded systems do not take any precautions
    14 out there and also many modern embedded systems often do not
    15 to prevent such attacks.
    15 take any precautions to prevent such attacks.
    16 
    16 
    17 To understand how buffer overflow attacks work, we have to have
    17 To understand how buffer overflow attacks work, we have to have
    18 a look at how computers work ``under the hood'' (on the
    18 a look at how computers work ``under the hood'' (on the
    19 machine level) and also understand some aspects of the C/C++
    19 machine level) and also understand some aspects of the C/C++
    20 programming language. This might not be everyday fare for
    20 programming language. This might not be everyday fare for
    21 computer science students, but who said that criminal hackers
    21 computer science students, but who said that criminal hackers
    22 restrict themselves to everyday fare? Not to mention the
    22 restrict themselves to everyday fare? Not to mention the
    23 free-riding script-kiddies who use this technology without
    23 free-riding script-kiddies who use this technology without
    24 even knowing what the underlying ideas are. If you want to be
    24 even knowing what the underlying ideas are. If you want to be
    25 a good security engineer who needs to defend such attacks, 
    25 a good security engineer who needs to defend such attacks, 
    26 then better you know the details.
    26 then better you get to know the details.
    27  
    27  
    28 For buffer overflow attacks to work, a number of innocent
    28 For buffer overflow attacks to work, a number of innocent
    29 design decisions, which are really benign on their own, need
    29 design decisions, which are really benign on their own, need
    30 to conspire against you. All these decisions were pretty much
    30 to conspire against you. All these decisions were taken at a
    31 taken at a time when there was no Internet: C was introduced
    31 time when there was no Internet: C was introduced around 1973;
    32 around 1973; the Internet TCP/IP protocol was standardised in
    32 the Internet TCP/IP protocol was standardised in 1982 by which
    33 1982 by which time there were maybe 500 servers connected (and
    33 time there were maybe 500 servers connected (and all users
    34 all users were well-behaved, mostly academics); Intel's first
    34 were well-behaved, mostly academics); Intel's first 8086 CPUs
    35 8086 CPUs arrived around 1977. So nobody of the
    35 arrived around 1977. So nobody of the ``forefathers'' can
    36 ``forefathers'' can really be blamed, but as mentioned above
    36 really be blamed, but as mentioned above we should already be
    37 we should already be way beyond the point that buffer overflow
    37 way beyond the point that buffer overflow attacks are worth a
    38 attacks are worth a thought. Unfortunately, this is far from
    38 thought. Unfortunately, this is far from the truth. I let you
    39 the truth. I let you ponder why?
    39 ponder why?
    40 
    40 
    41 One such ``benign'' design decision is how the memory is laid
    41 One such ``benign'' design decision is how the memory is laid
    42 out into different regions for each process. 
    42 out into different regions for each process. 
    43  
    43  
    44 \begin{center}
    44 \begin{center}
    73 stack when a program is running. Consider the following simple
    73 stack when a program is running. Consider the following simple
    74 C program.
    74 C program.
    75  
    75  
    76 \lstinputlisting[language=C]{../progs/example1.c} 
    76 \lstinputlisting[language=C]{../progs/example1.c} 
    77  
    77  
    78 \noindent The \code{main} function calls \code{foo} with three
    78 \noindent The \code{main} function calls in Line 7 the
    79 arguments. \code{Foo} contains two (local) buffers. The
    79 function \code{foo} with three arguments. \code{Foo} creates
    80 interesting point for us will be what will the stack loke
    80 two (local) buffers, but does not do anything interesting with
    81 like after Line 3 has been executed? The answer is as follows:
    81 them. The only purpose of this program is to illustrate what
       
    82 happens behind the scenes with the stack. The interesting
       
    83 question is what will the stack be after Line 3 has been
       
    84 executed? The answer can be illustrated as follows:
    82  
    85  
    83 \begin{center} 
    86 \begin{center} 
    84  \begin{tikzpicture}[scale=0.65]
    87  \begin{tikzpicture}[scale=0.65]
    85   \draw[gray!20,fill=gray!20] (-5, 0) rectangle (-3,-1);
    88   \draw[gray!20,fill=gray!20] (-5, 0) rectangle (-3,-1);
    86   \draw[line width=1mm] (-5,-1.2) -- (-5,0.2);
    89   \draw[line width=1mm] (-5,-1.2) -- (-5,0.2);
   124 \noindent On the left is the stack before \code{foo} is
   127 \noindent On the left is the stack before \code{foo} is
   125 called; on the right is the stack after \code{foo} finishes.
   128 called; on the right is the stack after \code{foo} finishes.
   126 The function call to \code{foo} in Line 7 pushes the arguments
   129 The function call to \code{foo} in Line 7 pushes the arguments
   127 onto the stack in reverse order---shown in the middle.
   130 onto the stack in reverse order---shown in the middle.
   128 Therefore first 3 then 2 and finally 1. Then it pushes the
   131 Therefore first 3 then 2 and finally 1. Then it pushes the
   129 return address to the stack where execution should resume once
   132 return address onto the stack where execution should resume
   130 \code{foo} has finished. The last stack pointer (\code{sp}) is
   133 once \code{foo} has finished. The last stack pointer
   131 needed in order to clean up the stack to the last level---in
   134 (\code{sp}) is needed in order to clean up the stack to the
   132 fact there is no cleaning involved, but just the top of the
   135 last level---in fact there is no cleaning involved, but just
   133 stack will be set back. The two buffers are also on the stack,
   136 the top of the stack will be set back. So the last stack
   134 because they are local data within \code{foo}. So in the
   137 pointer also needs to be stored. The two buffers inside
   135 middle is a snapshot of the stack after Line 3 has been 
   138 \pcode{foo} are on the stack too, because they are local data
   136 executed. In case you are familiar with assembly instructions
   139 within \code{foo}. Consequently the stack in the middle is a
   137 you can also read off this behaviour from the machine
   140 snapshot after Line 3 has been executed. In case you are
   138 code that the \code{gcc} compiler generates for the program
   141 familiar with assembly instructions you can also read off this
   139 above:\footnote{You can make \pcode{gcc} generate assembly 
   142 behaviour from the machine code that the \code{gcc} compiler
   140 instructions if you call it with the \pcode{-S} option, 
   143 generates for the program above:\footnote{You can make
   141 for example \pcode{gcc -S out in.c}\;. Or you can look
   144 \pcode{gcc} generate assembly instructions if you call it with
   142 at this code by using the debugger. This will be explained
   145 the \pcode{-S} option, for example \pcode{gcc -S out in.c}\;.
   143 later.}.
   146 Or you can look at this code by using the debugger. How to do
       
   147 this will be explained later.}.
   144 
   148 
   145 \begin{center}\small
   149 \begin{center}\small
   146 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
   150 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
   147 {\lstinputlisting[language={[x86masm]Assembler},
   151 {\lstinputlisting[language={[x86masm]Assembler},
   148   morekeywords={movl},xleftmargin=5mm]
   152   morekeywords={movl},xleftmargin=5mm]
   152   {../progs/example1b.s}}  
   156   {../progs/example1b.s}}  
   153 \end{tabular}
   157 \end{tabular}
   154 \end{center}
   158 \end{center}
   155 
   159 
   156 \noindent On the left you can see how the function
   160 \noindent On the left you can see how the function
   157 \pcode{main} prepares in Lines 2 to 7 the stack, before
   161 \pcode{main} prepares in Lines 2 to 7 the stack before calling
   158 calling the function \pcode{foo}. You can see that the
   162 the function \pcode{foo}. You can see that the numbers 3, 2, 1
   159 numbers 3, 2, 1 are stored on the stack (the register
   163 are stored on the stack (the register \code{$esp} refers to
   160 \code{$esp} refers to the top of the stack). On the right
   164 the top of the stack). On the right you can see how the
   161 you can see how the function \pcode{foo} stores the two local
   165 function \pcode{foo} stores the two local buffers onto the
   162 buffers onto the stack and initialises them with the given
   166 stack and initialises them with the given data (Lines 2 to 9).
   163 data (Lines 2 to 9). Since there is no real computation
   167 Since there is no real computation going on inside
   164 going on inside \pcode{foo} the function then just restores
   168 \pcode{foo}, the function then just restores the stack to its
   165 the stack to its old state and crucially sets the return
   169 old state and crucially sets the return address where the
   166 address where the computation should resume (Line 9 in the
   170 computation should resume (Line 9 in the code on the left-hand
   167 code on the left hand side). The instruction \code{ret} then
   171 side). The instruction \code{ret} then transfers control back
   168 transfers control back to the function \pcode{main} to the
   172 to the function \pcode{main} to the the instruction just after
   169 teh instruction just after the call, namely Line 9.
   173 the call to \pcode{foo}, that is Line 9.
   170  
   174  
   171 Another part of the ``conspiracy'' is that library functions
   175 Another part of the ``conspiracy'' is that library functions
   172 in C look typically as follows:
   176 in C look typically as follows:
   173  
   177  
   174 \begin{center}
   178 \begin{center}
   178 \noindent This function copies data from a source \pcode{src}
   182 \noindent This function copies data from a source \pcode{src}
   179 to a destination \pcode{dst}. The important point is that it
   183 to a destination \pcode{dst}. The important point is that it
   180 copies the data until it reaches a zero-byte (\code{"\\0"}). 
   184 copies the data until it reaches a zero-byte (\code{"\\0"}). 
   181 
   185 
   182 The central idea of the buffer overflow attack is to overwrite
   186 The central idea of the buffer overflow attack is to overwrite
   183 the return address on the stack which states where the control
   187 the return address on the stack which designates where the
   184 flow of the program should resume once the function at hand
   188 control flow of the program should resume once the function at
   185 has finished its computation. So if we have somewhere in a 
   189 hand has finished its computation. So if we have somewhere in
   186 function a local a buffer, say
   190 a function a local a buffer, say
   187 
   191 
   188 \begin{center}
   192 \begin{center}
   189 \code{char buf[8];}
   193 \code{char buf[8];}
   190 \end{center}
   194 \end{center}
   191 
   195 
   208   \draw[line width=1mm] (-1,3) -- (1,3);
   212   \draw[line width=1mm] (-1,3) -- (1,3);
   209   \draw (0,3.1) node[anchor=south] {\tt ret};
   213   \draw (0,3.1) node[anchor=south] {\tt ret};
   210   \draw[line width=1mm] (-1,4) -- (1,4);
   214   \draw[line width=1mm] (-1,4) -- (1,4);
   211   \draw (0,4) node[anchor=south] {\small\tt last sp};
   215   \draw (0,4) node[anchor=south] {\small\tt last sp};
   212   \draw[line width=1mm] (-1,5) -- (1,5);
   216   \draw[line width=1mm] (-1,5) -- (1,5);
   213   \draw (0,5) node[anchor=south] {\tt buf};
   217   \draw (0,5.1) node[anchor=south] {\tt buf};
   214   \draw[line width=1mm] (-1,6) -- (1,6);
   218   \draw[line width=1mm] (-1,6) -- (1,6);
   215   \draw (2,5.1) node[anchor=south] {\code{$esp}};
   219   \draw (2,5.1) node[anchor=south] {\code{$esp}};
   216   \draw[<-,line width=0.5mm] (1.1,6) -- (2.5,6);
   220   \draw[<-,line width=0.5mm] (1.1,6) -- (2.5,6);
   217 
   221 
   218   \draw[->,line width=0.5mm] (1,4.5) -- (2.5,4.5);
   222   \draw[->,line width=0.5mm] (1,4.5) -- (2.5,4.5);
   221   \draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5);
   225   \draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5);
   222   \draw (2.6,3.1) node[anchor=south west] {\tt jump to \code{\\x080483f4}};
   226   \draw (2.6,3.1) node[anchor=south west] {\tt jump to \code{\\x080483f4}};
   223 \end{tikzpicture}
   227 \end{tikzpicture}
   224 \end{center}
   228 \end{center}
   225 
   229 
   226 \noindent We need to fill this over its limit of
   230 \noindent We need to fill this buffer over its limit of 8
   227 8 characters so that it overwrites the stack pointer
   231 characters so that it overwrites the stack pointer and then
   228 and then overwrites the return address. If, for example, 
   232 also overwrites the return address. If, for example, we want
   229 we want to jump to a specific address in memory, say,
   233 to jump to a specific address in memory, say,
   230 \pcode{\\x080483f4} then we need to fill the 
   234 \pcode{\\x080483f4} then we can fill the buffer with the data
   231 buffer for example as follows
       
   232 
   235 
   233 \begin{center}
   236 \begin{center}
   234 \code{char buf[8] = "AAAAAAAABBBB\\xf4\\x83\\x04\\x08";}
   237 \code{char buf[8] = "AAAAAAAABBBB\\xf4\\x83\\x04\\x08";}
   235 \end{center}
   238 \end{center}
   236  
   239  
   237 \noindent The first 8 \pcode{A}s fill the buffer to the rim;
   240 \noindent The first eight \pcode{A}s fill the buffer to the
   238 the next four \pcode{B}s overwrite the stack pointer (with
   241 rim; the next four \pcode{B}s overwrite the stack pointer
   239 what data we overwrite this part is usually not important);
   242 (with what data we overwrite this part is usually not
   240 then comes the address we want to jump to. Notice that we have
   243 important); then comes the address we want to jump to. Notice
   241 to give the address in the reverse order. All addresses on
   244 that we have to give the address in the reverse order. All
   242 Intel CPUs need to be given in this way. Since the string is
   245 addresses on Intel CPUs need to be given in this way. Since
   243 enclosed in double quotes, the C convention is that the string
   246 the string is enclosed in double quotes, the C convention is
   244 internally will automatically be terminated by a zero-byte. If
   247 that the string internally will automatically be terminated by
   245 the programmer uses functions like \pcode{strcpy} for filling
   248 a zero-byte. If the programmer uses functions like
   246 the buffer \pcode{buf}, then we can be sure it will overwrite
   249 \pcode{strcpy} for filling the buffer \pcode{buf}, then we can
   247 the stack in this manner---since it will copy everything up
   250 be sure it will overwrite the stack in this manner---since it
   248 to the zero-byte.
   251 will copy everything up to the zero-byte. Notice that this
       
   252 overwriting of the buffer only works since the newer item, the
       
   253 buffer, is stored on the stack before the older items, like
       
   254 return address and arguments. If it had be the other way
       
   255 around, then such an overwriting by overflowing a local buffer
       
   256 would just not work.
   249 
   257 
   250 What the outcome of such an attack is can be illustrated with
   258 What the outcome of such an attack is can be illustrated with
   251 the code shown in Figure~\ref{C2}. Under ``normal operation''
   259 the code shown in Figure~\ref{C2}. Under ``normal operation''
   252 this program ask for a login-name and a password (both are
   260 this program ask for a login-name and a password. Both of
   253 represented as strings). Both of which are stored in buffers
   261 which are stored in \code{char} buffers of length 8. The
   254 of length 8. The function \pcode{match} tests whether two such 
   262 function \pcode{match} tests whether two such buffers contain
   255 strings are equal. If yes, then the function lets you in
   263 the same. If yes, then the function lets you ``in'' (by
   256 (by printing \pcode{Welcome}). If not, it denies access
   264 printing \pcode{Welcome}). If not, it denies access (by
   257 (by printing \pcode{Wrong identity}). The vulnerable function
   265 printing \pcode{Wrong identity}). The vulnerable function is
   258 is \code{get_line} in Lines 11 to 19. This function does not
   266 \code{get_line} in Lines 11 to 19. This function does not take
   259 take any precautions about the buffer of 8 characters being
   267 any precautions about the buffer of 8 characters being filled
   260 filled beyond this 8-character-limit. The buffer overflow
   268 beyond this 8-character-limit. Let us suppose the login name
   261 can be triggered by inputing something, like \pcode{foo}, for 
   269 is \pcode{test}. Then the buffer overflow can be triggered
   262 the login name and then the specially crafted string as 
   270 with a specially crafted string as password:
   263 password:
       
   264 
   271 
   265 \begin{center}
   272 \begin{center}
   266 \code{AAAAAAAABBBB\\x2c\\x85\\x04\\x08\\n}
   273 \code{AAAAAAAABBBB\\x2c\\x85\\x04\\x08\\n}
   267 \end{center}
   274 \end{center}
   268 
   275 
   269 \noindent The address happens to be the one for the function
   276 \noindent The address at the end happens to be the one for the
   270 \pcode{welcome()}. This means even with this input (where the
   277 function \pcode{welcome()}. This means even with this input
   271 login name and password clearly do not match) the program will
   278 (where the login name and password clearly do not match) the
   272 still print out \pcode{Welcome}. The only information we need
   279 program will still print out \pcode{Welcome}. The only
   273 for this attack is to know where the function
   280 information we need for this attack is to know where the
   274 \pcode{welcome()} starts in memory. This information can be
   281 function \pcode{welcome()} starts in memory. This information
   275 easily obtained by starting the program inside the debugger
   282 can be easily obtained by starting the program inside the
   276 and disassembling this function. 
   283 debugger and disassembling this function. 
   277 
   284 
   278 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
   285 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
   279   morekeywords={movl,movw}]
   286   morekeywords={movl,movw}]
   280 $ gdb C2
   287 $ gdb C2
   281 GNU gdb (GDB) 7.2-ubuntu
   288 GNU gdb (GDB) 7.2-ubuntu
   282 (gdb) disassemble welcome
   289 (gdb) disassemble welcome
   283 \end{lstlisting}
   290 \end{lstlisting}
   284 
   291 
   285 \noindent 
   292 \noindent \pcode{C2} is the name of the program and
   286 The output will be something like this
   293 \pcode{gdb} is the name of the debugger. The output will be
       
   294 something like this
   287 
   295 
   288 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
   296 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
   289   morekeywords={movl,movw}]
   297   morekeywords={movl,movw}]
   290 0x0804852c <+0>:     push   %ebp
   298 0x0804852c <+0>:     push   %ebp
   291 0x0804852d <+1>:     mov    %esp,%ebp
   299 0x0804852d <+1>:     mov    %esp,%ebp
   295 0x0804853e <+18>:    movl   $0x0,(%esp)
   303 0x0804853e <+18>:    movl   $0x0,(%esp)
   296 0x08048545 <+25>:    call   0x80483b4 <exit@plt>
   304 0x08048545 <+25>:    call   0x80483b4 <exit@plt>
   297 \end{lstlisting}
   305 \end{lstlisting}
   298 
   306 
   299 \noindent indicating that the function \pcode{welcome()}
   307 \noindent indicating that the function \pcode{welcome()}
   300 starts at address \pcode{0x0804852c}.
   308 starts at address \pcode{0x0804852c} (top address in the 
       
   309 left column).
   301 
   310 
   302 \begin{figure}[p]
   311 \begin{figure}[p]
   303 \lstinputlisting[language=C]{../progs/C2.c}
   312 \lstinputlisting[language=C]{../progs/C2.c}
   304 \caption{A suspicious login implementation.\label{C2}}
   313 \caption{A suspicious login implementation.\label{C2}}
   305 \end{figure}
   314 \end{figure}
   308 that needed a key to be unlocked. Historically, hackers first 
   317 that needed a key to be unlocked. Historically, hackers first 
   309 broke the rather weak encryption of these locking mechanisms.
   318 broke the rather weak encryption of these locking mechanisms.
   310 After the encryption had been made stronger, hackers used
   319 After the encryption had been made stronger, hackers used
   311 buffer overflow attacks as shown above to jump directly to
   320 buffer overflow attacks as shown above to jump directly to
   312 the part of the program that was intended to be only available
   321 the part of the program that was intended to be only available
   313 after the correct key was typed in by the user. 
   322 after the correct key was typed in. 
   314 
   323 
   315 \subsection*{Paylods}
   324 \subsection*{Paylods}
   316 
   325 
   317 Unfortunately, much more harm can be caused by buffer overflow
   326 Unfortunately, much more harm can be caused by buffer overflow
   318 attacks. This is achieved by injecting code that will be run
   327 attacks. This is achieved by injecting code that will be run
   319 once the return address is appropriately modified. Typically
   328 once the return address is appropriately modified. Typically
   320 the code that will be injected is for running a shell. In
   329 the code that will be injected is for running a shell. This
   321 order to be send as part of the string that is overflowing the
   330 gives the attacker the ability to run programs on the target
   322 buffer, we need the code to be encoded as a sequence of 
   331 machine and have a good look around, provided the attacked
   323 characters
   332 process was not already running as root.\footnote{In that case
       
   333 the attacker would do already congratulate him or herself to
       
   334 another computer under full control.} In order to be send as
       
   335 part of the string that is overflowing the buffer, we need the
       
   336 code to be represented as a sequence of characters. For
       
   337 example
   324 
   338 
   325 \lstinputlisting[language=C,numbers=none]{../progs/o1.c}
   339 \lstinputlisting[language=C,numbers=none]{../progs/o1.c}
   326 
   340 
   327 \noindent These characters represent the machine code
   341 \noindent These characters represent the machine code for
   328 for opening a shell. It seems obtaining such a string
   342 opening a shell. It seems obtaining such a string requires
   329 requires higher-education in the architecture of the
   343 higher-education in the architecture of the target system. But
   330 target system. But it is actually relatively simple: First
   344 it is actually relatively simple: First there are many such
   331 there are many ready-made strings available---just a quick
   345 string ready-made---just a quick Google query away. Second,
   332 Google query away. Second, tools like the debugger can help 
   346 tools like the debugger can help us again. We can just write
   333 us again. We can just write the code we want in C, for 
   347 the code we want in C, for example this would be the program
   334 example this would be the program to start a shell
   348 for starting a shell
   335 
   349 
   336 \lstinputlisting[language=C,numbers=none]{../progs/shell.c} 
   350 \lstinputlisting[language=C,numbers=none]{../progs/shell.c} 
   337 
   351 
   338 \noindent Once compiled, we can use the debugger to obtain 
   352 \noindent Once compiled, we can use the debugger to obtain 
   339 the machine code, or even the ready made encoding as character
   353 the machine code, or even the ready-made encoding as character
   340 sequence. 
   354 sequence. 
   341 
   355 
   342 While easy, obtaining this string is not entirely trivial.
   356 While easy, obtaining this string is not entirely trivial.
   343 Remember the functions in C that copy or fill buffers work
   357 Remember the functions in C that copy or fill buffers work
   344 such that they copy everything until the zero byte is reached.
   358 such that they copy everything until the zero byte is reached.
   345 Unfortunately the ``vanilla'' output from the debugger for the
   359 Unfortunately the ``vanilla'' output from the debugger for the
   346 shell-program will contain such zero bytes. So a
   360 shell-program above will contain such zero bytes. So a
   347 post-processing phase is needed to rewrite the machine code
   361 post-processing phase is needed to rewrite the machine code in
   348 such that it does not contain any zero bytes. This is like
   362 a way that it does not contain any zero bytes. This is like
   349 some works of literature that have been rewritten so that the
   363 some works of literature that have been written so that the
   350 letter 'i', for example, is avoided. For rewriting the machine
   364 letter 'i', for example, is avoided. For rewriting the machine
   351 code you might need to use clever tricks like
   365 code, you might need to use clever tricks like
   352 
   366 
   353 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
   367 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
   354 xor %eax, %eax
   368 xor %eax, %eax
   355 \end{lstlisting}
   369 \end{lstlisting}
   356 
   370 
   357 \noindent This instruction does not contain any zero byte when
   371 \noindent This instruction does not contain any zero byte when
   358 encoded, but produces a zero byte on the stack. 
   372 encoded, but produces a zero byte on the stack when run. 
   359 
   373 
   360 Having removed the zero bytes we can craft the string that 
   374 Having removed the zero bytes we can craft the string that 
   361 will be send to the target computer. It is typically of the 
   375 will be send to the target computer. It is typically of the 
   362 form
   376 form
   363 
   377 
   379 run. But as you can see this is only 47 bytes, which is a very
   393 run. But as you can see this is only 47 bytes, which is a very
   380 low bar to jump over. More formidable is the choice of finding
   394 low bar to jump over. More formidable is the choice of finding
   381 the right address to jump to. As indicated in the picture we
   395 the right address to jump to. As indicated in the picture we
   382 need to be very precise with the address with which we will
   396 need to be very precise with the address with which we will
   383 overwrite the buffer. It has to be precisely the first byte of
   397 overwrite the buffer. It has to be precisely the first byte of
   384 the shellcode. While this is easy withe the help of a
   398 the shellcode. While this is easy with the help of a debugger
   385 debugger, we typically cannot run anything on the machine yet
   399 (as seen before), we typically cannot run anything on the
   386 we target. And the address is very specific to the setup of
   400 machine yet we target. And the address is very specific to the
   387 the target machine. One way of finding out what the right
   401 setup of the target machine. One way of finding out what the
   388 address is to try out one by one until we get lucky. With
   402 right address is is to try out one by one until we get lucky.
   389 large memories available today, however, the odds are long.
   403 With the large memories available today, however, the odds are
   390 And if we try out too many possible candidates to quickly, we
   404 long. And if we try out too many possible candidates too
   391 might be detected by the system administrator of the target
   405 quickly, we might be detected by the system administrator of
   392 system.
   406 the target system.
   393 
   407 
   394 We can improve our odds considerably, by the following clever 
   408 We can improve our odds considerably by following a clever 
   395 trick. Instead of adding the shellcode at the beginning of the
   409 trick. Instead of adding the shellcode at the beginning of the
   396 string, we should add it at the end, just before we overflow 
   410 string, we should add it at the end, just before we overflow 
   397 the buffer, like
   411 the buffer, for example
   398 
   412 
   399 \begin{center}
   413 \begin{center}
   400   \begin{tikzpicture}[scale=0.7]
   414   \begin{tikzpicture}[scale=0.7]
   401   \draw[line width=1mm] (-2, -1) rectangle (2,3);
   415   \draw[line width=1mm] (-2, -1) rectangle (2,3);
   402   \draw[line width=1mm] (-2,1.9) -- (2,1.9);
   416   \draw[line width=1mm] (-2,1.9) -- (2,1.9);
   404   \draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7);
   418   \draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7);
   405   \draw (-2, 3) node[anchor=north east] {\LARGE \color{codegreen}{``}};
   419   \draw (-2, 3) node[anchor=north east] {\LARGE \color{codegreen}{``}};
   406   \draw ( 2,-0.9) node[anchor=west] {\LARGE\color{codegreen}{''}};
   420   \draw ( 2,-0.9) node[anchor=west] {\LARGE\color{codegreen}{''}};
   407   \end{tikzpicture}
   421   \end{tikzpicture}
   408 \end{center}
   422 \end{center}
       
   423 
       
   424 \noindent Then we can fill up the gray part of the string with
       
   425 a \pcode{NOP} operation. The code for this operation is
       
   426 \code{\\0x90}. It is available on every architecture and its
       
   427 purpose it to to nothing apart from waiting a small amount of
       
   428 time. If we now use an address that lets us jump to any
       
   429 address in the gray area we are done. The target machine will 
       
   430 execute these \pcode{NOP} operations until it reaches the
       
   431 shellcode. A moment of thought can convince you that this
       
   432 trick can hugely improve our odds of finding the right 
       
   433 address---depending on the size of the buffer, it might
       
   434 only take a few tries to get the shellcode to run.
   409 
   435 
   410 \bigskip\bigskip
   436 \bigskip\bigskip
   411 \subsubsection*{A Crash-Course for GDB}
   437 \subsubsection*{A Crash-Course for GDB}
   412 
   438 
   413 \begin{itemize}
   439 \begin{itemize}