--- a/handouts/ho03.tex	Thu Oct 09 15:49:21 2014 +0100
+++ b/handouts/ho03.tex	Thu Oct 09 23:12:10 2014 +0100
@@ -7,12 +7,12 @@
 \section*{Handout 3 (Buffer Overflow Attacks)}
 
 By far the most popular attack method on computers are buffer
-overflow attacks or simple variations thereof. The popularity is
-unfortunate because we nowadays have technology in place to prevent them
-effectively. But these kind of attacks are still very relevant
-even today since there are many legacy systems out there and
-also many modern embedded systems do not take any precautions
-to prevent such attacks.
+overflow attacks or variations thereof. The popularity is
+unfortunate because we nowadays have technology in place to
+prevent them effectively. But these kind of attacks are still
+very relevant even today since there are many legacy systems
+out there and also many modern embedded systems often do not
+take any precautions to prevent such attacks.
 
 To understand how buffer overflow attacks work, we have to have
 a look at how computers work ``under the hood'' (on the
@@ -23,20 +23,20 @@
 free-riding script-kiddies who use this technology without
 even knowing what the underlying ideas are. If you want to be
 a good security engineer who needs to defend such attacks, 
-then better you know the details.
+then better you get to know the details.
  
 For buffer overflow attacks to work, a number of innocent
 design decisions, which are really benign on their own, need
-to conspire against you. All these decisions were pretty much
-taken at a time when there was no Internet: C was introduced
-around 1973; the Internet TCP/IP protocol was standardised in
-1982 by which time there were maybe 500 servers connected (and
-all users were well-behaved, mostly academics); Intel's first
-8086 CPUs arrived around 1977. So nobody of the
-``forefathers'' can really be blamed, but as mentioned above
-we should already be way beyond the point that buffer overflow
-attacks are worth a thought. Unfortunately, this is far from
-the truth. I let you ponder why?
+to conspire against you. All these decisions were taken at a
+time when there was no Internet: C was introduced around 1973;
+the Internet TCP/IP protocol was standardised in 1982 by which
+time there were maybe 500 servers connected (and all users
+were well-behaved, mostly academics); Intel's first 8086 CPUs
+arrived around 1977. So nobody of the ``forefathers'' can
+really be blamed, but as mentioned above we should already be
+way beyond the point that buffer overflow attacks are worth a
+thought. Unfortunately, this is far from the truth. I let you
+ponder why?
 
 One such ``benign'' design decision is how the memory is laid
 out into different regions for each process. 
@@ -75,10 +75,13 @@
  
 \lstinputlisting[language=C]{../progs/example1.c} 
  
-\noindent The \code{main} function calls \code{foo} with three
-arguments. \code{Foo} contains two (local) buffers. The
-interesting point for us will be what will the stack loke
-like after Line 3 has been executed? The answer is as follows:
+\noindent The \code{main} function calls in Line 7 the
+function \code{foo} with three arguments. \code{Foo} creates
+two (local) buffers, but does not do anything interesting with
+them. The only purpose of this program is to illustrate what
+happens behind the scenes with the stack. The interesting
+question is what will the stack be after Line 3 has been
+executed? The answer can be illustrated as follows:
  
 \begin{center} 
  \begin{tikzpicture}[scale=0.65]
@@ -126,21 +129,22 @@
 The function call to \code{foo} in Line 7 pushes the arguments
 onto the stack in reverse order---shown in the middle.
 Therefore first 3 then 2 and finally 1. Then it pushes the
-return address to the stack where execution should resume once
-\code{foo} has finished. The last stack pointer (\code{sp}) is
-needed in order to clean up the stack to the last level---in
-fact there is no cleaning involved, but just the top of the
-stack will be set back. The two buffers are also on the stack,
-because they are local data within \code{foo}. So in the
-middle is a snapshot of the stack after Line 3 has been 
-executed. In case you are familiar with assembly instructions
-you can also read off this behaviour from the machine
-code that the \code{gcc} compiler generates for the program
-above:\footnote{You can make \pcode{gcc} generate assembly 
-instructions if you call it with the \pcode{-S} option, 
-for example \pcode{gcc -S out in.c}\;. Or you can look
-at this code by using the debugger. This will be explained
-later.}.
+return address onto the stack where execution should resume
+once \code{foo} has finished. The last stack pointer
+(\code{sp}) is needed in order to clean up the stack to the
+last level---in fact there is no cleaning involved, but just
+the top of the stack will be set back. So the last stack
+pointer also needs to be stored. The two buffers inside
+\pcode{foo} are on the stack too, because they are local data
+within \code{foo}. Consequently the stack in the middle is a
+snapshot after Line 3 has been executed. In case you are
+familiar with assembly instructions you can also read off this
+behaviour from the machine code that the \code{gcc} compiler
+generates for the program above:\footnote{You can make
+\pcode{gcc} generate assembly instructions if you call it with
+the \pcode{-S} option, for example \pcode{gcc -S out in.c}\;.
+Or you can look at this code by using the debugger. How to do
+this will be explained later.}.
 
 \begin{center}\small
 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
@@ -154,19 +158,19 @@
 \end{center}
 
 \noindent On the left you can see how the function
-\pcode{main} prepares in Lines 2 to 7 the stack, before
-calling the function \pcode{foo}. You can see that the
-numbers 3, 2, 1 are stored on the stack (the register
-\code{$esp} refers to the top of the stack). On the right
-you can see how the function \pcode{foo} stores the two local
-buffers onto the stack and initialises them with the given
-data (Lines 2 to 9). Since there is no real computation
-going on inside \pcode{foo} the function then just restores
-the stack to its old state and crucially sets the return
-address where the computation should resume (Line 9 in the
-code on the left hand side). The instruction \code{ret} then
-transfers control back to the function \pcode{main} to the
-teh instruction just after the call, namely Line 9.
+\pcode{main} prepares in Lines 2 to 7 the stack before calling
+the function \pcode{foo}. You can see that the numbers 3, 2, 1
+are stored on the stack (the register \code{$esp} refers to
+the top of the stack). On the right you can see how the
+function \pcode{foo} stores the two local buffers onto the
+stack and initialises them with the given data (Lines 2 to 9).
+Since there is no real computation going on inside
+\pcode{foo}, the function then just restores the stack to its
+old state and crucially sets the return address where the
+computation should resume (Line 9 in the code on the left-hand
+side). The instruction \code{ret} then transfers control back
+to the function \pcode{main} to the the instruction just after
+the call to \pcode{foo}, that is Line 9.
  
 Another part of the ``conspiracy'' is that library functions
 in C look typically as follows:
@@ -180,10 +184,10 @@
 copies the data until it reaches a zero-byte (\code{"\\0"}). 
 
 The central idea of the buffer overflow attack is to overwrite
-the return address on the stack which states where the control
-flow of the program should resume once the function at hand
-has finished its computation. So if we have somewhere in a 
-function a local a buffer, say
+the return address on the stack which designates where the
+control flow of the program should resume once the function at
+hand has finished its computation. So if we have somewhere in
+a function a local a buffer, say
 
 \begin{center}
 \code{char buf[8];}
@@ -210,7 +214,7 @@
   \draw[line width=1mm] (-1,4) -- (1,4);
   \draw (0,4) node[anchor=south] {\small\tt last sp};
   \draw[line width=1mm] (-1,5) -- (1,5);
-  \draw (0,5) node[anchor=south] {\tt buf};
+  \draw (0,5.1) node[anchor=south] {\tt buf};
   \draw[line width=1mm] (-1,6) -- (1,6);
   \draw (2,5.1) node[anchor=south] {\code{$esp}};
   \draw[<-,line width=0.5mm] (1.1,6) -- (2.5,6);
@@ -223,57 +227,60 @@
 \end{tikzpicture}
 \end{center}
 
-\noindent We need to fill this over its limit of
-8 characters so that it overwrites the stack pointer
-and then overwrites the return address. If, for example, 
-we want to jump to a specific address in memory, say,
-\pcode{\\x080483f4} then we need to fill the 
-buffer for example as follows
+\noindent We need to fill this buffer over its limit of 8
+characters so that it overwrites the stack pointer and then
+also overwrites the return address. If, for example, we want
+to jump to a specific address in memory, say,
+\pcode{\\x080483f4} then we can fill the buffer with the data
 
 \begin{center}
 \code{char buf[8] = "AAAAAAAABBBB\\xf4\\x83\\x04\\x08";}
 \end{center}
  
-\noindent The first 8 \pcode{A}s fill the buffer to the rim;
-the next four \pcode{B}s overwrite the stack pointer (with
-what data we overwrite this part is usually not important);
-then comes the address we want to jump to. Notice that we have
-to give the address in the reverse order. All addresses on
-Intel CPUs need to be given in this way. Since the string is
-enclosed in double quotes, the C convention is that the string
-internally will automatically be terminated by a zero-byte. If
-the programmer uses functions like \pcode{strcpy} for filling
-the buffer \pcode{buf}, then we can be sure it will overwrite
-the stack in this manner---since it will copy everything up
-to the zero-byte.
+\noindent The first eight \pcode{A}s fill the buffer to the
+rim; the next four \pcode{B}s overwrite the stack pointer
+(with what data we overwrite this part is usually not
+important); then comes the address we want to jump to. Notice
+that we have to give the address in the reverse order. All
+addresses on Intel CPUs need to be given in this way. Since
+the string is enclosed in double quotes, the C convention is
+that the string internally will automatically be terminated by
+a zero-byte. If the programmer uses functions like
+\pcode{strcpy} for filling the buffer \pcode{buf}, then we can
+be sure it will overwrite the stack in this manner---since it
+will copy everything up to the zero-byte. Notice that this
+overwriting of the buffer only works since the newer item, the
+buffer, is stored on the stack before the older items, like
+return address and arguments. If it had be the other way
+around, then such an overwriting by overflowing a local buffer
+would just not work.
 
 What the outcome of such an attack is can be illustrated with
 the code shown in Figure~\ref{C2}. Under ``normal operation''
-this program ask for a login-name and a password (both are
-represented as strings). Both of which are stored in buffers
-of length 8. The function \pcode{match} tests whether two such 
-strings are equal. If yes, then the function lets you in
-(by printing \pcode{Welcome}). If not, it denies access
-(by printing \pcode{Wrong identity}). The vulnerable function
-is \code{get_line} in Lines 11 to 19. This function does not
-take any precautions about the buffer of 8 characters being
-filled beyond this 8-character-limit. The buffer overflow
-can be triggered by inputing something, like \pcode{foo}, for 
-the login name and then the specially crafted string as 
-password:
+this program ask for a login-name and a password. Both of
+which are stored in \code{char} buffers of length 8. The
+function \pcode{match} tests whether two such buffers contain
+the same. If yes, then the function lets you ``in'' (by
+printing \pcode{Welcome}). If not, it denies access (by
+printing \pcode{Wrong identity}). The vulnerable function is
+\code{get_line} in Lines 11 to 19. This function does not take
+any precautions about the buffer of 8 characters being filled
+beyond this 8-character-limit. Let us suppose the login name
+is \pcode{test}. Then the buffer overflow can be triggered
+with a specially crafted string as password:
 
 \begin{center}
 \code{AAAAAAAABBBB\\x2c\\x85\\x04\\x08\\n}
 \end{center}
 
-\noindent The address happens to be the one for the function
-\pcode{welcome()}. This means even with this input (where the
-login name and password clearly do not match) the program will
-still print out \pcode{Welcome}. The only information we need
-for this attack is to know where the function
-\pcode{welcome()} starts in memory. This information can be
-easily obtained by starting the program inside the debugger
-and disassembling this function. 
+\noindent The address at the end happens to be the one for the
+function \pcode{welcome()}. This means even with this input
+(where the login name and password clearly do not match) the
+program will still print out \pcode{Welcome}. The only
+information we need for this attack is to know where the
+function \pcode{welcome()} starts in memory. This information
+can be easily obtained by starting the program inside the
+debugger and disassembling this function. 
 
 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
   morekeywords={movl,movw}]
@@ -282,8 +289,9 @@
 (gdb) disassemble welcome
 \end{lstlisting}
 
-\noindent 
-The output will be something like this
+\noindent \pcode{C2} is the name of the program and
+\pcode{gdb} is the name of the debugger. The output will be
+something like this
 
 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
   morekeywords={movl,movw}]
@@ -297,7 +305,8 @@
 \end{lstlisting}
 
 \noindent indicating that the function \pcode{welcome()}
-starts at address \pcode{0x0804852c}.
+starts at address \pcode{0x0804852c} (top address in the 
+left column).
 
 \begin{figure}[p]
 \lstinputlisting[language=C]{../progs/C2.c}
@@ -310,52 +319,57 @@
 After the encryption had been made stronger, hackers used
 buffer overflow attacks as shown above to jump directly to
 the part of the program that was intended to be only available
-after the correct key was typed in by the user. 
+after the correct key was typed in. 
 
 \subsection*{Paylods}
 
 Unfortunately, much more harm can be caused by buffer overflow
 attacks. This is achieved by injecting code that will be run
 once the return address is appropriately modified. Typically
-the code that will be injected is for running a shell. In
-order to be send as part of the string that is overflowing the
-buffer, we need the code to be encoded as a sequence of 
-characters
+the code that will be injected is for running a shell. This
+gives the attacker the ability to run programs on the target
+machine and have a good look around, provided the attacked
+process was not already running as root.\footnote{In that case
+the attacker would do already congratulate him or herself to
+another computer under full control.} In order to be send as
+part of the string that is overflowing the buffer, we need the
+code to be represented as a sequence of characters. For
+example
 
 \lstinputlisting[language=C,numbers=none]{../progs/o1.c}
 
-\noindent These characters represent the machine code
-for opening a shell. It seems obtaining such a string
-requires higher-education in the architecture of the
-target system. But it is actually relatively simple: First
-there are many ready-made strings available---just a quick
-Google query away. Second, tools like the debugger can help 
-us again. We can just write the code we want in C, for 
-example this would be the program to start a shell
+\noindent These characters represent the machine code for
+opening a shell. It seems obtaining such a string requires
+higher-education in the architecture of the target system. But
+it is actually relatively simple: First there are many such
+string ready-made---just a quick Google query away. Second,
+tools like the debugger can help us again. We can just write
+the code we want in C, for example this would be the program
+for starting a shell
 
 \lstinputlisting[language=C,numbers=none]{../progs/shell.c} 
 
 \noindent Once compiled, we can use the debugger to obtain 
-the machine code, or even the ready made encoding as character
+the machine code, or even the ready-made encoding as character
 sequence. 
 
 While easy, obtaining this string is not entirely trivial.
 Remember the functions in C that copy or fill buffers work
 such that they copy everything until the zero byte is reached.
 Unfortunately the ``vanilla'' output from the debugger for the
-shell-program will contain such zero bytes. So a
-post-processing phase is needed to rewrite the machine code
-such that it does not contain any zero bytes. This is like
-some works of literature that have been rewritten so that the
+shell-program above will contain such zero bytes. So a
+post-processing phase is needed to rewrite the machine code in
+a way that it does not contain any zero bytes. This is like
+some works of literature that have been written so that the
 letter 'i', for example, is avoided. For rewriting the machine
-code you might need to use clever tricks like
+code, you might need to use clever tricks like
 
 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
 xor %eax, %eax
 \end{lstlisting}
 
 \noindent This instruction does not contain any zero byte when
-encoded, but produces a zero byte on the stack. 
+encoded, but produces a zero byte on the stack when run. 
 
 Having removed the zero bytes we can craft the string that 
 will be send to the target computer. It is typically of the 
@@ -381,20 +395,20 @@
 the right address to jump to. As indicated in the picture we
 need to be very precise with the address with which we will
 overwrite the buffer. It has to be precisely the first byte of
-the shellcode. While this is easy withe the help of a
-debugger, we typically cannot run anything on the machine yet
-we target. And the address is very specific to the setup of
-the target machine. One way of finding out what the right
-address is to try out one by one until we get lucky. With
-large memories available today, however, the odds are long.
-And if we try out too many possible candidates to quickly, we
-might be detected by the system administrator of the target
-system.
+the shellcode. While this is easy with the help of a debugger
+(as seen before), we typically cannot run anything on the
+machine yet we target. And the address is very specific to the
+setup of the target machine. One way of finding out what the
+right address is is to try out one by one until we get lucky.
+With the large memories available today, however, the odds are
+long. And if we try out too many possible candidates too
+quickly, we might be detected by the system administrator of
+the target system.
 
-We can improve our odds considerably, by the following clever 
+We can improve our odds considerably by following a clever 
 trick. Instead of adding the shellcode at the beginning of the
 string, we should add it at the end, just before we overflow 
-the buffer, like
+the buffer, for example
 
 \begin{center}
   \begin{tikzpicture}[scale=0.7]
@@ -407,6 +421,18 @@
   \end{tikzpicture}
 \end{center}
 
+\noindent Then we can fill up the gray part of the string with
+a \pcode{NOP} operation. The code for this operation is
+\code{\\0x90}. It is available on every architecture and its
+purpose it to to nothing apart from waiting a small amount of
+time. If we now use an address that lets us jump to any
+address in the gray area we are done. The target machine will 
+execute these \pcode{NOP} operations until it reaches the
+shellcode. A moment of thought can convince you that this
+trick can hugely improve our odds of finding the right 
+address---depending on the size of the buffer, it might
+only take a few tries to get the shellcode to run.
+
 \bigskip\bigskip
 \subsubsection*{A Crash-Course for GDB}
changeset 229	ea921d6a1819
parent 228	4f7c7997b68b
child 230	603cbd28e988