handouts/ho03.tex
changeset 263 8a42736cce27
parent 259 f96d3e48ed3d
child 283 40511897fcc4
--- a/handouts/ho03.tex	Tue Oct 28 16:33:53 2014 +0000
+++ b/handouts/ho03.tex	Wed Oct 29 13:08:11 2014 +0000
@@ -32,7 +32,7 @@
     xmin=1996.5,
     xmax=2015,
     ymax=21,
-    ytick={0,2,...,20},
+    ytick={0,5,...,20},
     scaled ticks=false,
     axis lines=left,
     width=12cm,
@@ -48,10 +48,10 @@
 \end{tikzpicture}
 \end{center}
 
-\noindent
-This statistics seems to indicate that in the last five years the
-number of buffer overflow attacks is around 10\% of all attacks
-(whereby the absolute numbers of attacks seem to grow each year).
+\noindent This statistics indicates that in the last
+five years or so the number of buffer overflow attacks is
+around 10\% of all attacks (whereby the absolute numbers of
+attacks grow each year).
 
 
 To understand how buffer overflow attacks work, we have to have
@@ -173,18 +173,18 @@
 once \code{foo} has finished. The last stack pointer
 (\code{sp}) is needed in order to clean up the stack to the
 last level---in fact there is no cleaning involved, but just
-the top of the stack will be set back. So the last stack
-pointer also needs to be stored. The two buffers inside
-\pcode{foo} are on the stack too, because they are local data
-within \code{foo}. Consequently the stack in the middle is a
-snapshot after Line 3 has been executed. In case you are
-familiar with assembly instructions you can also read off this
-behaviour from the machine code that the \code{gcc} compiler
-generates for the program above:\footnote{You can make
-\pcode{gcc} generate assembly instructions if you call it with
-the \pcode{-S} option, for example \pcode{gcc -S out in.c}\;.
-Or you can look at this code by using the debugger. How to do
-this will be explained later.}
+the top of the stack will be set back to this address. So the
+last stack pointer also needs to be stored. The two buffers
+inside \pcode{foo} are on the stack too, because they are
+local data within \code{foo}. Consequently the stack in the
+middle is a snapshot after Line 3 has been executed. In case
+you are familiar with assembly instructions you can also read
+off this behaviour from the machine code that the \code{gcc}
+compiler generates for the program above:\footnote{You can
+make \pcode{gcc} generate assembly instructions if you call it
+with the \pcode{-S} option, for example \pcode{gcc -S out
+in.c}\;. Or you can look at this code by using the debugger.
+How to do this will be explained later.}
 
 \begin{center}\small
 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
@@ -201,16 +201,18 @@
 \pcode{main} prepares in Lines 2 to 7 the stack before calling
 the function \pcode{foo}. You can see that the numbers 3, 2, 1
 are stored on the stack (the register \code{$esp} refers to
-the top of the stack). On the right you can see how the
-function \pcode{foo} stores the two local buffers onto the
-stack and initialises them with the given data (Lines 2 to 9).
-Since there is no real computation going on inside
-\pcode{foo}, the function then just restores the stack to its
-old state and crucially sets the return address where the
-computation should resume (Line 9 in the code on the left-hand
-side). The instruction \code{ret} then transfers control back
-to the function \pcode{main} to the the instruction just after
-the call to \pcode{foo}, that is Line 9.
+the top of the stack; \pcode{$0x1}, \pcode{$0x2} \pcode{$0x3}
+are the encodings for \pcode{1} to \pcode{3}). On the right
+you can see how the function \pcode{foo} stores the two local
+buffers onto the stack and initialises them with the given
+data (Lines 2 to 9). Since there is no real computation going
+on inside \pcode{foo}, the function then just restores the
+stack to its old state and crucially sets the return address
+where the computation should resume (Line 9 in the code on the
+left-hand side). The instruction \code{ret} then transfers
+control back to the function \pcode{main} to the the
+instruction just after the call to \pcode{foo}, that is Line
+9.
  
 Another part of the ``conspiracy'' of buffer overflow attacks
 is that library functions in C look typically as follows:
@@ -297,7 +299,7 @@
 buffer, is stored on the stack before the older items, like
 return address and arguments. If it had be the other way
 around, then such an overwriting by overflowing a local buffer
-would just not work. If the designers of C had just been able
+would just not work. Had the designers of C had just been able
 to foresee what headaches their way of arranging the stack
 caused in the time where computers are accessible from
 everywhere. 
@@ -386,7 +388,7 @@
 
 \noindent These characters represent the machine code for
 opening a shell. It seems obtaining such a string requires
-higher-education in the architecture of the target system. But
+``higher-education'' in the architecture of the target system. But
 it is actually relatively simple: First there are many such
 string ready-made---just a quick Google query away. Second,
 tools like the debugger can help us again. We can just write
@@ -399,20 +401,21 @@
 the machine code, or even the ready-made encoding as character
 sequence. 
 
-While easy, obtaining this string is not entirely trivial.
-Remember the functions in C that copy or fill buffers work
-such that they copy everything until the zero byte is reached.
-Unfortunately the ``vanilla'' output from the debugger for the
-shell-program above will contain such zero bytes. So a
-post-processing phase is needed to rewrite the machine code in
-a way that it does not contain any zero bytes. This is like
-some works of literature that have been written so that the
-letter e, for example, is avoided. The technical term for such
-a literature work is \emph{lipogram}.\footnote{The most
-famous example of a lipogram is a 50,000 words novel titled
-Gadsby, see \url{https://archive.org/details/Gadsby}.} For
-rewriting the machine code, you might need to use clever
-tricks like
+While easy, obtaining this string is not entirely trivial
+using \pcode{gdb}. Remember the functions in C that copy or
+fill buffers work such that they copy everything until the
+zero byte is reached. Unfortunately the ``vanilla'' output
+from the debugger for the shell-program above will contain
+such zero bytes. So a post-processing phase is needed to
+rewrite the machine code in a way that it does not contain any
+zero bytes. This is like some works of literature that have
+been written so that the letter e, for example, is avoided.
+The technical term for such a literature work is
+\emph{lipogram}.\footnote{The most famous example of a
+lipogram is a 50,000 words novel titled Gadsby, see
+\url{https://archive.org/details/Gadsby}, which avoids the 
+letter `e' throughout.} For rewriting the
+machine code, you might need to use clever tricks like
 
 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
 xor %eax, %eax
@@ -485,7 +488,7 @@
 amount of time. If we now use an address that lets us jump to
 any address in the grey area we are done. The target machine
 will execute these \pcode{NOP} operations until it reaches the
-shellcode. A moment of thought can convince you that this
+shellcode. A moment of thought should convince you that this
 trick can hugely improve our odds of finding the right
 address---depending on the size of the buffer, it might only
 take a few tries to get the shellcode to run. And then we are
@@ -558,32 +561,32 @@
 
 \lstinputlisting[language=C]{../progs/C5.c}
 
-\noindent Here the programmer actually to take extra care to
-not fall pray to a buffer overflow attack, but in the process
-made the program susceptible to a format string attack.
-Clearly the \pcode{printf} function in Line 7 contains now
-an explicit format string, but because the commandline
-input is copied using the function \pcode{snprintf} the
-result will be the same---the string can be exploited 
-by embedding format strings into the user input. Here the
-programmer really cannot be blamed (much) because by using
-\pcode{snprintf} he or she tried to make sure only 10
-characters get copied into the local buffer---in this way
-avoiding the obvious buffer overflow attack.
+\noindent Here the programmer actually tried to take extra
+care to not fall pray to a buffer overflow attack, but in the
+process made the program susceptible to a format string
+attack. Clearly the \pcode{printf} function in Line 7 contains
+now an explicit format string, but because the commandline
+input is copied using the function \pcode{snprintf} the result
+will be the same---the string can be exploited by embedding
+format strings into the user input. Here the programmer really
+cannot be blamed (much) because by using \pcode{snprintf} he
+or she tried to make sure only 10 characters get copied into
+the local buffer---in this way avoiding the obvious buffer
+overflow attack.
 
 \subsubsection*{Caveats and Defences}
 
-How can we defend against these attacks? Well, a reflex could 
-be to blame programmers. Precautions should be taken so that 
-buffers cannot been overfilled and format strings should not
-be forgotten. This might actually be slightly simpler nowadays 
-since safe versions of the library functions exists, which
-always specify the precise number of bytes that should be 
-copied. Compilers also nowadays provide warnings when format
-strings are omitted. So proper education of programmers is 
-definitely a part of a defence against such attacks. However,
-if we leave it at that, then we have the mess we have today
-with new attacks discovered almost daily. 
+How can we defend against these attacks? Well, a reflex could
+be to blame programmers. Precautions should be taken by them
+so that buffers cannot been overfilled and format strings
+should not be forgotten. This might actually be slightly
+simpler nowadays since safe versions of the library functions
+exist, which always specify the precise number of bytes that
+should be copied. Compilers also nowadays provide warnings
+when format strings are omitted. So proper education of
+programmers is definitely a part of a defence against such
+attacks. However, if we leave it at that, then we have the
+mess we have today with new attacks discovered almost daily. 
 
 There is actually a quite long record of publications
 proposing defences against buffer overflow attacks. One method
@@ -711,9 +714,11 @@
 attacks.
 \bigskip
 
-\noindent If you want to know more about buffer overflow
-attacks, the original Phrack article ``Smashing The Stack For
-Fun And Profit'' by Elias Levy (also known as Aleph One) is an
+\subsubsection*{Further Reading}
+
+If you want to know more about buffer overflow attacks, the
+original Phrack article ``Smashing The Stack For Fun And
+Profit'' by Elias Levy (also known as Aleph One) is an
 engaging read:
 
 \begin{center}