diff -r 57269d9931da -r 8a42736cce27 handouts/ho03.tex --- a/handouts/ho03.tex Tue Oct 28 16:33:53 2014 +0000 +++ b/handouts/ho03.tex Wed Oct 29 13:08:11 2014 +0000 @@ -32,7 +32,7 @@ xmin=1996.5, xmax=2015, ymax=21, - ytick={0,2,...,20}, + ytick={0,5,...,20}, scaled ticks=false, axis lines=left, width=12cm, @@ -48,10 +48,10 @@ \end{tikzpicture} \end{center} -\noindent -This statistics seems to indicate that in the last five years the -number of buffer overflow attacks is around 10\% of all attacks -(whereby the absolute numbers of attacks seem to grow each year). +\noindent This statistics indicates that in the last +five years or so the number of buffer overflow attacks is +around 10\% of all attacks (whereby the absolute numbers of +attacks grow each year). To understand how buffer overflow attacks work, we have to have @@ -173,18 +173,18 @@ once \code{foo} has finished. The last stack pointer (\code{sp}) is needed in order to clean up the stack to the last level---in fact there is no cleaning involved, but just -the top of the stack will be set back. So the last stack -pointer also needs to be stored. The two buffers inside -\pcode{foo} are on the stack too, because they are local data -within \code{foo}. Consequently the stack in the middle is a -snapshot after Line 3 has been executed. In case you are -familiar with assembly instructions you can also read off this -behaviour from the machine code that the \code{gcc} compiler -generates for the program above:\footnote{You can make -\pcode{gcc} generate assembly instructions if you call it with -the \pcode{-S} option, for example \pcode{gcc -S out in.c}\;. -Or you can look at this code by using the debugger. How to do -this will be explained later.} +the top of the stack will be set back to this address. So the +last stack pointer also needs to be stored. The two buffers +inside \pcode{foo} are on the stack too, because they are +local data within \code{foo}. Consequently the stack in the +middle is a snapshot after Line 3 has been executed. In case +you are familiar with assembly instructions you can also read +off this behaviour from the machine code that the \code{gcc} +compiler generates for the program above:\footnote{You can +make \pcode{gcc} generate assembly instructions if you call it +with the \pcode{-S} option, for example \pcode{gcc -S out +in.c}\;. Or you can look at this code by using the debugger. +How to do this will be explained later.} \begin{center}\small \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}} @@ -201,16 +201,18 @@ \pcode{main} prepares in Lines 2 to 7 the stack before calling the function \pcode{foo}. You can see that the numbers 3, 2, 1 are stored on the stack (the register \code{$esp} refers to -the top of the stack). On the right you can see how the -function \pcode{foo} stores the two local buffers onto the -stack and initialises them with the given data (Lines 2 to 9). -Since there is no real computation going on inside -\pcode{foo}, the function then just restores the stack to its -old state and crucially sets the return address where the -computation should resume (Line 9 in the code on the left-hand -side). The instruction \code{ret} then transfers control back -to the function \pcode{main} to the the instruction just after -the call to \pcode{foo}, that is Line 9. +the top of the stack; \pcode{$0x1}, \pcode{$0x2} \pcode{$0x3} +are the encodings for \pcode{1} to \pcode{3}). On the right +you can see how the function \pcode{foo} stores the two local +buffers onto the stack and initialises them with the given +data (Lines 2 to 9). Since there is no real computation going +on inside \pcode{foo}, the function then just restores the +stack to its old state and crucially sets the return address +where the computation should resume (Line 9 in the code on the +left-hand side). The instruction \code{ret} then transfers +control back to the function \pcode{main} to the the +instruction just after the call to \pcode{foo}, that is Line +9. Another part of the ``conspiracy'' of buffer overflow attacks is that library functions in C look typically as follows: @@ -297,7 +299,7 @@ buffer, is stored on the stack before the older items, like return address and arguments. If it had be the other way around, then such an overwriting by overflowing a local buffer -would just not work. If the designers of C had just been able +would just not work. Had the designers of C had just been able to foresee what headaches their way of arranging the stack caused in the time where computers are accessible from everywhere. @@ -386,7 +388,7 @@ \noindent These characters represent the machine code for opening a shell. It seems obtaining such a string requires -higher-education in the architecture of the target system. But +``higher-education'' in the architecture of the target system. But it is actually relatively simple: First there are many such string ready-made---just a quick Google query away. Second, tools like the debugger can help us again. We can just write @@ -399,20 +401,21 @@ the machine code, or even the ready-made encoding as character sequence. -While easy, obtaining this string is not entirely trivial. -Remember the functions in C that copy or fill buffers work -such that they copy everything until the zero byte is reached. -Unfortunately the ``vanilla'' output from the debugger for the -shell-program above will contain such zero bytes. So a -post-processing phase is needed to rewrite the machine code in -a way that it does not contain any zero bytes. This is like -some works of literature that have been written so that the -letter e, for example, is avoided. The technical term for such -a literature work is \emph{lipogram}.\footnote{The most -famous example of a lipogram is a 50,000 words novel titled -Gadsby, see \url{https://archive.org/details/Gadsby}.} For -rewriting the machine code, you might need to use clever -tricks like +While easy, obtaining this string is not entirely trivial +using \pcode{gdb}. Remember the functions in C that copy or +fill buffers work such that they copy everything until the +zero byte is reached. Unfortunately the ``vanilla'' output +from the debugger for the shell-program above will contain +such zero bytes. So a post-processing phase is needed to +rewrite the machine code in a way that it does not contain any +zero bytes. This is like some works of literature that have +been written so that the letter e, for example, is avoided. +The technical term for such a literature work is +\emph{lipogram}.\footnote{The most famous example of a +lipogram is a 50,000 words novel titled Gadsby, see +\url{https://archive.org/details/Gadsby}, which avoids the +letter `e' throughout.} For rewriting the +machine code, you might need to use clever tricks like \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}] xor %eax, %eax @@ -485,7 +488,7 @@ amount of time. If we now use an address that lets us jump to any address in the grey area we are done. The target machine will execute these \pcode{NOP} operations until it reaches the -shellcode. A moment of thought can convince you that this +shellcode. A moment of thought should convince you that this trick can hugely improve our odds of finding the right address---depending on the size of the buffer, it might only take a few tries to get the shellcode to run. And then we are @@ -558,32 +561,32 @@ \lstinputlisting[language=C]{../progs/C5.c} -\noindent Here the programmer actually to take extra care to -not fall pray to a buffer overflow attack, but in the process -made the program susceptible to a format string attack. -Clearly the \pcode{printf} function in Line 7 contains now -an explicit format string, but because the commandline -input is copied using the function \pcode{snprintf} the -result will be the same---the string can be exploited -by embedding format strings into the user input. Here the -programmer really cannot be blamed (much) because by using -\pcode{snprintf} he or she tried to make sure only 10 -characters get copied into the local buffer---in this way -avoiding the obvious buffer overflow attack. +\noindent Here the programmer actually tried to take extra +care to not fall pray to a buffer overflow attack, but in the +process made the program susceptible to a format string +attack. Clearly the \pcode{printf} function in Line 7 contains +now an explicit format string, but because the commandline +input is copied using the function \pcode{snprintf} the result +will be the same---the string can be exploited by embedding +format strings into the user input. Here the programmer really +cannot be blamed (much) because by using \pcode{snprintf} he +or she tried to make sure only 10 characters get copied into +the local buffer---in this way avoiding the obvious buffer +overflow attack. \subsubsection*{Caveats and Defences} -How can we defend against these attacks? Well, a reflex could -be to blame programmers. Precautions should be taken so that -buffers cannot been overfilled and format strings should not -be forgotten. This might actually be slightly simpler nowadays -since safe versions of the library functions exists, which -always specify the precise number of bytes that should be -copied. Compilers also nowadays provide warnings when format -strings are omitted. So proper education of programmers is -definitely a part of a defence against such attacks. However, -if we leave it at that, then we have the mess we have today -with new attacks discovered almost daily. +How can we defend against these attacks? Well, a reflex could +be to blame programmers. Precautions should be taken by them +so that buffers cannot been overfilled and format strings +should not be forgotten. This might actually be slightly +simpler nowadays since safe versions of the library functions +exist, which always specify the precise number of bytes that +should be copied. Compilers also nowadays provide warnings +when format strings are omitted. So proper education of +programmers is definitely a part of a defence against such +attacks. However, if we leave it at that, then we have the +mess we have today with new attacks discovered almost daily. There is actually a quite long record of publications proposing defences against buffer overflow attacks. One method @@ -711,9 +714,11 @@ attacks. \bigskip -\noindent If you want to know more about buffer overflow -attacks, the original Phrack article ``Smashing The Stack For -Fun And Profit'' by Elias Levy (also known as Aleph One) is an +\subsubsection*{Further Reading} + +If you want to know more about buffer overflow attacks, the +original Phrack article ``Smashing The Stack For Fun And +Profit'' by Elias Levy (also known as Aleph One) is an engaging read: \begin{center}