updated
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Wed, 07 Oct 2015 00:44:12 +0100
changeset 397 93affa1ebd6f
parent 396 2f4296a0ab21
child 398 b183036ba675
updated
handouts/ho03.pdf
handouts/ho03.tex
Binary file handouts/ho03.pdf has changed
--- a/handouts/ho03.tex	Tue Oct 06 09:42:58 2015 +0100
+++ b/handouts/ho03.tex	Wed Oct 07 00:44:12 2015 +0100
@@ -123,8 +123,9 @@
 them. The only purpose of this program is to illustrate what
 happens behind the scenes with the stack. The interesting
 question is what will the stack look like after Line 3 has
-been executed? The answer can be illustrated as follows:
+been executed? The answer is illustrated in Figure~\ref{stack}.
  
+\begin{figure} 
 \begin{center} 
  \begin{tikzpicture}[scale=0.65]
   \draw[gray!20,fill=gray!20] (-5, 0) rectangle (-3,-1);
@@ -164,29 +165,38 @@
   \draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5);
   \draw (2.6,3.1) node[anchor=south west] {\tt back to main()};
 \end{tikzpicture}
-\end{center} 
+\end{center}
+\caption{The stack layout for a program where the main
+function calls an auxiliary function with three arguments
+(1,2 and 3). The auxiliary function has two local
+buffer variables {\tt buf}$_1$ and {\tt buf}$_2$.\label{stack}} 
+\end{figure}
 
-\noindent On the left is the stack before \code{foo} is
-called; on the right is the stack after \code{foo} finishes.
-The function call to \code{foo} in Line 7 pushes the arguments
-onto the stack in reverse order---shown in the middle.
-Therefore first 3 then 2 and finally 1. Then it pushes the
-return address onto the stack where execution should resume
-once \code{foo} has finished. The last stack pointer
+On the left is the stack before \code{foo} is called; on the
+right is the stack after \code{foo} finishes. The function
+call to \code{foo} in Line 7 (in the C program above) pushes
+the arguments onto the stack in reverse order---shown in the
+middle. Therefore first 3 then 2 and finally 1. Then it pushes
+the return address onto the stack where execution should
+resume once \code{foo} has finished. The last stack pointer
 (\code{sp}) is needed in order to clean up the stack to the
 last level---in fact there is no cleaning involved, but just
 the top of the stack will be set back to this address. So the
 last stack pointer also needs to be stored. The two buffers
 inside \pcode{foo} are on the stack too, because they are
 local data within \code{foo}. Consequently the stack in the
-middle is a snapshot after Line 3 has been executed. In case
-you are familiar with assembly instructions you can also read
-off this behaviour from the machine code that the \code{gcc}
-compiler generates for the program above:\footnote{You can
-make \pcode{gcc} generate assembly instructions if you call it
-with the \pcode{-S} option, for example \pcode{gcc -S out
-in.c}\;. Or you can look at this code by using the debugger.
-How to do this will be explained in the last section.}
+middle of Figure~\ref{stack} is a snapshot after Line 3 has
+been executed. 
+
+In case you are familiar with assembly instructions you can
+also read off this behaviour from the machine code that the
+\code{gcc} compiler generates for the program
+above:\footnote{You can make \pcode{gcc} generate assembly
+instructions if you call it with the \pcode{-S} option, for
+example \pcode{gcc -S out in.c}\;. Or you can look at this
+code by using the debugger. How to do this will be explained
+in the last section.} It generates the following code for the
+\pcode{main} and \pcode{foo} functions.
 
 \begin{center}\small
 \begin{tabular}[t]{p{11cm}}
@@ -195,6 +205,15 @@
   {../progs/example1a.s}}
 \end{tabular}
 \end{center}
+
+\noindent Again you can see how the function \pcode{main}
+prepares in Lines 2 to 7 the stack before calling the function
+\pcode{foo}. You can see that the numbers 3, 2, 1 are stored
+on the stack (the register \code{\%esp} refers to the top of
+the stack; \pcode{$0x1}, \pcode{$0x2} \pcode{$0x3} are the
+hexadecimal encodings for \pcode{1} to \pcode{3}). The code
+for the foo function is as follows:
+
 \begin{center}\small
 \begin{tabular}[t]{p{11cm}}
 {\lstinputlisting[language={[x86masm]Assembler},
@@ -203,22 +222,18 @@
 \end{tabular}
 \end{center}
 
-\noindent On the left you can see how the function
-\pcode{main} prepares in Lines 2 to 7 the stack before calling
-the function \pcode{foo}. You can see that the numbers 3, 2, 1
-are stored on the stack (the register \code{$esp} refers to
-the top of the stack; \pcode{$0x1}, \pcode{$0x2} \pcode{$0x3}
-are the encodings for \pcode{1} to \pcode{3}). On the right
-you can see how the function \pcode{foo} stores the two local
+\noindent You can see how the function \pcode{foo} stores
+first the last stack pointer onto the stack and then
+calculates the new stack pointer to have enough space for the
+two local buffers (Lines 2 - 4). Then it puts the two local
 buffers onto the stack and initialises them with the given
-data (Lines 2 to 9). Since there is no real computation going
+data (Lines 5 to 9). Since there is no real computation going
 on inside \pcode{foo}, the function then just restores the
-stack to its old state and crucially sets the return address
-where the computation should resume (Line 9 in the code on the
-right-hand side). The instruction \code{ret} then transfers
-control back to the function \pcode{main} to the
-instruction just after the call to \pcode{foo}, that is Line
-9.
+stack to its old state (Line 10) and crucially sets the return
+address where the computation should resume (Line 10). The
+instruction \code{ret} then transfers control back to the
+function \pcode{main} to the instruction just after the call
+to \pcode{foo}, that is Line 10.
  
 Another part of the ``conspiracy'' of buffer overflow attacks
 is that library functions in C look typically as follows:
@@ -251,16 +266,9 @@
 \begin{center}
  \begin{tikzpicture}[scale=0.65]
   %\draw[step=1cm] (-3,-1) grid (3,8);
-  \draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1);
-  \draw[line width=1mm] (-1,-1.2) -- (-1,6.4);
-  \draw[line width=1mm] ( 1,-1.2) -- ( 1,6.4);
-  \draw (0,-1) node[anchor=south] {\tt main};
-  \draw[line width=1mm] (-1,0) -- (1,0);
-  \draw (0,0) node[anchor=south] {\tt arg$_3$=3};
-  \draw[line width=1mm] (-1,1) -- (1,1);
-  \draw (0,1) node[anchor=south] {\tt arg$_2$=2};
-  \draw[line width=1mm] (-1,2) -- (1,2);
-  \draw (0,2) node[anchor=south] {\tt arg$_1$=1};
+  \draw[line width=1mm] (-1,1.2) -- (-1,6.4);
+  \draw[line width=1mm] ( 1,1.2) -- ( 1,6.4);
+  \draw (0,2) node[anchor=south] {\ldots};
   \draw[line width=1mm] (-1,3) -- (1,3);
   \draw (0,3.1) node[anchor=south] {\tt ret};
   \draw[line width=1mm] (-1,4) -- (1,4);
@@ -301,14 +309,14 @@
 \pcode{strcpy} for filling the buffer \pcode{buf}, then we can
 be sure it will overwrite the stack in this manner---since it
 will copy everything up to the zero-byte. Notice that this
-overwriting of the buffer only works since the newer item, the
-buffer, is stored on the stack before the older items, like
-return address and arguments. If it had be the other way
-around, then such an overwriting by overflowing a local buffer
-would just not work. Had the designers of C had just been able
-to foresee what headaches their way of arranging the stack
-caused in the time where computers are accessible from
-everywhere?
+overwriting of the buffer only works since the newer
+item---the buffer---is stored on the stack before the older
+items, like return address and arguments. If it had be the
+other way around, then such an overwriting by overflowing a
+local buffer would just not work. Had the designers of C 
+been able to foresee what headaches their way of
+arranging the stack will cause, how different could be
+the IT-World today?
 
 What the outcome of such an attack is can be illustrated with
 the code shown in Figure~\ref{C2}. Under ``normal operation''
@@ -320,9 +328,11 @@
 printing \pcode{Wrong identity}). The vulnerable function is
 \code{get_line} in Lines 11 to 19. This function does not take
 any precautions about the buffer of 8 characters being filled
-beyond its 8-character-limit. Let us suppose the login name
-is \pcode{test}. Then the buffer overflow can be triggered
-with a specially crafted string as password:
+beyond its 8-character-limit. Let us suppose the login name is
+\pcode{test}. Then the buffer overflow can be triggered with a
+specially crafted string as password (remember
+\pcode{get\_line} requires a \pcode{\\n} at the end of the
+input):
 
 \begin{center}
 \code{AAAAAAAABBBB\\x2c\\x85\\x04\\x08\\n}
@@ -383,23 +393,28 @@
 once the return address is appropriately modified. Typically
 the code that will be injected starts a shell. This gives the
 attacker the ability to run programs on the target machine and
-to have a good look around, provided the attacked process was not
-already running as root.\footnote{In that case the attacker
-would already congratulate him or herself to another
-computer under full control.} In order to be send as part of
-the string that is overflowing the buffer, we need the code to
-be represented as a sequence of characters. For example
+to have a good look around in order to obtain also full root
+access (normally the program that is attacked would run with
+lesser rights and any shell injected would also only run with
+these lesser access rights). If the attacked program was 
+already running as root, then the attacker can congratulate
+him or herself to another computer under full control\ldots
+no more work to be done.
+
+In order to be send as part of the string that is overflowing
+the buffer, we need the code for starting the shell to be
+represented as a sequence of characters. For example
 
 \lstinputlisting[language=C,numbers=none]{../progs/o1.c}
 
 \noindent These characters represent the machine code for
 opening a shell. It seems obtaining such a string requires
-``higher-education'' in the architecture of the target system. But
-it is actually relatively simple: First there are many such
-string ready-made---just a quick Google query away. Second,
-tools like the debugger can help us again. We can just write
-the code we want in C, for example this would be the program
-for starting a shell:
+``higher-education'' in the architecture of the target system.
+But it is actually relatively simple: First there are many
+such strings ready-made---just a quick Google query away.
+Second, tools like the debugger can help us again. We can just
+write the code we want in C, for example this would be the
+program for starting a shell:
 
 \lstinputlisting[language=C,numbers=none]{../progs/shell.c} 
 
@@ -407,10 +422,10 @@
 the machine code, or even the ready-made encoding as character
 sequence. 
 
-While easy, obtaining this string is not entirely trivial
-using \pcode{gdb}. Remember the functions in C that copy or
-fill buffers work such that they copy everything until the
-zero byte is reached. Unfortunately the ``vanilla'' output
+While not too difficult, obtaining this string is not entirely
+trivial using \pcode{gdb}. Remember the functions in C that
+copy or fill buffers work such that they copy everything until
+the zero byte is reached. Unfortunately the ``vanilla'' output
 from the debugger for the shell-program above will contain
 such zero bytes. So a post-processing phase is needed to
 rewrite the machine code in a way that it does not contain any
@@ -419,9 +434,9 @@
 The technical term for such a literature work is
 \emph{lipogram}.\footnote{The most famous example of a
 lipogram is a 50,000 words novel titled Gadsby, see
-\url{https://archive.org/details/Gadsby}, which avoids the 
-letter `e' throughout.} For rewriting the
-machine code, you might need to use clever tricks like
+\url{https://archive.org/details/Gadsby}, which avoids the
+letter `e' throughout.} For rewriting the machine code, you
+might need to use clever tricks like
 
 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
 xor %eax, %eax
@@ -453,22 +468,22 @@
 \end{center}
 
 \noindent where we need to be very precise with the address
-with which we will overwrite the buffer. It has to be
-precisely the first byte of the shellcode. While this is easy
-with the help of a debugger (as seen before), we typically
-cannot run anything, including a debugger, on the machine yet
-we target. And the address is very specific to the setup of
-the target machine. One way of finding out what the right
-address is is to try out one by one every possible
-address until we get lucky. With the large memories available
-today, however, the odds are long. And if we try out too many
-possible candidates too quickly, we might be detected by the
-system administrator of the target system.
+with which we will overwrite the buffer (indicated as a black
+rectangle). It has to be precisely the first byte of the
+shellcode. While this is easy with the help of a debugger (as
+seen before), we typically cannot run anything, including a
+debugger, on the machine yet we target. And the address is
+very specific to the setup of the target machine. One way of
+finding out what the right address is is to try out one by one
+every possible address until we get lucky. With the large
+memories available today, however, the odds are long. And if
+we try out too many possible candidates too quickly, we might
+be detected by the system administrator of the target system.
 
-We can improve our odds considerably by following a clever 
-trick. Instead of adding the shellcode at the beginning of the
-string, we should add it at the end, just before we overflow 
-the buffer, for example
+We can improve our odds considerably by making use of a very
+clever trick. Instead of adding the shellcode at the beginning
+of the string, we should add it at the end, just before we
+overflow the buffer, for example
 
 \begin{center}
   \begin{tikzpicture}[scale=0.6]
@@ -489,20 +504,20 @@
 
 \noindent Then we can fill up the grey part of the string with
 \pcode{NOP} operations. The code for this operation is
-\code{\\0x90}. It is available on every architecture and its
-purpose in a CPU is to do nothing apart from waiting a small
-amount of time. If we now use an address that lets us jump to
-any address in the grey area we are done. The target machine
-will execute these \pcode{NOP} operations until it reaches the
-shellcode. That is why this NOP-part is often called
-\emph{NOP-sledge}. A moment of thought should convince you
-that this trick can hugely improve our odds of finding the
-right address---depending on the size of the buffer, it might
-only take a few tries to get the shellcode to run. And then we
-are in. The code for such an attack is shown in
-Figure~\ref{C3}. It is directly taken from the original paper
-about ``Smashing the Stack for Fun and Profit'' (see pointer
-given at the end).
+\code{\\0x90} on Intel CPUs. It is available on every
+architecture and its purpose in a CPU is to do nothing apart
+from waiting a small amount of time. If we now use an address
+that lets us jump to any address in the grey area we are done.
+The target machine will execute these \pcode{NOP} operations
+until it reaches the shellcode. That is why this NOP-part is
+often called \emph{NOP-sledge}. A moment of thought should
+convince you that this trick can hugely improve our odds of
+finding the right address---depending on the size of the
+buffer, it might only take a few tries to get the shellcode to
+run. And then we are in. The code for such an attack is shown
+in Figure~\ref{C3}. It is directly taken from the original
+paper about ``Smashing the Stack for Fun and Profit'' (see
+pointer given at the end).
 
 \begin{figure}[p]
 \lstinputlisting[language=C]{../progs/C3.c}
@@ -510,13 +525,13 @@
 payload.\label{C3}}
 \end{figure}
 
-By the way you might have the question how do attackers find
-out about vulnerable systems? Well, the automated version uses
-\emph{fuzzers}, which throw randomly generated user input at
-applications and observe the behaviour. If an application
-seg-faults (throws a segmentation error) then this is a good
-indication that a buffer overflow vulnerability can be
-exploited.
+By the way you might naw have the question how do attackers
+find out about vulnerable systems in the first place? Well,
+the automated version uses \emph{fuzzers}, which throw
+randomly generated user input at applications and observe the
+behaviour. If an application segfaults (throws a segmentation
+error) then this is a good indication that a buffer overflow
+vulnerability can be exploited.
 
 
 \subsubsection*{Format String Attacks}
@@ -532,12 +547,12 @@
 
 \lstinputlisting[language=C]{../progs/C4.c}
 
-\noindent The intention is to print out the first argument
-given on the command line. The ``secret string'' is never to
-be printed. The problem is that the C function \pcode{printf}
-normally expects a format string---a schema that directs how a
-string should be printed. This would be for example a proper
-invocation of this function:
+\noindent The intention of this program is to print out the
+first argument given on the command line. The ``secret
+string'' is never to be printed. The problem is that the C
+function \pcode{printf} normally expects a format string---a
+schema that directs how a string should be printed. This would
+be for example a proper invocation of this function:
 
 \begin{lstlisting}[numbers=none,language=C]
 long n = 123456789;
@@ -588,13 +603,14 @@
 be to blame programmers. Precautions should be taken by them
 so that buffers cannot been overfilled and format strings
 should not be forgotten. This might actually be slightly
-simpler nowadays since safe versions of the library functions
-exist, which always specify the precise number of bytes that
-should be copied. Compilers also nowadays provide warnings
-when format strings are omitted. So proper education of
-programmers is definitely a part of a defence against such
-attacks. However, if we leave it at that, then we have the
-mess we have today with new attacks discovered almost daily. 
+simpler to achieve by programmers nowadays since safe versions
+of the library functions exist, which always specify the
+precise number of bytes that should be copied. Compilers also
+nowadays provide warnings when format strings are omitted. So
+proper education of programmers is definitely a part of a
+defence against such attacks. However, if we leave it at that,
+then we have the mess we have today with new attacks
+discovered almost daily. 
 
 There is actually a quite long record of publications
 proposing defences against buffer overflow attacks. One method
@@ -614,17 +630,16 @@
 target computer. The lib-C library, for example, already
 contains the code for spawning a shell. With
 \emph{return-to-lib-C} one just has to find out where this
-code is located. But attackers can make good guesses. In my
-examples I took a shortcut and always made the stack
-executable. 
+code is located. But attackers can make good guesses. 
 
-Another defence is called \emph{stack canaries}. The advantage 
+Another defence is called \emph{stack canaries}. The advantage
 is that they can be automatically inserted into compiled code
 and do not need any hardware support. Though they will make
 your program run slightly slower. The idea behind \emph{stack
-canaries} is to push a random number onto the stack just 
-before local data is stored. For our very first function the
-stack would with a \emph{stack canary} look as follows
+canaries} is to push a random number onto the stack just
+before local data is stored. For our very first function
+\pcode{foo} the stack would with a \emph{stack canary} look as
+follows
 
 \begin{center}
 \begin{tikzpicture}[scale=0.65]
@@ -654,7 +669,7 @@
 \noindent The idea behind this random number is that when the
 function finishes, it is checked that this random number is
 still intact on the stack. If not, then a buffer overflow has
-occurred. Although this is quite effective, but requires 
+occurred. Although this is quite effective, it requires 
 suitable support for generating random numbers. This is always
 hard to get right and attackers are happy to exploit the 
 resulting weaknesses.
@@ -668,14 +683,18 @@
 
 As mentioned before, modern operating systems have these
 defences enabled by default and make buffer overflow attacks
-harder, but not impossible. Indeed, I as an amateur attacker
-had to explicitly switch off these defences. I run my example
-under an Ubuntu version ``Maverick Meerkat'' from October 
-2010 and the gcc 4.4.5. I have not tried whether newer versions
-would work as well. I tested all examples inside a virtual 
-box\footnote{\url{https://www.virtualbox.org}} insulating my main 
-system from any harm. When compiling the programs I called 
-the compiler with the following options:
+harder, but not impossible. Indeed, I---as an amateur
+attacker---had to explicitly switch off these defences. 
+A real attacker would be more knowledgeable and not need this
+shortcut.
+
+To work I run my example under an Ubuntu version ``Maverick
+Meerkat'' from October 2010 and the gcc 4.4.5. I have not
+tried whether newer versions would work as well. I tested all
+examples inside a virtual
+box\footnote{\url{https://www.virtualbox.org}} insulating my
+main system from any harm. When compiling the programs I
+called the compiler with the following options:
 
 \begin{center}
 \begin{tabular}{l@{\hspace{1mm}}l}
@@ -690,20 +709,22 @@
 compiler to include debugging information and also produce
 non-optimised code (the latter makes the output of the code a
 bit more predictable). The third is important as it switches
-off defences like the stack canaries. The fourth again makes it
-a bit easier to read the code. The final option makes the
-stack executable, thus the example in Figure~\ref{C3}
-works as intended. While this might be considered
-cheating....since I explicitly switched off all defences, I
-hope I was able convey the point that this is actually not too far from
-realistic scenarios. I have shown you the classic version of
-the buffer overflow attacks. Updated variants do exist. Also
-one might argue buffer-overflow attacks have been solved on
-computers (desktops or servers) but the computing landscape of today 
-is much wider than that. The main problem today are
-embedded systems against which attacker can equally cause a
-lot of harm and which are much less defended. Anthony Bonkoski
-makes a similar argument in his security blog:
+off defences like the stack canaries. The fourth again makes
+it a bit easier to read the code. The final option makes the
+stack executable, thus the example in Figure~\ref{C3} works as
+intended. While this might be considered cheating....since I
+explicitly switched off all defences, I hope I was able convey
+the point that this is actually not too far from realistic
+scenarios. I have shown you the classic version of the buffer
+overflow attacks. Updated and more advanced variants do exist.
+
+With the standard defences switched on, you might want to
+argue buffer-overflow attacks have been solved on computers
+(desktops and servers) but the computing landscape of today is
+much wider than that. The main problem today are embedded
+systems against which attacker can equally cause a lot of harm
+and which are much less defended. Anthony Bonkoski makes a
+similar argument in his security blog:
 
 \begin{center}
 \url{http://jabsoft.io/2013/09/25/are-buffer-overflows-solved-yet-a-historical-tale/}