handouts/ho03.tex
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Thu, 09 Oct 2014 23:12:10 +0100
changeset 229 ea921d6a1819
parent 228 4f7c7997b68b
child 230 603cbd28e988
permissions -rw-r--r--
updated

\documentclass{article}
\usepackage{../style}
\usepackage{../langs}

\begin{document}

\section*{Handout 3 (Buffer Overflow Attacks)}

By far the most popular attack method on computers are buffer
overflow attacks or variations thereof. The popularity is
unfortunate because we nowadays have technology in place to
prevent them effectively. But these kind of attacks are still
very relevant even today since there are many legacy systems
out there and also many modern embedded systems often do not
take any precautions to prevent such attacks.

To understand how buffer overflow attacks work, we have to have
a look at how computers work ``under the hood'' (on the
machine level) and also understand some aspects of the C/C++
programming language. This might not be everyday fare for
computer science students, but who said that criminal hackers
restrict themselves to everyday fare? Not to mention the
free-riding script-kiddies who use this technology without
even knowing what the underlying ideas are. If you want to be
a good security engineer who needs to defend such attacks, 
then better you get to know the details.
 
For buffer overflow attacks to work, a number of innocent
design decisions, which are really benign on their own, need
to conspire against you. All these decisions were taken at a
time when there was no Internet: C was introduced around 1973;
the Internet TCP/IP protocol was standardised in 1982 by which
time there were maybe 500 servers connected (and all users
were well-behaved, mostly academics); Intel's first 8086 CPUs
arrived around 1977. So nobody of the ``forefathers'' can
really be blamed, but as mentioned above we should already be
way beyond the point that buffer overflow attacks are worth a
thought. Unfortunately, this is far from the truth. I let you
ponder why?

One such ``benign'' design decision is how the memory is laid
out into different regions for each process. 
 
\begin{center}
  \begin{tikzpicture}[scale=0.7]
  %\draw[step=1cm] (-3,-3) grid (3,3);
  \draw[line width=1mm] (-2, -3) rectangle (2,3);
  \draw[line width=1mm] (-2,1) -- (2,1);
  \draw[line width=1mm] (-2,-1) -- (2,-1);
  \draw (0,2) node {\large\tt text};
  \draw (0,0) node {\large\tt heap};
  \draw (0,-2) node {\large\tt stack};

  \draw (-2.7,3) node[anchor=north east] {\tt\begin{tabular}{@{}l@{}}lower\\ address\end{tabular}};
  \draw (-2.7,-3) node[anchor=south east] {\tt\begin{tabular}{@{}l@{}}higher\\ address\end{tabular}};
  \draw[->, line width=1mm] (-2.5,3) -- (-2.5,-3);

  \draw (2.7,-2) node[anchor=west] {\tt grows};
  \draw (2.7,-3) node[anchor=south west] {\tt\footnotesize older};
  \draw (2.7,-1) node[anchor=north west] {\tt\footnotesize newer};
  \draw[|->, line width=1mm] (2.5,-3) -- (2.5,-1);
  \end{tikzpicture}
\end{center}

\noindent The text region contains the program code (usually
this region is read-only). The heap stores all data the
programmer explicitly allocates. For us the most interesting
region is the stack, which contains data mostly associated
with the control flow of the program. Notice that the stack
grows from a higher addresses to lower addresses. That means
that older items on the stack will be stored behind, or after,
newer items. Let's look a bit closer what happens with the
stack when a program is running. Consider the following simple
C program.
 
\lstinputlisting[language=C]{../progs/example1.c} 
 
\noindent The \code{main} function calls in Line 7 the
function \code{foo} with three arguments. \code{Foo} creates
two (local) buffers, but does not do anything interesting with
them. The only purpose of this program is to illustrate what
happens behind the scenes with the stack. The interesting
question is what will the stack be after Line 3 has been
executed? The answer can be illustrated as follows:
 
\begin{center} 
 \begin{tikzpicture}[scale=0.65]
  \draw[gray!20,fill=gray!20] (-5, 0) rectangle (-3,-1);
  \draw[line width=1mm] (-5,-1.2) -- (-5,0.2);
  \draw[line width=1mm] (-3,-1.2) -- (-3,0.2);
  \draw (-4,-1) node[anchor=south] {\tt main};
  \draw[line width=1mm] (-5,0) -- (-3,0);

  \draw[gray!20,fill=gray!20] (3, 0) rectangle (5,-1);
  \draw[line width=1mm] (3,-1.2) -- (3,0.2);
  \draw[line width=1mm] (5,-1.2) -- (5,0.2);
  \draw (4,-1) node[anchor=south] {\tt main};
  \draw[line width=1mm] (3,0) -- (5,0);
 
   %\draw[step=1cm] (-3,-1) grid (3,8);
  \draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1);
  \draw[line width=1mm] (-1,-1.2) -- (-1,7.4);
  \draw[line width=1mm] ( 1,-1.2) -- ( 1,7.4);
  \draw (0,-1) node[anchor=south] {\tt main};
  \draw[line width=1mm] (-1,0) -- (1,0);
  \draw (0,0) node[anchor=south] {\tt arg$_3$=3};
  \draw[line width=1mm] (-1,1) -- (1,1);
  \draw (0,1) node[anchor=south] {\tt arg$_2$=2};
  \draw[line width=1mm] (-1,2) -- (1,2);
  \draw (0,2) node[anchor=south] {\tt arg$_1$=1};
  \draw[line width=1mm] (-1,3) -- (1,3);
  \draw (0,3.1) node[anchor=south] {\tt ret};
  \draw[line width=1mm] (-1,4) -- (1,4);
  \draw (0,4) node[anchor=south] {\small\tt last sp};
  \draw[line width=1mm] (-1,5) -- (1,5);
  \draw (0,5) node[anchor=south] {\tt buf$_1$};
  \draw[line width=1mm] (-1,6) -- (1,6);
  \draw (0,6) node[anchor=south] {\tt buf$_2$};
  \draw[line width=1mm] (-1,7) -- (1,7);

  \draw[->,line width=0.5mm] (1,4.5) -- (1.8,4.5) -- (1.8, 0) -- (1.1,0); 
  \draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5);
  \draw (2.6,3.1) node[anchor=south west] {\tt back to main()};
\end{tikzpicture}
\end{center} 

\noindent On the left is the stack before \code{foo} is
called; on the right is the stack after \code{foo} finishes.
The function call to \code{foo} in Line 7 pushes the arguments
onto the stack in reverse order---shown in the middle.
Therefore first 3 then 2 and finally 1. Then it pushes the
return address onto the stack where execution should resume
once \code{foo} has finished. The last stack pointer
(\code{sp}) is needed in order to clean up the stack to the
last level---in fact there is no cleaning involved, but just
the top of the stack will be set back. So the last stack
pointer also needs to be stored. The two buffers inside
\pcode{foo} are on the stack too, because they are local data
within \code{foo}. Consequently the stack in the middle is a
snapshot after Line 3 has been executed. In case you are
familiar with assembly instructions you can also read off this
behaviour from the machine code that the \code{gcc} compiler
generates for the program above:\footnote{You can make
\pcode{gcc} generate assembly instructions if you call it with
the \pcode{-S} option, for example \pcode{gcc -S out in.c}\;.
Or you can look at this code by using the debugger. How to do
this will be explained later.}.

\begin{center}\small
\begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
{\lstinputlisting[language={[x86masm]Assembler},
  morekeywords={movl},xleftmargin=5mm]
  {../progs/example1a.s}} &
{\lstinputlisting[language={[x86masm]Assembler},
  morekeywords={movl,movw},xleftmargin=5mm]
  {../progs/example1b.s}}  
\end{tabular}
\end{center}

\noindent On the left you can see how the function
\pcode{main} prepares in Lines 2 to 7 the stack before calling
the function \pcode{foo}. You can see that the numbers 3, 2, 1
are stored on the stack (the register \code{$esp} refers to
the top of the stack). On the right you can see how the
function \pcode{foo} stores the two local buffers onto the
stack and initialises them with the given data (Lines 2 to 9).
Since there is no real computation going on inside
\pcode{foo}, the function then just restores the stack to its
old state and crucially sets the return address where the
computation should resume (Line 9 in the code on the left-hand
side). The instruction \code{ret} then transfers control back
to the function \pcode{main} to the the instruction just after
the call to \pcode{foo}, that is Line 9.
 
Another part of the ``conspiracy'' is that library functions
in C look typically as follows:
 
\begin{center}
\lstinputlisting[language=C,numbers=none]{../progs/app5.c}
\end{center} 

\noindent This function copies data from a source \pcode{src}
to a destination \pcode{dst}. The important point is that it
copies the data until it reaches a zero-byte (\code{"\\0"}). 

The central idea of the buffer overflow attack is to overwrite
the return address on the stack which designates where the
control flow of the program should resume once the function at
hand has finished its computation. So if we have somewhere in
a function a local a buffer, say

\begin{center}
\code{char buf[8];}
\end{center}

\noindent
then the corresponding stack will look as follows

\begin{center}
 \begin{tikzpicture}[scale=0.65]
  %\draw[step=1cm] (-3,-1) grid (3,8);
  \draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1);
  \draw[line width=1mm] (-1,-1.2) -- (-1,6.4);
  \draw[line width=1mm] ( 1,-1.2) -- ( 1,6.4);
  \draw (0,-1) node[anchor=south] {\tt main};
  \draw[line width=1mm] (-1,0) -- (1,0);
  \draw (0,0) node[anchor=south] {\tt arg$_3$=3};
  \draw[line width=1mm] (-1,1) -- (1,1);
  \draw (0,1) node[anchor=south] {\tt arg$_2$=2};
  \draw[line width=1mm] (-1,2) -- (1,2);
  \draw (0,2) node[anchor=south] {\tt arg$_1$=1};
  \draw[line width=1mm] (-1,3) -- (1,3);
  \draw (0,3.1) node[anchor=south] {\tt ret};
  \draw[line width=1mm] (-1,4) -- (1,4);
  \draw (0,4) node[anchor=south] {\small\tt last sp};
  \draw[line width=1mm] (-1,5) -- (1,5);
  \draw (0,5.1) node[anchor=south] {\tt buf};
  \draw[line width=1mm] (-1,6) -- (1,6);
  \draw (2,5.1) node[anchor=south] {\code{$esp}};
  \draw[<-,line width=0.5mm] (1.1,6) -- (2.5,6);

  \draw[->,line width=0.5mm] (1,4.5) -- (2.5,4.5);
  \draw (2.6,4.1) node[anchor=south west] {\code{??}};
  
  \draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5);
  \draw (2.6,3.1) node[anchor=south west] {\tt jump to \code{\\x080483f4}};
\end{tikzpicture}
\end{center}

\noindent We need to fill this buffer over its limit of 8
characters so that it overwrites the stack pointer and then
also overwrites the return address. If, for example, we want
to jump to a specific address in memory, say,
\pcode{\\x080483f4} then we can fill the buffer with the data

\begin{center}
\code{char buf[8] = "AAAAAAAABBBB\\xf4\\x83\\x04\\x08";}
\end{center}
 
\noindent The first eight \pcode{A}s fill the buffer to the
rim; the next four \pcode{B}s overwrite the stack pointer
(with what data we overwrite this part is usually not
important); then comes the address we want to jump to. Notice
that we have to give the address in the reverse order. All
addresses on Intel CPUs need to be given in this way. Since
the string is enclosed in double quotes, the C convention is
that the string internally will automatically be terminated by
a zero-byte. If the programmer uses functions like
\pcode{strcpy} for filling the buffer \pcode{buf}, then we can
be sure it will overwrite the stack in this manner---since it
will copy everything up to the zero-byte. Notice that this
overwriting of the buffer only works since the newer item, the
buffer, is stored on the stack before the older items, like
return address and arguments. If it had be the other way
around, then such an overwriting by overflowing a local buffer
would just not work.

What the outcome of such an attack is can be illustrated with
the code shown in Figure~\ref{C2}. Under ``normal operation''
this program ask for a login-name and a password. Both of
which are stored in \code{char} buffers of length 8. The
function \pcode{match} tests whether two such buffers contain
the same. If yes, then the function lets you ``in'' (by
printing \pcode{Welcome}). If not, it denies access (by
printing \pcode{Wrong identity}). The vulnerable function is
\code{get_line} in Lines 11 to 19. This function does not take
any precautions about the buffer of 8 characters being filled
beyond this 8-character-limit. Let us suppose the login name
is \pcode{test}. Then the buffer overflow can be triggered
with a specially crafted string as password:

\begin{center}
\code{AAAAAAAABBBB\\x2c\\x85\\x04\\x08\\n}
\end{center}

\noindent The address at the end happens to be the one for the
function \pcode{welcome()}. This means even with this input
(where the login name and password clearly do not match) the
program will still print out \pcode{Welcome}. The only
information we need for this attack is to know where the
function \pcode{welcome()} starts in memory. This information
can be easily obtained by starting the program inside the
debugger and disassembling this function. 

\begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
  morekeywords={movl,movw}]
$ gdb C2
GNU gdb (GDB) 7.2-ubuntu
(gdb) disassemble welcome
\end{lstlisting}

\noindent \pcode{C2} is the name of the program and
\pcode{gdb} is the name of the debugger. The output will be
something like this

\begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
  morekeywords={movl,movw}]
0x0804852c <+0>:     push   %ebp
0x0804852d <+1>:     mov    %esp,%ebp
0x0804852f <+3>:     sub    $0x4,%esp
0x08048532 <+6>:     movl   $0x8048690,(%esp)
0x08048539 <+13>:    call   0x80483a4 <puts@plt>
0x0804853e <+18>:    movl   $0x0,(%esp)
0x08048545 <+25>:    call   0x80483b4 <exit@plt>
\end{lstlisting}

\noindent indicating that the function \pcode{welcome()}
starts at address \pcode{0x0804852c} (top address in the 
left column).

\begin{figure}[p]
\lstinputlisting[language=C]{../progs/C2.c}
\caption{A suspicious login implementation.\label{C2}}
\end{figure}

This kind of attack was very popular with commercial programs
that needed a key to be unlocked. Historically, hackers first 
broke the rather weak encryption of these locking mechanisms.
After the encryption had been made stronger, hackers used
buffer overflow attacks as shown above to jump directly to
the part of the program that was intended to be only available
after the correct key was typed in. 

\subsection*{Paylods}

Unfortunately, much more harm can be caused by buffer overflow
attacks. This is achieved by injecting code that will be run
once the return address is appropriately modified. Typically
the code that will be injected is for running a shell. This
gives the attacker the ability to run programs on the target
machine and have a good look around, provided the attacked
process was not already running as root.\footnote{In that case
the attacker would do already congratulate him or herself to
another computer under full control.} In order to be send as
part of the string that is overflowing the buffer, we need the
code to be represented as a sequence of characters. For
example

\lstinputlisting[language=C,numbers=none]{../progs/o1.c}

\noindent These characters represent the machine code for
opening a shell. It seems obtaining such a string requires
higher-education in the architecture of the target system. But
it is actually relatively simple: First there are many such
string ready-made---just a quick Google query away. Second,
tools like the debugger can help us again. We can just write
the code we want in C, for example this would be the program
for starting a shell

\lstinputlisting[language=C,numbers=none]{../progs/shell.c} 

\noindent Once compiled, we can use the debugger to obtain 
the machine code, or even the ready-made encoding as character
sequence. 

While easy, obtaining this string is not entirely trivial.
Remember the functions in C that copy or fill buffers work
such that they copy everything until the zero byte is reached.
Unfortunately the ``vanilla'' output from the debugger for the
shell-program above will contain such zero bytes. So a
post-processing phase is needed to rewrite the machine code in
a way that it does not contain any zero bytes. This is like
some works of literature that have been written so that the
letter 'i', for example, is avoided. For rewriting the machine
code, you might need to use clever tricks like

\begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
xor %eax, %eax
\end{lstlisting}

\noindent This instruction does not contain any zero byte when
encoded, but produces a zero byte on the stack when run. 

Having removed the zero bytes we can craft the string that 
will be send to the target computer. It is typically of the 
form

\begin{center}
  \begin{tikzpicture}[scale=0.7]
  \draw[line width=1mm] (-2, -1) rectangle (2,3);
  \draw[line width=1mm] (-2,1.9) -- (2,1.9);
  \draw (0,2.5) node {\large\tt shell code};
  \draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7);
  \draw[->,line width=0.3mm] (1.05, -1) -- (1.05,-1.7) --
  (-3,-1.7) -- (-3, 3.7) -- (-1.9, 3.7) -- (-1.9, 3.1);
  \draw (-2, 3) node[anchor=north east] {\LARGE \color{codegreen}{``}};
  \draw ( 2,-0.9) node[anchor=west] {\LARGE\color{codegreen}{''}};
  \end{tikzpicture}
\end{center}

\noindent This of course requires that the buffer we are
trying to attack can at least contain the shellcode we want to
run. But as you can see this is only 47 bytes, which is a very
low bar to jump over. More formidable is the choice of finding
the right address to jump to. As indicated in the picture we
need to be very precise with the address with which we will
overwrite the buffer. It has to be precisely the first byte of
the shellcode. While this is easy with the help of a debugger
(as seen before), we typically cannot run anything on the
machine yet we target. And the address is very specific to the
setup of the target machine. One way of finding out what the
right address is is to try out one by one until we get lucky.
With the large memories available today, however, the odds are
long. And if we try out too many possible candidates too
quickly, we might be detected by the system administrator of
the target system.

We can improve our odds considerably by following a clever 
trick. Instead of adding the shellcode at the beginning of the
string, we should add it at the end, just before we overflow 
the buffer, for example

\begin{center}
  \begin{tikzpicture}[scale=0.7]
  \draw[line width=1mm] (-2, -1) rectangle (2,3);
  \draw[line width=1mm] (-2,1.9) -- (2,1.9);
  \draw (0,2.5) node {\large\tt shell code};
  \draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7);
  \draw (-2, 3) node[anchor=north east] {\LARGE \color{codegreen}{``}};
  \draw ( 2,-0.9) node[anchor=west] {\LARGE\color{codegreen}{''}};
  \end{tikzpicture}
\end{center}

\noindent Then we can fill up the gray part of the string with
a \pcode{NOP} operation. The code for this operation is
\code{\\0x90}. It is available on every architecture and its
purpose it to to nothing apart from waiting a small amount of
time. If we now use an address that lets us jump to any
address in the gray area we are done. The target machine will 
execute these \pcode{NOP} operations until it reaches the
shellcode. A moment of thought can convince you that this
trick can hugely improve our odds of finding the right 
address---depending on the size of the buffer, it might
only take a few tries to get the shellcode to run.

\bigskip\bigskip
\subsubsection*{A Crash-Course for GDB}

\begin{itemize}
\item \texttt{(l)ist n} -- listing the source file from line 
\texttt{n}
\item \texttt{disassemble fun-name}
\item \texttt{run args} -- starts the program, potential 
arguments can be given
\item \texttt{(b)reak line-number} -- set break point
\item \texttt{(c)ontinue} -- continue execution until next 
breakpoint in a line number

\item \texttt{x/nxw addr} -- print out \texttt{n} words starting 
from address \pcode{addr}, the address could be \code{$esp} 
for looking at the content of the stack
\item \texttt{x/nxb addr} -- print out \texttt{n} bytes 
\end{itemize}

 
\bigskip\bigskip \noindent If you want to know more about
buffer overflow attacks, the original Phrack article
``Smashing The Stack For Fun And Profit'' by Elias Levy (also
known as Aleph One) is an engaging read:

\begin{center}
\url{http://phrack.org/issues/49/14.html}
\end{center} 

\noindent This is an article from 1996 and some parts are
not up-to-date anymore. The article called
``Smashing the Stack in 2010''

\begin{center}
\url{http://www.mgraziano.info/docs/stsi2010.pdf}
\end{center}

\noindent updates, as the name says, most information to 2010.
 
\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: