\documentclass{article}
\usepackage{../style}
\usepackage{../langs}
\begin{document}
\section*{Handout 3 (Buffer Overflow Attacks)}
By far the most popular attack method on computers are buffer
overflow attacks or simple variations thereof. The popularity is
unfortunate because we nowadays have technology in place to prevent them
effectively. But these kind of attacks are still very relevant
even today since there are many legacy systems out there and
also many modern embedded systems do not take any precautions
to prevent such attacks.
To understand how buffer overflow attacks work, we have to have
a look at how computers work ``under the hood'' (on the
machine level) and also understand some aspects of the C/C++
programming language. This might not be everyday fare for
computer science students, but who said that criminal hackers
restrict themselves to everyday fare? Not to mention the
free-riding script-kiddies who use this technology without
even knowing what the underlying ideas are. If you want to be
a good security engineer who needs to defend such attacks,
then better you know the details.
For buffer overflow attacks to work, a number of innocent
design decisions, which are really benign on their own, need
to conspire against you. All these decisions were pretty much
taken at a time when there was no Internet: C was introduced
around 1973; the Internet TCP/IP protocol was standardised in
1982 by which time there were maybe 500 servers connected (and
all users were well-behaved, mostly academics); Intel's first
8086 CPUs arrived around 1977. So nobody of the
``forefathers'' can really be blamed, but as mentioned above
we should already be way beyond the point that buffer overflow
attacks are worth a thought. Unfortunately, this is far from
the truth. I let you ponder why?
One such ``benign'' design decision is how the memory is laid
out into different regions for each process.
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\draw[step=1cm] (-3,-3) grid (3,3);
\draw[line width=1mm] (-2, -3) rectangle (2,3);
\draw[line width=1mm] (-2,1) -- (2,1);
\draw[line width=1mm] (-2,-1) -- (2,-1);
\draw (0,2) node {\large\tt text};
\draw (0,0) node {\large\tt heap};
\draw (0,-2) node {\large\tt stack};
\draw (-2.7,3) node[anchor=north east] {\tt\begin{tabular}{@{}l@{}}lower\\ address\end{tabular}};
\draw (-2.7,-3) node[anchor=south east] {\tt\begin{tabular}{@{}l@{}}higher\\ address\end{tabular}};
\draw[->, line width=1mm] (-2.5,3) -- (-2.5,-3);
\draw (2.7,-2) node[anchor=west] {\tt grows};
\draw (2.7,-3) node[anchor=south west] {\tt\footnotesize older};
\draw (2.7,-1) node[anchor=north west] {\tt\footnotesize newer};
\draw[|->, line width=1mm] (2.5,-3) -- (2.5,-1);
\end{tikzpicture}
\end{center}
\noindent The text region contains the program code (usually
this region is read-only). The heap stores all data the
programmer explicitly allocates. For us the most interesting
region is the stack, which contains data mostly associated
with the control flow of the program. Notice that the stack
grows from a higher addresses to lower addresses. That means
that older items on the stack will be stored behind, or after,
newer items. Let's look a bit closer what happens with the
stack when a program is running. Consider the following simple
C program.
\lstinputlisting[language=C]{../progs/example1.c}
\noindent The \code{main} function calls \code{foo} with three
arguments. \code{Foo} contains two (local) buffers. The
interesting point for us will be what will the stack loke
like after Line 3 has been executed? The answer is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.65]
\draw[gray!20,fill=gray!20] (-5, 0) rectangle (-3,-1);
\draw[line width=1mm] (-5,-1.2) -- (-5,0.2);
\draw[line width=1mm] (-3,-1.2) -- (-3,0.2);
\draw (-4,-1) node[anchor=south] {\tt main};
\draw[line width=1mm] (-5,0) -- (-3,0);
\draw[gray!20,fill=gray!20] (3, 0) rectangle (5,-1);
\draw[line width=1mm] (3,-1.2) -- (3,0.2);
\draw[line width=1mm] (5,-1.2) -- (5,0.2);
\draw (4,-1) node[anchor=south] {\tt main};
\draw[line width=1mm] (3,0) -- (5,0);
%\draw[step=1cm] (-3,-1) grid (3,8);
\draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1);
\draw[line width=1mm] (-1,-1.2) -- (-1,7.4);
\draw[line width=1mm] ( 1,-1.2) -- ( 1,7.4);
\draw (0,-1) node[anchor=south] {\tt main};
\draw[line width=1mm] (-1,0) -- (1,0);
\draw (0,0) node[anchor=south] {\tt arg$_3$=3};
\draw[line width=1mm] (-1,1) -- (1,1);
\draw (0,1) node[anchor=south] {\tt arg$_2$=2};
\draw[line width=1mm] (-1,2) -- (1,2);
\draw (0,2) node[anchor=south] {\tt arg$_1$=1};
\draw[line width=1mm] (-1,3) -- (1,3);
\draw (0,3.1) node[anchor=south] {\tt ret};
\draw[line width=1mm] (-1,4) -- (1,4);
\draw (0,4) node[anchor=south] {\small\tt last sp};
\draw[line width=1mm] (-1,5) -- (1,5);
\draw (0,5) node[anchor=south] {\tt buf$_1$};
\draw[line width=1mm] (-1,6) -- (1,6);
\draw (0,6) node[anchor=south] {\tt buf$_2$};
\draw[line width=1mm] (-1,7) -- (1,7);
\draw[->,line width=0.5mm] (1,4.5) -- (1.8,4.5) -- (1.8, 0) -- (1.1,0);
\draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5);
\draw (2.6,3.1) node[anchor=south west] {\tt back to main()};
\end{tikzpicture}
\end{center}
\noindent On the left is the stack before \code{foo} is
called; on the right is the stack after \code{foo} finishes.
The function call to \code{foo} in Line 7 pushes the arguments
onto the stack in reverse order---shown in the middle.
Therefore first 3 then 2 and finally 1. Then it pushes the
return address to the stack where execution should resume once
\code{foo} has finished. The last stack pointer (\code{sp}) is
needed in order to clean up the stack to the last level---in
fact there is no cleaning involved, but just the top of the
stack will be set back. The two buffers are also on the stack,
because they are local data within \code{foo}. So in the
middle is a snapshot of the stack after Line 3 has been
executed. In case you are familiar with assembly instructions
you can also read off this behaviour from the machine
code that the \code{gcc} compiler generates for the program
above:\footnote{You can make \pcode{gcc} generate assembly
instructions if you call it with the \pcode{-S} option,
for example \pcode{gcc -S out in.c}\;. Or you can look
at this code by using the debugger. This will be explained
later.}.
\begin{center}\small
\begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
{\lstinputlisting[language={[x86masm]Assembler},
morekeywords={movl},xleftmargin=5mm]
{../progs/example1a.s}} &
{\lstinputlisting[language={[x86masm]Assembler},
morekeywords={movl,movw},xleftmargin=5mm]
{../progs/example1b.s}}
\end{tabular}
\end{center}
\noindent On the left you can see how the function
\pcode{main} prepares in Lines 2 to 7 the stack, before
calling the function \pcode{foo}. You can see that the
numbers 3, 2, 1 are stored on the stack (the register
\code{$esp} refers to the top of the stack). On the right
you can see how the function \pcode{foo} stores the two local
buffers onto the stack and initialises them with the given
data (Lines 2 to 9). Since there is no real computation
going on inside \pcode{foo} the function then just restores
the stack to its old state and crucially sets the return
address where the computation should resume (Line 9 in the
code on the left hand side). The instruction \code{ret} then
transfers control back to the function \pcode{main} to the
teh instruction just after the call, namely Line 9.
Another part of the ``conspiracy'' is that library functions
in C look typically as follows:
\begin{center}
\lstinputlisting[language=C,numbers=none]{../progs/app5.c}
\end{center}
\noindent This function copies data from a source \pcode{src}
to a destination \pcode{dst}. The important point is that it
copies the data until it reaches a zero-byte (\code{"\\0"}).
The central idea of the buffer overflow attack is to overwrite
the return address on the stack which states where the control
flow of the program should resume once the function at hand
has finished its computation. So if we have somewhere in a
function a local a buffer, say
\begin{center}
\code{char buf[8];}
\end{center}
\noindent
then the corresponding stack will look as follows
\begin{center}
\begin{tikzpicture}[scale=0.65]
%\draw[step=1cm] (-3,-1) grid (3,8);
\draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1);
\draw[line width=1mm] (-1,-1.2) -- (-1,6.4);
\draw[line width=1mm] ( 1,-1.2) -- ( 1,6.4);
\draw (0,-1) node[anchor=south] {\tt main};
\draw[line width=1mm] (-1,0) -- (1,0);
\draw (0,0) node[anchor=south] {\tt arg$_3$=3};
\draw[line width=1mm] (-1,1) -- (1,1);
\draw (0,1) node[anchor=south] {\tt arg$_2$=2};
\draw[line width=1mm] (-1,2) -- (1,2);
\draw (0,2) node[anchor=south] {\tt arg$_1$=1};
\draw[line width=1mm] (-1,3) -- (1,3);
\draw (0,3.1) node[anchor=south] {\tt ret};
\draw[line width=1mm] (-1,4) -- (1,4);
\draw (0,4) node[anchor=south] {\small\tt last sp};
\draw[line width=1mm] (-1,5) -- (1,5);
\draw (0,5) node[anchor=south] {\tt buf};
\draw[line width=1mm] (-1,6) -- (1,6);
\draw (2,5.1) node[anchor=south] {\code{$esp}};
\draw[<-,line width=0.5mm] (1.1,6) -- (2.5,6);
\draw[->,line width=0.5mm] (1,4.5) -- (2.5,4.5);
\draw (2.6,4.1) node[anchor=south west] {\code{??}};
\draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5);
\draw (2.6,3.1) node[anchor=south west] {\tt jump to \code{\\x080483f4}};
\end{tikzpicture}
\end{center}
\noindent We need to fill this over its limit of
8 characters so that it overwrites the stack pointer
and then overwrites the return address. If, for example,
we want to jump to a specific address in memory, say,
\pcode{\\x080483f4} then we need to fill the
buffer for example as follows
\begin{center}
\code{char buf[8] = "AAAAAAAABBBB\\xf4\\x83\\x04\\x08";}
\end{center}
\noindent The first 8 \pcode{A}s fill the buffer to the rim;
the next four \pcode{B}s overwrite the stack pointer (with
what data we overwrite this part is usually not important);
then comes the address we want to jump to. Notice that we have
to give the address in the reverse order. All addresses on
Intel CPUs need to be given in this way. Since the string is
enclosed in double quotes, the C convention is that the string
internally will automatically be terminated by a zero-byte. If
the programmer uses functions like \pcode{strcpy} for filling
the buffer \pcode{buf}, then we can be sure it will overwrite
the stack in this manner---since it will copy everything up
to the zero-byte.
What the outcome of such an attack is can be illustrated with
the code shown in Figure~\ref{C2}. Under ``normal operation''
this program ask for a login-name and a password (both are
represented as strings). Both of which are stored in buffers
of length 8. The function \pcode{match} tests whether two such
strings are equal. If yes, then the function lets you in
(by printing \pcode{Welcome}). If not, it denies access
(by printing \pcode{Wrong identity}). The vulnerable function
is \code{get_line} in Lines 11 to 19. This function does not
take any precautions about the buffer of 8 characters being
filled beyond this 8-character-limit. The buffer overflow
can be triggered by inputing something, like \pcode{foo}, for
the login name and then the specially crafted string as
password:
\begin{center}
\code{AAAAAAAABBBB\\x2c\\x85\\x04\\x08\\n}
\end{center}
\noindent The address happens to be the one for the function
\pcode{welcome()}. This means even with this input (where the
login name and password clearly do not match) the program will
still print out \pcode{Welcome}. The only information we need
for this attack is to know where the function
\pcode{welcome()} starts in memory. This information can be
easily obtained by starting the program inside the debugger
and disassembling this function.
\begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
morekeywords={movl,movw}]
$ gdb C2
GNU gdb (GDB) 7.2-ubuntu
(gdb) disassemble welcome
\end{lstlisting}
\noindent
The output will be something like this
\begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
morekeywords={movl,movw}]
0x0804852c <+0>: push %ebp
0x0804852d <+1>: mov %esp,%ebp
0x0804852f <+3>: sub $0x4,%esp
0x08048532 <+6>: movl $0x8048690,(%esp)
0x08048539 <+13>: call 0x80483a4 <puts@plt>
0x0804853e <+18>: movl $0x0,(%esp)
0x08048545 <+25>: call 0x80483b4 <exit@plt>
\end{lstlisting}
\noindent indicating that the function \pcode{welcome()}
starts at address \pcode{0x0804852c}.
\begin{figure}[p]
\lstinputlisting[language=C]{../progs/C2.c}
\caption{A suspicious login implementation.\label{C2}}
\end{figure}
This kind of attack was very popular with commercial programs
that needed a key to be unlocked. Historically, hackers first
broke the rather weak encryption of these locking mechanisms.
After the encryption had been made stronger, hackers used
buffer overflow attacks as shown above to jump directly to
the part of the program that was intended to be only available
after the correct key was typed in by the user.
\subsection*{Paylods}
Unfortunately, much more harm can be caused by buffer overflow
attacks. This is achieved by injecting code that will be run
once the return address is appropriately modified. Typically
the code that will be injected is for running a shell. In
order to be send as part of the string that is overflowing the
buffer, we need the code to be encoded as a sequence of
characters
\lstinputlisting[language=C,numbers=none]{../progs/o1.c}
\noindent These characters represent the machine code
for opening a shell. It seems obtaining such a string
requires higher-education in the architecture of the
target system. But it is actually relatively simple: First
there are many ready-made strings available---just a quick
Google query away. Second, tools like the debugger can help
us again. We can just write the code we want in C, for
example this would be the program to start a shell
\lstinputlisting[language=C,numbers=none]{../progs/shell.c}
\noindent Once compiled, we can use the debugger to obtain
the machine code, or even the ready made encoding as character
sequence.
While easy, obtaining this string is not entirely trivial.
Remember the functions in C that copy or fill buffers work
such that they copy everything until the zero byte is reached.
Unfortunately the ``vanilla'' output from the debugger for the
shell-program will contain such zero bytes. So a
post-processing phase is needed to rewrite the machine code
such that it does not contain any zero bytes. This is like
some works of literature that have been rewritten so that the
letter 'i', for example, is avoided. For rewriting the machine
code you might need to use clever tricks like
\begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
xor %eax, %eax
\end{lstlisting}
\noindent This instruction does not contain any zero byte when
encoded, but produces a zero byte on the stack.
Having removed the zero bytes we can craft the string that
will be send to the target computer. It is typically of the
form
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw[line width=1mm] (-2, -1) rectangle (2,3);
\draw[line width=1mm] (-2,1.9) -- (2,1.9);
\draw (0,2.5) node {\large\tt shell code};
\draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7);
\draw[->,line width=0.3mm] (1.05, -1) -- (1.05,-1.7) --
(-3,-1.7) -- (-3, 3.7) -- (-1.9, 3.7) -- (-1.9, 3.1);
\draw (-2, 3) node[anchor=north east] {\LARGE \color{codegreen}{``}};
\draw ( 2,-0.9) node[anchor=west] {\LARGE\color{codegreen}{''}};
\end{tikzpicture}
\end{center}
\noindent This of course requires that the buffer we are
trying to attack can at least contain the shellcode we want to
run. But as you can see this is only 47 bytes, which is a very
low bar to jump over. More formidable is the choice of finding
the right address to jump to. As indicated in the picture we
need to be very precise with the address with which we will
overwrite the buffer. It has to be precisely the first byte of
the shellcode. While this is easy withe the help of a
debugger, we typically cannot run anything on the machine yet
we target. And the address is very specific to the setup of
the target machine. One way of finding out what the right
address is to try out one by one until we get lucky. With
large memories available today, however, the odds are long.
And if we try out too many possible candidates to quickly, we
might be detected by the system administrator of the target
system.
We can improve our odds considerably, by the following clever
trick. Instead of adding the shellcode at the beginning of the
string, we should add it at the end, just before we overflow
the buffer, like
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw[line width=1mm] (-2, -1) rectangle (2,3);
\draw[line width=1mm] (-2,1.9) -- (2,1.9);
\draw (0,2.5) node {\large\tt shell code};
\draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7);
\draw (-2, 3) node[anchor=north east] {\LARGE \color{codegreen}{``}};
\draw ( 2,-0.9) node[anchor=west] {\LARGE\color{codegreen}{''}};
\end{tikzpicture}
\end{center}
\bigskip\bigskip
\subsubsection*{A Crash-Course for GDB}
\begin{itemize}
\item \texttt{(l)ist n} -- listing the source file from line
\texttt{n}
\item \texttt{disassemble fun-name}
\item \texttt{run args} -- starts the program, potential
arguments can be given
\item \texttt{(b)reak line-number} -- set break point
\item \texttt{(c)ontinue} -- continue execution until next
breakpoint in a line number
\item \texttt{x/nxw addr} -- print out \texttt{n} words starting
from address \pcode{addr}, the address could be \code{$esp}
for looking at the content of the stack
\item \texttt{x/nxb addr} -- print out \texttt{n} bytes
\end{itemize}
\bigskip\bigskip \noindent If you want to know more about
buffer overflow attacks, the original Phrack article
``Smashing The Stack For Fun And Profit'' by Elias Levy (also
known as Aleph One) is an engaging read:
\begin{center}
\url{http://phrack.org/issues/49/14.html}
\end{center}
\noindent This is an article from 1996 and some parts are
not up-to-date anymore. The article called
``Smashing the Stack in 2010''
\begin{center}
\url{http://www.mgraziano.info/docs/stsi2010.pdf}
\end{center}
\noindent updates, as the name says, most information to 2010.
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End: