handouts/ho03.tex
changeset 237 b784175a69dc
parent 236 40efc28963af
child 238 6ba55ba5b588
--- a/handouts/ho03.tex	Fri Oct 10 13:09:06 2014 +0100
+++ b/handouts/ho03.tex	Fri Oct 10 14:22:41 2014 +0100
@@ -8,8 +8,9 @@
 \section*{Handout 3 (Buffer Overflow Attacks)}
 
 By far the most popular attack method on computers are buffer
-overflow attacks or variations thereof. The popularity is
-unfortunate because we nowadays have technology in place to
+overflow attacks or variations thereof. The first Internet
+worm (Morris) exploited exactly such an attack. The popularity
+is unfortunate because we nowadays have technology in place to
 prevent them effectively. But these kind of attacks are still
 very relevant even today since there are many legacy systems
 out there and also many modern embedded systems often do not
@@ -460,16 +461,25 @@
 payload.\label{C3}}
 \end{figure}
 
+By the way you might have the question how do attackers find
+out about vulnerable systems? Well, the automated version uses
+\emph{fuzzers}, which throw randomly generated user input at
+applications and observe the behaviour. If an application
+seg-faults (throws a segmentation error) then this is a good
+indication that a buffer overflow vulnerability can be
+exploited.
+
+
 \subsubsection*{Format String Attacks}
 
-A question might arise, where do we get all this information
-about addresses necessary for mounting a buffer overflow
-attack without having yet access to the system? The answer are
-\emph{format string attacks}. While technically they are
-programming mistakes (and they are pointed out as warning by
-modern compilers), they can be easily made and therefore an
-easy target. Let us look at the simplest version of a 
-vulnerable program.
+Another question might arise, where do we get all this
+information about addresses necessary for mounting a buffer
+overflow attack without having yet access to the system? The
+answer are \emph{format string attacks}. While technically
+they are programming mistakes (and they are pointed out as
+warning by modern compilers), they can be easily made and
+therefore an easy target. Let us look at the simplest version
+of a vulnerable program.
 
 \lstinputlisting[language=C]{../progs/C4.c}
 
@@ -526,34 +536,147 @@
 \subsubsection*{Caveats and Defences}
 
 How can we defend against these attacks? Well, a reflex could 
-be to blame programmers. Precautions should be taken that 
+be to blame programmers. Precautions should be taken so that 
 buffers cannot been overfilled and format strings should not
-be forgotten. 
+be forgotten. This might actually be slightly simpler nowadays 
+since safe versions of the library functions exists, which
+always specify the precise number of bytes that should be 
+copied. Compilers also nowadays provide warnings when format
+strings are omitted. So proper education of programmers is 
+definitely a part of a defence against such attacks. However,
+if we leave it at that, then we have the mess we have today
+with new attacks discovered almost daily. 
+
+There is actually a quite long record of publications
+proposing defences against buffer overflow attacks. One method
+is to declare the stack data as not executable. In this way it
+is impossible to inject a payload as shown above which is then
+executed once the stack is smashed. But this needs hardware
+support which allows one to declare certain memory regions to
+be not executable. Such a feature was not introduced before
+the Intel 386, for example. Also if you have a JIT
+(just-in-time) compiler it might be advantageous to have
+the stack containing executable data. So it is definitely a 
+trade-off.
+
+Anyway attackers have found ways around this defence: they
+developed \emph{return-to-lib-C} attacks. The idea is to not
+inject code, but already use the code that is present at the
+target computer. The lib-C library, for example, already
+contains the code for spawning a shell. With
+\emph{return-to-lib-C} one just has to find out where this
+code is located. But attackers can make good guesses. In my
+examples I took a shortcut and always made the stack
+executable. 
 
-\bigskip\bigskip
-\subsubsection*{A Crash-Course for GDB}
+Another defence is called \emph{stack canaries}. The advantage 
+is that they can be automatically inserted into compiled code
+and do not need any hardware support. Though they will make
+your program run slightly slower. The idea behind \emph{stack
+canaries} is to push a random number onto the stack just 
+before local data is stored. For our very first function the
+stack would with a \emph{stack canary} look as follows
+
+\begin{center}
+\begin{tikzpicture}[scale=0.65]
+  %\draw[step=1cm] (-3,-1) grid (3,8);
+  \draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1);
+  \draw[line width=1mm] (-1,-1.2) -- (-1,7.4);
+  \draw[line width=1mm] ( 1,-1.2) -- ( 1,7.4);
+  \draw (0,-1) node[anchor=south] {\tt main};
+  \draw[line width=1mm] (-1,0) -- (1,0);
+  \draw (0,0) node[anchor=south] {\tt arg$_3$=3};
+  \draw[line width=1mm] (-1,1) -- (1,1);
+  \draw (0,1) node[anchor=south] {\tt arg$_2$=2};
+  \draw[line width=1mm] (-1,2) -- (1,2);
+  \draw (0,2) node[anchor=south] {\tt arg$_1$=1};
+  \draw[line width=1mm] (-1,3) -- (1,3);
+  \draw (0,3.1) node[anchor=south] {\tt ret};
+  \draw[line width=1mm] (-1,4) -- (1,4);
+  \draw (0,4) node[anchor=south] {\small\tt last sp};
+  \draw[line width=1mm] (-1,5) -- (1,5);
+  \draw (0,5.1) node[anchor=south] {\tt\small\textcolor{red}{\textbf{random}}};
+  \draw[line width=1mm] (-1,6) -- (1,6);
+  \draw (0,6) node[anchor=south] {\tt buf};
+  \draw[line width=1mm] (-1,7) -- (1,7);
+  \end{tikzpicture}
+\end{center}
 
-\begin{itemize}
-\item \texttt{(l)ist n} -- listing the source file from line 
-\texttt{n}
-\item \texttt{disassemble fun-name}
-\item \texttt{run args} -- starts the program, potential 
-arguments can be given
-\item \texttt{(b)reak line-number} -- set break point
-\item \texttt{(c)ontinue} -- continue execution until next 
-breakpoint in a line number
+\noindent The idea behind this random number is that when the
+function finishes, it is checked that this random number is
+still intact on the stack. If not, then a buffer overflow has
+occurred. Although this is quite effective, but requires 
+suitable support for generating random numbers. This is always
+hard to get right and attackers are happy to exploit the 
+resulting weaknesses.
+
+Another defence is \emph{address space randomisation}. This
+defence tries to make it harder for an attacker to guess 
+addresses where code is stored. It turns out that addresses
+where code is stored is rather predictable. Randomising the
+place where programs are stored mitigates this problem 
+somewhat.
+
+As mentioned before, modern operating systems have these
+defences enabled by default and make buffer overflow attacks
+harder, but not impossible. Indeed, I as an amateur attacker
+had to explicitly switch off these defences. I run my example
+under an Ubuntu version ``Maverick Meerkat'' from October 
+2010 and the gcc 4.4.5. I have not tried whether newer versions
+would work as well. I tested all examples inside a virtual 
+box\footnote{https://www.virtualbox.org} insulating my main 
+system from any harm. When compiling the programs I called 
+the compiler with the following options:
+
+\begin{center}
+\begin{tabular}{l@{\hspace{1mm}}l}
+\pcode{/usr/bin/gcc} & \pcode{-ggdb -O0}\\
+                     & \pcode{-fno-stack-protector}\\
+                     & \pcode{-mpreferred-stack-boundary=2}\\
+                     & \pcode{-z execstack} 
+\end{tabular}
+\end{center}
 
-\item \texttt{x/nxw addr} -- print out \texttt{n} words starting 
-from address \pcode{addr}, the address could be \code{$esp} 
-for looking at the content of the stack
-\item \texttt{x/nxb addr} -- print out \texttt{n} bytes 
-\end{itemize}
+\noindent The first two are innocent as they instruct the
+compiler to include debugging information and also produce
+non-optimised code (the latter makes the output of the code a
+bit more predictable). The third is important as it switches
+of defences like the stack canaries. The fourth again makes it
+a bit easier to read the code. The final option makes the
+stack executable, thus the the example in Figure~\ref{C3}
+works as intended. While this might be considered
+cheating....since I explicitly switched off all defences, I
+hope I was able convey that this is actually not too far
+from realistic scenarios. I have shown you the classic version
+of the buffer overflow attacks. Updated variants do exist.
+Also one might argue buffer-overflow attacks have been
+solved on computers (desktops or servers) but the computing
+landscape of nowadays is wider than ever. The main problem
+nowadays are embedded systems against which attacker can 
+equally cause a lot of harm and which are much less defended
+against. Anthony Bonkoski makes a similar argument in his 
+security blog:
 
- 
-\bigskip\bigskip \noindent If you want to know more about
-buffer overflow attacks, the original Phrack article
-``Smashing The Stack For Fun And Profit'' by Elias Levy (also
-known as Aleph One) is an engaging read:
+\begin{center}
+\url{http://jabsoft.io/2013/09/25/are-buffer-overflows-solved-yet-a-historical-tale/}
+\end{center}
+
+
+There is one more rather effective defence against buffer 
+overflow attacks: Why not using a safe language? Java at its 
+inception was touted as a safe language because it hides
+all explicit memory management from the user. This definitely
+incurs a runtime penalty, but for bog-standard user-input
+processing applications, speed is not of such an essence 
+anymore. There are of course also many other programming 
+languages that are safe, i.e.~immune to buffer overflow
+attacks.
+\bigskip
+
+\noindent If you want to know more about buffer overflow
+attacks, the original Phrack article ``Smashing The Stack For
+Fun And Profit'' by Elias Levy (also known as Aleph One) is an
+engaging read:
 
 \begin{center}
 \url{http://phrack.org/issues/49/14.html}
@@ -568,8 +691,33 @@
 \end{center}
 
 \noindent updates, as the name says, most information to 2010.
+There are also sources for buffer overflow attack in  
+
+
+\subsubsection*{A Crash-Course for GDB}
+
+If you want to try out the examples from KEATS it might be
+helpful to know about the following commands of the GNU 
+Debugger:
+
+\begin{itemize}
+\item \texttt{(l)ist n} -- lists the source file from line 
+\texttt{n}, the number can be omitted 
+\item \texttt{disassemble fun-name} -- show the assembly code 
+of a function
+\item \texttt{run args} -- starts the program, potential 
+arguments can be given
+\item \texttt{(b)reak line-number} -- sets break point
+\item \texttt{(c)ontinue} -- continue execution until next 
+breakpoint
+\item \texttt{x/nxw addr} -- prints out \texttt{n} words starting 
+from address \pcode{addr}, the address could be \code{$esp} 
+for looking at the content of the stack
+\item \texttt{x/nxb addr} -- prints out \texttt{n} bytes 
+\end{itemize}
+
  
-\end{document}
+\bigskip\bigskip \noindent \end{document}
 
 %%% Local Variables: 
 %%% mode: latex