handouts/ho03.tex
changeset 237 b784175a69dc
parent 236 40efc28963af
child 238 6ba55ba5b588
equal deleted inserted replaced
236:40efc28963af 237:b784175a69dc
     6 \begin{document}
     6 \begin{document}
     7 
     7 
     8 \section*{Handout 3 (Buffer Overflow Attacks)}
     8 \section*{Handout 3 (Buffer Overflow Attacks)}
     9 
     9 
    10 By far the most popular attack method on computers are buffer
    10 By far the most popular attack method on computers are buffer
    11 overflow attacks or variations thereof. The popularity is
    11 overflow attacks or variations thereof. The first Internet
    12 unfortunate because we nowadays have technology in place to
    12 worm (Morris) exploited exactly such an attack. The popularity
       
    13 is unfortunate because we nowadays have technology in place to
    13 prevent them effectively. But these kind of attacks are still
    14 prevent them effectively. But these kind of attacks are still
    14 very relevant even today since there are many legacy systems
    15 very relevant even today since there are many legacy systems
    15 out there and also many modern embedded systems often do not
    16 out there and also many modern embedded systems often do not
    16 take any precautions to prevent such attacks.
    17 take any precautions to prevent such attacks.
    17 
    18 
   458 \lstinputlisting[language=C]{../progs/C3.c}
   459 \lstinputlisting[language=C]{../progs/C3.c}
   459 \caption{Overwriting a buffer with a string containing a
   460 \caption{Overwriting a buffer with a string containing a
   460 payload.\label{C3}}
   461 payload.\label{C3}}
   461 \end{figure}
   462 \end{figure}
   462 
   463 
       
   464 By the way you might have the question how do attackers find
       
   465 out about vulnerable systems? Well, the automated version uses
       
   466 \emph{fuzzers}, which throw randomly generated user input at
       
   467 applications and observe the behaviour. If an application
       
   468 seg-faults (throws a segmentation error) then this is a good
       
   469 indication that a buffer overflow vulnerability can be
       
   470 exploited.
       
   471 
       
   472 
   463 \subsubsection*{Format String Attacks}
   473 \subsubsection*{Format String Attacks}
   464 
   474 
   465 A question might arise, where do we get all this information
   475 Another question might arise, where do we get all this
   466 about addresses necessary for mounting a buffer overflow
   476 information about addresses necessary for mounting a buffer
   467 attack without having yet access to the system? The answer are
   477 overflow attack without having yet access to the system? The
   468 \emph{format string attacks}. While technically they are
   478 answer are \emph{format string attacks}. While technically
   469 programming mistakes (and they are pointed out as warning by
   479 they are programming mistakes (and they are pointed out as
   470 modern compilers), they can be easily made and therefore an
   480 warning by modern compilers), they can be easily made and
   471 easy target. Let us look at the simplest version of a 
   481 therefore an easy target. Let us look at the simplest version
   472 vulnerable program.
   482 of a vulnerable program.
   473 
   483 
   474 \lstinputlisting[language=C]{../progs/C4.c}
   484 \lstinputlisting[language=C]{../progs/C4.c}
   475 
   485 
   476 \noindent The intention is to print out the first argument
   486 \noindent The intention is to print out the first argument
   477 given on the command line. The ``secret string'' is never to
   487 given on the command line. The ``secret string'' is never to
   524 avoiding the obvious buffer overflow attack.
   534 avoiding the obvious buffer overflow attack.
   525 
   535 
   526 \subsubsection*{Caveats and Defences}
   536 \subsubsection*{Caveats and Defences}
   527 
   537 
   528 How can we defend against these attacks? Well, a reflex could 
   538 How can we defend against these attacks? Well, a reflex could 
   529 be to blame programmers. Precautions should be taken that 
   539 be to blame programmers. Precautions should be taken so that 
   530 buffers cannot been overfilled and format strings should not
   540 buffers cannot been overfilled and format strings should not
   531 be forgotten. 
   541 be forgotten. This might actually be slightly simpler nowadays 
   532 
   542 since safe versions of the library functions exists, which
   533 \bigskip\bigskip
   543 always specify the precise number of bytes that should be 
   534 \subsubsection*{A Crash-Course for GDB}
   544 copied. Compilers also nowadays provide warnings when format
   535 
   545 strings are omitted. So proper education of programmers is 
   536 \begin{itemize}
   546 definitely a part of a defence against such attacks. However,
   537 \item \texttt{(l)ist n} -- listing the source file from line 
   547 if we leave it at that, then we have the mess we have today
   538 \texttt{n}
   548 with new attacks discovered almost daily. 
   539 \item \texttt{disassemble fun-name}
   549 
   540 \item \texttt{run args} -- starts the program, potential 
   550 There is actually a quite long record of publications
   541 arguments can be given
   551 proposing defences against buffer overflow attacks. One method
   542 \item \texttt{(b)reak line-number} -- set break point
   552 is to declare the stack data as not executable. In this way it
   543 \item \texttt{(c)ontinue} -- continue execution until next 
   553 is impossible to inject a payload as shown above which is then
   544 breakpoint in a line number
   554 executed once the stack is smashed. But this needs hardware
   545 
   555 support which allows one to declare certain memory regions to
   546 \item \texttt{x/nxw addr} -- print out \texttt{n} words starting 
   556 be not executable. Such a feature was not introduced before
   547 from address \pcode{addr}, the address could be \code{$esp} 
   557 the Intel 386, for example. Also if you have a JIT
   548 for looking at the content of the stack
   558 (just-in-time) compiler it might be advantageous to have
   549 \item \texttt{x/nxb addr} -- print out \texttt{n} bytes 
   559 the stack containing executable data. So it is definitely a 
   550 \end{itemize}
   560 trade-off.
   551 
   561 
   552  
   562 Anyway attackers have found ways around this defence: they
   553 \bigskip\bigskip \noindent If you want to know more about
   563 developed \emph{return-to-lib-C} attacks. The idea is to not
   554 buffer overflow attacks, the original Phrack article
   564 inject code, but already use the code that is present at the
   555 ``Smashing The Stack For Fun And Profit'' by Elias Levy (also
   565 target computer. The lib-C library, for example, already
   556 known as Aleph One) is an engaging read:
   566 contains the code for spawning a shell. With
       
   567 \emph{return-to-lib-C} one just has to find out where this
       
   568 code is located. But attackers can make good guesses. In my
       
   569 examples I took a shortcut and always made the stack
       
   570 executable. 
       
   571 
       
   572 Another defence is called \emph{stack canaries}. The advantage 
       
   573 is that they can be automatically inserted into compiled code
       
   574 and do not need any hardware support. Though they will make
       
   575 your program run slightly slower. The idea behind \emph{stack
       
   576 canaries} is to push a random number onto the stack just 
       
   577 before local data is stored. For our very first function the
       
   578 stack would with a \emph{stack canary} look as follows
       
   579 
       
   580 \begin{center}
       
   581 \begin{tikzpicture}[scale=0.65]
       
   582   %\draw[step=1cm] (-3,-1) grid (3,8);
       
   583   \draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1);
       
   584   \draw[line width=1mm] (-1,-1.2) -- (-1,7.4);
       
   585   \draw[line width=1mm] ( 1,-1.2) -- ( 1,7.4);
       
   586   \draw (0,-1) node[anchor=south] {\tt main};
       
   587   \draw[line width=1mm] (-1,0) -- (1,0);
       
   588   \draw (0,0) node[anchor=south] {\tt arg$_3$=3};
       
   589   \draw[line width=1mm] (-1,1) -- (1,1);
       
   590   \draw (0,1) node[anchor=south] {\tt arg$_2$=2};
       
   591   \draw[line width=1mm] (-1,2) -- (1,2);
       
   592   \draw (0,2) node[anchor=south] {\tt arg$_1$=1};
       
   593   \draw[line width=1mm] (-1,3) -- (1,3);
       
   594   \draw (0,3.1) node[anchor=south] {\tt ret};
       
   595   \draw[line width=1mm] (-1,4) -- (1,4);
       
   596   \draw (0,4) node[anchor=south] {\small\tt last sp};
       
   597   \draw[line width=1mm] (-1,5) -- (1,5);
       
   598   \draw (0,5.1) node[anchor=south] {\tt\small\textcolor{red}{\textbf{random}}};
       
   599   \draw[line width=1mm] (-1,6) -- (1,6);
       
   600   \draw (0,6) node[anchor=south] {\tt buf};
       
   601   \draw[line width=1mm] (-1,7) -- (1,7);
       
   602   \end{tikzpicture}
       
   603 \end{center}
       
   604 
       
   605 \noindent The idea behind this random number is that when the
       
   606 function finishes, it is checked that this random number is
       
   607 still intact on the stack. If not, then a buffer overflow has
       
   608 occurred. Although this is quite effective, but requires 
       
   609 suitable support for generating random numbers. This is always
       
   610 hard to get right and attackers are happy to exploit the 
       
   611 resulting weaknesses.
       
   612 
       
   613 Another defence is \emph{address space randomisation}. This
       
   614 defence tries to make it harder for an attacker to guess 
       
   615 addresses where code is stored. It turns out that addresses
       
   616 where code is stored is rather predictable. Randomising the
       
   617 place where programs are stored mitigates this problem 
       
   618 somewhat.
       
   619 
       
   620 As mentioned before, modern operating systems have these
       
   621 defences enabled by default and make buffer overflow attacks
       
   622 harder, but not impossible. Indeed, I as an amateur attacker
       
   623 had to explicitly switch off these defences. I run my example
       
   624 under an Ubuntu version ``Maverick Meerkat'' from October 
       
   625 2010 and the gcc 4.4.5. I have not tried whether newer versions
       
   626 would work as well. I tested all examples inside a virtual 
       
   627 box\footnote{https://www.virtualbox.org} insulating my main 
       
   628 system from any harm. When compiling the programs I called 
       
   629 the compiler with the following options:
       
   630 
       
   631 \begin{center}
       
   632 \begin{tabular}{l@{\hspace{1mm}}l}
       
   633 \pcode{/usr/bin/gcc} & \pcode{-ggdb -O0}\\
       
   634                      & \pcode{-fno-stack-protector}\\
       
   635                      & \pcode{-mpreferred-stack-boundary=2}\\
       
   636                      & \pcode{-z execstack} 
       
   637 \end{tabular}
       
   638 \end{center}
       
   639 
       
   640 \noindent The first two are innocent as they instruct the
       
   641 compiler to include debugging information and also produce
       
   642 non-optimised code (the latter makes the output of the code a
       
   643 bit more predictable). The third is important as it switches
       
   644 of defences like the stack canaries. The fourth again makes it
       
   645 a bit easier to read the code. The final option makes the
       
   646 stack executable, thus the the example in Figure~\ref{C3}
       
   647 works as intended. While this might be considered
       
   648 cheating....since I explicitly switched off all defences, I
       
   649 hope I was able convey that this is actually not too far
       
   650 from realistic scenarios. I have shown you the classic version
       
   651 of the buffer overflow attacks. Updated variants do exist.
       
   652 Also one might argue buffer-overflow attacks have been
       
   653 solved on computers (desktops or servers) but the computing
       
   654 landscape of nowadays is wider than ever. The main problem
       
   655 nowadays are embedded systems against which attacker can 
       
   656 equally cause a lot of harm and which are much less defended
       
   657 against. Anthony Bonkoski makes a similar argument in his 
       
   658 security blog:
       
   659 
       
   660 \begin{center}
       
   661 \url{http://jabsoft.io/2013/09/25/are-buffer-overflows-solved-yet-a-historical-tale/}
       
   662 \end{center}
       
   663 
       
   664 
       
   665 There is one more rather effective defence against buffer 
       
   666 overflow attacks: Why not using a safe language? Java at its 
       
   667 inception was touted as a safe language because it hides
       
   668 all explicit memory management from the user. This definitely
       
   669 incurs a runtime penalty, but for bog-standard user-input
       
   670 processing applications, speed is not of such an essence 
       
   671 anymore. There are of course also many other programming 
       
   672 languages that are safe, i.e.~immune to buffer overflow
       
   673 attacks.
       
   674 \bigskip
       
   675 
       
   676 \noindent If you want to know more about buffer overflow
       
   677 attacks, the original Phrack article ``Smashing The Stack For
       
   678 Fun And Profit'' by Elias Levy (also known as Aleph One) is an
       
   679 engaging read:
   557 
   680 
   558 \begin{center}
   681 \begin{center}
   559 \url{http://phrack.org/issues/49/14.html}
   682 \url{http://phrack.org/issues/49/14.html}
   560 \end{center} 
   683 \end{center} 
   561 
   684 
   566 \begin{center}
   689 \begin{center}
   567 \url{http://www.mgraziano.info/docs/stsi2010.pdf}
   690 \url{http://www.mgraziano.info/docs/stsi2010.pdf}
   568 \end{center}
   691 \end{center}
   569 
   692 
   570 \noindent updates, as the name says, most information to 2010.
   693 \noindent updates, as the name says, most information to 2010.
   571  
   694 There are also sources for buffer overflow attack in  
   572 \end{document}
   695 
       
   696 
       
   697 \subsubsection*{A Crash-Course for GDB}
       
   698 
       
   699 If you want to try out the examples from KEATS it might be
       
   700 helpful to know about the following commands of the GNU 
       
   701 Debugger:
       
   702 
       
   703 \begin{itemize}
       
   704 \item \texttt{(l)ist n} -- lists the source file from line 
       
   705 \texttt{n}, the number can be omitted 
       
   706 \item \texttt{disassemble fun-name} -- show the assembly code 
       
   707 of a function
       
   708 \item \texttt{run args} -- starts the program, potential 
       
   709 arguments can be given
       
   710 \item \texttt{(b)reak line-number} -- sets break point
       
   711 \item \texttt{(c)ontinue} -- continue execution until next 
       
   712 breakpoint
       
   713 \item \texttt{x/nxw addr} -- prints out \texttt{n} words starting 
       
   714 from address \pcode{addr}, the address could be \code{$esp} 
       
   715 for looking at the content of the stack
       
   716 \item \texttt{x/nxb addr} -- prints out \texttt{n} bytes 
       
   717 \end{itemize}
       
   718 
       
   719  
       
   720 \bigskip\bigskip \noindent \end{document}
   573 
   721 
   574 %%% Local Variables: 
   722 %%% Local Variables: 
   575 %%% mode: latex
   723 %%% mode: latex
   576 %%% TeX-master: t
   724 %%% TeX-master: t
   577 %%% End: 
   725 %%% End: