\documentclass{article}\usepackage{../style}\usepackage{../langs}\begin{document}\section*{Handout 3 (Buffer Overflow Attacks)}By far the most popular attack method on computers are bufferoverflow attacks or variations thereof. The popularity isunfortunate because we now have technology to prevent themeffectively. But these kind of attacks are still very relevanteven today since there are many legacy systems out there andalso many modern embedded systems do not take any precautionsto prevent such attacks.To understand how buffer overflow attacks work we have to havea look at how computers work ``under the hood'' (on themachine level) and also understand some aspects of the C/C++programming language. This might not be everyday fare forcomputer science students, but who said that criminal hackersrestrict themselves to everyday fare? Not to mention thefree-riding script-kiddies who use this technology withoutknowing what are the underlying ideas.For buffer overflow attacks to work, a number of innocentdesign decisions, which are really benign on their own, needto conspire against you. All these decisions were pretty muchtaken in a time when there was no Internet: C was introducedaround 1973, the Internet TCP/IP protocol was standardised in1982 by which time there were maybe 500 servers connectedworldwide (all users were well-behaved), Intel's first 8086CPUs arrived around 1977. So nobody of the creators can really be blamed, but as mentioned above we should already be way beyond the point that buffer overflow attacks areworth a thought. Unfortunately this is far from the truth. I let you think why?One such ``benign'' design decision is how the memory is laidout into different regions for each process. \begin{center} \begin{tikzpicture}[scale=0.7] %\draw[step=1cm] (-3,-3) grid (3,3); \draw[line width=1mm] (-2, -3) rectangle (2,3); \draw[line width=1mm] (-2,1) -- (2,1); \draw[line width=1mm] (-2,-1) -- (2,-1); \draw (0,2) node {\large\tt text}; \draw (0,0) node {\large\tt heap}; \draw (0,-2) node {\large\tt stack}; \draw (-2.7,3) node[anchor=north east] {\tt\begin{tabular}{@{}l@{}}lower\\ address\end{tabular}}; \draw (-2.7,-3) node[anchor=south east] {\tt\begin{tabular}{@{}l@{}}higher\\ address\end{tabular}}; \draw[->, line width=1mm] (-2.5,3) -- (-2.5,-3); \draw (2.7,-2) node[anchor=west] {\tt grows}; \draw (2.7,-3) node[anchor=south west] {\tt\footnotesize older}; \draw (2.7,-1) node[anchor=north west] {\tt\footnotesize newer}; \draw[|->, line width=1mm] (2.5,-3) -- (2.5,-1); \end{tikzpicture}\end{center}\noindent The text region contains the program code (usuallythis region is read-only). The heap stores all data theprogrammer explicitly allocates. For us the most interestingregion is the stack, which contains data mostly associatedwith the ``control flow'' of the program. Notice that the stackgrows from a higher addresses to lower addresses. That means that older items on the stack will be stored behind newer items. Let's look a bit closer what happens with the stack.Consider the the trivial C program.\lstinputlisting[language=C]{../progs/example1.c} \noindent The main function calls \code{foo} with threeargument. Foo contains two (local) buffers. The interestingpoint is what will the stack looks like after Line 3 has beenexecuted? The answer is as follows:\begin{center} \begin{tikzpicture}[scale=0.65] \draw[gray!20,fill=gray!20] (-5, 0) rectangle (-3,-1); \draw[line width=1mm] (-5,-1.2) -- (-5,0.2); \draw[line width=1mm] (-3,-1.2) -- (-3,0.2); \draw (-4,-1) node[anchor=south] {\tt main}; \draw[line width=1mm] (-5,0) -- (-3,0); \draw[gray!20,fill=gray!20] (3, 0) rectangle (5,-1); \draw[line width=1mm] (3,-1.2) -- (3,0.2); \draw[line width=1mm] (5,-1.2) -- (5,0.2); \draw (4,-1) node[anchor=south] {\tt main}; \draw[line width=1mm] (3,0) -- (5,0); %\draw[step=1cm] (-3,-1) grid (3,8); \draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1); \draw[line width=1mm] (-1,-1.2) -- (-1,7.4); \draw[line width=1mm] ( 1,-1.2) -- ( 1,7.4); \draw (0,-1) node[anchor=south] {\tt main}; \draw[line width=1mm] (-1,0) -- (1,0); \draw (0,0) node[anchor=south] {\tt arg$_3$=3}; \draw[line width=1mm] (-1,1) -- (1,1); \draw (0,1) node[anchor=south] {\tt arg$_2$=2}; \draw[line width=1mm] (-1,2) -- (1,2); \draw (0,2) node[anchor=south] {\tt arg$_1$=1}; \draw[line width=1mm] (-1,3) -- (1,3); \draw (0,3.1) node[anchor=south] {\tt ret}; \draw[line width=1mm] (-1,4) -- (1,4); \draw (0,4) node[anchor=south] {\small\tt last sp}; \draw[line width=1mm] (-1,5) -- (1,5); \draw (0,5) node[anchor=south] {\tt buf$_1$}; \draw[line width=1mm] (-1,6) -- (1,6); \draw (0,6) node[anchor=south] {\tt buf$_2$}; \draw[line width=1mm] (-1,7) -- (1,7); \draw[->,line width=0.5mm] (1,4.5) -- (1.8,4.5) -- (1.8, 0) -- (1.1,0); \draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5); \draw (2.6,3.1) node[anchor=south west] {\tt back to main()};\end{tikzpicture}\end{center} \noindent On the left is the stack before \code{foo} iscalled; on the right is the stack after \code{foo} finishes.The function call to \code{foo} in Line 7 pushes the argumentsonto the stack in reverse order---shown in the middle.Therefore first 3 then 2 and finally 1. Then it pushes thereturn address to the stack where execution should resume once\code{foo} has finished. The last stack pointer (\code{sp}) isneeded in order to clean up the stack to the last level---infact there is no cleaning involved, but just the top of thestack will be set back. The two buffers are also on the stack,because they are local data within \code{foo}.Another part of the ``conspiracy'' is that library functionsin C look typically as follows:\begin{center}\lstinputlisting[language=C,numbers=none]{../progs/app5.c}\end{center} \noindent This function copies data from a source \pcode{src}to a destination \pcode{dst}. It copies the data until it reaches a zero-byte (\code{"\\0"}). \bigskip\bigskip\subsubsection*{A Crash-Course on GDB}\begin{itemize}\item \texttt{(l)ist n} -- listing the source file from line \texttt{n}\item \texttt{disassemble fun-name}\item \texttt{run} -- starts the program\item \texttt{(b)reak line-number} -- set break point\item \texttt{(c)ontinue} -- continue execution until next breakpoint in a line number\item \texttt{x/nxw addr} -- print out \texttt{n} words starting from address \pcode{addr}, the address could be \code{$esp} for looking at the content of the stack\item \texttt{x/nxb addr} -- print out \texttt{n} bytes \end{itemize}\bigskip\bigskip \noindent If you want to know more aboutbuffer overflow attacks, the original Phrack article``Smashing The Stack For Fun And Profit'' by Elias Levy (alsoknown as Aleph One) is an engaging read:\begin{center}\url{http://phrack.org/issues/49/14.html}\end{center} \noindent This is an article from 1996 and some parts arenot up-to-date anymore. The article called``Smashing the Stack in 2010''\begin{center}\url{http://www.mgraziano.info/docs/stsi2010.pdf}\end{center}\noindent updates, as the name says, most information to 2010.\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: