sen-material: comparison handouts/ho03.tex

equal deleted inserted replaced

-:88416b7df38c
+:0105257429f3
 \documentclass{article}
 \usepackage{../style}
+\usepackage{../langs}
 \begin{document}
 \section*{Handout 3 (Buffer Overflow Attacks)}
 computer science students, but who said that criminal hackers
 restrict themselves to everyday fare? Not to mention the
 free-riding script-kiddies who use this technology without
 knowing what are the underlying ideas.
+For buffer overflow attacks to work, a number of innocent
+design decisions, which are really benign on their own, need
+to conspire against you. All these decisions were pretty much
+taken in a time when there was no Internet: C was introduced
+around 1973, the Internet TCP/IP protocol was standardised in
+1982 by which time there were maybe 500 servers connected
+worldwide (all users were well-behaved), Intel's first 8086
+CPUs arrived around 1977. So nobody of the creators can
+really be blamed, but as mentioned above we should already
+be way beyond the point that buffer overflow attacks are
+worth a thought. Unfortunately this is far from the truth. I
+let you think why?
+One such ``benign'' design decision is how the memory is laid
+out into different regions for each process.
-\bigskip
+\begin{center}
-For buffer overflow attacks to work a number of innocent
+\begin{tikzpicture}[scale=0.7]
-design decisions, which are benign on their own, need to
+%\draw[step=1cm] (-3,-3) grid (3,3);
-conspire against you. One such design decision is how the
+\draw[line width=1mm] (-2, -3) rectangle (2,3);
-memory is laid out for each process.
+\draw[line width=1mm] (-2,1) -- (2,1);
+\draw[line width=1mm] (-2,-1) -- (2,-1);
+\draw (0,2) node {\large\tt text};
+\draw (0,0) node {\large\tt heap};
+\draw (0,-2) node {\large\tt stack};
+\draw (-2.7,3) node[anchor=north east] {\tt\begin{tabular}{@{}l@{}}lower\\ address\end{tabular}};
+\draw (-2.7,-3) node[anchor=south east] {\tt\begin{tabular}{@{}l@{}}higher\\ address\end{tabular}};
+\draw[->, line width=1mm] (-2.5,3) -- (-2.5,-3);
+\draw (2.7,-2) node[anchor=west] {\tt grows};
+\draw (2.7,-3) node[anchor=south west] {\tt\footnotesize older};
+\draw (2.7,-1) node[anchor=north west] {\tt\footnotesize newer};
+\draw[|->, line width=1mm] (2.5,-3) -- (2.5,-1);
+\end{tikzpicture}
+\end{center}
+\noindent The text region contains the program code (usually
+this region is read-only). The heap stores all data the
+programmer explicitly allocates. For us the most interesting
+region is the stack, which contains data mostly associated
+with the ``control flow'' of the program. Notice that the stack
+grows from a higher addresses to lower addresses. That means
+that older items on the stack will be stored behind newer
+items. Let's look a bit closer what happens with the stack.
+Consider the the trivial C program.
+\lstinputlisting[language=C]{../progs/example1.c}
+\noindent The main function calls \code{foo} with three
+argument. Foo contains two (local) buffers. The interesting
+point is what will the stack looks like after Line 3 has been
+executed? The answer is as follows:
+\begin{center}
+\begin{tikzpicture}[scale=0.65]
+\draw[gray!20,fill=gray!20] (-5, 0) rectangle (-3,-1);
+\draw[line width=1mm] (-5,-1.2) -- (-5,0.2);
+\draw[line width=1mm] (-3,-1.2) -- (-3,0.2);
+\draw (-4,-1) node[anchor=south] {\tt main};
+\draw[line width=1mm] (-5,0) -- (-3,0);
+\draw[gray!20,fill=gray!20] (3, 0) rectangle (5,-1);
+\draw[line width=1mm] (3,-1.2) -- (3,0.2);
+\draw[line width=1mm] (5,-1.2) -- (5,0.2);
+\draw (4,-1) node[anchor=south] {\tt main};
+\draw[line width=1mm] (3,0) -- (5,0);
+%\draw[step=1cm] (-3,-1) grid (3,8);
+\draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1);
+\draw[line width=1mm] (-1,-1.2) -- (-1,7.4);
+\draw[line width=1mm] ( 1,-1.2) -- ( 1,7.4);
+\draw (0,-1) node[anchor=south] {\tt main};
+\draw[line width=1mm] (-1,0) -- (1,0);
+\draw (0,0) node[anchor=south] {\tt arg$_3$=3};
+\draw[line width=1mm] (-1,1) -- (1,1);
+\draw (0,1) node[anchor=south] {\tt arg$_2$=2};
+\draw[line width=1mm] (-1,2) -- (1,2);
+\draw (0,2) node[anchor=south] {\tt arg$_1$=1};
+\draw[line width=1mm] (-1,3) -- (1,3);
+\draw (0,3.1) node[anchor=south] {\tt ret};
+\draw[line width=1mm] (-1,4) -- (1,4);
+\draw (0,4) node[anchor=south] {\small\tt last sp};
+\draw[line width=1mm] (-1,5) -- (1,5);
+\draw (0,5) node[anchor=south] {\tt buf$_1$};
+\draw[line width=1mm] (-1,6) -- (1,6);
+\draw (0,6) node[anchor=south] {\tt buf$_2$};
+\draw[line width=1mm] (-1,7) -- (1,7);
+\draw[->,line width=0.5mm] (1,4.5) -- (1.8,4.5) -- (1.8, 0) -- (1.1,0);
+\draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5);
+\draw (2.6,3.1) node[anchor=south west] {\tt back to main()};
+\end{tikzpicture}
+\end{center}
+\noindent On the left is the stack before \code{foo} is
+called; on the right is the stack after \code{foo} finishes.
+The function call to \code{foo} in Line 7 pushes the arguments
+onto the stack in reverse order---shown in the middle.
+Therefore first 3 then 2 and finally 1. Then it pushes the
+return address to the stack where execution should resume once
+\code{foo} has finished. The last stack pointer (\code{sp}) is
+needed in order to clean up the stack to the last level---in
+fact there is no cleaning involved, but just the top of the
+stack will be set back. The two buffers are also on the stack,
+because they are local data within \code{foo}.
+Another part of the ``conspiracy'' is that library functions
+in C look typically as follows:
+\begin{center}
+\lstinputlisting[language=C,numbers=none]{../progs/app5.c}
+\end{center}
+\noindent This function copies data from a source \pcode{src}
+to a destination \pcode{dst}. It copies the data until it
+reaches a zero-byte (\code{"\\0"}).
+\bigskip\bigskip
+\subsubsection*{A Crash-Course on GDB}
+\begin{itemize}
+\item \texttt{(l)ist n} -- listing the source file from line
+\texttt{n}
+\item \texttt{disassemble fun-name}
+\item \texttt{run} -- starts the program
+\item \texttt{(b)reak line-number} -- set break point
+\item \texttt{(c)ontinue} -- continue execution until next
+breakpoint in a line number
+\item \texttt{x/nxw addr} -- print out \texttt{n} words starting
+from address \pcode{addr}, the address could be \code{$esp}
+for looking at the content of the stack
+\item \texttt{x/nxb addr} -- print out \texttt{n} bytes
+\end{itemize}
 \bigskip\bigskip \noindent If you want to know more about
 buffer overflow attacks, the original Phrack article
 ``Smashing The Stack For Fun And Profit'' by Elias Levy (also
 known as Aleph One) is an engaging read:
 \begin{center}
 \url{http://phrack.org/issues/49/14.html}
 \end{center}
+\noindent This is an article from 1996 and some parts are
+not up-to-date anymore. The article called
+``Smashing the Stack in 2010''
+\begin{center}
+\url{http://www.mgraziano.info/docs/stsi2010.pdf}
+\end{center}
+\noindent updates, as the name says, most information to 2010.
 \end{document}
 %%% Local Variables:
 %%% mode: latex

changeset 206	0105257429f3
parent 204	8fe0dc898c73
child 209	fd43a9cd9c07