sen-material: comparison handouts/ho03.tex

equal deleted inserted replaced

-:ea921d6a1819
+:603cbd28e988
 \noindent The text region contains the program code (usually
 this region is read-only). The heap stores all data the
 programmer explicitly allocates. For us the most interesting
 region is the stack, which contains data mostly associated
 with the control flow of the program. Notice that the stack
-grows from a higher addresses to lower addresses. That means
+grows from higher addresses to lower addresses (i.e.~from the
-that older items on the stack will be stored behind, or after,
+back to the front). That means that older items on the stack
-newer items. Let's look a bit closer what happens with the
+will be stored behind, or after, newer items. Let's look a bit
-stack when a program is running. Consider the following simple
+closer what happens with the stack when a program is running.
-C program.
+Consider the following simple C program.
 \lstinputlisting[language=C]{../progs/example1.c}
 \noindent The \code{main} function calls in Line 7 the
 function \code{foo} with three arguments. \code{Foo} creates
 behaviour from the machine code that the \code{gcc} compiler
 generates for the program above:\footnote{You can make
 \pcode{gcc} generate assembly instructions if you call it with
 the \pcode{-S} option, for example \pcode{gcc -S out in.c}\;.
 Or you can look at this code by using the debugger. How to do
-this will be explained later.}.
+this will be explained later.}
 \begin{center}\small
 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
 {\lstinputlisting[language={[x86masm]Assembler},
 morekeywords={movl},xleftmargin=5mm]
 computation should resume (Line 9 in the code on the left-hand
 side). The instruction \code{ret} then transfers control back
 to the function \pcode{main} to the the instruction just after
 the call to \pcode{foo}, that is Line 9.
-Another part of the ``conspiracy'' is that library functions
+Another part of the ``conspiracy'' of buffer overflow attacks
-in C look typically as follows:
+is that library functions in C look typically as follows:
 \begin{center}
 \lstinputlisting[language=C,numbers=none]{../progs/app5.c}
 \end{center}
 \noindent This function copies data from a source \pcode{src}
 to a destination \pcode{dst}. The important point is that it
 copies the data until it reaches a zero-byte (\code{"\\0"}).
+This is a convention of the C language which assumes all
+strings are terminated by such a zero-byte.
 The central idea of the buffer overflow attack is to overwrite
-the return address on the stack which designates where the
+the return address on the stack. This address decides where
-control flow of the program should resume once the function at
+the control flow of the program should resume once the
-hand has finished its computation. So if we have somewhere in
+function at hand has finished its computation. So if we
-a function a local a buffer, say
+can control this address, then we can modify the control
+flow of a program. To launch an attack we need
+somewhere in a function a local a buffer, say
 \begin{center}
 \code{char buf[8];}
 \end{center}
-\noindent
+\noindent which is filled by some user input. The
-then the corresponding stack will look as follows
+corresponding stack of such a function will look as follows
 \begin{center}
 \begin{tikzpicture}[scale=0.65]
 %\draw[step=1cm] (-3,-1) grid (3,8);
 \draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1);
 will copy everything up to the zero-byte. Notice that this
 overwriting of the buffer only works since the newer item, the
 buffer, is stored on the stack before the older items, like
 return address and arguments. If it had be the other way
 around, then such an overwriting by overflowing a local buffer
-would just not work.
+would just not work. If the designers of C had just been able
+to foresee what headaches their way of arranging the stack
+caused in the time where computers are accessible from
+everywhere.
 What the outcome of such an attack is can be illustrated with
 the code shown in Figure~\ref{C2}. Under ``normal operation''
 this program ask for a login-name and a password. Both of
 which are stored in \code{char} buffers of length 8. The
 function \pcode{match} tests whether two such buffers contain
-the same. If yes, then the function lets you ``in'' (by
+the same content. If yes, then the function lets you ``in''
-printing \pcode{Welcome}). If not, it denies access (by
+(by printing \pcode{Welcome}). If not, it denies access (by
 printing \pcode{Wrong identity}). The vulnerable function is
 \code{get_line} in Lines 11 to 19. This function does not take
 any precautions about the buffer of 8 characters being filled
-beyond this 8-character-limit. Let us suppose the login name
+beyond its 8-character-limit. Let us suppose the login name
 is \pcode{test}. Then the buffer overflow can be triggered
 with a specially crafted string as password:
 \begin{center}
 \code{AAAAAAAABBBB\\x2c\\x85\\x04\\x08\\n}
 \noindent The address at the end happens to be the one for the
 function \pcode{welcome()}. This means even with this input
 (where the login name and password clearly do not match) the
 program will still print out \pcode{Welcome}. The only
-information we need for this attack is to know where the
+information we need for this attack to work is to know where
-function \pcode{welcome()} starts in memory. This information
+the function \pcode{welcome()} starts in memory. This
-can be easily obtained by starting the program inside the
+information can be easily obtained by starting the program
-debugger and disassembling this function.
+inside the debugger and disassembling this function.
 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
 morekeywords={movl,movw}]
 $ gdb C2
 GNU gdb (GDB) 7.2-ubuntu
 starts at address \pcode{0x0804852c} (top address in the
 left column).
 \begin{figure}[p]
 \lstinputlisting[language=C]{../progs/C2.c}
-\caption{A suspicious login implementation.\label{C2}}
+\caption{A vulnerable login implementation.\label{C2}}
 \end{figure}
 This kind of attack was very popular with commercial programs
 that needed a key to be unlocked. Historically, hackers first
 broke the rather weak encryption of these locking mechanisms.
 \subsection*{Paylods}
 Unfortunately, much more harm can be caused by buffer overflow
 attacks. This is achieved by injecting code that will be run
 once the return address is appropriately modified. Typically
-the code that will be injected is for running a shell. This
+the code that will be injected starts a shell. This gives the
-gives the attacker the ability to run programs on the target
+attacker the ability to run programs on the target machine and
-machine and have a good look around, provided the attacked
+to have a good look around, provided the attacked process was not
-process was not already running as root.\footnote{In that case
+already running as root.\footnote{In that case the attacker
-the attacker would do already congratulate him or herself to
+would already congratulate him or herself to another
-another computer under full control.} In order to be send as
+computer under full control.} In order to be send as part of
-part of the string that is overflowing the buffer, we need the
+the string that is overflowing the buffer, we need the code to
-code to be represented as a sequence of characters. For
+be represented as a sequence of characters. For example
-example
 \lstinputlisting[language=C,numbers=none]{../progs/o1.c}
 \noindent These characters represent the machine code for
 opening a shell. It seems obtaining such a string requires
 higher-education in the architecture of the target system. But
 it is actually relatively simple: First there are many such
 string ready-made---just a quick Google query away. Second,
 tools like the debugger can help us again. We can just write
 the code we want in C, for example this would be the program
-for starting a shell
+for starting a shell:
 \lstinputlisting[language=C,numbers=none]{../progs/shell.c}
 \noindent Once compiled, we can use the debugger to obtain
 the machine code, or even the ready-made encoding as character
 Unfortunately the ``vanilla'' output from the debugger for the
 shell-program above will contain such zero bytes. So a
 post-processing phase is needed to rewrite the machine code in
 a way that it does not contain any zero bytes. This is like
 some works of literature that have been written so that the
-letter 'i', for example, is avoided. For rewriting the machine
+letter e, for example, is avoided. The technical term for such
-code, you might need to use clever tricks like
+a literature work is \emph{lipogram}.\footnote{The most
+famous example of a lipogram is a 50,000 words novel titled
+Gadsby, see \url{https://archive.org/details/Gadsby}.} For
+rewriting the machine code, you might need to use clever
+tricks like
 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
 xor %eax, %eax
 \end{lstlisting}
-\noindent This instruction does not contain any zero byte when
+\noindent This instruction does not contain any zero-byte when
-encoded, but produces a zero byte on the stack when run.
+encoded as string, but produces a zero-byte on the stack when
+run.
-Having removed the zero bytes we can craft the string that
-will be send to the target computer. It is typically of the
+Having removed the zero-bytes we can craft the string that
-form
+will be send to the target computer. This of course requires
+that the buffer we are trying to attack can at least contain
-\begin{center}
+the shellcode we want to run. But as you can see this is only
-\begin{tikzpicture}[scale=0.7]
+47 bytes, which is a very low bar to jump over. More
+formidable is the choice of finding the right address to jump
+to. The string is typically of the form
+\begin{center}
+\begin{tikzpicture}[scale=0.6]
 \draw[line width=1mm] (-2, -1) rectangle (2,3);
 \draw[line width=1mm] (-2,1.9) -- (2,1.9);
 \draw (0,2.5) node {\large\tt shell code};
 \draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7);
 \draw[->,line width=0.3mm] (1.05, -1) -- (1.05,-1.7) --
 \draw (-2, 3) node[anchor=north east] {\LARGE \color{codegreen}{``}};
 \draw ( 2,-0.9) node[anchor=west] {\LARGE\color{codegreen}{''}};
 \end{tikzpicture}
 \end{center}
-\noindent This of course requires that the buffer we are
+\noindent where we need to be very precise with the address
-trying to attack can at least contain the shellcode we want to
+with which we will overwrite the buffer. It has to be
-run. But as you can see this is only 47 bytes, which is a very
+precisely the first byte of the shellcode. While this is easy
-low bar to jump over. More formidable is the choice of finding
+with the help of a debugger (as seen before), we typically
-the right address to jump to. As indicated in the picture we
+cannot run anything, including a debugger, on the machine yet
-need to be very precise with the address with which we will
+we target. And the address is very specific to the setup of
-overwrite the buffer. It has to be precisely the first byte of
+the target machine. One way of finding out what the right
-the shellcode. While this is easy with the help of a debugger
+address is is to try out one by one every possible
-(as seen before), we typically cannot run anything on the
+address until we get lucky. With the large memories available
-machine yet we target. And the address is very specific to the
+today, however, the odds are long. And if we try out too many
-setup of the target machine. One way of finding out what the
+possible candidates too quickly, we might be detected by the
-right address is is to try out one by one until we get lucky.
+system administrator of the target system.
-With the large memories available today, however, the odds are
-long. And if we try out too many possible candidates too
-quickly, we might be detected by the system administrator of
-the target system.
 We can improve our odds considerably by following a clever
 trick. Instead of adding the shellcode at the beginning of the
 string, we should add it at the end, just before we overflow
 the buffer, for example
 \begin{center}
-\begin{tikzpicture}[scale=0.7]
+\begin{tikzpicture}[scale=0.6]
+\draw[gray!50,fill=gray!50] (-2,0.3) rectangle (2,3);
 \draw[line width=1mm] (-2, -1) rectangle (2,3);
-\draw[line width=1mm] (-2,1.9) -- (2,1.9);
+\draw[line width=1mm] (-2,0.3) -- (2,0.3);
-\draw (0,2.5) node {\large\tt shell code};
+\draw[line width=1mm] (-2,-0.7) -- (2,-0.7);
+\draw (0,-0.2) node {\large\tt shell code};
 \draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7);
 \draw (-2, 3) node[anchor=north east] {\LARGE \color{codegreen}{``}};
 \draw ( 2,-0.9) node[anchor=west] {\LARGE\color{codegreen}{''}};
 \end{tikzpicture}
 \end{center}
 \noindent Then we can fill up the gray part of the string with
-a \pcode{NOP} operation. The code for this operation is
+\pcode{NOP} operations. The code for this operation is
 \code{\\0x90}. It is available on every architecture and its
-purpose it to to nothing apart from waiting a small amount of
+purpose in a CPU is to do nothing apart from waiting a small
-time. If we now use an address that lets us jump to any
+amount of time. If we now use an address that lets us jump to
-address in the gray area we are done. The target machine will
+any address in the gray area we are done. The target machine
-execute these \pcode{NOP} operations until it reaches the
+will execute these \pcode{NOP} operations until it reaches the
 shellcode. A moment of thought can convince you that this
 trick can hugely improve our odds of finding the right
-address---depending on the size of the buffer, it might
+address---depending on the size of the buffer, it might only
-only take a few tries to get the shellcode to run.
+take a few tries to get the shellcode to run. And then
+we are in. The code for such an attack is show in
+Figure~\ref{overflow}.
+\begin{figure}[p]
+\lstinputlisting[language=C]{../progs/overflow.c}
+\caption{Overwriting a buffer with a paylod.\label{overflow}}
+\end{figure}
 \bigskip\bigskip
 \subsubsection*{A Crash-Course for GDB}
 \begin{itemize}

changeset 230	603cbd28e988
parent 229	ea921d6a1819
child 232	abc45724b267