sen-material: comparison handouts/ho03.tex

equal deleted inserted replaced

-:57269d9931da
+:8a42736cce27
 enlargelimits=false,
 xtick={1997,1998,2000,...,2014},
 xmin=1996.5,
 xmax=2015,
 ymax=21,
-ytick={0,2,...,20},
+ytick={0,5,...,20},
 scaled ticks=false,
 axis lines=left,
 width=12cm,
 height=5cm,
 ybar,
 table [x=Year,y=Percentage] {bufferoverflows.data};
 \end{axis}
 \end{tikzpicture}
 \end{center}
-\noindent
+\noindent This statistics indicates that in the last
-This statistics seems to indicate that in the last five years the
+five years or so the number of buffer overflow attacks is
-number of buffer overflow attacks is around 10\% of all attacks
+around 10\% of all attacks (whereby the absolute numbers of
-(whereby the absolute numbers of attacks seem to grow each year).
+attacks grow each year).
 To understand how buffer overflow attacks work, we have to have
 a look at how computers work ``under the hood'' (on the
 machine level) and also understand some aspects of the C/C++
 Therefore first 3 then 2 and finally 1. Then it pushes the
 return address onto the stack where execution should resume
 once \code{foo} has finished. The last stack pointer
 (\code{sp}) is needed in order to clean up the stack to the
 last level---in fact there is no cleaning involved, but just
-the top of the stack will be set back. So the last stack
+the top of the stack will be set back to this address. So the
-pointer also needs to be stored. The two buffers inside
+last stack pointer also needs to be stored. The two buffers
-\pcode{foo} are on the stack too, because they are local data
+inside \pcode{foo} are on the stack too, because they are
-within \code{foo}. Consequently the stack in the middle is a
+local data within \code{foo}. Consequently the stack in the
-snapshot after Line 3 has been executed. In case you are
+middle is a snapshot after Line 3 has been executed. In case
-familiar with assembly instructions you can also read off this
+you are familiar with assembly instructions you can also read
-behaviour from the machine code that the \code{gcc} compiler
+off this behaviour from the machine code that the \code{gcc}
-generates for the program above:\footnote{You can make
+compiler generates for the program above:\footnote{You can
-\pcode{gcc} generate assembly instructions if you call it with
+make \pcode{gcc} generate assembly instructions if you call it
-the \pcode{-S} option, for example \pcode{gcc -S out in.c}\;.
+with the \pcode{-S} option, for example \pcode{gcc -S out
-Or you can look at this code by using the debugger. How to do
+in.c}\;. Or you can look at this code by using the debugger.
-this will be explained later.}
+How to do this will be explained later.}
 \begin{center}\small
 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
 {\lstinputlisting[language={[x86masm]Assembler},
 morekeywords={movl},xleftmargin=5mm]
 \noindent On the left you can see how the function
 \pcode{main} prepares in Lines 2 to 7 the stack before calling
 the function \pcode{foo}. You can see that the numbers 3, 2, 1
 are stored on the stack (the register \code{$esp} refers to
-the top of the stack). On the right you can see how the
+the top of the stack; \pcode{$0x1}, \pcode{$0x2} \pcode{$0x3}
-function \pcode{foo} stores the two local buffers onto the
+are the encodings for \pcode{1} to \pcode{3}). On the right
-stack and initialises them with the given data (Lines 2 to 9).
+you can see how the function \pcode{foo} stores the two local
-Since there is no real computation going on inside
+buffers onto the stack and initialises them with the given
-\pcode{foo}, the function then just restores the stack to its
+data (Lines 2 to 9). Since there is no real computation going
-old state and crucially sets the return address where the
+on inside \pcode{foo}, the function then just restores the
-computation should resume (Line 9 in the code on the left-hand
+stack to its old state and crucially sets the return address
-side). The instruction \code{ret} then transfers control back
+where the computation should resume (Line 9 in the code on the
-to the function \pcode{main} to the the instruction just after
+left-hand side). The instruction \code{ret} then transfers
-the call to \pcode{foo}, that is Line 9.
+control back to the function \pcode{main} to the the
+instruction just after the call to \pcode{foo}, that is Line
+9.
 Another part of the ``conspiracy'' of buffer overflow attacks
 is that library functions in C look typically as follows:
 \begin{center}
 will copy everything up to the zero-byte. Notice that this
 overwriting of the buffer only works since the newer item, the
 buffer, is stored on the stack before the older items, like
 return address and arguments. If it had be the other way
 around, then such an overwriting by overflowing a local buffer
-would just not work. If the designers of C had just been able
+would just not work. Had the designers of C had just been able
 to foresee what headaches their way of arranging the stack
 caused in the time where computers are accessible from
 everywhere.
 What the outcome of such an attack is can be illustrated with
 \lstinputlisting[language=C,numbers=none]{../progs/o1.c}
 \noindent These characters represent the machine code for
 opening a shell. It seems obtaining such a string requires
-higher-education in the architecture of the target system. But
+``higher-education'' in the architecture of the target system. But
 it is actually relatively simple: First there are many such
 string ready-made---just a quick Google query away. Second,
 tools like the debugger can help us again. We can just write
 the code we want in C, for example this would be the program
 for starting a shell:
 \noindent Once compiled, we can use the debugger to obtain
 the machine code, or even the ready-made encoding as character
 sequence.
-While easy, obtaining this string is not entirely trivial.
+While easy, obtaining this string is not entirely trivial
-Remember the functions in C that copy or fill buffers work
+using \pcode{gdb}. Remember the functions in C that copy or
-such that they copy everything until the zero byte is reached.
+fill buffers work such that they copy everything until the
-Unfortunately the ``vanilla'' output from the debugger for the
+zero byte is reached. Unfortunately the ``vanilla'' output
-shell-program above will contain such zero bytes. So a
+from the debugger for the shell-program above will contain
-post-processing phase is needed to rewrite the machine code in
+such zero bytes. So a post-processing phase is needed to
-a way that it does not contain any zero bytes. This is like
+rewrite the machine code in a way that it does not contain any
-some works of literature that have been written so that the
+zero bytes. This is like some works of literature that have
-letter e, for example, is avoided. The technical term for such
+been written so that the letter e, for example, is avoided.
-a literature work is \emph{lipogram}.\footnote{The most
+The technical term for such a literature work is
-famous example of a lipogram is a 50,000 words novel titled
+\emph{lipogram}.\footnote{The most famous example of a
-Gadsby, see \url{https://archive.org/details/Gadsby}.} For
+lipogram is a 50,000 words novel titled Gadsby, see
-rewriting the machine code, you might need to use clever
+\url{https://archive.org/details/Gadsby}, which avoids the
-tricks like
+letter `e' throughout.} For rewriting the
+machine code, you might need to use clever tricks like
 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
 xor %eax, %eax
 \end{lstlisting}
 \code{\\0x90}. It is available on every architecture and its
 purpose in a CPU is to do nothing apart from waiting a small
 amount of time. If we now use an address that lets us jump to
 any address in the grey area we are done. The target machine
 will execute these \pcode{NOP} operations until it reaches the
-shellcode. A moment of thought can convince you that this
+shellcode. A moment of thought should convince you that this
 trick can hugely improve our odds of finding the right
 address---depending on the size of the buffer, it might only
 take a few tries to get the shellcode to run. And then we are
 in. The code for such an attack is shown in Figure~\ref{C3}.
 It is directly taken from the original paper about ``Smashing
 responses containing the user input. Consider the slight
 variant of the program above
 \lstinputlisting[language=C]{../progs/C5.c}
-\noindent Here the programmer actually to take extra care to
+\noindent Here the programmer actually tried to take extra
-not fall pray to a buffer overflow attack, but in the process
+care to not fall pray to a buffer overflow attack, but in the
-made the program susceptible to a format string attack.
+process made the program susceptible to a format string
-Clearly the \pcode{printf} function in Line 7 contains now
+attack. Clearly the \pcode{printf} function in Line 7 contains
-an explicit format string, but because the commandline
+now an explicit format string, but because the commandline
-input is copied using the function \pcode{snprintf} the
+input is copied using the function \pcode{snprintf} the result
-result will be the same---the string can be exploited
+will be the same---the string can be exploited by embedding
-by embedding format strings into the user input. Here the
+format strings into the user input. Here the programmer really
-programmer really cannot be blamed (much) because by using
+cannot be blamed (much) because by using \pcode{snprintf} he
-\pcode{snprintf} he or she tried to make sure only 10
+or she tried to make sure only 10 characters get copied into
-characters get copied into the local buffer---in this way
+the local buffer---in this way avoiding the obvious buffer
-avoiding the obvious buffer overflow attack.
+overflow attack.
 \subsubsection*{Caveats and Defences}
 How can we defend against these attacks? Well, a reflex could
-be to blame programmers. Precautions should be taken so that
+be to blame programmers. Precautions should be taken by them
-buffers cannot been overfilled and format strings should not
+so that buffers cannot been overfilled and format strings
-be forgotten. This might actually be slightly simpler nowadays
+should not be forgotten. This might actually be slightly
-since safe versions of the library functions exists, which
+simpler nowadays since safe versions of the library functions
-always specify the precise number of bytes that should be
+exist, which always specify the precise number of bytes that
-copied. Compilers also nowadays provide warnings when format
+should be copied. Compilers also nowadays provide warnings
-strings are omitted. So proper education of programmers is
+when format strings are omitted. So proper education of
-definitely a part of a defence against such attacks. However,
+programmers is definitely a part of a defence against such
-if we leave it at that, then we have the mess we have today
+attacks. However, if we leave it at that, then we have the
-with new attacks discovered almost daily.
+mess we have today with new attacks discovered almost daily.
 There is actually a quite long record of publications
 proposing defences against buffer overflow attacks. One method
 is to declare the stack data as not executable. In this way it
 is impossible to inject a payload as shown above which is then
 anymore. There are of course also many other programming
 languages that are safe, i.e.~immune to buffer overflow
 attacks.
 \bigskip
-\noindent If you want to know more about buffer overflow
+\subsubsection*{Further Reading}
-attacks, the original Phrack article ``Smashing The Stack For
-Fun And Profit'' by Elias Levy (also known as Aleph One) is an
+If you want to know more about buffer overflow attacks, the
+original Phrack article ``Smashing The Stack For Fun And
+Profit'' by Elias Levy (also known as Aleph One) is an
 engaging read:
 \begin{center}
 \url{http://phrack.org/issues/49/14.html}
 \end{center}

changeset 263	8a42736cce27
parent 259	f96d3e48ed3d
child 283	40511897fcc4