handouts/ho03.tex
changeset 263 8a42736cce27
parent 259 f96d3e48ed3d
child 283 40511897fcc4
equal deleted inserted replaced
262:57269d9931da 263:8a42736cce27
    30     enlargelimits=false,
    30     enlargelimits=false,
    31     xtick={1997,1998,2000,...,2014},
    31     xtick={1997,1998,2000,...,2014},
    32     xmin=1996.5,
    32     xmin=1996.5,
    33     xmax=2015,
    33     xmax=2015,
    34     ymax=21,
    34     ymax=21,
    35     ytick={0,2,...,20},
    35     ytick={0,5,...,20},
    36     scaled ticks=false,
    36     scaled ticks=false,
    37     axis lines=left,
    37     axis lines=left,
    38     width=12cm,
    38     width=12cm,
    39     height=5cm,
    39     height=5cm,
    40     ybar,
    40     ybar,
    46   table [x=Year,y=Percentage] {bufferoverflows.data};
    46   table [x=Year,y=Percentage] {bufferoverflows.data};
    47 \end{axis}
    47 \end{axis}
    48 \end{tikzpicture}
    48 \end{tikzpicture}
    49 \end{center}
    49 \end{center}
    50 
    50 
    51 \noindent
    51 \noindent This statistics indicates that in the last
    52 This statistics seems to indicate that in the last five years the
    52 five years or so the number of buffer overflow attacks is
    53 number of buffer overflow attacks is around 10\% of all attacks
    53 around 10\% of all attacks (whereby the absolute numbers of
    54 (whereby the absolute numbers of attacks seem to grow each year).
    54 attacks grow each year).
    55 
    55 
    56 
    56 
    57 To understand how buffer overflow attacks work, we have to have
    57 To understand how buffer overflow attacks work, we have to have
    58 a look at how computers work ``under the hood'' (on the
    58 a look at how computers work ``under the hood'' (on the
    59 machine level) and also understand some aspects of the C/C++
    59 machine level) and also understand some aspects of the C/C++
   171 Therefore first 3 then 2 and finally 1. Then it pushes the
   171 Therefore first 3 then 2 and finally 1. Then it pushes the
   172 return address onto the stack where execution should resume
   172 return address onto the stack where execution should resume
   173 once \code{foo} has finished. The last stack pointer
   173 once \code{foo} has finished. The last stack pointer
   174 (\code{sp}) is needed in order to clean up the stack to the
   174 (\code{sp}) is needed in order to clean up the stack to the
   175 last level---in fact there is no cleaning involved, but just
   175 last level---in fact there is no cleaning involved, but just
   176 the top of the stack will be set back. So the last stack
   176 the top of the stack will be set back to this address. So the
   177 pointer also needs to be stored. The two buffers inside
   177 last stack pointer also needs to be stored. The two buffers
   178 \pcode{foo} are on the stack too, because they are local data
   178 inside \pcode{foo} are on the stack too, because they are
   179 within \code{foo}. Consequently the stack in the middle is a
   179 local data within \code{foo}. Consequently the stack in the
   180 snapshot after Line 3 has been executed. In case you are
   180 middle is a snapshot after Line 3 has been executed. In case
   181 familiar with assembly instructions you can also read off this
   181 you are familiar with assembly instructions you can also read
   182 behaviour from the machine code that the \code{gcc} compiler
   182 off this behaviour from the machine code that the \code{gcc}
   183 generates for the program above:\footnote{You can make
   183 compiler generates for the program above:\footnote{You can
   184 \pcode{gcc} generate assembly instructions if you call it with
   184 make \pcode{gcc} generate assembly instructions if you call it
   185 the \pcode{-S} option, for example \pcode{gcc -S out in.c}\;.
   185 with the \pcode{-S} option, for example \pcode{gcc -S out
   186 Or you can look at this code by using the debugger. How to do
   186 in.c}\;. Or you can look at this code by using the debugger.
   187 this will be explained later.}
   187 How to do this will be explained later.}
   188 
   188 
   189 \begin{center}\small
   189 \begin{center}\small
   190 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
   190 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
   191 {\lstinputlisting[language={[x86masm]Assembler},
   191 {\lstinputlisting[language={[x86masm]Assembler},
   192   morekeywords={movl},xleftmargin=5mm]
   192   morekeywords={movl},xleftmargin=5mm]
   199 
   199 
   200 \noindent On the left you can see how the function
   200 \noindent On the left you can see how the function
   201 \pcode{main} prepares in Lines 2 to 7 the stack before calling
   201 \pcode{main} prepares in Lines 2 to 7 the stack before calling
   202 the function \pcode{foo}. You can see that the numbers 3, 2, 1
   202 the function \pcode{foo}. You can see that the numbers 3, 2, 1
   203 are stored on the stack (the register \code{$esp} refers to
   203 are stored on the stack (the register \code{$esp} refers to
   204 the top of the stack). On the right you can see how the
   204 the top of the stack; \pcode{$0x1}, \pcode{$0x2} \pcode{$0x3}
   205 function \pcode{foo} stores the two local buffers onto the
   205 are the encodings for \pcode{1} to \pcode{3}). On the right
   206 stack and initialises them with the given data (Lines 2 to 9).
   206 you can see how the function \pcode{foo} stores the two local
   207 Since there is no real computation going on inside
   207 buffers onto the stack and initialises them with the given
   208 \pcode{foo}, the function then just restores the stack to its
   208 data (Lines 2 to 9). Since there is no real computation going
   209 old state and crucially sets the return address where the
   209 on inside \pcode{foo}, the function then just restores the
   210 computation should resume (Line 9 in the code on the left-hand
   210 stack to its old state and crucially sets the return address
   211 side). The instruction \code{ret} then transfers control back
   211 where the computation should resume (Line 9 in the code on the
   212 to the function \pcode{main} to the the instruction just after
   212 left-hand side). The instruction \code{ret} then transfers
   213 the call to \pcode{foo}, that is Line 9.
   213 control back to the function \pcode{main} to the the
       
   214 instruction just after the call to \pcode{foo}, that is Line
       
   215 9.
   214  
   216  
   215 Another part of the ``conspiracy'' of buffer overflow attacks
   217 Another part of the ``conspiracy'' of buffer overflow attacks
   216 is that library functions in C look typically as follows:
   218 is that library functions in C look typically as follows:
   217  
   219  
   218 \begin{center}
   220 \begin{center}
   295 will copy everything up to the zero-byte. Notice that this
   297 will copy everything up to the zero-byte. Notice that this
   296 overwriting of the buffer only works since the newer item, the
   298 overwriting of the buffer only works since the newer item, the
   297 buffer, is stored on the stack before the older items, like
   299 buffer, is stored on the stack before the older items, like
   298 return address and arguments. If it had be the other way
   300 return address and arguments. If it had be the other way
   299 around, then such an overwriting by overflowing a local buffer
   301 around, then such an overwriting by overflowing a local buffer
   300 would just not work. If the designers of C had just been able
   302 would just not work. Had the designers of C had just been able
   301 to foresee what headaches their way of arranging the stack
   303 to foresee what headaches their way of arranging the stack
   302 caused in the time where computers are accessible from
   304 caused in the time where computers are accessible from
   303 everywhere. 
   305 everywhere. 
   304 
   306 
   305 What the outcome of such an attack is can be illustrated with
   307 What the outcome of such an attack is can be illustrated with
   384 
   386 
   385 \lstinputlisting[language=C,numbers=none]{../progs/o1.c}
   387 \lstinputlisting[language=C,numbers=none]{../progs/o1.c}
   386 
   388 
   387 \noindent These characters represent the machine code for
   389 \noindent These characters represent the machine code for
   388 opening a shell. It seems obtaining such a string requires
   390 opening a shell. It seems obtaining such a string requires
   389 higher-education in the architecture of the target system. But
   391 ``higher-education'' in the architecture of the target system. But
   390 it is actually relatively simple: First there are many such
   392 it is actually relatively simple: First there are many such
   391 string ready-made---just a quick Google query away. Second,
   393 string ready-made---just a quick Google query away. Second,
   392 tools like the debugger can help us again. We can just write
   394 tools like the debugger can help us again. We can just write
   393 the code we want in C, for example this would be the program
   395 the code we want in C, for example this would be the program
   394 for starting a shell:
   396 for starting a shell:
   397 
   399 
   398 \noindent Once compiled, we can use the debugger to obtain 
   400 \noindent Once compiled, we can use the debugger to obtain 
   399 the machine code, or even the ready-made encoding as character
   401 the machine code, or even the ready-made encoding as character
   400 sequence. 
   402 sequence. 
   401 
   403 
   402 While easy, obtaining this string is not entirely trivial.
   404 While easy, obtaining this string is not entirely trivial
   403 Remember the functions in C that copy or fill buffers work
   405 using \pcode{gdb}. Remember the functions in C that copy or
   404 such that they copy everything until the zero byte is reached.
   406 fill buffers work such that they copy everything until the
   405 Unfortunately the ``vanilla'' output from the debugger for the
   407 zero byte is reached. Unfortunately the ``vanilla'' output
   406 shell-program above will contain such zero bytes. So a
   408 from the debugger for the shell-program above will contain
   407 post-processing phase is needed to rewrite the machine code in
   409 such zero bytes. So a post-processing phase is needed to
   408 a way that it does not contain any zero bytes. This is like
   410 rewrite the machine code in a way that it does not contain any
   409 some works of literature that have been written so that the
   411 zero bytes. This is like some works of literature that have
   410 letter e, for example, is avoided. The technical term for such
   412 been written so that the letter e, for example, is avoided.
   411 a literature work is \emph{lipogram}.\footnote{The most
   413 The technical term for such a literature work is
   412 famous example of a lipogram is a 50,000 words novel titled
   414 \emph{lipogram}.\footnote{The most famous example of a
   413 Gadsby, see \url{https://archive.org/details/Gadsby}.} For
   415 lipogram is a 50,000 words novel titled Gadsby, see
   414 rewriting the machine code, you might need to use clever
   416 \url{https://archive.org/details/Gadsby}, which avoids the 
   415 tricks like
   417 letter `e' throughout.} For rewriting the
       
   418 machine code, you might need to use clever tricks like
   416 
   419 
   417 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
   420 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
   418 xor %eax, %eax
   421 xor %eax, %eax
   419 \end{lstlisting}
   422 \end{lstlisting}
   420 
   423 
   483 \code{\\0x90}. It is available on every architecture and its
   486 \code{\\0x90}. It is available on every architecture and its
   484 purpose in a CPU is to do nothing apart from waiting a small
   487 purpose in a CPU is to do nothing apart from waiting a small
   485 amount of time. If we now use an address that lets us jump to
   488 amount of time. If we now use an address that lets us jump to
   486 any address in the grey area we are done. The target machine
   489 any address in the grey area we are done. The target machine
   487 will execute these \pcode{NOP} operations until it reaches the
   490 will execute these \pcode{NOP} operations until it reaches the
   488 shellcode. A moment of thought can convince you that this
   491 shellcode. A moment of thought should convince you that this
   489 trick can hugely improve our odds of finding the right
   492 trick can hugely improve our odds of finding the right
   490 address---depending on the size of the buffer, it might only
   493 address---depending on the size of the buffer, it might only
   491 take a few tries to get the shellcode to run. And then we are
   494 take a few tries to get the shellcode to run. And then we are
   492 in. The code for such an attack is shown in Figure~\ref{C3}.
   495 in. The code for such an attack is shown in Figure~\ref{C3}.
   493 It is directly taken from the original paper about ``Smashing
   496 It is directly taken from the original paper about ``Smashing
   556 responses containing the user input. Consider the slight
   559 responses containing the user input. Consider the slight
   557 variant of the program above
   560 variant of the program above
   558 
   561 
   559 \lstinputlisting[language=C]{../progs/C5.c}
   562 \lstinputlisting[language=C]{../progs/C5.c}
   560 
   563 
   561 \noindent Here the programmer actually to take extra care to
   564 \noindent Here the programmer actually tried to take extra
   562 not fall pray to a buffer overflow attack, but in the process
   565 care to not fall pray to a buffer overflow attack, but in the
   563 made the program susceptible to a format string attack.
   566 process made the program susceptible to a format string
   564 Clearly the \pcode{printf} function in Line 7 contains now
   567 attack. Clearly the \pcode{printf} function in Line 7 contains
   565 an explicit format string, but because the commandline
   568 now an explicit format string, but because the commandline
   566 input is copied using the function \pcode{snprintf} the
   569 input is copied using the function \pcode{snprintf} the result
   567 result will be the same---the string can be exploited 
   570 will be the same---the string can be exploited by embedding
   568 by embedding format strings into the user input. Here the
   571 format strings into the user input. Here the programmer really
   569 programmer really cannot be blamed (much) because by using
   572 cannot be blamed (much) because by using \pcode{snprintf} he
   570 \pcode{snprintf} he or she tried to make sure only 10
   573 or she tried to make sure only 10 characters get copied into
   571 characters get copied into the local buffer---in this way
   574 the local buffer---in this way avoiding the obvious buffer
   572 avoiding the obvious buffer overflow attack.
   575 overflow attack.
   573 
   576 
   574 \subsubsection*{Caveats and Defences}
   577 \subsubsection*{Caveats and Defences}
   575 
   578 
   576 How can we defend against these attacks? Well, a reflex could 
   579 How can we defend against these attacks? Well, a reflex could
   577 be to blame programmers. Precautions should be taken so that 
   580 be to blame programmers. Precautions should be taken by them
   578 buffers cannot been overfilled and format strings should not
   581 so that buffers cannot been overfilled and format strings
   579 be forgotten. This might actually be slightly simpler nowadays 
   582 should not be forgotten. This might actually be slightly
   580 since safe versions of the library functions exists, which
   583 simpler nowadays since safe versions of the library functions
   581 always specify the precise number of bytes that should be 
   584 exist, which always specify the precise number of bytes that
   582 copied. Compilers also nowadays provide warnings when format
   585 should be copied. Compilers also nowadays provide warnings
   583 strings are omitted. So proper education of programmers is 
   586 when format strings are omitted. So proper education of
   584 definitely a part of a defence against such attacks. However,
   587 programmers is definitely a part of a defence against such
   585 if we leave it at that, then we have the mess we have today
   588 attacks. However, if we leave it at that, then we have the
   586 with new attacks discovered almost daily. 
   589 mess we have today with new attacks discovered almost daily. 
   587 
   590 
   588 There is actually a quite long record of publications
   591 There is actually a quite long record of publications
   589 proposing defences against buffer overflow attacks. One method
   592 proposing defences against buffer overflow attacks. One method
   590 is to declare the stack data as not executable. In this way it
   593 is to declare the stack data as not executable. In this way it
   591 is impossible to inject a payload as shown above which is then
   594 is impossible to inject a payload as shown above which is then
   709 anymore. There are of course also many other programming 
   712 anymore. There are of course also many other programming 
   710 languages that are safe, i.e.~immune to buffer overflow
   713 languages that are safe, i.e.~immune to buffer overflow
   711 attacks.
   714 attacks.
   712 \bigskip
   715 \bigskip
   713 
   716 
   714 \noindent If you want to know more about buffer overflow
   717 \subsubsection*{Further Reading}
   715 attacks, the original Phrack article ``Smashing The Stack For
   718 
   716 Fun And Profit'' by Elias Levy (also known as Aleph One) is an
   719 If you want to know more about buffer overflow attacks, the
       
   720 original Phrack article ``Smashing The Stack For Fun And
       
   721 Profit'' by Elias Levy (also known as Aleph One) is an
   717 engaging read:
   722 engaging read:
   718 
   723 
   719 \begin{center}
   724 \begin{center}
   720 \url{http://phrack.org/issues/49/14.html}
   725 \url{http://phrack.org/issues/49/14.html}
   721 \end{center} 
   726 \end{center}