46 table [x=Year,y=Percentage] {bufferoverflows.data}; |
46 table [x=Year,y=Percentage] {bufferoverflows.data}; |
47 \end{axis} |
47 \end{axis} |
48 \end{tikzpicture} |
48 \end{tikzpicture} |
49 \end{center} |
49 \end{center} |
50 |
50 |
51 \noindent |
51 \noindent This statistics indicates that in the last |
52 This statistics seems to indicate that in the last five years the |
52 five years or so the number of buffer overflow attacks is |
53 number of buffer overflow attacks is around 10\% of all attacks |
53 around 10\% of all attacks (whereby the absolute numbers of |
54 (whereby the absolute numbers of attacks seem to grow each year). |
54 attacks grow each year). |
55 |
55 |
56 |
56 |
57 To understand how buffer overflow attacks work, we have to have |
57 To understand how buffer overflow attacks work, we have to have |
58 a look at how computers work ``under the hood'' (on the |
58 a look at how computers work ``under the hood'' (on the |
59 machine level) and also understand some aspects of the C/C++ |
59 machine level) and also understand some aspects of the C/C++ |
171 Therefore first 3 then 2 and finally 1. Then it pushes the |
171 Therefore first 3 then 2 and finally 1. Then it pushes the |
172 return address onto the stack where execution should resume |
172 return address onto the stack where execution should resume |
173 once \code{foo} has finished. The last stack pointer |
173 once \code{foo} has finished. The last stack pointer |
174 (\code{sp}) is needed in order to clean up the stack to the |
174 (\code{sp}) is needed in order to clean up the stack to the |
175 last level---in fact there is no cleaning involved, but just |
175 last level---in fact there is no cleaning involved, but just |
176 the top of the stack will be set back. So the last stack |
176 the top of the stack will be set back to this address. So the |
177 pointer also needs to be stored. The two buffers inside |
177 last stack pointer also needs to be stored. The two buffers |
178 \pcode{foo} are on the stack too, because they are local data |
178 inside \pcode{foo} are on the stack too, because they are |
179 within \code{foo}. Consequently the stack in the middle is a |
179 local data within \code{foo}. Consequently the stack in the |
180 snapshot after Line 3 has been executed. In case you are |
180 middle is a snapshot after Line 3 has been executed. In case |
181 familiar with assembly instructions you can also read off this |
181 you are familiar with assembly instructions you can also read |
182 behaviour from the machine code that the \code{gcc} compiler |
182 off this behaviour from the machine code that the \code{gcc} |
183 generates for the program above:\footnote{You can make |
183 compiler generates for the program above:\footnote{You can |
184 \pcode{gcc} generate assembly instructions if you call it with |
184 make \pcode{gcc} generate assembly instructions if you call it |
185 the \pcode{-S} option, for example \pcode{gcc -S out in.c}\;. |
185 with the \pcode{-S} option, for example \pcode{gcc -S out |
186 Or you can look at this code by using the debugger. How to do |
186 in.c}\;. Or you can look at this code by using the debugger. |
187 this will be explained later.} |
187 How to do this will be explained later.} |
188 |
188 |
189 \begin{center}\small |
189 \begin{center}\small |
190 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}} |
190 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}} |
191 {\lstinputlisting[language={[x86masm]Assembler}, |
191 {\lstinputlisting[language={[x86masm]Assembler}, |
192 morekeywords={movl},xleftmargin=5mm] |
192 morekeywords={movl},xleftmargin=5mm] |
199 |
199 |
200 \noindent On the left you can see how the function |
200 \noindent On the left you can see how the function |
201 \pcode{main} prepares in Lines 2 to 7 the stack before calling |
201 \pcode{main} prepares in Lines 2 to 7 the stack before calling |
202 the function \pcode{foo}. You can see that the numbers 3, 2, 1 |
202 the function \pcode{foo}. You can see that the numbers 3, 2, 1 |
203 are stored on the stack (the register \code{$esp} refers to |
203 are stored on the stack (the register \code{$esp} refers to |
204 the top of the stack). On the right you can see how the |
204 the top of the stack; \pcode{$0x1}, \pcode{$0x2} \pcode{$0x3} |
205 function \pcode{foo} stores the two local buffers onto the |
205 are the encodings for \pcode{1} to \pcode{3}). On the right |
206 stack and initialises them with the given data (Lines 2 to 9). |
206 you can see how the function \pcode{foo} stores the two local |
207 Since there is no real computation going on inside |
207 buffers onto the stack and initialises them with the given |
208 \pcode{foo}, the function then just restores the stack to its |
208 data (Lines 2 to 9). Since there is no real computation going |
209 old state and crucially sets the return address where the |
209 on inside \pcode{foo}, the function then just restores the |
210 computation should resume (Line 9 in the code on the left-hand |
210 stack to its old state and crucially sets the return address |
211 side). The instruction \code{ret} then transfers control back |
211 where the computation should resume (Line 9 in the code on the |
212 to the function \pcode{main} to the the instruction just after |
212 left-hand side). The instruction \code{ret} then transfers |
213 the call to \pcode{foo}, that is Line 9. |
213 control back to the function \pcode{main} to the the |
|
214 instruction just after the call to \pcode{foo}, that is Line |
|
215 9. |
214 |
216 |
215 Another part of the ``conspiracy'' of buffer overflow attacks |
217 Another part of the ``conspiracy'' of buffer overflow attacks |
216 is that library functions in C look typically as follows: |
218 is that library functions in C look typically as follows: |
217 |
219 |
218 \begin{center} |
220 \begin{center} |
295 will copy everything up to the zero-byte. Notice that this |
297 will copy everything up to the zero-byte. Notice that this |
296 overwriting of the buffer only works since the newer item, the |
298 overwriting of the buffer only works since the newer item, the |
297 buffer, is stored on the stack before the older items, like |
299 buffer, is stored on the stack before the older items, like |
298 return address and arguments. If it had be the other way |
300 return address and arguments. If it had be the other way |
299 around, then such an overwriting by overflowing a local buffer |
301 around, then such an overwriting by overflowing a local buffer |
300 would just not work. If the designers of C had just been able |
302 would just not work. Had the designers of C had just been able |
301 to foresee what headaches their way of arranging the stack |
303 to foresee what headaches their way of arranging the stack |
302 caused in the time where computers are accessible from |
304 caused in the time where computers are accessible from |
303 everywhere. |
305 everywhere. |
304 |
306 |
305 What the outcome of such an attack is can be illustrated with |
307 What the outcome of such an attack is can be illustrated with |
384 |
386 |
385 \lstinputlisting[language=C,numbers=none]{../progs/o1.c} |
387 \lstinputlisting[language=C,numbers=none]{../progs/o1.c} |
386 |
388 |
387 \noindent These characters represent the machine code for |
389 \noindent These characters represent the machine code for |
388 opening a shell. It seems obtaining such a string requires |
390 opening a shell. It seems obtaining such a string requires |
389 higher-education in the architecture of the target system. But |
391 ``higher-education'' in the architecture of the target system. But |
390 it is actually relatively simple: First there are many such |
392 it is actually relatively simple: First there are many such |
391 string ready-made---just a quick Google query away. Second, |
393 string ready-made---just a quick Google query away. Second, |
392 tools like the debugger can help us again. We can just write |
394 tools like the debugger can help us again. We can just write |
393 the code we want in C, for example this would be the program |
395 the code we want in C, for example this would be the program |
394 for starting a shell: |
396 for starting a shell: |
397 |
399 |
398 \noindent Once compiled, we can use the debugger to obtain |
400 \noindent Once compiled, we can use the debugger to obtain |
399 the machine code, or even the ready-made encoding as character |
401 the machine code, or even the ready-made encoding as character |
400 sequence. |
402 sequence. |
401 |
403 |
402 While easy, obtaining this string is not entirely trivial. |
404 While easy, obtaining this string is not entirely trivial |
403 Remember the functions in C that copy or fill buffers work |
405 using \pcode{gdb}. Remember the functions in C that copy or |
404 such that they copy everything until the zero byte is reached. |
406 fill buffers work such that they copy everything until the |
405 Unfortunately the ``vanilla'' output from the debugger for the |
407 zero byte is reached. Unfortunately the ``vanilla'' output |
406 shell-program above will contain such zero bytes. So a |
408 from the debugger for the shell-program above will contain |
407 post-processing phase is needed to rewrite the machine code in |
409 such zero bytes. So a post-processing phase is needed to |
408 a way that it does not contain any zero bytes. This is like |
410 rewrite the machine code in a way that it does not contain any |
409 some works of literature that have been written so that the |
411 zero bytes. This is like some works of literature that have |
410 letter e, for example, is avoided. The technical term for such |
412 been written so that the letter e, for example, is avoided. |
411 a literature work is \emph{lipogram}.\footnote{The most |
413 The technical term for such a literature work is |
412 famous example of a lipogram is a 50,000 words novel titled |
414 \emph{lipogram}.\footnote{The most famous example of a |
413 Gadsby, see \url{https://archive.org/details/Gadsby}.} For |
415 lipogram is a 50,000 words novel titled Gadsby, see |
414 rewriting the machine code, you might need to use clever |
416 \url{https://archive.org/details/Gadsby}, which avoids the |
415 tricks like |
417 letter `e' throughout.} For rewriting the |
|
418 machine code, you might need to use clever tricks like |
416 |
419 |
417 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}] |
420 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}] |
418 xor %eax, %eax |
421 xor %eax, %eax |
419 \end{lstlisting} |
422 \end{lstlisting} |
420 |
423 |
483 \code{\\0x90}. It is available on every architecture and its |
486 \code{\\0x90}. It is available on every architecture and its |
484 purpose in a CPU is to do nothing apart from waiting a small |
487 purpose in a CPU is to do nothing apart from waiting a small |
485 amount of time. If we now use an address that lets us jump to |
488 amount of time. If we now use an address that lets us jump to |
486 any address in the grey area we are done. The target machine |
489 any address in the grey area we are done. The target machine |
487 will execute these \pcode{NOP} operations until it reaches the |
490 will execute these \pcode{NOP} operations until it reaches the |
488 shellcode. A moment of thought can convince you that this |
491 shellcode. A moment of thought should convince you that this |
489 trick can hugely improve our odds of finding the right |
492 trick can hugely improve our odds of finding the right |
490 address---depending on the size of the buffer, it might only |
493 address---depending on the size of the buffer, it might only |
491 take a few tries to get the shellcode to run. And then we are |
494 take a few tries to get the shellcode to run. And then we are |
492 in. The code for such an attack is shown in Figure~\ref{C3}. |
495 in. The code for such an attack is shown in Figure~\ref{C3}. |
493 It is directly taken from the original paper about ``Smashing |
496 It is directly taken from the original paper about ``Smashing |
556 responses containing the user input. Consider the slight |
559 responses containing the user input. Consider the slight |
557 variant of the program above |
560 variant of the program above |
558 |
561 |
559 \lstinputlisting[language=C]{../progs/C5.c} |
562 \lstinputlisting[language=C]{../progs/C5.c} |
560 |
563 |
561 \noindent Here the programmer actually to take extra care to |
564 \noindent Here the programmer actually tried to take extra |
562 not fall pray to a buffer overflow attack, but in the process |
565 care to not fall pray to a buffer overflow attack, but in the |
563 made the program susceptible to a format string attack. |
566 process made the program susceptible to a format string |
564 Clearly the \pcode{printf} function in Line 7 contains now |
567 attack. Clearly the \pcode{printf} function in Line 7 contains |
565 an explicit format string, but because the commandline |
568 now an explicit format string, but because the commandline |
566 input is copied using the function \pcode{snprintf} the |
569 input is copied using the function \pcode{snprintf} the result |
567 result will be the same---the string can be exploited |
570 will be the same---the string can be exploited by embedding |
568 by embedding format strings into the user input. Here the |
571 format strings into the user input. Here the programmer really |
569 programmer really cannot be blamed (much) because by using |
572 cannot be blamed (much) because by using \pcode{snprintf} he |
570 \pcode{snprintf} he or she tried to make sure only 10 |
573 or she tried to make sure only 10 characters get copied into |
571 characters get copied into the local buffer---in this way |
574 the local buffer---in this way avoiding the obvious buffer |
572 avoiding the obvious buffer overflow attack. |
575 overflow attack. |
573 |
576 |
574 \subsubsection*{Caveats and Defences} |
577 \subsubsection*{Caveats and Defences} |
575 |
578 |
576 How can we defend against these attacks? Well, a reflex could |
579 How can we defend against these attacks? Well, a reflex could |
577 be to blame programmers. Precautions should be taken so that |
580 be to blame programmers. Precautions should be taken by them |
578 buffers cannot been overfilled and format strings should not |
581 so that buffers cannot been overfilled and format strings |
579 be forgotten. This might actually be slightly simpler nowadays |
582 should not be forgotten. This might actually be slightly |
580 since safe versions of the library functions exists, which |
583 simpler nowadays since safe versions of the library functions |
581 always specify the precise number of bytes that should be |
584 exist, which always specify the precise number of bytes that |
582 copied. Compilers also nowadays provide warnings when format |
585 should be copied. Compilers also nowadays provide warnings |
583 strings are omitted. So proper education of programmers is |
586 when format strings are omitted. So proper education of |
584 definitely a part of a defence against such attacks. However, |
587 programmers is definitely a part of a defence against such |
585 if we leave it at that, then we have the mess we have today |
588 attacks. However, if we leave it at that, then we have the |
586 with new attacks discovered almost daily. |
589 mess we have today with new attacks discovered almost daily. |
587 |
590 |
588 There is actually a quite long record of publications |
591 There is actually a quite long record of publications |
589 proposing defences against buffer overflow attacks. One method |
592 proposing defences against buffer overflow attacks. One method |
590 is to declare the stack data as not executable. In this way it |
593 is to declare the stack data as not executable. In this way it |
591 is impossible to inject a payload as shown above which is then |
594 is impossible to inject a payload as shown above which is then |
709 anymore. There are of course also many other programming |
712 anymore. There are of course also many other programming |
710 languages that are safe, i.e.~immune to buffer overflow |
713 languages that are safe, i.e.~immune to buffer overflow |
711 attacks. |
714 attacks. |
712 \bigskip |
715 \bigskip |
713 |
716 |
714 \noindent If you want to know more about buffer overflow |
717 \subsubsection*{Further Reading} |
715 attacks, the original Phrack article ``Smashing The Stack For |
718 |
716 Fun And Profit'' by Elias Levy (also known as Aleph One) is an |
719 If you want to know more about buffer overflow attacks, the |
|
720 original Phrack article ``Smashing The Stack For Fun And |
|
721 Profit'' by Elias Levy (also known as Aleph One) is an |
717 engaging read: |
722 engaging read: |
718 |
723 |
719 \begin{center} |
724 \begin{center} |
720 \url{http://phrack.org/issues/49/14.html} |
725 \url{http://phrack.org/issues/49/14.html} |
721 \end{center} |
726 \end{center} |