62 |
64 |
63 \noindent The text region contains the program code (usually |
65 \noindent The text region contains the program code (usually |
64 this region is read-only). The heap stores all data the |
66 this region is read-only). The heap stores all data the |
65 programmer explicitly allocates. For us the most interesting |
67 programmer explicitly allocates. For us the most interesting |
66 region is the stack, which contains data mostly associated |
68 region is the stack, which contains data mostly associated |
67 with the ``control flow'' of the program. Notice that the stack |
69 with the control flow of the program. Notice that the stack |
68 grows from a higher addresses to lower addresses. That means |
70 grows from a higher addresses to lower addresses. That means |
69 that older items on the stack will be stored behind newer |
71 that older items on the stack will be stored behind, or after, |
70 items. Let's look a bit closer what happens with the stack. |
72 newer items. Let's look a bit closer what happens with the |
71 Consider the the trivial C program. |
73 stack when a program is running. Consider the following simple |
|
74 C program. |
72 |
75 |
73 \lstinputlisting[language=C]{../progs/example1.c} |
76 \lstinputlisting[language=C]{../progs/example1.c} |
74 |
77 |
75 \noindent The main function calls \code{foo} with three |
78 \noindent The \code{main} function calls \code{foo} with three |
76 argument. Foo contains two (local) buffers. The interesting |
79 arguments. \code{Foo} contains two (local) buffers. The |
77 point is what will the stack looks like after Line 3 has been |
80 interesting point for us will be what will the stack loke |
78 executed? The answer is as follows: |
81 like after Line 3 has been executed? The answer is as follows: |
79 |
82 |
80 \begin{center} |
83 \begin{center} |
81 \begin{tikzpicture}[scale=0.65] |
84 \begin{tikzpicture}[scale=0.65] |
82 \draw[gray!20,fill=gray!20] (-5, 0) rectangle (-3,-1); |
85 \draw[gray!20,fill=gray!20] (-5, 0) rectangle (-3,-1); |
83 \draw[line width=1mm] (-5,-1.2) -- (-5,0.2); |
86 \draw[line width=1mm] (-5,-1.2) -- (-5,0.2); |
126 return address to the stack where execution should resume once |
129 return address to the stack where execution should resume once |
127 \code{foo} has finished. The last stack pointer (\code{sp}) is |
130 \code{foo} has finished. The last stack pointer (\code{sp}) is |
128 needed in order to clean up the stack to the last level---in |
131 needed in order to clean up the stack to the last level---in |
129 fact there is no cleaning involved, but just the top of the |
132 fact there is no cleaning involved, but just the top of the |
130 stack will be set back. The two buffers are also on the stack, |
133 stack will be set back. The two buffers are also on the stack, |
131 because they are local data within \code{foo}. |
134 because they are local data within \code{foo}. So in the |
132 |
135 middle is a snapshot of the stack after Line 3 has been |
|
136 executed. In case you are familiar with assembly instructions |
|
137 you can also read off this behaviour from the machine |
|
138 code that the \code{gcc} compiler generates for the program |
|
139 above:\footnote{You can make \pcode{gcc} generate assembly |
|
140 instructions if you call it with the \pcode{-S} option, |
|
141 for example \pcode{gcc -S out in.c}\;. Or you can look |
|
142 at this code by using the debugger. This will be explained |
|
143 later.}. |
|
144 |
|
145 \begin{center}\small |
|
146 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}} |
|
147 {\lstinputlisting[language={[x86masm]Assembler}, |
|
148 morekeywords={movl},xleftmargin=5mm] |
|
149 {../progs/example1a.s}} & |
|
150 {\lstinputlisting[language={[x86masm]Assembler}, |
|
151 morekeywords={movl,movw},xleftmargin=5mm] |
|
152 {../progs/example1b.s}} |
|
153 \end{tabular} |
|
154 \end{center} |
|
155 |
|
156 \noindent On the left you can see how the function |
|
157 \pcode{main} prepares in Lines 2 to 7 the stack, before |
|
158 calling the function \pcode{foo}. You can see that the |
|
159 numbers 3, 2, 1 are stored on the stack (the register |
|
160 \code{$esp} refers to the top of the stack). On the right |
|
161 you can see how the function \pcode{foo} stores the two local |
|
162 buffers onto the stack and initialises them with the given |
|
163 data (Lines 2 to 9). Since there is no real computation |
|
164 going on inside \pcode{foo} the function then just restores |
|
165 the stack to its old state and crucially sets the return |
|
166 address where the computation should resume (Line 9 in the |
|
167 code on the left hand side). The instruction \code{ret} then |
|
168 transfers control back to the function \pcode{main} to the |
|
169 teh instruction just after the call, namely Line 9. |
133 |
170 |
134 Another part of the ``conspiracy'' is that library functions |
171 Another part of the ``conspiracy'' is that library functions |
135 in C look typically as follows: |
172 in C look typically as follows: |
136 |
173 |
137 \begin{center} |
174 \begin{center} |
139 \end{center} |
176 \end{center} |
140 |
177 |
141 \noindent This function copies data from a source \pcode{src} |
178 \noindent This function copies data from a source \pcode{src} |
142 to a destination \pcode{dst}. The important point is that it |
179 to a destination \pcode{dst}. The important point is that it |
143 copies the data until it reaches a zero-byte (\code{"\\0"}). |
180 copies the data until it reaches a zero-byte (\code{"\\0"}). |
|
181 |
|
182 The central idea of the buffer overflow attack is to overwrite |
|
183 the return address on the stack which states where the control |
|
184 flow of the program should resume once the function at hand |
|
185 has finished its computation. So if we have somewhere in a |
|
186 function a local a buffer, say |
|
187 |
|
188 \begin{center} |
|
189 \code{char buf[8];} |
|
190 \end{center} |
|
191 |
|
192 \noindent |
|
193 then the corresponding stack will look as follows |
|
194 |
|
195 \begin{center} |
|
196 \begin{tikzpicture}[scale=0.65] |
|
197 %\draw[step=1cm] (-3,-1) grid (3,8); |
|
198 \draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1); |
|
199 \draw[line width=1mm] (-1,-1.2) -- (-1,6.4); |
|
200 \draw[line width=1mm] ( 1,-1.2) -- ( 1,6.4); |
|
201 \draw (0,-1) node[anchor=south] {\tt main}; |
|
202 \draw[line width=1mm] (-1,0) -- (1,0); |
|
203 \draw (0,0) node[anchor=south] {\tt arg$_3$=3}; |
|
204 \draw[line width=1mm] (-1,1) -- (1,1); |
|
205 \draw (0,1) node[anchor=south] {\tt arg$_2$=2}; |
|
206 \draw[line width=1mm] (-1,2) -- (1,2); |
|
207 \draw (0,2) node[anchor=south] {\tt arg$_1$=1}; |
|
208 \draw[line width=1mm] (-1,3) -- (1,3); |
|
209 \draw (0,3.1) node[anchor=south] {\tt ret}; |
|
210 \draw[line width=1mm] (-1,4) -- (1,4); |
|
211 \draw (0,4) node[anchor=south] {\small\tt last sp}; |
|
212 \draw[line width=1mm] (-1,5) -- (1,5); |
|
213 \draw (0,5) node[anchor=south] {\tt buf}; |
|
214 \draw[line width=1mm] (-1,6) -- (1,6); |
|
215 \draw (2,5.1) node[anchor=south] {\code{$esp}}; |
|
216 \draw[<-,line width=0.5mm] (1.1,6) -- (2.5,6); |
|
217 |
|
218 \draw[->,line width=0.5mm] (1,4.5) -- (2.5,4.5); |
|
219 \draw (2.6,4.1) node[anchor=south west] {\code{??}}; |
|
220 |
|
221 \draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5); |
|
222 \draw (2.6,3.1) node[anchor=south west] {\tt jump to \code{\\x080483f4}}; |
|
223 \end{tikzpicture} |
|
224 \end{center} |
|
225 |
|
226 \noindent We need to fill this over its limit of |
|
227 8 characters so that it overwrites the stack pointer |
|
228 and then overwrites the return address. If, for example, |
|
229 we want to jump to a specific address in memory, say, |
|
230 \pcode{\\x080483f4} then we need to fill the |
|
231 buffer for example as follows |
|
232 |
|
233 \begin{center} |
|
234 \code{char buf[8] = "AAAAAAAABBBB\\xf4\\x83\\x04\\x08";} |
|
235 \end{center} |
|
236 |
|
237 \noindent The first 8 \pcode{A}s fill the buffer to the rim; |
|
238 the next four \pcode{B}s overwrite the stack pointer (with |
|
239 what data we overwrite this part is usually not important); |
|
240 then comes the address we want to jump to. Notice that we have |
|
241 to give the address in the reverse order. All addresses on |
|
242 Intel CPUs need to be given in this way. Since the string is |
|
243 enclosed in double quotes, the C convention is that the string |
|
244 internally will automatically be terminated by a zero-byte. If |
|
245 the programmer uses functions like \pcode{strcpy} for filling |
|
246 the buffer \pcode{buf}, then we can be sure it will overwrite |
|
247 the stack in this manner---since it will copy everything up |
|
248 to the zero-byte. |
|
249 |
|
250 What the outcome of such an attack is can be illustrated with |
|
251 the code shown in Figure~\ref{C2}. Under ``normal operation'' |
|
252 this program ask for a login-name and a password (both are |
|
253 represented as strings). Both of which are stored in buffers |
|
254 of length 8. The function \pcode{match} tests whether two such |
|
255 strings are equal. If yes, then the function lets you in |
|
256 (by printing \pcode{Welcome}). If not, it denies access |
|
257 (by printing \pcode{Wrong identity}). The vulnerable function |
|
258 is \code{get_line} in Lines 11 to 19. This function does not |
|
259 take any precautions about the buffer of 8 characters being |
|
260 filled beyond this 8-character-limit. The buffer overflow |
|
261 can be triggered by inputing something, like \pcode{foo}, for |
|
262 the login name and then the specially crafted string as |
|
263 password: |
|
264 |
|
265 \begin{center} |
|
266 \code{AAAAAAAABBBB\\x2c\\x85\\x04\\x08\\n} |
|
267 \end{center} |
|
268 |
|
269 \noindent The address happens to be the one for the function |
|
270 \pcode{welcome()}. This means even with this input (where the |
|
271 login name and password clearly do not match) the program will |
|
272 still print out \pcode{Welcome}. The only information we need |
|
273 for this attack is to know where the function |
|
274 \pcode{welcome()} starts in memory. This information can be |
|
275 easily obtained by starting the program inside the debugger |
|
276 and disassembling this function. |
|
277 |
|
278 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}, |
|
279 morekeywords={movl,movw}] |
|
280 $ gdb C2 |
|
281 GNU gdb (GDB) 7.2-ubuntu |
|
282 (gdb) disassemble welcome |
|
283 \end{lstlisting} |
|
284 |
|
285 \noindent |
|
286 The output will be something like this |
|
287 |
|
288 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}, |
|
289 morekeywords={movl,movw}] |
|
290 0x0804852c <+0>: push %ebp |
|
291 0x0804852d <+1>: mov %esp,%ebp |
|
292 0x0804852f <+3>: sub $0x4,%esp |
|
293 0x08048532 <+6>: movl $0x8048690,(%esp) |
|
294 0x08048539 <+13>: call 0x80483a4 <puts@plt> |
|
295 0x0804853e <+18>: movl $0x0,(%esp) |
|
296 0x08048545 <+25>: call 0x80483b4 <exit@plt> |
|
297 \end{lstlisting} |
|
298 |
|
299 \noindent indicating that the function \pcode{welcome()} |
|
300 starts at address \pcode{0x0804852c}. |
144 |
301 |
145 \begin{figure}[p] |
302 \begin{figure}[p] |
146 \lstinputlisting[language=C]{../progs/C2.c} |
303 \lstinputlisting[language=C]{../progs/C2.c} |
147 \caption{A suspicious login implementation.\label{C2}} |
304 \caption{A suspicious login implementation.\label{C2}} |
148 \end{figure} |
305 \end{figure} |
149 |
306 |
|
307 This kind of attack was very popular with commercial programs |
|
308 that needed a key to be unlocked. Historically, hackers first |
|
309 broke the rather weak encryption of these locking mechanisms. |
|
310 After the encryption had been made stronger, hackers used |
|
311 buffer overflow attacks as shown above to jump directly to |
|
312 the part of the program that was intended to be only available |
|
313 after the correct key was typed in by the user. |
|
314 |
|
315 \subsection*{Paylods} |
|
316 |
|
317 Unfortunately, much more harm can be caused by buffer overflow |
|
318 attacks. This is achieved by injecting code that will be run |
|
319 once the return address is appropriately modified. Typically |
|
320 the code that will be injected is for running a shell. In |
|
321 order to be send as part of the string that is overflowing the |
|
322 buffer, we need the code to be encoded as a sequence of |
|
323 characters |
|
324 |
|
325 \lstinputlisting[language=C,numbers=none]{../progs/o1.c} |
|
326 |
|
327 \noindent These characters represent the machine code |
|
328 for opening a shell. It seems obtaining such a string |
|
329 requires higher-education in the architecture of the |
|
330 target system. But it is actually relatively simple: First |
|
331 there are many ready-made strings available---just a quick |
|
332 Google query away. Second, tools like the debugger can help |
|
333 us again. We can just write the code we want in C, for |
|
334 example this would be the program to start a shell |
|
335 |
|
336 \lstinputlisting[language=C,numbers=none]{../progs/shell.c} |
|
337 |
|
338 \noindent Once compiled, we can use the debugger to obtain |
|
339 the machine code, or even the ready made encoding as character |
|
340 sequence. |
|
341 |
|
342 While easy, obtaining this string is not entirely trivial. |
|
343 Remember the functions in C that copy or fill buffers work |
|
344 such that they copy everything until the zero byte is reached. |
|
345 Unfortunately the ``vanilla'' output from the debugger for the |
|
346 shell-program will contain such zero bytes. So a |
|
347 post-processing phase is needed to rewrite the machine code |
|
348 such that it does not contain any zero bytes. This is like |
|
349 some works of literature that have been rewritten so that the |
|
350 letter 'i', for example, is avoided. For rewriting the machine |
|
351 code you might need to use clever tricks like |
|
352 |
|
353 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}] |
|
354 xor %eax, %eax |
|
355 \end{lstlisting} |
|
356 |
|
357 \noindent This instruction does not contain any zero byte when |
|
358 encoded, but produces a zero byte on the stack. |
|
359 |
|
360 Having removed the zero bytes we can craft the string that |
|
361 will be send to our target computer. It is typically of the |
|
362 form |
|
363 |
|
364 \begin{center} |
|
365 \begin{tikzpicture}[scale=0.7] |
|
366 \draw[line width=1mm] (-2, -1) rectangle (2,3); |
|
367 \draw[line width=1mm] (-2,1.9) -- (2,1.9); |
|
368 \draw (0,2.5) node {\large\tt shell code}; |
|
369 \draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7); |
|
370 \draw[->,line width=0.3mm] (1.05, -1) -- (1.05,-1.7) -- |
|
371 (-3,-1.7) -- (-3, 3.7) -- (-1.9, 3.7) -- (-1.9, 3.1); |
|
372 \draw (-2, 3) node[anchor=north east] {\LARGE\tt "}; |
|
373 \draw ( 2,-0.9) node[anchor=west] {\LARGE\tt "}; |
|
374 \end{tikzpicture} |
|
375 \end{center} |
|
376 |
|
377 |
150 \bigskip\bigskip |
378 \bigskip\bigskip |
151 \subsubsection*{A Crash-Course on GDB} |
379 \subsubsection*{A Crash-Course for GDB} |
152 |
380 |
153 \begin{itemize} |
381 \begin{itemize} |
154 \item \texttt{(l)ist n} -- listing the source file from line |
382 \item \texttt{(l)ist n} -- listing the source file from line |
155 \texttt{n} |
383 \texttt{n} |
156 \item \texttt{disassemble fun-name} |
384 \item \texttt{disassemble fun-name} |