180 Lets look at another example which will help with |
180 Lets look at another example which will help with |
181 understanding how passwords should be verified and stored. |
181 understanding how passwords should be verified and stored. |
182 Imagine you need to develop a web-application that has the |
182 Imagine you need to develop a web-application that has the |
183 feature of recording how many times a customer visits a page. |
183 feature of recording how many times a customer visits a page. |
184 For example in order to give a discount whenever the customer |
184 For example in order to give a discount whenever the customer |
185 visited a webpage some $x$ number of times (say $x$ equal |
185 has visited a webpage some $x$ number of times (say $x$ equal |
186 $5$). There is one more constraint: we want to store the |
186 $5$). There is one more constraint: we want to store the |
187 information about the number of visits as a cookie on the |
187 information about the number of visits as a cookie on the |
188 browser. I think, for a number of years the webpage of the New |
188 browser. I think, for a number of years the webpage of the New |
189 York Times operated in this way: it allowed you to read ten |
189 York Times operated in this way: it allowed you to read ten |
190 articles per month for free; if you wanted to read more, you |
190 articles per month for free; if you wanted to read more, you |
273 other hand, we use many ``private'' keys for the clients, then |
274 other hand, we use many ``private'' keys for the clients, then |
274 we have to solve the problem of having to securely store this |
275 we have to solve the problem of having to securely store this |
275 key on our server side (obviously we cannot store the key with |
276 key on our server side (obviously we cannot store the key with |
276 the client because then the client again has all data to |
277 the client because then the client again has all data to |
277 tamper with the counter; and obviously we also cannot encrypt |
278 tamper with the counter; and obviously we also cannot encrypt |
278 the key, lest we can solve a chicken-and-egg problem). So |
279 the key, lest we can solve an impossible chicken-and-egg |
279 encryption seems to not solve the problem we face with the |
280 problem). So encryption seems to not solve the problem we face |
280 integrity of our counter. |
281 with the integrity of our counter. |
281 |
282 |
282 Fortunately, \emph{hash functions} seem to be more suitable |
283 Fortunately, \emph{hash functions} seem to be more suitable |
283 for our purpose. Like encryption, hash functions scramble data |
284 for our purpose. Like encryption, hash functions scramble data |
284 in such a way that it is easy to calculate the output of a |
285 in such a way that it is easy to calculate the output of a |
285 hash function from the input. But it is hard (i.e.~practically |
286 hash function from the input. But it is hard (i.e.~practically |
286 impossible) to calculate the input from knowing the output. |
287 impossible) to calculate the input from knowing the output. |
287 Therefore hash functions are often called \emph{one-way |
288 Therefore hash functions are often called \emph{one-way |
288 functions}. There are several such hashing function. For |
289 functions}\ldots you cannot go back from the output to the |
289 example SHA-1 would hash the string \pcode{"hello world"} to |
290 input (without some tricks, see below). There are several such |
290 produce the hash-value |
291 hashing function. For example SHA-1 would hash the string |
|
292 \pcode{"hello world"} to produce the hash-value |
291 |
293 |
292 \begin{center} |
294 \begin{center} |
293 \pcode{2aae6c35c94fcfb415dbe95f408b9ce91ee846ed} |
295 \pcode{2aae6c35c94fcfb415dbe95f408b9ce91ee846ed} |
294 \end{center} |
296 \end{center} |
295 |
297 |
306 will be from just looking at input that is ``close by''. |
308 will be from just looking at input that is ``close by''. |
307 |
309 |
308 We can use hashes in our web-application and store in the |
310 We can use hashes in our web-application and store in the |
309 cookie the value of the counter in plain text but together |
311 cookie the value of the counter in plain text but together |
310 with its hash. We need to store both pieces of data in such a |
312 with its hash. We need to store both pieces of data in such a |
311 way that we can extract both components (below I will just |
313 way that we can extract them again later on (in the code below |
312 separate them using a \pcode{"-"}). If we now read back the |
314 I will just separate them using a \pcode{"-"}). If we now read |
313 cookie when the client visits our webpage, we can extract the |
315 back the cookie when the client visits our webpage, we can |
314 counter, hash it again and compare the result to the stored |
316 extract the counter, hash it again and compare the result to |
315 hash value inside the cookie. If these hashes disagree, then |
317 the stored hash value inside the cookie. If these hashes |
316 we can deduce that the cookie has been tampered with. |
318 disagree, then we can deduce that the cookie has been tampered |
317 Unfortunately, if they agree, we can still not be entirely |
319 with. Unfortunately, if they agree, we can still not be |
318 sure that not a clever hacker has tampered with the cookie. |
320 entirely sure that not a clever hacker has tampered with the |
319 The reason is that the hacker can see the clear text part of |
321 cookie. The reason is that the hacker can see the clear text |
320 the cookie, say \pcode{3}, and also its hash. It does not take |
322 part of the cookie, say \pcode{3}, and also its hash. It does |
321 much trial and error to find out that we used the SHA-1 |
323 not take much trial and error to find out that we used the |
322 hashing functions and then the hacker can graft a cookie |
324 SHA-1 hashing function and then the hacker can graft a cookie |
323 accordingly. This is eased by the fact that for SHA-1 many |
325 accordingly. This is eased by the fact that for SHA-1 many |
324 strings and corresponding hash-values are precalculated. Type, |
326 strings and corresponding hash-values are precalculated. Type, |
325 for example, into Google the hash value for \pcode{"hello |
327 for example, into Google the hash value for \pcode{"hello |
326 world"} and you will actually pretty quickly find that it was |
328 world"} and you will actually pretty quickly find that it was |
327 generated by input string \pcode{"hello wolrd"}. This defeats |
329 generated by input string \pcode{"hello world"}. This defeats |
328 the purpose of a hashing functions and thus would not help us |
330 the purpose of a hashing function and thus would not help us |
329 with our web-applications and later with how to store |
331 with our web-applications and later also not with how to store |
330 passwords properly. |
332 passwords properly. |
331 |
333 |
332 |
334 |
333 There is one ingredient missing, which happens to be called |
335 There is one ingredient missing, which happens to be called |
334 \emph{salts}. Salts are random keys, which are added to the |
336 \emph{salts}. Salts are random keys, which are added to the |
357 \end{center} |
359 \end{center} |
358 |
360 |
359 \noindent These hashes allow us to read and set the value of |
361 \noindent These hashes allow us to read and set the value of |
360 the counter, and also give us confidence that the counter has |
362 the counter, and also give us confidence that the counter has |
361 not been tampered with. This of course depends on being able |
363 not been tampered with. This of course depends on being able |
362 to keep the salt secret. Once this is out, we better ignore |
364 to keep the salt secret. Once the salt is public, we better |
363 all cookies and start setting them again with a new salt. |
365 ignore all cookies and start setting them again with a new |
|
366 salt. |
364 |
367 |
365 There is an interesting and very subtle point to note with |
368 There is an interesting and very subtle point to note with |
366 respect to the New York Times' way of checking the number |
369 respect to the New York Times' way of checking the number |
367 visits. Essentially they have their `resource' unlocked at the |
370 visits. Essentially they have their `resource' unlocked at the |
368 beginning and lock it only when the data in the cookie states |
371 beginning and lock it only when the data in the cookie states |
369 that the allowed free number of visits are up. As said before, |
372 that the allowed free number of visits are up. As said before, |
370 this can be easily circumvented by just deleting the cookie or |
373 this can be easily circumvented by just deleting the cookie or |
371 by switching the browser. This would mean the New York Times |
374 by switching the browser. This would mean the New York Times |
372 will lose revenue whenever this kind of tampering occurs. In |
375 will lose revenue whenever this kind of tampering occurs. The |
373 contrast, our web-application has the resource (discount) |
376 quick fix to require that a cookie must always be present does |
374 locked at the beginning and only unlocks it if the cookie data |
377 not work, because then this newspaper will cut off any new |
375 says so. If the cookie is deleted, well then the resource just |
378 readers, or anyone who gets a new computer. In contrast, our |
376 does not get unlocked. No mayor harm will result to us. You |
379 web-application has the resource (discount) locked at the |
377 can see: the same security mechanism behaves rather |
380 beginning and only unlocks it if the cookie data says so. If |
378 differently depending on whether the ``resource'' needs to be |
381 the cookie is deleted, well then the resource just does not |
379 locked or unlocked. Apart from think about the difference very |
382 get unlocked. No mayor harm will result to us. You can see: |
380 carefully, I do not know of any ``theory'' that could help |
383 the same security mechanism behaves rather differently |
381 with solving such security intricacies in any automatic way. |
384 depending on whether the ``resource'' needs to be locked or |
382 |
385 unlocked. Apart from thinking about the difference very |
383 \subsection*{How to Store Passwords} |
386 carefully, I do not know of any good ``theory'' that could |
|
387 help with solving such security intricacies in any other way. |
|
388 |
|
389 \subsection*{How to Store Passwords Properly?} |
384 |
390 |
385 While admittedly quite silly, the simple web-application in |
391 While admittedly quite silly, the simple web-application in |
386 the previous section should help with the more important |
392 the previous section should help with the more important |
387 question of how passwords should be verified and stored. It is |
393 question of how passwords should be verified and stored. It is |
388 unbelievable that nowadays systems still do this with |
394 unbelievable that nowadays systems still do this with |
389 passwords in plain text. The idea behind such plain-text |
395 passwords in plain text. The idea behind such plain-text |
390 passwords is of course that if the user typed in \emph{foobar} |
396 passwords is of course that if the user typed in |
391 as password, we need to verify whether it matches with the |
397 \pcode{foobar} as password, we need to verify whether it |
392 password that is already stored for this user in the system. |
398 matches with the password that is already stored for this user |
393 But doing this verification in plain text is really a bad |
399 in the system. But doing this verification in plain text is |
394 idea. Unfortunately, evidence suggests, however, it is still a |
400 really a bad idea. Unfortunately, evidence suggests, however, |
395 widespread practice. I leave you to think about why verifying |
401 it is still a widespread practice. I leave you to think about |
396 passwords in plain text is a bad idea. |
402 why verifying passwords in plain text is a bad idea. |
397 |
403 |
398 Using hash functions we can do better. It is not really |
404 Using hash functions, like in our web-application, we can do |
399 necessary to store passwords in plain text in order to verify |
405 better. They allow us to not having to store passwords in |
400 whether a password matches. We can just hash the password in |
406 plain text for verification whether a password matches or not. |
401 order to be stored. And whenever the user types in a new |
407 We can just hash the password and store the hash-value. And |
402 password, well then we hash it again and check whether the |
408 whenever the user types in a new password, well then we hash |
403 hash-values agree. Lets analyse what happens when a hacker |
409 it again and check whether the hash-values agree. Just like |
404 gets hold of our password database. In case the passwords are |
410 in the web-application before. |
405 hashed, the hacker has a list of user names and associated |
411 |
406 hash-values, like |
412 Lets analyse what happens when a hacker gets hold of such a |
|
413 hashed password database. The hacker has then a list of user |
|
414 names and associated hash-values, like |
407 |
415 |
408 \begin{center} |
416 \begin{center} |
409 \pcode{urbanc:2aae6c35c94fcfb415dbe95f408b9ce91ee846ed} |
417 \pcode{urbanc:2aae6c35c94fcfb415dbe95f408b9ce91ee846ed} |
410 \end{center} |
418 \end{center} |
411 |
419 |
412 \noindent For a beginner-level hacker this information |
420 \noindent For a beginner-level hacker this information is of |
413 is of no use. It would not work to type in the hash value |
421 no use. It would not work to type in the hash value instead of |
414 instead of the password, because it will go through the |
422 the password, because it will go through the hashing function |
415 hashing function again and then the resulting two |
423 again and then the resulting two hash-values will not match. |
416 hash-values will not match. One attack a hacker can try is |
424 One attack a hacker can try, however, is called a \emph{brute |
417 called a \emph{brute force attack}. Essentially this means |
425 force attack}. Essentially this means trying out exhaustively |
418 trying out exhaustively all strings |
426 all strings |
419 |
427 |
420 \begin{center} |
428 \begin{center} |
421 \pcode{a}, |
429 \pcode{a}, |
422 \pcode{aa}, |
430 \pcode{aa}, |
423 \pcode{...}, |
431 \pcode{...}, |
425 \pcode{...}, |
433 \pcode{...}, |
426 \pcode{zzz}, |
434 \pcode{zzz}, |
427 \pcode{...} |
435 \pcode{...} |
428 \end{center} |
436 \end{center} |
429 |
437 |
430 \noindent hash them and check whether they match with the |
438 \noindent and so on, hash them and check whether they match |
431 hash-values in the database. Such brute force attacks are |
439 with the hash-values in the database. Such brute force attacks |
432 surprisingly effective. With modern technology (usually GPU |
440 are surprisingly effective. With modern technology (usually |
433 graphic cards), passwords of moderate length |
441 GPU graphic cards), passwords of moderate length only needs |
434 |
442 seconds or hours to be cracked. Well the only defence we have |
435 %The corresponding attack is called \emph{dictionary |
443 is to make passwords longer and force users to use the whole |
436 %attack}\ldots hashes are not reversed by brute force |
444 spectrum of letters and keys for passwords in order to make |
437 %calculations, that is trying out all possible combinations. |
445 the search space to big for an effective brute force attack. |
438 |
446 |
439 |
447 Unfortunately, clever hackers have another ace up their |
440 %We have to make sure the salt does not get known. |
448 sleeves. These are called \emph{dictionary attacks}. The idea |
441 |
449 behind dictionary attack is the observation that only few |
442 |
450 people are competent enough to use sufficiently strong |
443 %Note ....NYT |
451 passwords. Most users (at least too many) use passwords like |
|
452 |
|
453 \begin{center} |
|
454 \pcode{123456}, |
|
455 \pcode{password}, |
|
456 \pcode{qwerty}, |
|
457 \pcode{letmein}, |
|
458 \pcode{...} |
|
459 \end{center} |
|
460 |
|
461 \noindent So an attacker just needs to compile a list |
|
462 as large as possible of such likely candidates of passwords |
|
463 and also compute their hash-values. Now if the attacker |
|
464 knows the hash-value of a password is |
|
465 |
|
466 \begin{center} |
|
467 \pcode{5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8} |
|
468 \end{center} |
|
469 |
|
470 \noindent then just a lookup in the dictionary will reveal |
|
471 that the plain-text password was \pcode{password}. What is |
|
472 good about this attack is that the dictionary can be |
|
473 precompiled in the ``comfort of the hacker's home'' before an |
|
474 actual attack is launched. It just needs sufficient storage |
|
475 space, which nowadays is pretty cheap. A hacker might in this |
|
476 way not be able to crack all passwords in our database, but |
|
477 even being able to crack 50\% can be serious damage for a |
|
478 large company (because then you have to think how to make |
|
479 users to change their old passwords). And hackers are very |
|
480 industrious in compiling these dictionaries: for example they |
|
481 definitely include variations like \pcode{passw0rd} and also |
|
482 includes rules that cover cases like \pcode{passwordpassword} |
|
483 or \pcode{drowssap} (password reversed). Historically, |
|
484 compiling a list for a dictionary attack is not as simple as |
|
485 it might seem. At the beginning only ``real'' dictionaries |
|
486 were available (like the Oxford English Dictionary), but such |
|
487 dictionary are not ``optimised'' for the purpose of passwords. |
|
488 The first real hard date was obtained when a company called |
|
489 RockYou ``lost'' 32 Million plain-text password. With this |
|
490 data of real-life passwords, dictionary attacks took off. |
|
491 |
|
492 These dictionary attacks can be prevented by using salts. |
|
493 Remember a hacker needs to use the most likely candidates |
|
494 of passwords and calculate their has-value. If we add before |
|
495 hashing a password with a random salt, like \pcode{mPX2aq}, |
|
496 then the string \pcode{passwordmPX2aq} will almost certainly |
|
497 not be in the dictionary. Like in the web-application in the |
|
498 previous section a salt does not prevent us from verifying a |
|
499 password. We just need to add the salt whenever the password |
|
500 is typed in again. |
|
501 |
|
502 There is a question whether we should us a single random salt |
|
503 for every password in our database. A single salt would |
|
504 already make dictionary attacks considerably more difficult. |
|
505 It turns out, however, that in case of password databases |
|
506 every password should get their own salt. This salt is |
|
507 generated at the time when the password is first set. |
|
508 If you look at a Unix password file you will find entries like |
|
509 |
|
510 \begin{center} |
|
511 \pcode{urbanc:$6$3WWbKfr1$4vblknvGr6FcDeF92R5xFn3mskfdnEn...:...} |
|
512 \end{center} |
|
513 |
|
514 \noindent where the first part is the login-name, followed |
|
515 by a field \pcode{$6$} which specifies which hash-function |
|
516 is used. After that follows the salt \pcode{3WWbKfr1} and |
|
517 after that the hash-value that is stored for the password plus |
|
518 salt. I leave it to you to figure out how the password |
|
519 verification would need to work based on this data. |
|
520 |
|
521 There is a non-obvious benefit of using a separate salt for |
|
522 each password. Recall that \pcode{123456} is a popular |
|
523 password that is most likely used by several of your users |
|
524 (especially if the database contains millions of entries). If |
|
525 we use no salt or one global salt, all hash-values will be the |
|
526 same for this password. So if a hacker is in the business of |
|
527 cracking as much passwords as possible, then it is a good idea |
|
528 to concentrate on those very popular passwords. This is not |
|
529 possible if each password gets its own salt: since we assume |
|
530 the salt is generated randomly, each version of \pcode{123456} |
|
531 will be associated with a different hash-value. |
|
532 |
|
533 Note another interesting point. The web-application from the |
|
534 previous section was only secure when the salt was secret. In |
|
535 the password case, this is not needed. The salt can be public |
|
536 as shown above and is actually stored as part of the password |
|
537 entry. Knowing the salt does not give the attacker any |
|
538 advantage, but prevents that dictionaries can be precompiled. |
|
539 |
|
540 |
444 \end{document} |
541 \end{document} |
445 |
542 |
446 %%% Local Variables: |
543 %%% Local Variables: |
447 %%% mode: latex |
544 %%% mode: latex |
448 %%% TeX-master: t |
545 %%% TeX-master: t |