handouts/ho01.tex
changeset 182 681e35f6b0e4
parent 181 a736a0c324a3
child 183 6ed7c9b8b291
equal deleted inserted replaced
181:a736a0c324a3 182:681e35f6b0e4
   180 Lets look at another example which will help with
   180 Lets look at another example which will help with
   181 understanding how passwords should be verified and stored.
   181 understanding how passwords should be verified and stored.
   182 Imagine you need to develop a web-application that has the
   182 Imagine you need to develop a web-application that has the
   183 feature of recording how many times a customer visits a page.
   183 feature of recording how many times a customer visits a page.
   184 For example in order to give a discount whenever the customer
   184 For example in order to give a discount whenever the customer
   185 visited a webpage some $x$ number of times (say $x$ equal
   185 has visited a webpage some $x$ number of times (say $x$ equal
   186 $5$). There is one more constraint: we want to store the
   186 $5$). There is one more constraint: we want to store the
   187 information about the number of visits as a cookie on the
   187 information about the number of visits as a cookie on the
   188 browser. I think, for a number of years the webpage of the New
   188 browser. I think, for a number of years the webpage of the New
   189 York Times operated in this way: it allowed you to read ten
   189 York Times operated in this way: it allowed you to read ten
   190 articles per month for free; if you wanted to read more, you
   190 articles per month for free; if you wanted to read more, you
   195 
   195 
   196 To implement our web-application it is good to look under the
   196 To implement our web-application it is good to look under the
   197 hood what happens when a webpage is displayed in a browser. A
   197 hood what happens when a webpage is displayed in a browser. A
   198 typical web-application works as follows: The browser sends a
   198 typical web-application works as follows: The browser sends a
   199 GET request for a particular page to a server. The server
   199 GET request for a particular page to a server. The server
   200 answers this request with a webpage in HTML (we can here
   200 answers this request with a webpage in HTML (for our purposes
   201 ignore these details). A simple JavaScript program that
   201 we can ignore the details about HTML). A simple JavaScript
   202 realises a ``hello world'' webpage is as follows:
   202 program that realises a server answering with a ``hello
       
   203 world'' webpage is as follows:
   203 
   204 
   204 \begin{center}
   205 \begin{center}
   205 \lstinputlisting{../progs/ap0.js}
   206 \lstinputlisting{../progs/ap0.js}
   206 \end{center}
   207 \end{center}
   207 
   208 
   256 web-page. This is actually a trivial task for a knowledgeable
   257 web-page. This is actually a trivial task for a knowledgeable
   257 person, since there are convenient tools that allow one to set
   258 person, since there are convenient tools that allow one to set
   258 a cookie to an arbitrary value, for example above our
   259 a cookie to an arbitrary value, for example above our
   259 threshold for the discount. 
   260 threshold for the discount. 
   260 
   261 
   261 There seems to be no real way to prevent this kind of
   262 There seems to be no simple way to prevent this kind of
   262 tampering with cookies, because the whole purpose of cookies
   263 tampering with cookies, because the whole purpose of cookies
   263 is that they are stored on the client's side, which from the
   264 is that they are stored on the client's side, which from the
   264 the server's perspective is a potentially hostile environment.
   265 the server's perspective is a potentially hostile environment.
   265 What we need to ensure is the integrity of this counter in
   266 What we need to ensure is the integrity of this counter in
   266 this hostile environment. We could think of encrypting the
   267 this hostile environment. We could think of encrypting the
   273 other hand, we use many ``private'' keys for the clients, then
   274 other hand, we use many ``private'' keys for the clients, then
   274 we have to solve the problem of having to securely store this
   275 we have to solve the problem of having to securely store this
   275 key on our server side (obviously we cannot store the key with
   276 key on our server side (obviously we cannot store the key with
   276 the client because then the client again has all data to
   277 the client because then the client again has all data to
   277 tamper with the counter; and obviously we also cannot encrypt
   278 tamper with the counter; and obviously we also cannot encrypt
   278 the key, lest we can solve a chicken-and-egg problem). So
   279 the key, lest we can solve an impossible chicken-and-egg
   279 encryption seems to not solve the problem we face with the
   280 problem). So encryption seems to not solve the problem we face
   280 integrity of our counter.
   281 with the integrity of our counter.
   281 
   282 
   282 Fortunately, \emph{hash functions} seem to be more suitable
   283 Fortunately, \emph{hash functions} seem to be more suitable
   283 for our purpose. Like encryption, hash functions scramble data
   284 for our purpose. Like encryption, hash functions scramble data
   284 in such a way that it is easy to calculate the output of a
   285 in such a way that it is easy to calculate the output of a
   285 hash function from the input. But it is hard (i.e.~practically
   286 hash function from the input. But it is hard (i.e.~practically
   286 impossible) to calculate the input from knowing the output.
   287 impossible) to calculate the input from knowing the output.
   287 Therefore hash functions are often called \emph{one-way
   288 Therefore hash functions are often called \emph{one-way
   288 functions}. There are several such hashing function. For
   289 functions}\ldots you cannot go back from the output to the
   289 example SHA-1 would hash the string \pcode{"hello world"} to
   290 input (without some tricks, see below). There are several such
   290 produce the hash-value
   291 hashing function. For example SHA-1 would hash the string
       
   292 \pcode{"hello world"} to produce the hash-value
   291 
   293 
   292 \begin{center}
   294 \begin{center}
   293 \pcode{2aae6c35c94fcfb415dbe95f408b9ce91ee846ed}
   295 \pcode{2aae6c35c94fcfb415dbe95f408b9ce91ee846ed}
   294 \end{center}
   296 \end{center}
   295 
   297 
   306 will be from just looking at input that is ``close by''. 
   308 will be from just looking at input that is ``close by''. 
   307 
   309 
   308 We can use hashes in our web-application and store in the
   310 We can use hashes in our web-application and store in the
   309 cookie the value of the counter in plain text but together
   311 cookie the value of the counter in plain text but together
   310 with its hash. We need to store both pieces of data in such a
   312 with its hash. We need to store both pieces of data in such a
   311 way that we can extract both components (below I will just
   313 way that we can extract them again later on (in the code below
   312 separate them using a \pcode{"-"}). If we now read back the
   314 I will just separate them using a \pcode{"-"}). If we now read
   313 cookie when the client visits our webpage, we can extract the
   315 back the cookie when the client visits our webpage, we can
   314 counter, hash it again and compare the result to the stored
   316 extract the counter, hash it again and compare the result to
   315 hash value inside the cookie. If these hashes disagree, then
   317 the stored hash value inside the cookie. If these hashes
   316 we can deduce that the cookie has been tampered with.
   318 disagree, then we can deduce that the cookie has been tampered
   317 Unfortunately, if they agree, we can still not be entirely
   319 with. Unfortunately, if they agree, we can still not be
   318 sure that not a clever hacker has tampered with the cookie.
   320 entirely sure that not a clever hacker has tampered with the
   319 The reason is that the hacker can see the clear text part of
   321 cookie. The reason is that the hacker can see the clear text
   320 the cookie, say \pcode{3}, and also its hash. It does not take
   322 part of the cookie, say \pcode{3}, and also its hash. It does
   321 much trial and error to find out that we used the SHA-1
   323 not take much trial and error to find out that we used the
   322 hashing functions and then the hacker can graft a cookie
   324 SHA-1 hashing function and then the hacker can graft a cookie
   323 accordingly. This is eased by the fact that for SHA-1 many
   325 accordingly. This is eased by the fact that for SHA-1 many
   324 strings and corresponding hash-values are precalculated. Type,
   326 strings and corresponding hash-values are precalculated. Type,
   325 for example, into Google the hash value for \pcode{"hello
   327 for example, into Google the hash value for \pcode{"hello
   326 world"} and you will actually pretty quickly find that it was
   328 world"} and you will actually pretty quickly find that it was
   327 generated by input string \pcode{"hello wolrd"}. This defeats
   329 generated by input string \pcode{"hello world"}. This defeats
   328 the purpose of a hashing functions and thus would not help us
   330 the purpose of a hashing function and thus would not help us
   329 with our web-applications and later with how to store
   331 with our web-applications and later also not with how to store
   330 passwords properly. 
   332 passwords properly. 
   331 
   333 
   332 
   334 
   333 There is one ingredient missing, which happens to be called
   335 There is one ingredient missing, which happens to be called
   334 \emph{salts}. Salts are random keys, which are added to the
   336 \emph{salts}. Salts are random keys, which are added to the
   357 \end{center}
   359 \end{center}
   358 
   360 
   359 \noindent These hashes allow us to read and set the value of
   361 \noindent These hashes allow us to read and set the value of
   360 the counter, and also give us confidence that the counter has
   362 the counter, and also give us confidence that the counter has
   361 not been tampered with. This of course depends on being able
   363 not been tampered with. This of course depends on being able
   362 to keep the salt secret. Once this is out, we better ignore
   364 to keep the salt secret. Once the salt is public, we better
   363 all cookies and start setting them again with a new salt.
   365 ignore all cookies and start setting them again with a new
       
   366 salt.
   364 
   367 
   365 There is an interesting and very subtle point to note with
   368 There is an interesting and very subtle point to note with
   366 respect to the New York Times' way of checking the number
   369 respect to the New York Times' way of checking the number
   367 visits. Essentially they have their `resource' unlocked at the
   370 visits. Essentially they have their `resource' unlocked at the
   368 beginning and lock it only when the data in the cookie states
   371 beginning and lock it only when the data in the cookie states
   369 that the allowed free number of visits are up. As said before,
   372 that the allowed free number of visits are up. As said before,
   370 this can be easily circumvented by just deleting the cookie or
   373 this can be easily circumvented by just deleting the cookie or
   371 by switching the browser. This would mean the New York Times
   374 by switching the browser. This would mean the New York Times
   372 will lose revenue whenever this kind of tampering occurs. In
   375 will lose revenue whenever this kind of tampering occurs. The
   373 contrast, our web-application has the resource (discount)
   376 quick fix to require that a cookie must always be present does
   374 locked at the beginning and only unlocks it if the cookie data
   377 not work, because then this newspaper will cut off any new
   375 says so. If the cookie is deleted, well then the resource just
   378 readers, or anyone who gets a new computer. In contrast, our
   376 does not get unlocked. No mayor harm will result to us. You
   379 web-application has the resource (discount) locked at the
   377 can see: the same security mechanism behaves rather
   380 beginning and only unlocks it if the cookie data says so. If
   378 differently depending on whether the ``resource'' needs to be
   381 the cookie is deleted, well then the resource just does not
   379 locked or unlocked. Apart from think about the difference very
   382 get unlocked. No mayor harm will result to us. You can see:
   380 carefully, I do not know of any ``theory'' that could help
   383 the same security mechanism behaves rather differently
   381 with solving such security intricacies in any automatic way.  
   384 depending on whether the ``resource'' needs to be locked or
   382 
   385 unlocked. Apart from thinking about the difference very
   383 \subsection*{How to Store Passwords}
   386 carefully, I do not know of any good ``theory'' that could
       
   387 help with solving such security intricacies in any other way.  
       
   388 
       
   389 \subsection*{How to Store Passwords Properly?}
   384 
   390 
   385 While admittedly quite silly, the simple web-application in
   391 While admittedly quite silly, the simple web-application in
   386 the previous section should help with the more important
   392 the previous section should help with the more important
   387 question of how passwords should be verified and stored. It is
   393 question of how passwords should be verified and stored. It is
   388 unbelievable that nowadays systems still do this with
   394 unbelievable that nowadays systems still do this with
   389 passwords in plain text. The idea behind such plain-text
   395 passwords in plain text. The idea behind such plain-text
   390 passwords is of course that if the user typed in \emph{foobar}
   396 passwords is of course that if the user typed in
   391 as password, we need to verify whether it matches with the
   397 \pcode{foobar} as password, we need to verify whether it
   392 password that is already stored for this user in the system.
   398 matches with the password that is already stored for this user
   393 But doing this verification in plain text is really a bad
   399 in the system. But doing this verification in plain text is
   394 idea. Unfortunately, evidence suggests, however, it is still a
   400 really a bad idea. Unfortunately, evidence suggests, however,
   395 widespread practice. I leave you to think about why verifying
   401 it is still a widespread practice. I leave you to think about
   396 passwords in plain text is a bad idea.
   402 why verifying passwords in plain text is a bad idea.
   397 
   403 
   398 Using hash functions we can do better. It is not really
   404 Using hash functions, like in our web-application, we can do
   399 necessary to store passwords in plain text in order to verify
   405 better. They allow us to not having to store passwords in
   400 whether a password matches. We can just hash the password in
   406 plain text for verification whether a password matches or not.
   401 order to be stored. And whenever the user types in a new
   407 We can just hash the password and store the hash-value. And
   402 password, well then we hash it again and check whether the
   408 whenever the user types in a new password, well then we hash
   403 hash-values agree. Lets analyse what happens when a hacker
   409 it again and check whether the hash-values agree. Just like
   404 gets hold of our password database. In case the passwords are
   410 in the web-application before.
   405 hashed, the hacker has a list of user names and associated
   411 
   406 hash-values, like 
   412 Lets analyse what happens when a hacker gets hold of such a
       
   413 hashed password database. The hacker has then a list of user
       
   414 names and associated hash-values, like 
   407 
   415 
   408 \begin{center}
   416 \begin{center}
   409 \pcode{urbanc:2aae6c35c94fcfb415dbe95f408b9ce91ee846ed}
   417 \pcode{urbanc:2aae6c35c94fcfb415dbe95f408b9ce91ee846ed}
   410 \end{center}
   418 \end{center}
   411 
   419 
   412 \noindent For a beginner-level hacker this information
   420 \noindent For a beginner-level hacker this information is of
   413 is of no use. It would not work to type in the hash value
   421 no use. It would not work to type in the hash value instead of
   414 instead of the password, because it will go through the
   422 the password, because it will go through the hashing function
   415 hashing function again and then the resulting two
   423 again and then the resulting two hash-values will not match.
   416 hash-values will not match. One attack a hacker can try is
   424 One attack a hacker can try, however, is called a \emph{brute
   417 called a \emph{brute force attack}. Essentially this means
   425 force attack}. Essentially this means trying out exhaustively
   418 trying out exhaustively all strings
   426 all strings
   419 
   427 
   420 \begin{center}
   428 \begin{center}
   421 \pcode{a},
   429 \pcode{a},
   422 \pcode{aa},
   430 \pcode{aa},
   423 \pcode{...},
   431 \pcode{...},
   425 \pcode{...},
   433 \pcode{...},
   426 \pcode{zzz},
   434 \pcode{zzz},
   427 \pcode{...}
   435 \pcode{...}
   428 \end{center}   
   436 \end{center}   
   429 
   437 
   430 \noindent hash them and check whether they match with the
   438 \noindent and so on, hash them and check whether they match
   431 hash-values in the database. Such brute force attacks are
   439 with the hash-values in the database. Such brute force attacks
   432 surprisingly effective. With modern technology (usually GPU
   440 are surprisingly effective. With modern technology (usually
   433 graphic cards), passwords of moderate length
   441 GPU graphic cards), passwords of moderate length only needs
   434 
   442 seconds or hours to be cracked. Well the only defence we have
   435 %The corresponding attack is called \emph{dictionary
   443 is to make passwords longer and force users to use the whole
   436 %attack}\ldots hashes are not reversed by brute force
   444 spectrum of letters and keys for passwords in order to make
   437 %calculations, that is trying out all possible combinations.
   445 the search space to big for an effective brute force attack.
   438 
   446 
   439 
   447 Unfortunately, clever hackers have another ace up their
   440 %We have to make sure the salt does not get known.
   448 sleeves. These are called \emph{dictionary attacks}. The idea
   441 
   449 behind dictionary attack is the observation that only few
   442 
   450 people are competent enough to use sufficiently strong
   443 %Note ....NYT 
   451 passwords. Most users (at least too many) use passwords like
       
   452 
       
   453 \begin{center}
       
   454 \pcode{123456},
       
   455 \pcode{password},
       
   456 \pcode{qwerty},
       
   457 \pcode{letmein},
       
   458 \pcode{...}
       
   459 \end{center}
       
   460 
       
   461 \noindent So an attacker just needs to compile a list
       
   462 as large as possible of such likely candidates of passwords
       
   463 and also compute their hash-values. Now if the attacker
       
   464 knows the hash-value of a password is
       
   465 
       
   466 \begin{center}
       
   467 \pcode{5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8}
       
   468 \end{center}
       
   469 
       
   470 \noindent then just a lookup in the dictionary will reveal
       
   471 that the plain-text password was \pcode{password}. What is
       
   472 good about this attack is that the dictionary can be
       
   473 precompiled in the ``comfort of the hacker's home'' before an
       
   474 actual attack is launched. It just needs sufficient storage
       
   475 space, which nowadays is pretty cheap. A hacker might in this
       
   476 way not be able to crack all passwords in our database, but
       
   477 even being able to crack 50\% can be serious damage for a
       
   478 large company (because then you have to think how to make
       
   479 users to change their old passwords). And hackers are very
       
   480 industrious in compiling these dictionaries: for example they
       
   481 definitely include variations like \pcode{passw0rd} and also
       
   482 includes rules that cover cases like \pcode{passwordpassword}
       
   483 or \pcode{drowssap} (password reversed). Historically,
       
   484 compiling a list for a dictionary attack is not as simple as
       
   485 it might seem. At the beginning only ``real'' dictionaries
       
   486 were available (like the Oxford English Dictionary), but such
       
   487 dictionary are not ``optimised'' for the purpose of passwords.
       
   488 The first real hard date was obtained when a company called
       
   489 RockYou ``lost'' 32 Million plain-text password. With this
       
   490 data of real-life passwords, dictionary attacks took off.
       
   491 
       
   492 These dictionary attacks can be prevented by using salts.
       
   493 Remember a hacker needs to use the most likely candidates 
       
   494 of passwords and calculate their has-value. If we add before
       
   495 hashing a password with a random salt, like \pcode{mPX2aq},
       
   496 then the string \pcode{passwordmPX2aq} will almost certainly 
       
   497 not be in the dictionary. Like in the web-application in the
       
   498 previous section a salt does not prevent us from verifying a 
       
   499 password. We just need to add the salt whenever the password 
       
   500 is typed in again. 
       
   501 
       
   502 There is a question whether we should us a single random salt
       
   503 for every password in our database. A single salt would
       
   504 already make dictionary attacks considerably more difficult.
       
   505 It turns out, however, that in case of password databases
       
   506 every password should get their own salt. This salt is
       
   507 generated at the time when the password is first set. 
       
   508 If you look at a Unix password file you will find entries like
       
   509 
       
   510 \begin{center}
       
   511 \pcode{urbanc:$6$3WWbKfr1$4vblknvGr6FcDeF92R5xFn3mskfdnEn...:...}
       
   512 \end{center}
       
   513 
       
   514 \noindent where the first part is the login-name, followed
       
   515 by a field \pcode{$6$} which specifies which hash-function
       
   516 is used. After that follows the salt \pcode{3WWbKfr1} and 
       
   517 after that the hash-value that is stored for the password plus 
       
   518 salt. I leave it to you to figure out how the password 
       
   519 verification would need to work based on this data.
       
   520 
       
   521 There is a non-obvious benefit of using a separate salt for
       
   522 each password. Recall that \pcode{123456} is a popular
       
   523 password that is most likely used by several of your users
       
   524 (especially if the database contains millions of entries). If
       
   525 we use no salt or one global salt, all hash-values will be the
       
   526 same for this password. So if a hacker is in the business of
       
   527 cracking as much passwords as possible, then it is a good idea
       
   528 to concentrate on those very popular passwords. This is not
       
   529 possible if each password gets its own salt: since we assume
       
   530 the salt is generated randomly, each version of \pcode{123456}
       
   531 will be associated with a different hash-value.  
       
   532 
       
   533 Note another interesting point. The web-application from the
       
   534 previous section was only secure when the salt was secret. In
       
   535 the password case, this is not needed. The salt can be public
       
   536 as shown above and is actually stored as part of the password
       
   537 entry. Knowing the salt does not give the attacker any
       
   538 advantage, but prevents that dictionaries can be precompiled.
       
   539 
       
   540 
   444 \end{document}
   541 \end{document}
   445 
   542 
   446 %%% Local Variables: 
   543 %%% Local Variables: 
   447 %%% mode: latex
   544 %%% mode: latex
   448 %%% TeX-master: t
   545 %%% TeX-master: t