sen-material: changeset 227:7807863c4196

Binary file handouts/ho01.pdf has changed

--- a/handouts/ho01.tex	Tue Oct 07 12:48:07 2014 +0100
+++ b/handouts/ho01.tex	Thu Oct 09 14:41:36 2014 +0100
@@ -42,7 +42,7 @@
 hack into things. I beg to differ: You have this mindset
 already when in school you were thinking, at least
 hypothetically, about ways in which you can cheat in an exam
-(whether it is about hiding notes or looking over the
+(whether it is by hiding notes or by looking over the
 shoulders of your fellow pupils). Right? To defend a system,
 you need to have this kind mindset and be able to think like
 an attacker. This will include understanding techniques that
@@ -108,7 +108,7 @@
 authorisation. Even though the banks involved trumpeted their
 system as being absolutely secure and indeed fraud rates
 initially went down, security researchers were not convinced
-(especially the group around Ross Anderson). To begin with,
+(especially not the group around Ross Anderson). To begin with,
 the Chip-and-PIN system introduced a ``new player'' into the
 system that needed to be trusted: the PIN terminals and their
 manufacturers. It was claimed that these terminals were
@@ -122,7 +122,7 @@
 mostly beyond the control of customers who need to use these
 terminals.
 
-To make matters worse for Chip-and-PIN, in around 2009 Ross
+To make matters worse for Chip-and-PIN, around 2009 Ross
 Anderson and his group were able to perform man-in-the-middle
 attacks against Chip-and-PIN. Essentially they made the
 terminal think the correct PIN was entered and the card think
@@ -148,15 +148,15 @@
 Since banks managed to successfully claim that their
 Chip-and-PIN system is secure, they were under the new system
 able to point the finger at the customer when fraud occurred:
-customers must have been negligent losing their PIN and they
-had almost no way of defending themselves in such situations.
-That is why the work of \emph{ethical} hackers like Ross
-Anderson's group was so important, because they and others
-established that the bank's claim that their system is secure
-and it must have been the customer's fault, was bogus. In 2009
-the law changed and the burden of proof went back to the
-banks. They need to prove whether it was really the customer
-who used a card or not.
+customers must have been negligent losing their PIN and
+customers had almost no way of defending themselves in such
+situations. That is why the work of \emph{ethical} hackers
+like Ross Anderson's group was so important, because they and
+others established that the banks' claim that their system is
+secure and it must have been the customer's fault, was bogus.
+In 2009 the law changed and the burden of proof went back to
+the banks. They need to prove whether it was really the
+customer who used a card or not.
 
 This is a classic example where a security design principle
 was violated: Namely, the one who is in the position to
@@ -183,7 +183,7 @@
 
 \subsection*{Of Cookies and Salts}
 
-Lets look at another example which will help with
+Let us look at another example which will help with
 understanding how passwords should be verified and stored.
 Imagine you need to develop a web-application that has the
 feature of recording how many times a customer visits a page.
@@ -271,7 +271,7 @@
 the server's perspective is a potentially hostile environment.
 What we need to ensure is the integrity of this counter in
 this hostile environment. We could think of encrypting the
-counter. But this has two drawbacks to do with the key for
+counter. But this has two drawbacks to do with the keys for
 encryption. If you use a single, global key for all the
 clients that visit our site, then we risk that our whole
 ``business'' might collapse in the event this key gets known
@@ -410,8 +410,8 @@
 \pcode{foobar} as password, we need to verify whether it
 matches with the password that is already stored for this user
 in the system. Why not doing this with plain-text passwords?
-But doing this verification in plain text is really a bad
-idea. Unfortunately, evidence suggests it is still a
+Unfortunately doing this verification in plain text is really
+a bad idea. Alas, evidence suggests it is still a
 widespread practice. I leave you to think about why verifying
 passwords in plain text is a bad idea.
 
@@ -481,43 +481,46 @@
 as possible of such likely candidates of passwords and also
 compute their hash-values. The difference between a brute
 force attack, where maybe $2^{80}$ many strings need to be
-considered, a dictionary attack might get away witch checking
-only 10 Million (remember the language English ``only''
-contains 600,000 words). This is a drastic simplification for
-attackers. Now if the attacker knows the hash-value of a
-password is
+considered, is that a dictionary attack might get away with
+checking only 10 Million words (remember the language English
+``only'' contains 600,000 words). This is a drastic
+simplification for attackers. Now, if the attacker knows the
+hash-value of a password is
 
 \begin{center}
 \pcode{5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8}
 \end{center}
 
-\noindent then just a lookup in the dictionary will reveal that the
-plain-text password was \pcode{password}. What is good about this
-attack is that the dictionary can be precompiled in the ``comfort of
-the hacker's home'' before an actual attack is launched. It just needs
-sufficient storage space, which nowadays is pretty cheap. A hacker
-might in this way not be able to crack all passwords in our database,
-but even being able to crack 50\% can be serious damage for a large
-company (because then you have to think about how to make users to
-change their old passwords---a major hassle).  And hackers are very
-industrious in compiling these dictionaries: for example they
-definitely include variations like \pcode{passw0rd} and also include
-rules that cover cases like \pcode{passwordpassword} or
-\pcode{drowssap} (password reversed).\footnote{Some entertaining rules
-  for creating effective dictionaries are described in the book
-  ``Applied Cryptography'' by Bruce Schneier (in case you can find it
-  in the library), and also in the original research literature which
-  can be accessed for free from
-  \url{http://www.klein.com/dvk/publications/passwd.pdf}.}
-Historically, compiling a list for a dictionary attack is not as
-simple as it might seem. At the beginning only ``real'' dictionaries
-were available (like the Oxford English Dictionary), but such
-dictionaries are not ``optimised'' for the purpose of passwords. The
-first real hard data about actually used passwords was obtained when a
-company called RockYou ``lost'' 32 Million plain-text passwords. With
-this data of real-life passwords, dictionary attacks took
-off. Compiling such dictionaries is nowadays very easy with the help
-of off-the-shelf tools.
+\noindent then just a lookup in the dictionary will reveal
+that the plain-text password was \pcode{password}. What is
+good about this attack is that the dictionary can be
+precompiled in the ``comfort of the hacker's home'' before an
+actual attack is launched. It just needs sufficient storage
+space, which nowadays is pretty cheap. A hacker might in this
+way not be able to crack all passwords in our database, but
+even being able to crack 50\% can be serious damage for a
+large company (because then you have to think about how to
+make users to change their old passwords---a major hassle).
+And hackers are very industrious in compiling these
+dictionaries: for example they definitely include variations
+like \pcode{passw0rd} and also include rules that cover cases
+like \pcode{passwordpassword} or \pcode{drowssap} (password
+reversed).\footnote{Some entertaining rules for creating
+effective dictionaries are described in the book ``Applied
+Cryptography'' by Bruce Schneier (in case you can find it in
+the library), and also in the original research literature
+which can be accessed for free from
+\url{http://www.klein.com/dvk/publications/passwd.pdf}.}
+Historically, compiling a list for a dictionary attack is not
+as simple as it might seem. At the beginning only ``real''
+dictionaries were available (like the Oxford English
+Dictionary), but such dictionaries are not optimised for the
+purpose of cracking passwords. The first real hard data about actually
+used passwords was obtained when a company called RockYou
+``lost'' 32 Million plain-text passwords. With this data of
+real-life passwords, dictionary attacks took off. Compiling
+such dictionaries is nowadays very easy with the help of
+off-the-shelf tools.
 
 These dictionary attacks can be prevented by using salts.
 Remember a hacker needs to use the most likely candidates 
@@ -544,7 +547,7 @@
 \noindent where the first part is the login-name, followed by
 a field \pcode{$6$} which specifies which hash-function is
 used. After that follows the salt \pcode{3WWbKfr1} and after
-that the hash-value that is stored for the password ( which
+that the hash-value that is stored for the password (which
 includes the salt). I leave it to you to figure out how the
 password verification would need to work based on this data.
 
@@ -561,18 +564,20 @@
 will be associated with a different hash-value. This will
 make the life harder for an attacker.
 
-Note another interesting point. The web-application from the previous
-section was only secure when the salt was secret. In the password
-case, this is not needed. The salt can be public as shown above in the
-Unix password file where is actually stored as part of the password
-entry. Knowing the salt does not give the attacker any advantage, but
-prevents that dictionaries can be precompiled. While salts do not
-solve every problem, they help with protecting against dictionary
-attacks on password files. It protects people who have the same
-passwords on multiple machines. But it does not protect against a
-focused attack against a single password and also does not make poorly
-chosen passwords any better. Still the moral is that you should never
-store passwords in plain text. Never ever.\medskip
+Note another interesting point. The web-application from the
+previous section was only secure when the salt was secret. In
+the password case, this is not needed. The salt can be public
+as shown above in the Unix password file where it is actually
+stored as part of the password entry. Knowing the salt does
+not give the attacker any advantage, but prevents that
+dictionaries can be precompiled. While salts do not solve
+every problem, they help with protecting against dictionary
+attacks on password files. It protects people who have the
+same passwords on multiple machines. But it does not protect
+against a focused attack against a single password and also
+does not make poorly chosen passwords any better. Still the
+moral is that you should never store passwords in plain text.
+Never ever.\medskip
 
 \noindent
 If you want to know more about passwords I recommend viewing some

Binary file handouts/ho02.pdf has changed

--- a/handouts/ho02.tex	Tue Oct 07 12:48:07 2014 +0100
+++ b/handouts/ho02.tex	Thu Oct 09 14:41:36 2014 +0100
@@ -10,7 +10,7 @@
 phenomena: for example I am happy (more or less) to use online
 banking every day, where if something goes wrong, I can
 potentially lose a lot of money, but I am staunchly against
-using electronic voting (lets call it e-voting for short).
+using electronic voting (let's call it e-voting for short).
 E-voting is an idea that is nowadays often promoted in order
 to counter low turnouts in elections\footnote{In my last local
 election where I was eligible to vote only 48\% of the
@@ -35,8 +35,8 @@
 unsolvable with current technology. This is not just my
 opinion, but also shared by many security researchers amongst
 them Alex Halderman, who is the world-expert on this subject
-and from whose course on Securing Digital Democracy I have
-most of my information and inspiration. It is also a
+and from whose Coursera course on Securing Digital Democracy I
+have most of my information and inspiration. It is also a
 controversial topic in many countries:
 
 \begin{itemize}
@@ -56,6 +56,9 @@
       
 \item The US used mechanical machines since the 1930s, later
       punch cards, now DREs and optical scan voting machines.
+      But there is a lot of evidence that DREs and optical 
+      scan voting machines are not as secure as they should
+      be.
 
 \item Estonia used since 2007 the Internet for national
       elections. There were earlier pilot studies for voting
@@ -89,13 +92,13 @@
         what should be ensured is that the error rate does not
         change the outcome of the election. Of course if
         elections continue to be on knives edges we need to
-        ensure that we have a rather small error rate. 
+        strive for rather small error rates. 
           
   \item There might be gigantic sums at stake and need to be
         defended against. The problem with this is that if
         the incentives are great and enough resources are
         available, then maybe it is feasible to mount a DoS
-        attack against voting server and by bringing the
+        attack against the voting server and by bringing the
         system to its knees, change the outcome of an
         election. Not to mention to hack the complete
         system with malware and change votes undetectably.                
@@ -132,7 +135,7 @@
         one reason or another just do not have driving
         licenses. They are now excluded. Also if you insist on
         paper ballots you have to have special provisions for
-        blind people. Otherwise they cannot vote.
+        blind people. Otherwise they too cannot vote.
  \end{itemize}
   
 \item {\bf Availability}
@@ -155,12 +158,12 @@
 \noindent If we had ballots with complete voter
 identification, then we can improve integrity because we can
 trace back the votes to the voters. This would be good when
-verifying the results or recounting. But such an
+verifying the results or when recounting. But such an
 identification would violate ballot secrecy (you can prove to
 somebody else how you voted). In contrast, if we remove all
 identification for ensuring ballot secrecy, then we have to
 ensure that no ``vote-stuffing'' occurs. Similarly, if we
-improve authentication by requiring a to be present at the
+improve authentication by requiring to be present at the
 polling station with an ID card, then we exclude absentee
 voting.
 
@@ -170,7 +173,7 @@
 is not entirely trivial and immune from being hacked. We know
 for sure that elections were held in Athens as early as 600
 BC, but might even date to the time of Mesopotamia and also in
-India some kind of ``republics'' might have existed before the
+India some kind of republics might have existed before the
 Alexander the Great invaded it. Have a look at Wikipedia about
 the history of democracy for more information. These elections
 were mainly based on voting by show of hands. While this
@@ -180,21 +183,22 @@
 Romans did not perceive this as a problem, but the result was
 that their elections favoured rich, famous people who had
 enough resources to swing votes. Even using small coloured
-stones did not really mitigate the problem with ballot
-secrecy. The problem of authorisation was solved by friends or
-neighbours vouching for you to prove you are eligible to vote
-(there were no ID cards in ancient Greece and Rome).
+stones, which were also used at that time, did not really
+mitigate the problem with ballot secrecy. The problem of
+authorisation was solved by friends or neighbours vouching for
+you to prove you are eligible to vote (there were no ID cards
+in ancient Greece and Rome).
 
 Starting with the French Revolution and the US constitution,
-people started to value a more egalitarian approach to voting
+people began to value a more egalitarian approach to voting
 and electing officials. This was also the time where paper
 ballots started to become the prevailing form of casting
 votes. While more resistant against voter intimidation, paper
 ballots need a number of security mechanisms to avoid fraud.
-For example you need voting booths to fill out the ballot in
-secret. Also transparent ballot boxes are often used in order
-to easily detect and prevent vote stuffing (prefilling the
-ballot box with false votes). 
+For example you need voting booths for being able to fill out
+the ballot in secret. Also transparent ballot boxes are often
+used in order to easily detect and prevent vote stuffing
+(prefilling the ballot box with false votes). 
 
 \begin{center}
 \includegraphics[scale=2.5]{../pics/ballotbox.jpg}
@@ -203,16 +207,18 @@
 \noindent Another security mechanism is to guard the ballot
 box against any tampering during the election until counting.
 The counting needs to be done by a team potentially involving
-also independent observers. One interesting attack against
-completely anonymous paper ballots is called \emph{chain vote
-attack}. It works if the paper ballots are given out to each
-voter at the polling station. Then an attacker can give the
-prefilled ballot to a voter. The voter uses this prefilled
-ballot to cast the vote, and then returns the empty ballot
-back to the attacker who now compensates the voter. The blank
-ballot can be reused for the next voter. 
+also independent observers. 
 
-The point is that paper ballots have evolved over some time 
+One interesting attack against completely anonymous paper
+ballots is called \emph{chain vote attack}. It works if the
+paper ballots are given out to each voter at the polling
+station. Then an attacker can give the prefilled ballot to a
+voter. The voter uses this prefilled ballot to cast the vote,
+and then returns the empty ballot paper back to the attacker who now
+compensates the voter. The blank ballot can be reused for the
+next voter. 
+
+To sum up, the point is that paper ballots have evolved over some time 
 and no single best method has emerged for preventing fraud.
 But the involved technology is well understood in order to
 provide good enough security with paper ballots.
@@ -229,25 +235,30 @@
 \end{quote}
 
 \noindent Whenever people argue in favour of e-voting they
-seem to be ignore this basic premise.\bigskip
+seem to be ignoring this basic premise.\bigskip
 
 \noindent After the debacle of the Florida presidential
-election in 2000, many counties used Direct-Recording
-Electronic voting machines (DREs) or optical scan machines.
-One popular model of DRE was sold by the company called
-Diebold. In hindsight they were a complete disaster: the
-products were inferior and the company incompetent. Direct
-recording meant that there was no paper trail, the votes were
-directly recorded on memory cards. Thus the voters had no
-visible assurance whether the votes were correctly cast. The
-machines behind these DREs were ``normal'' windows computers,
-which could be used for anything, for example for changing
-votes. Why did nobody at Diebold think of that? That this was
-eventually done undetectably is the result of the
-determination of ethical hackers like Alex Halderman. His
-group thoroughly hacked them showing that election fraud is
-easily possible. They managed to write a virus that infected
-the whole system by having only access to a single machine.
+election in 2000, many voting precincts in the US used
+Direct-Recording Electronic voting machines (DREs) or optical
+scan machines. One popular model of DREs was sold by a
+company called Diebold. In hindsight they were a complete
+disaster: the products were inadequate and the company
+incompetent. Direct recording meant that there was no paper
+trail, the votes were directly recorded on memory cards. Thus
+the voters had no visible assurance whether the votes were
+correctly cast. Even if there is a printout provided;
+it does not give any guaranty about what is recorded on
+the memory card.
+
+The machines behind these DREs were ``normal'' windows
+computers, which could be used for anything, for example for
+changing votes. Why did nobody at Diebold think of that? I
+have no idea. But that this was eventually done undetectably
+is the result of the determination of ethical hackers like
+Alex Halderman. His group thoroughly hacked Diebold's DREs
+showing that election fraud with them is easily possible. They
+even managed to write a virus that infected the whole system
+by having only access to a single machine.
 
 \begin{figure}[t]
 \begin{center}
@@ -262,35 +273,35 @@
 \end{figure}
 
 What made matters worse was that Diebold tried to hide their
-incompetency and inferiority of their products, by requiring
-that election counties must not give the machines up for
-independent review. They also kept their source secret. 
-This meant Halderman and his group had to obtain a machine
-not in the official channels. Then they had to reverse 
-engineer the source code in order to design their attack. 
-What this all showed is that a shady security design is no 
-match to a determined hacker. 
+incompetency and the inferiority of their products, by
+requiring that election counties must not give the machines up
+for independent review. They also kept their source secret.
+This meant Halderman and his group had to obtain a machine not
+through the official channels. They then had to reverse
+engineer the source code in order to design their attack. What
+this all showed is that a shady security design is no match to
+a determined hacker. 
 
 Apart from the obvious failings (for example no papertrail),
 this story also told another side. While a paper ballot box
 need to be kept secure from the beginning of the election
 (when it needs to be ensured it is empty) until the end of the
 day, electronic voting machines need to be kept secure the
-whole year. The reason is of course one cannot see whether
-somebody has tampered with the program a computer is running.
-Such a 24/7 security costly and often even even impossible,
-because voting machines need to be distributed usually the day
-before to the polling station. These are often schools where
-the voting machines are kept unsecured overnight. The obvious
-solution of putting seals on computers also does not work: in
-the process of getting these DREs discredited (involving court
-cases) it was shown that seals can easily be circumvented. The
-moral of this story is that election officials were 
-incentivised with money by the central government to obtain
-new  voting equipment and in the process fell prey to pariahs
-which sold them a substandard product. Diebold was not the
-only pariah in this project, but one of the more notorious
-one.
+whole year. The reason is of course that one cannot see
+whether somebody has tampered with the program a computer is
+running. Such a 24/7 security is costly and often even
+impossible, because voting machines need to be distributed
+usually the day before the election to the polling stations.
+These are often schools where the voting machines are kept
+unsecured overnight. The obvious solution of putting seals on
+computers did not work: in the process of getting these DREs
+discredited (involving court cases) it was shown that seals
+can easily be circumvented. The moral of this story is that
+election officials were incentivised with money by the central
+government to obtain new voting equipment and in the process
+fell prey to pariahs which sold them a substandard product.
+Diebold was not the only pariah in this area, but one of the
+more notorious ones.
 
 Optical scan machines are slightly better from a security
 point of view but by no means good enough. Their main idea
@@ -307,36 +318,38 @@
 India. Essentially they designed a bespoke voting device,
 which could not be used for anything else. Having a bespoke
 device is a good security engineering decision because it
-makes the attack surface smaller. If you have a full-fledged
-computer behind your system, then you can do everything a
-computer can do\ldots{}that is a lot, including a lot of
-abuse. What was bad that these machines did not have the
-important paper trail: that means if an election was tampered
-with, nobody would find out. Even if they had by their bespoke
-design a very small attack surface, ethical hackers were still
-able to tamper with them. The moral with Indian's voting
-machines is that even if very good security design decisions
-are taken, e-voting is very hard to get right.\bigskip 
+makes the attack surface much smaller. If you have a
+full-fledged computer behind your system, then you can do
+everything a computer can do\ldots{}and that is a lot,
+including a lot of abuse. What was bad about the devices in
+India was that these machines did not have the important paper
+trail: that means if an election was tampered with, nobody
+would find out. Even if they had by their bespoke design a
+very small attack surface, ethical hackers were still able to
+tamper with them. The moral with Indian's voting machines is
+that even if very good security design decisions are taken,
+e-voting is very hard to get right.\bigskip 
 
 
 \noindent This brings us to the case of Estonia, which held in
 2007 the worlds first general election that used Internet.
-Again their solution made some good choices: for example
-voter authentication is done via the Estonian ID card,
-which contains a chip like credit cards. They also made most
-of their source code public for independent scrutiny. Of
-this openness means that people (hacker) will look at your 
-fingers and find code such as
+Again their solution made some good choices: for example voter
+authentication is done via the Estonian ID card, which
+contains a chip like on credit cards. They also made most of
+their source code public for independent scrutiny. Of course
+this openness means that people (hackers) will look at your
+fingers and find code such as this snippet.
 
 {\footnotesize\lstinputlisting[language=Python,numbers=none]
 {../progs/estonia.py}}
 
-\noindent which can be downloaded from their github
+\noindent If you want to have a look their code can be
+downloaded from their github
 repository.\footnote{\url{https://github.com/vvk-ehk/evalimine/}}
 Also their system is designed such that Internet voting is
 used before the election: votes can be changed an unlimited
-amount of times, the last vote is tabulated, you can even
-change your vote on the polling day in person. This is an
+amount of times, always the last vote is tabulated, you can
+even change your vote on the polling day in person. This is an
 important security mechanism guarding against vote coercion,
 which of course is an important problem if you are allowed to
 vote via Internet.
@@ -345,7 +358,7 @@
 voters' computers and the central server. Unfortunately, their
 system is designed such that they needs to trust the integrity
 of voters’ computers, central server components and also the
-election staff. In 2014, group of independent observers around
+election staff. In 2014, a group of independent observers around
 Alex Halderman were able to scrutinise the election process in
 Estonia. They found many weaknesses, for example careless
 handling of software updates on the servers. They also
@@ -365,7 +378,7 @@
 \noindent This brings us to the question, what could be a
 viable electronic voting process in
 \underline{\textbf{\emph{theory}}} with current technology?
-In the literature one can find proposals such as
+In the literature one can find proposals such as this one:
 
 \begin{enumerate}
 \item Alice prepares and audits some ballots, then casts an
@@ -379,11 +392,11 @@
 
 \item When the election closes, all votes are shuffled and the
       system produces a non-interactive proof of a correct
-      shuffling. Correct in the sense that one cannot determine
+      shuffling---correct in the sense that one cannot determine
        anymore who has voted for what. This will require a 
-       zero-knowledge-proof based shuffling procedure.
+       shuffling procedure based on zero-knowledge-proofs.
 
-\item After a reasonable complaint period to let auditors
+\item After a reasonable complaint period, let auditors
       check the shuffling, all shuffled ballots are decrypted,
       and the system provides a decryption proof for each
       decrypted ballot. Again this will need a 
@@ -397,15 +410,15 @@
 
 \noindent As you can see the whole process is not trivial at
 all and leaves out a number of crucial details (such as how to
-best distribute public keys). It even depends on a highly
-sophisticated process called \emph{zero-knowledge-proofs}.
-They essentially allow one to convince somebody else to know
-a secret without revealing what the secret is. This is a kind
-of cryptographic ``magic'', like the Hellman-Diffie protocol
-which can be used to establish a secret even if you can only
-exchange postcards with your communication partner. We will
-look at zero-knowledge-proofs in a later lecture in more
-detail. 
+best distribute public keys for encryption). It even depends
+on a highly sophisticated process called
+\emph{zero-knowledge-proofs}. They essentially allow one to
+convince somebody else to know a secret without actually
+revealing what the secret is. This is a kind of cryptographic
+``magic'', like the Hellman-Diffie protocol which can be used
+to establish a secret even if you can only exchange postcards
+with your communication partner. We will look at
+zero-knowledge-proofs in a later lecture in more detail. 
 
 The point of these theoretical/hot-air musings is to show that
 such an e-voting procedure is far from convenient: it takes
@@ -421,7 +434,7 @@
 secrecy. This is different from online banking where the whole
 process is designed around authentication. If fraud occurs,
 you try to identify who did what (somebody’s account got zero;
-somewhere the money went). Even if there might be even more 
+somewhere the money went). Even if there might be more 
 gigantic sums at stake in online banking than with voting,
 it can be solved. That does not mean there are no problems
 with online banking. But with enough thought, they can
@@ -431,7 +444,7 @@
 
 
 This conclusion does not imply that in some special cases
-Internet voting cannot be made to work securely. Just in a
+of Internet voting cannot be made to work securely. Just in a
 general election where stakes are very high, it does not work.
 For example a good-enough and workable in-lecture online
 voting system where students' votes are anonymous and students

Binary file handouts/ho03.pdf has changed

--- a/handouts/ho03.tex	Tue Oct 07 12:48:07 2014 +0100
+++ b/handouts/ho03.tex	Thu Oct 09 14:41:36 2014 +0100
@@ -21,7 +21,9 @@
 computer science students, but who said that criminal hackers
 restrict themselves to everyday fare? Not to mention the
 free-riding script-kiddies who use this technology without
-even knowing what the underlying ideas are.
+even knowing what the underlying ideas are. If you want to be
+a good security engineer who needs to defend such attacks, 
+then better you know the details.
  
 For buffer overflow attacks to work, a number of innocent
 design decisions, which are really benign on their own, need
@@ -34,7 +36,7 @@
 ``forefathers'' can really be blamed, but as mentioned above
 we should already be way beyond the point that buffer overflow
 attacks are worth a thought. Unfortunately, this is far from
-the truth. I let you think why?
+the truth. I let you ponder why?
 
 One such ``benign'' design decision is how the memory is laid
 out into different regions for each process. 
@@ -64,18 +66,19 @@
 this region is read-only). The heap stores all data the
 programmer explicitly allocates. For us the most interesting
 region is the stack, which contains data mostly associated
-with the ``control flow'' of the program. Notice that the stack
-grows from a higher addresses to lower addresses. That means 
-that older items on the stack will be stored behind newer 
-items. Let's look a bit closer what happens with the stack.
-Consider the the trivial C program.
+with the control flow of the program. Notice that the stack
+grows from a higher addresses to lower addresses. That means
+that older items on the stack will be stored behind, or after,
+newer items. Let's look a bit closer what happens with the
+stack when a program is running. Consider the following simple
+C program.
  
 \lstinputlisting[language=C]{../progs/example1.c} 
  
-\noindent The main function calls \code{foo} with three
-argument. Foo contains two (local) buffers. The interesting
-point is what will the stack looks like after Line 3 has been
-executed? The answer is as follows:
+\noindent The \code{main} function calls \code{foo} with three
+arguments. \code{Foo} contains two (local) buffers. The
+interesting point for us will be what will the stack loke
+like after Line 3 has been executed? The answer is as follows:
  
 \begin{center} 
  \begin{tikzpicture}[scale=0.65]
@@ -128,8 +131,42 @@
 needed in order to clean up the stack to the last level---in
 fact there is no cleaning involved, but just the top of the
 stack will be set back. The two buffers are also on the stack,
-because they are local data within \code{foo}.
+because they are local data within \code{foo}. So in the
+middle is a snapshot of the stack after Line 3 has been 
+executed. In case you are familiar with assembly instructions
+you can also read off this behaviour from the machine
+code that the \code{gcc} compiler generates for the program
+above:\footnote{You can make \pcode{gcc} generate assembly 
+instructions if you call it with the \pcode{-S} option, 
+for example \pcode{gcc -S out in.c}\;. Or you can look
+at this code by using the debugger. This will be explained
+later.}.
 
+\begin{center}\small
+\begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
+{\lstinputlisting[language={[x86masm]Assembler},
+  morekeywords={movl},xleftmargin=5mm]
+  {../progs/example1a.s}} &
+{\lstinputlisting[language={[x86masm]Assembler},
+  morekeywords={movl,movw},xleftmargin=5mm]
+  {../progs/example1b.s}}  
+\end{tabular}
+\end{center}
+
+\noindent On the left you can see how the function
+\pcode{main} prepares in Lines 2 to 7 the stack, before
+calling the function \pcode{foo}. You can see that the
+numbers 3, 2, 1 are stored on the stack (the register
+\code{$esp} refers to the top of the stack). On the right
+you can see how the function \pcode{foo} stores the two local
+buffers onto the stack and initialises them with the given
+data (Lines 2 to 9). Since there is no real computation
+going on inside \pcode{foo} the function then just restores
+the stack to its old state and crucially sets the return
+address where the computation should resume (Line 9 in the
+code on the left hand side). The instruction \code{ret} then
+transfers control back to the function \pcode{main} to the
+teh instruction just after the call, namely Line 9.
  
 Another part of the ``conspiracy'' is that library functions
 in C look typically as follows:
@@ -142,13 +179,204 @@
 to a destination \pcode{dst}. The important point is that it
 copies the data until it reaches a zero-byte (\code{"\\0"}). 
 
+The central idea of the buffer overflow attack is to overwrite
+the return address on the stack which states where the control
+flow of the program should resume once the function at hand
+has finished its computation. So if we have somewhere in a 
+function a local a buffer, say
+
+\begin{center}
+\code{char buf[8];}
+\end{center}
+
+\noindent
+then the corresponding stack will look as follows
+
+\begin{center}
+ \begin{tikzpicture}[scale=0.65]
+  %\draw[step=1cm] (-3,-1) grid (3,8);
+  \draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1);
+  \draw[line width=1mm] (-1,-1.2) -- (-1,6.4);
+  \draw[line width=1mm] ( 1,-1.2) -- ( 1,6.4);
+  \draw (0,-1) node[anchor=south] {\tt main};
+  \draw[line width=1mm] (-1,0) -- (1,0);
+  \draw (0,0) node[anchor=south] {\tt arg$_3$=3};
+  \draw[line width=1mm] (-1,1) -- (1,1);
+  \draw (0,1) node[anchor=south] {\tt arg$_2$=2};
+  \draw[line width=1mm] (-1,2) -- (1,2);
+  \draw (0,2) node[anchor=south] {\tt arg$_1$=1};
+  \draw[line width=1mm] (-1,3) -- (1,3);
+  \draw (0,3.1) node[anchor=south] {\tt ret};
+  \draw[line width=1mm] (-1,4) -- (1,4);
+  \draw (0,4) node[anchor=south] {\small\tt last sp};
+  \draw[line width=1mm] (-1,5) -- (1,5);
+  \draw (0,5) node[anchor=south] {\tt buf};
+  \draw[line width=1mm] (-1,6) -- (1,6);
+  \draw (2,5.1) node[anchor=south] {\code{$esp}};
+  \draw[<-,line width=0.5mm] (1.1,6) -- (2.5,6);
+
+  \draw[->,line width=0.5mm] (1,4.5) -- (2.5,4.5);
+  \draw (2.6,4.1) node[anchor=south west] {\code{??}};
+  
+  \draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5);
+  \draw (2.6,3.1) node[anchor=south west] {\tt jump to \code{\\x080483f4}};
+\end{tikzpicture}
+\end{center}
+
+\noindent We need to fill this over its limit of
+8 characters so that it overwrites the stack pointer
+and then overwrites the return address. If, for example, 
+we want to jump to a specific address in memory, say,
+\pcode{\\x080483f4} then we need to fill the 
+buffer for example as follows
+
+\begin{center}
+\code{char buf[8] = "AAAAAAAABBBB\\xf4\\x83\\x04\\x08";}
+\end{center}
+ 
+\noindent The first 8 \pcode{A}s fill the buffer to the rim;
+the next four \pcode{B}s overwrite the stack pointer (with
+what data we overwrite this part is usually not important);
+then comes the address we want to jump to. Notice that we have
+to give the address in the reverse order. All addresses on
+Intel CPUs need to be given in this way. Since the string is
+enclosed in double quotes, the C convention is that the string
+internally will automatically be terminated by a zero-byte. If
+the programmer uses functions like \pcode{strcpy} for filling
+the buffer \pcode{buf}, then we can be sure it will overwrite
+the stack in this manner---since it will copy everything up
+to the zero-byte.
+
+What the outcome of such an attack is can be illustrated with
+the code shown in Figure~\ref{C2}. Under ``normal operation''
+this program ask for a login-name and a password (both are
+represented as strings). Both of which are stored in buffers
+of length 8. The function \pcode{match} tests whether two such 
+strings are equal. If yes, then the function lets you in
+(by printing \pcode{Welcome}). If not, it denies access
+(by printing \pcode{Wrong identity}). The vulnerable function
+is \code{get_line} in Lines 11 to 19. This function does not
+take any precautions about the buffer of 8 characters being
+filled beyond this 8-character-limit. The buffer overflow
+can be triggered by inputing something, like \pcode{foo}, for 
+the login name and then the specially crafted string as 
+password:
+
+\begin{center}
+\code{AAAAAAAABBBB\\x2c\\x85\\x04\\x08\\n}
+\end{center}
+
+\noindent The address happens to be the one for the function
+\pcode{welcome()}. This means even with this input (where the
+login name and password clearly do not match) the program will
+still print out \pcode{Welcome}. The only information we need
+for this attack is to know where the function
+\pcode{welcome()} starts in memory. This information can be
+easily obtained by starting the program inside the debugger
+and disassembling this function. 
+
+\begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
+  morekeywords={movl,movw}]
+$ gdb C2
+GNU gdb (GDB) 7.2-ubuntu
+(gdb) disassemble welcome
+\end{lstlisting}
+
+\noindent 
+The output will be something like this
+
+\begin{lstlisting}[numbers=none,language={[x86masm]Assembler},
+  morekeywords={movl,movw}]
+0x0804852c <+0>:     push   %ebp
+0x0804852d <+1>:     mov    %esp,%ebp
+0x0804852f <+3>:     sub    $0x4,%esp
+0x08048532 <+6>:     movl   $0x8048690,(%esp)
+0x08048539 <+13>:    call   0x80483a4 <puts@plt>
+0x0804853e <+18>:    movl   $0x0,(%esp)
+0x08048545 <+25>:    call   0x80483b4 <exit@plt>
+\end{lstlisting}
+
+\noindent indicating that the function \pcode{welcome()}
+starts at address \pcode{0x0804852c}.
+
 \begin{figure}[p]
 \lstinputlisting[language=C]{../progs/C2.c}
 \caption{A suspicious login implementation.\label{C2}}
 \end{figure}
 
+This kind of attack was very popular with commercial programs
+that needed a key to be unlocked. Historically, hackers first 
+broke the rather weak encryption of these locking mechanisms.
+After the encryption had been made stronger, hackers used
+buffer overflow attacks as shown above to jump directly to
+the part of the program that was intended to be only available
+after the correct key was typed in by the user. 
+
+\subsection*{Paylods}
+
+Unfortunately, much more harm can be caused by buffer overflow
+attacks. This is achieved by injecting code that will be run
+once the return address is appropriately modified. Typically
+the code that will be injected is for running a shell. In
+order to be send as part of the string that is overflowing the
+buffer, we need the code to be encoded as a sequence of 
+characters
+
+\lstinputlisting[language=C,numbers=none]{../progs/o1.c}
+
+\noindent These characters represent the machine code
+for opening a shell. It seems obtaining such a string
+requires higher-education in the architecture of the
+target system. But it is actually relatively simple: First
+there are many ready-made strings available---just a quick
+Google query away. Second, tools like the debugger can help 
+us again. We can just write the code we want in C, for 
+example this would be the program to start a shell
+
+\lstinputlisting[language=C,numbers=none]{../progs/shell.c} 
+
+\noindent Once compiled, we can use the debugger to obtain 
+the machine code, or even the ready made encoding as character
+sequence. 
+
+While easy, obtaining this string is not entirely trivial.
+Remember the functions in C that copy or fill buffers work
+such that they copy everything until the zero byte is reached.
+Unfortunately the ``vanilla'' output from the debugger for the
+shell-program will contain such zero bytes. So a
+post-processing phase is needed to rewrite the machine code
+such that it does not contain any zero bytes. This is like
+some works of literature that have been rewritten so that the
+letter 'i', for example, is avoided. For rewriting the machine
+code you might need to use clever tricks like
+
+\begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
+xor %eax, %eax
+\end{lstlisting}
+
+\noindent This instruction does not contain any zero byte when
+encoded, but produces a zero byte on the stack. 
+
+Having removed the zero bytes we can craft the string that 
+will be send to our target computer. It is typically of the 
+form
+
+\begin{center}
+  \begin{tikzpicture}[scale=0.7]
+  \draw[line width=1mm] (-2, -1) rectangle (2,3);
+  \draw[line width=1mm] (-2,1.9) -- (2,1.9);
+  \draw (0,2.5) node {\large\tt shell code};
+  \draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7);
+  \draw[->,line width=0.3mm] (1.05, -1) -- (1.05,-1.7) --
+  (-3,-1.7) -- (-3, 3.7) -- (-1.9, 3.7) -- (-1.9, 3.1);
+  \draw (-2, 3) node[anchor=north east] {\LARGE\tt "};
+  \draw ( 2,-0.9) node[anchor=west] {\LARGE\tt "};
+  \end{tikzpicture}
+\end{center}
+
+
 \bigskip\bigskip
-\subsubsection*{A Crash-Course on GDB}
+\subsubsection*{A Crash-Course for GDB}
 
 \begin{itemize}
 \item \texttt{(l)ist n} -- listing the source file from line

author	Christian Urban <christian dot urban at kcl dot ac dot uk>
	Thu, 09 Oct 2014 14:41:36 +0100
changeset 227	7807863c4196
parent 226	01fe5aba8781
child 228	4f7c7997b68b

handouts/ho01.pdf		file \| annotate \| diff \| comparison \| revisions
handouts/ho01.tex		file \| annotate \| diff \| comparison \| revisions
handouts/ho02.pdf		file \| annotate \| diff \| comparison \| revisions
handouts/ho02.tex		file \| annotate \| diff \| comparison \| revisions
handouts/ho03.pdf		file \| annotate \| diff \| comparison \| revisions
handouts/ho03.tex		file \| annotate \| diff \| comparison \| revisions