# HG changeset patch # User Christian Urban # Date 1506192747 -3600 # Node ID 7a6e8f603e084c79ed083eb68dae10fb95d8af4c # Parent 280e057558b87a77e8ecb51de588adcef7fe6517 updated diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho01.pdf Binary file handouts/ho01.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho02.pdf Binary file handouts/ho02.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho03.pdf Binary file handouts/ho03.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho04.pdf Binary file handouts/ho04.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho05.pdf Binary file handouts/ho05.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho06.pdf Binary file handouts/ho06.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho06.tex --- a/handouts/ho06.tex Sat Sep 23 19:39:53 2017 +0100 +++ b/handouts/ho06.tex Sat Sep 23 19:52:27 2017 +0100 @@ -1,620 +1,559 @@ \documentclass{article} \usepackage{../style} +\usepackage{../graphics} \begin{document} \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015} -\section*{Handout 6 (Zero-Knowledge Proofs)} +%https://www.theguardian.com/technology/2016/oct/04/yahoo-secret-email-program-nsa-fbi +%https://nakedsecurity.sophos.com/2015/11/12/california-collects-owns-and-sells-infants-dna-samples/ +%http://randomwalker.info/teaching/fall-2012-privacy-technologies/? +%https://josephhall.org/papers/NYU-MCC-1303-S2012_privacy_syllabus.pdf +%http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf +%http://www.cs.cmu.edu/~yuxiangw/docs/Differential%20Privacy.pdf +%https://www.youtube.com/watch?v=Gx13lgEudtU +%https://fpf.org/wp-content/uploads/Differential-Privacy-as-a-Response-to-the-Reidentification-Threat-Klinefelter-and-Chin.pdf +%http://research.neustar.biz/2014/09/08/differential-privacy-the-basics/ + +%===== +%Tim Greene, Network World, 17 Dec 2015 (via ACM TechNews, 18 Dec 2015) +% +%Massachusetts Institute of Technology (MIT) researchers' experimental +%Vuvuzela messaging system offers more privacy than The Onion Router (Tor) by +%rendering text messages sent through it untraceable. MIT Ph.D. student +%David Lazar says Vuvuzela resists traffic analysis attacks, while Tor +%cannot. The researchers say the system functions no matter how many parties +%are using it to communicate, and it employs encryption and a set of servers +%to conceal whether or not parties are participating in text-based dialogues. +%"Vuvuzela prevents an adversary from learning which pairs of users are +%communicating, as long as just one out of [the] servers is not compromised, +%even for users who continue to use Vuvuzela for years," they note. Vuvuzela +%can support millions of users hosted on commodity servers deployed by a +%single group of users. Instead of anonymizing users, Vuvuzela prevents +%outside observers from differentiating between people sending messages, +%receiving messages, or neither, according to Lazar. The system imposes +%noise on the client-server traffic which cannot be distinguished from actual +%messages, and all communications are triple-wrapped in encryption by three +%servers. "Vuvuzela guarantees privacy as long as one of the servers is +%uncompromised, so using more servers increases security at the cost of +%increased message latency," Lazar notes. +%http://orange.hosting.lsoft.com/trk/click?ref=znwrbbrs9_5-e70bx2d991x066779& -Zero-knowledge proofs (short ZKP) solve a paradoxical puzzle: -How to convince somebody else that one knows a secret, without -revealing what the secret actually is? This sounds like a -problem the Mad Hatter from Alice in Wonderland would occupy -himself with, but actually there some serious and not so -serious applications of it. For example, if you solve -crosswords with your friend, say Bob, you might want to -convince him that you found a solution for one question, but -of course you do not want to reveal the solution, as this -might give Bob an advantage somewhere else in the crossword. +%%%% +%% canvas tracking +%%https://freedom-to-tinker.com/blog/englehardt/the-princeton-web-census-a-1-million-site-measurement-and-analysis-of-web-privacy/ + +%%% +%% cupit re-identification attack +%% https://nakedsecurity.sophos.com/2016/05/20/published-personal-data-on-70000-okcupid-users-taken-down-after-dmca-order/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+nakedsecurity+%28Naked+Security+-+Sophos%29 + +%Differential privacy +%===================== +%https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/ + +%Differential privacy, translated from Apple-speak, is the +%statistical science of trying to learn as much as possible +%about a group while learning as little as possible about any +%individual in it. -So how to convince Bob that you know the answer (or a secret)? -One way would be to come up with the following protocol: -Suppose the answer is \textit{folio}. Then look up the -definition of \textit{folio} in a dictionary. Say you find: +%As Roth notes when he refers to a “mathematical proof,” +%differential privacy doesn’t merely try to obfuscate or +%“anonymize” users’ data. That anonymization approach, he +%argues, tends to fail. In 2007, for instance, Netflix released +%a large collection of its viewers’ film ratings as part of a +%competition to optimize its recommendations, removing people’s +%names and other identifying details and publishing only their +%Netflix ratings. But researchers soon cross-referenced the +%Netflix data with public review data on IMDB to match up +%similar patterns of recommendations between the sites and add +%names back into Netflix’s supposedly anonymous database. + +%As an example of that last method, Microsoft’s Dwork points to +%the technique in which a survey asks if the respondent has +%ever, say, broken a law. But first, the survey asks them to +%flip a coin. If the result is tails, they should answer +%honestly. If the result is heads, they’re instructed to flip +%the coin again and then answer “yes” for heads or “no” for +%tails. The resulting random noise can be subtracted from the +%results with a bit of algebra, and every respondent is +%protected from punishment if they admitted to lawbreaking. -\begin{quote} -``an \textit{individual} leaf of paper or parchment, either -loose as one of a series or forming part of a bound volume, -which is numbered on the recto or front side only.'' -\end{quote} +%https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf + +% Windows 10 data send back to Microsoft (Cortana) +%Here’s a non-exhaustive list of data sent back: location data, text +%input, voice input, touch input, webpages you visit, and telemetry +%data regarding your general usage of your computer, including which +%programs you run and for how long. -\noindent -Take the first non-small word\footnote{Let's say the, a, an, -or, and \ldots fall into the category of small words.} in this definition, -in this case \textit{individual}, and look up the definition -of this word, say +% Businesses are already using customised pricing online based on +% information they can glean about you. It is hard to know how +% widespread the practice is; companies keep their pricing strategies +% closely guarded and are wary of the bad PR price discrimination +% could pose. However, it is clear that a number of large retailers +% are experimenting with it. Staples, for example, has offered +% discounted prices based on whether rival stores are within 20 miles +% of its customers’ location. Office Depot has admitted to using its +% customers’ browsing history and location to vary its range of offers +% and products. A 2014 study from Northeastern University found +% evidence of “steering” or differential pricing at four out of 10 +% general merchandise websites and five out of five travel +% websites. (Steering is when a company doesn’t give you a customised +% price, but points you towards more expensive options if it thinks +% you will pay more.) The online travel company Orbitz raised +% headlines in 2012 when it emerged that the firm was pointing Mac +% users towards higher-priced hotel rooms than PC users. + + +%%% government will overwrite your wishes if it is annoymous +%% https://www.lightbluetouchpaper.org/2016/12/05/government-u-turn-on-health-privacy/ + +%% corporate surveilance / privacy - report and CC3C talk +%% http://crackedlabs.org/en/networksofcontrol +%% https://media.ccc.de/v/33c3-8414-corporate_surveillance_digital_tracking_big_data_privacy#video&t=2933 + +\section*{Handout 6 (Privacy)} -\begin{quote} -``a single \textit{human} being as distinct from a group'' -\end{quote} +The first motor car was invented around 1886. For ten years, +until 1896, the law in the UK (and elsewhere) required a +person to walk in front of any moving car waving a red flag. +Cars were such a novelty that most people did not know what to +make of them. The person with the red flag was intended to +warn the public, for example horse owners, about the impending +novelty---a car. In my humble opinion, we are at the same +stage of development with privacy. Nobody really knows what it +is about or what it is good for. All seems very hazy. There +are a few laws (e.g.~cookie law, right-to-be-forgotten law) +which address problems with privacy, but even if they are well +intentioned, they either back-fire or are already obsolete +because of newer technologies. The result is that the world of +``privacy'' looks a little bit like the old Wild +West---lawless and mythical. -\noindent -In this definition take the second non-small word, that -is \textit{human}, and again look up the definition of this -word. This will yield +We would have hoped that after Snowden, Western governments +would be a bit more sensitive and enlightned about the topic +of privacy, but this is far from the truth. Ross Anderson +wrote the following in his blog\footnote{\url{https://www.lightbluetouchpaper.org/2016/02/11/report-on-the-ip-bill/}} about the approach taken in +the US to lessons learned from the Snowden leaks and contrasts +this with the new snooping bill that is considered in the UK +parliament: -\begin{quote} -``relating to or \textit{characteristic} of humankind'' +\begin{quote}\it +``The comparison with the USA is stark. There, all three +branches of government realised they'd gone too far after +Snowden. President Obama set up the NSA review group, and +implemented most of its recommendations by executive order; +the judiciary made changes to the procedures of the FISA +Court; and Congress failed to renew the data retention +provisions in the Patriot Act (aided by the judiciary). Yet +here in Britain the response is just to take Henry VIII powers +to legalise all the illegal things that GCHQ had been up to, +and hope that the European courts won't strike the law down +yet again.'' \end{quote} -\noindent -You could go on looking up the definition of the third -non-small word in this definition and so on. But let us assume -you agreed with Bob to stop after three iterations with the -third non-article word in the last definition, that is -\textit{or}. Now, instead of sending to Bob the solution -\textit{folio}, you send to him \textit{characteristic}. +\noindent Unfortunately, also big organisations besides +governments seem to take an unenlightened approach to privacy. +For example, UCAS, a charity set up to help students with +applying to universities in the UK, has a commercial unit that +happily sells your email addresses to anybody who forks out +enough money for bombarding you with spam. Yes, you can opt +out very often from such ``schemes'', but in case of UCAS any +opt-out will limit also legit emails you might actually be +interested in.\footnote{The main objectionable point, in my +opinion, is that the \emph{charity} everybody has to use for +HE applications has actually very honourable goals +(e.g.~assist applicants in gaining access to universities), +but the small print (or better the link ``About us'') reveals +they set up their organisation so that they can also +shamelessly sell the email addresses they ``harvest''. +Everything is of course very legal\ldots{}ethical?\ldots{}well +that is in the eye of the beholder. See: -How can Bob verify that you know the solution? Well, once he -solved it himself, he can use the dictionary and follow the -same ``trail'' as you did. If the final word agrees with what -you had sent him, he must infer you knew the solution earlier -than him. This protocol works like a one-way hash function -because \textit{characteristic} does not give any hint as to -what was the first word was. I leave you to think why this -protocol avoids small words? +\url{http://www.ucas.com/about-us/inside-ucas/advertising-opportunities} +or +\url{http://www.theguardian.com/uk-news/2014/mar/12/ucas-sells-marketing-access-student-data-advertisers}} -After Bob found his solution and verified that according to -the protocol it ``maps'' also to \textit{characteristic}, can -he be entirely sure no cheating is going on? Not with 100\% -certainty. It could have been possible that he was given -\textit{characteristic} as the word, but it derived from a -different word. This might seem very unlikely, but at least -theoretical it is a possibility. Protocols based on -zero-knowledge proofs will produce a similar result---they -give an answer that might be erroneous in a very small number -of cases. The point is to iterate them long enough so that the -theoretical possibility of cheating is negligibly small. +Another example: Verizon, an ISP who is supposed to provide +you just with connectivity, has found a ``nice'' side-business +too: When you have enabled all privacy guards in your browser +(the few you have at your disposal), Verizon happily adds a +kind of cookie to your +HTTP-requests.\footnote{\url{http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/}} +As shown in the picture below, this cookie will be sent to +every web-site you visit. The web-sites then can forward the +cookie to advertisers who in turn pay Verizon to tell them +everything they want to know about the person who just made +this request, that is you. + +\begin{center} +\includegraphics[scale=0.16]{../pics/verizon.png} +\end{center} -By the way, the authors of the paper ``Dismantling Megamos -Crypto: Wirelessly Lockpicking a Vehicle Immobilizer'' who -were barred from publishing their results used also a hash to -prove they did the work and (presumably) managed to get into -cars without a key; see Figure~\ref{paper}. This is very -similar to the method above about crosswords: They like to -prove that they did the work, but not giving out the -``solution''. But this also shows what the problem with such a -method is: yes, we can hide the secret temporarily, but if -somebody else wants to verify it, then the secret has to be -made public. Bob needs to know that \textit{folio} is the -solution before he can verify the claim of Alice that she had -the solution first. Similarly with the car-crypto paper: we -needed to wait until September 2015 when the authors were -finally able to publish their findings in order to verify the -hash. Zero-knowledge proofs, in contrast, can be immediately -checked, even if the secret is not public yet and perhaps -never will be. - -\begin{figure} -\begin{center} -\addtolength{\fboxsep}{4mm} -\fbox{\includegraphics[scale=0.4]{../pics/Dismantling_Megamos_Crypto.png}} -\end{center} -\caption{The authors of this paper used a hash in order to prove -later that they have managed to break into cars.\label{paper}} -\end{figure} +\noindent How disgusting! Even worse, Verizon is not known for +being the cheapest ISP on the planet (completely the +contrary), and also not known for providing the fastest +possible speeds, but rather for being among the few ISPs in +the US with a quasi-monopolistic ``market distribution''. -\subsubsection*{ZKP: An Illustrative Example} +Well, we could go on and on\ldots{}and that has not even +started us yet with all the naughty things NSA \& Friends are +up to. Why does privacy actually matter? Nobody, I think, has +a conclusive answer to this question yet. Maybe the following +four notions help with clarifying the overall picture +somewhat: -The idea behind zero-knowledge proofs is not very obvious and -will surely take some time for you to digest. Therefore let us -start with a simple illustrative example. The example will not -be perfect, but hopefully explain the gist of the idea. The -example is called Alibaba's cave, which graphically looks as -follows:\footnote{The example is taken from an -article titled ``How to Explain Zero-Knowledge Protocols -to Your Children'' available from -\url{http://pages.cs.wisc.edu/~mkowalcz/628.pdf}.} +\begin{itemize} +\item \textbf{Secrecy} is the mechanism used to limit the + number of principals with access to information (e.g., + cryptography or access controls). For example I better + keep my password secret, otherwise people from the wrong + side of the law might impersonate me. + +\item \textbf{Confidentiality} is the obligation to protect + the secrets of other people or organisations (secrecy + for the benefit of an organisation). For example as a + staff member at King's I have access to data, even + private data, I am allowed to use in my work but not + allowed to disclose to anyone else. -\begin{center} -\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c} -\includegraphics[scale=0.1]{../pics/alibaba1.png} & -\includegraphics[scale=0.1]{../pics/alibaba2.png} & -\includegraphics[scale=0.1]{../pics/alibaba3.png} \\ -Step 1 & Step 2 & Step 3 -\end{tabular} -\end{center} +\item \textbf{Anonymity} is the ability to leave no evidence of + an activity (e.g., sharing a secret). This is not equal + with privacy---anonymity is required in many + circumstances, for example for whistle-blowers, + voting, exam marking and so on. -\noindent Let us take a closer look at the picture in Step 1: -The cave has a tunnel which forks at some point. Both forks -are connected in a loop. At the deep end of the loop is a -magic wand. The point of the magic wand is that Alice knows -the secret word for how to open it. She wants to keep the word -secret, but wants to convince Bob that she knows it. +\item \textbf{Privacy} is the ability or right to protect your + personal secrets (secrecy for the benefit of an + individual). For example, in a job interview, I might + not like to disclose that I am pregnant, if I were a + woman, or that I am a father. Lest they might not hire + me. Similarly, I might not like to disclose my location + data, because thieves might break into my house if they + know I am away at work. Privacy is essentially + everything which ``shouldn't be anybody's business''. + +\end{itemize} + +\noindent While this might provide us with some rough +definitions, the problem with privacy is that it is an +extremely fine line what should stay private and what should +not. For example, since I am working in academia, I am every +so often very happy to be a digital exhibitionist: I am very +happy to disclose all `trivia' related to my work on my +personal web-page. This is a kind of bragging that is normal +in academia (at least in the field of CS), even expected if +you look for a job. I am even happy that Google maintains a +profile about all my academic papers and their citations. -One way of course would be to let Bob follow her, but then he -would also find out the secret. So this does not work. To find -a solution, let us first fix the rules: At the beginning Alice -and Bob are outside the cave. Alice goes in alone and takes -either tunnel labelled $A$ in the picture, or the other tunnel -labelled $B$. She waits at the magic wand for the instructions -from Bob, who also goes into the gave and observes what -happens at the fork. He has no knowledge which tunnel Alice -took and calls out (in Step 2) that she should emerge from tunnel -$A$, for example. If she knows the secret for opening the -wand, this will not be a problem for Alice. If she was already -in the $A$-segment of the tunnel, then she just comes back. If -she was in the $B$-segment of the tunnel she will say the magic -word and goes through the wand to emerge from $A$ as requested -by Bob. +On the other hand I would be very irritated if anybody I do +not know had a too close look on my private live---it +shouldn't be anybody's business. The reason is that knowledge +about my private life can often be used against me. As mentioned +above, public location data might mean I get robbed. If +supermarkets build a profile of my shopping habits, they will +use it to \emph{their} advantage---surely not to \emph{my} +advantage. Also whatever might be collected about my life will +always be an incomplete, or even misleading, picture. For +example I am pretty sure my creditworthiness score was +temporarily(?) destroyed by not having a regular income in +this country (before coming to King's I worked in Munich for +five years). To correct such incomplete or flawed credit +history data there is, since recently, a law that allows you +to check what information is held about you for determining +your creditworthiness. But this concerns only a very small +part of the data that is held about me/you. Also +what about cases where data is wrong or outdated (but do we +need a right-to be forgotten). + + +To see how private matter can lead really to the wrong +conclusions, take the example of Stephen Hawking: When he was +diagnosed with his disease, he was given a life expectancy of +two years. If employers would know about such problems, would +they have employed Hawking? Now, he is enjoying his 70+ +birthday. Clearly personal medical data needs to stay private. + +To cut a long story short, I let you ponder about the two +statements which are often voiced in discussions about privacy: -Let us have a look at the case where Alice cheats, that is not -knows the secret. She would still go into the cave and use, -for example the $B$-segment of the tunnel. If now Bob says she -should emerge from $B$, she is lucky. But if he says she -should emerge from $A$ then Alice is in trouble: Bob will find -out she does not actually know the secret. So in order to fool -Bob she needs to anticipate his call, and already go into the -corresponding tunnel. This of course also does not work, since -Bob can make any call he wants. Consequently in order to find -out whether Alice cheats, Bob just needs to repeat this -protocol many times. Each time Alice has a chance of -$\frac{1}{2}$ to be lucky or being found out. Iterating this -$n$ times means she must be right every time and when -cheating: the probability for this is $\frac{1}{2}^n$, number -that for already relatively small $n$, say 10, is incredibly -small. +\begin{itemize} +\item \textit{``You have zero privacy anyway. Get over +it.''}\\ +\mbox{}\hfill{}{\small{}(by Scott Mcnealy, former CEO of Sun)} + +\item \textit{``If you have nothing to hide, you have nothing +to fear.''} +\end{itemize} + +\noindent If you like to watch a movie which has this topic as +its main focus I recommend \emph{Gattaca} from +1997.\footnote{\url{http://www.imdb.com/title/tt0119177/}} If +you want to read up on this topic, I can recommend the +following article that appeared in 2011 in the Chronicle of +Higher Education: + +\begin{center} +\url{http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/} +\end{center} + +\noindent Funnily, or maybe not so funnily, the author of this +article carefully tries to construct an argument that does not +only attack the nothing-to-hide statement in cases where +governments \& co collect people's deepest secrets, or +pictures of people's naked bodies, but an argument that +applies also in cases where governments ``only'' collect data +relevant to, say, preventing terrorism. The fun is of course +that in 2011 we could just not imagine that respected +governments would do such infantile things as intercepting +people's nude photos. Well, since Snowden we know some people +at the NSA did exactly that and then shared such photos among +colleagues as ``fringe benefit''. -There are some interesting observations we can make about -Alibaba's cave and the ZKP protocol between Alice and Bob: +\subsubsection*{Re-Identification Attacks} + +Apart from philosophical musings, there are fortunately also +some real technical problems with privacy. The problem I want +to focus on in this handout is how to safely disclose datasets +containing potentially very private data, say health records. +What can go wrong with such disclosures can be illustrated +with four well-known examples: \begin{itemize} -\item {\bf Completeness} If Alice knows the secret, Bob - accepts Alice ``proof'' for sure. There is no error - possible in that Bob thinks Alice cheats when she - actually knows the secret. -\item {\bf Soundness} If Alice does not know the secret, - Bob accepts her ``proof'' with a very small probability. - If, as in the example above, the probability of being - able to hide cheating is $\frac{1}{2}$ in each round - it will be $\frac{1}{2}^n$ after $n$-rounds, which even - for small $n$ say $> 10$ is very small indeed. -\item {\bf Zero-Knowledge} Even if Bob accepts - the proof by Alice, he cannot convince anybody else. -\end{itemize} +\item In 2006, a then young company called Netflix offered a 1 + Mio \$ prize to anybody who could improve their movie + rating algorithm. For this they disclosed a dataset + containing 10\% of all Netflix users at the time + (appr.~500K). They removed names, but included numerical + ratings of movies as well as times when ratings were + uploaded. Though some information was perturbed (i.e., + slightly modified). + + Two researchers had a closer look at this anonymised + data and compared it with public data available from the + International Movie Database (IMDb). They found that + 98\% of the entries could be re-identified in the + Netflix dataset: either by their ratings or by the dates + the ratings were uploaded. The result was a class-action + suit against Netflix, which was only recently resolved + involving a lot of money. -\noindent The last property is the most interesting one. -Assume Alice has convinced Bob that she knows the secret and -Bob filmed the whole protocol with a camera. Can he use the -video to convince anybody else? After a moment of thought, you -will agree that this is not the case. Alice and Bob might have -just made it all up and colluded by Bob telling Alice -beforehand which tunnel he will call. In this way it appears -as if all iterations of the protocol were successful, but they -prove nothing. If another person wants to find out whether -Alice knows the secret, he or she would have to conduct the -protocol again. This is actually the formal definition of a -zero-knowledge proof: an independent observer cannot -distinguish between a real protocol (where Alice knows the -secret) and a fake one (where Bob and Alice colluded). +\item In the 1990ies, medical datasets were often made public + for research purposes. This was done in anonymised form + with names removed, but birth dates, gender and ZIP-code + were retained. In one case where such data about + hospital visits of state employees in Massachusetts was + made public, the then governor assured the public that + the released dataset protected patient privacy by + deleting identifiers. + + A graduate student could not resist cross-referencing + public voter data with the released data that still + included birth dates, gender and ZIP-code. The result + was that she could send the governor his own hospital + record. It turns out that birth dates, gender and + ZIP-code uniquely identify 87\% of people in the US. + This work resulted in a number of laws prescribing which + private data cannot be released in such datasets. + +\item In 2006, AOL published 20 million Web search queries + collected from 650,000 users (names had been deleted). + This was again done for research purposes. However, + within days an old lady, Thelma Arnold, from Lilburn, + Georgia, (11,596 inhabitants) was identified as user + No.~4417749 in this dataset. It turned out that search + engine queries are deep windows into people's private + lives. + +\item Genome-Wide Association Studies (GWAS) was a public + database of gene-frequency studies linked to diseases. + It would essentially record that people who have a + disease, say diabetes, have also certain genes. In order + to maintain privacy, the dataset would only include + aggregate information. In case of DNA data this + aggregation was achieved by mixing the DNA of many + individuals (having a disease) into a single solution. + Then this mixture was sequenced and included in the + dataset. The idea was that the aggregate information + would still be helpful to researchers, but would protect + the DNA data of individuals. + + In 2007 a forensic computer scientist showed that + individuals can still be identified. For this he used + the DNA data from a comparison group (people from the + general public) and ``subtracted'' this data from the + published data. He was left with data that included all + ``special'' DNA-markers of the individuals present in + the original mixture. He essentially deleted the + ``background noise'' in the published data. The problem + with DNA data is that it is of such a high resolution + that even if the mixture contained maybe 100 + individuals, you can with current technology detect + whether an individual was included in the mixture or + not. + + This result changed completely how DNA data is nowadays + published for research purposes. After the success of + the human-genome project with a very open culture of + exchanging data, it became much more difficult to + anonymise data so that patient's privacy is preserved. + The public GWAS database was taken offline in 2008. + +\end{itemize} + +\noindent There are many lessons that can be learned from +these examples. One is that when making datasets public in +anonymised form, you want to achieve \emph{forward privacy}. +This means, no matter what other data that is also available +or will be released later, the data in the original dataset +does not compromise an individual's privacy. This principle +was violated by the availability of ``outside data'' in the +Netflix and governor of Massachusetts cases. The additional +data permitted a re-identification of individuals in the +dataset. In case of GWAS a new technique of re-identification +compromised the privacy of people in the dataset. The case of +the AOL dataset shows clearly how incomplete such data can be: +Although the queries uniquely identified the older lady, she +also looked up diseases that her friends had, which had +nothing to do with her. Any rational analysis of her query +data must therefore have concluded, the lady is on her +death bed, while she was actually very much alive and kicking. + +In 2016, Yahoo released the so far largest machine learning +dataset to the research community. It includes approximately +13.5 TByte of data representing around 100 Billion events from +anonymized user-news items, collected by recording +interactions of about 20M users from February 2015 to May +2015. Yahoo's gracious goal is to promote independent research +in the fields of large-scale machine learning and recommender +systems. It remains to be seen whether this data will really +only be used for that purpose. + +\subsubsection*{Differential Privacy} + +Differential privacy is one of the few methods that tries to +achieve forward privacy. The basic idea is to add appropriate +noise, or errors, to any query of the dataset. The intention +is to make the result of a query insensitive to individual +entries in the database. That means the results are +approximately the same no matter if a particular individual is +in the dataset or not. The hope is that the added error does +not eliminate the ``signal'' one is looking for in the +dataset. + +%\begin{center} +%User\;\;\;\; +%\begin{tabular}{c} +%tell me $f(x)$ $\Rightarrow$\\ +%$\Leftarrow$ $f(x) + \text{noise}$ +%\end{tabular} +%\;\;\;\;\begin{tabular}{@{}c} +%Database\\ +%$x_1, \ldots, x_n$ +%\end{tabular} +%\end{center} +% +%\begin{center} +%\begin{tabular}{l|l} +%Staff & Salary\\\hline +%$PM$ & \pounds{107}\\ +%$PF$ & \pounds{102}\\ +%$LM_1$ & \pounds{101}\\ +%$LF_2$ & \pounds{97}\\ +%$LM_3$ & \pounds{100}\\ +%$LM_4$ & \pounds{99}\\ +%$LF_5$ & \pounds{98} +%\end{tabular} +%\end{center} +% +% +%\begin{center} +%\begin{tikzpicture} +%\begin{axis}[symbolic y coords={salary}, +% ytick=data, +% height=3cm] %\addplot+[jump mark mid] coordinates %{(0,salary) (0.1,salary) % (0.4,salary) (0.5,salary) % (0.8,salary) (0.9,salary)}; %\end{axis} %\end{tikzpicture} +%\end{center} +% +%\begin{tikzpicture}[outline/.style={draw=#1,fill=#1!20}] % \node [outline=red] {red box}; % \node [outline=blue] at (0,-1) {blue box}; %\end{tikzpicture} + +\ldots -\subsubsection*{Using an Graph-Isomorphism Problem for ZKPs} - -Now the question is how can we translate Alibaba's cave into a -computer science solution? It turns out we need an NP problem -for that. The main feature of an NP problem is that it is -computational very hard to generate a solution, but it is very -easy to check whether a given solution indeed solves the -problem at hand.\footnote{The question whether $P = NP$ or not, -we leave to others to speculate about.} - -One NP problem is the \emph{graph isomorphism problem}. It -essentially determines whether the following two graphs, say -$G_1$ and $G_2$, can be moved and stretched so that they look -exactly the same. +\subsubsection*{Further Reading} -\begin{center} -\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c} -$G_1$ & $G_2$\\ -\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple.png}} & -\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple-b.png}}& +Two cool articles about how somebody obtained via the Freedom +of Information Law the taxicab dataset of New York and someone +else showed how easy it is to mine for private information: -\footnotesize -\begin{tabular}{rl} -Graph $G_1$ & Graph $G_2$\\ -a & 1\\ -b & 6\\ -c & 8\\ -d & 3\\ -g & 5\\ -h & 2\\ -i & 4\\ -j & 7\\ -\end{tabular} +\begin{center}\small +\begin{tabular}{p{0.78\textwidth}} +\url{http://chriswhong.com/open-data/foil_nyc_taxi/}\smallskip\\ +\url{http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset} \end{tabular} \end{center} -\noindent The table on the right gives a mapping of the nodes -of the first graph to the nodes of the second. With this -mapping we can check: node $a$ is connected in $G_1$ with $g$, -$h$ and $i$. Node $a$ maps to node $1$ in $G_2$, which is -connected to $2$, $4$ and $5$, which again correspond via the -mapping to $h$, $i$ and $g$ respectively. Let us write -$\sigma$ for such a table and let us write - -\[G_1 = \sigma(G_2)\] - -\noindent for two isomorphic graphs ($\sigma$ being the -isomorphism). It is actually very easy to construct two -isomorphic graphs: Start with an arbitrary graph, re-label the -nodes consistently. Alice will need to do this frequently -for the protocol below. In order to be useful, we therefore -would need to consider substantially larger graphs, like -with thousand nodes. - -Now the secret which Alice tries to hide is the knowledge of -such an isomorphism $\sigma$ between two such graphs. But she -can convince Bob whether she knows such an isomorphism for two -graphs, say $G_1$ and $G_2$, using the same idea as in the -example with Alibaba's cave. For this Alice and Bob must -follow the following protocol: - -\begin{enumerate} -\item Alice generates an isomorphic graph $H$ which she sends - to Bob (in each iteration she needs to generate a - different $H$). -\item - After receiving $H$, Bob asks Alice either for an - isomorphism between $G_1$ and $H$, or $G_2$ and $H$. -\item Alice and Bob repeat this procedure $n$ times. -\end{enumerate} - -\noindent In Step 1 it is important that Alice always -generates a fresh isomorphic graph. I let you think what -would happen if Alice sends out twice the same graph $H$. - -As said before, this is relatively easy to generate by -consistently relabelling nodes. If she started from $G_1$, -Alice will have generated - -\begin{equation} -H = \sigma'(G_1)\label{hiso} -\end{equation} - -\noindent where $\sigma'$ is the isomorphism between $H$ and -$G_1$. But she could equally have started from $G_2$. In the -case where $G_1$ and $G_2$ are isomorphic, if $H$ is -isomorphic with $G_1$, it will also be isomorphic with -$G_2$, and vice versa. - -After generating $H$, Alice sends it to Bob. The important -point is that she needs to ``commit'' to this $H$, therefore -this kind of zero-knowledge protocols are called -\emph{commitment protocols}. Only after receiving $H$, Bob -will make up his mind about which isomorphism he asks -for---whether between $H$ and $G_1$ or $H$ and $G_2$. For this -he could flip a coin, since the choice should be as -unpredictable for Alice as possible. Once Alice receives the -request, she has to produce an isomorphism. If she generated -$H$ as shown in \eqref{hiso} and is asked for an isomorphism -between $H$ and $G_1$, she just sends $\sigma'$. If she had -been asked for an isomorphism between $H$ and $G_2$, she just -has to compose her secret isomorphism $\sigma$ and $\sigma'$. -The main point for the protocol is that even knowing the -isomorphism between $H$ and $G_1$ or $H$ and $G_2$, will not -make the task easier to find the isomorphism between $G_1$ and -$G_2$, which is the secret Alice tries to protect. - -In order to make it crystal clear how this protocol proceeds, -let us give a version using our more formal notation for -protocols: +\noindent +A readable article about how supermarkets mine your shopping +habits (especially how they prey on new exhausted parents +;o) appeared in 2012 in the New York Times: \begin{center} -\begin{tabular}{lrl} -0) & $A \to B:$ & $G_1$ and $G_2$\\ -1a) & $A \to B:$ & $H_1$ \\ -1b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_1$? - \quad(or $G_2 \leftrightarrow H_1$)\\ -1c) & $A \to B:$ & requested isomorphism\\ -2a) & $A \to B:$ & $H_2$\\ -2b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_2$? - \quad(or $G_2 \leftrightarrow H_2$)\\ -2c) & $A \to B:$ & requested isomorphism\\ - & \ldots\\ -\end{tabular} +\url{http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html} \end{center} -\noindent As can be seen the protocol runs for some -agreed number of iterations. The $H_i$ Alice needs to -produce, need to be all distinct. I hope you now know -why? - -It is also crucial that in each iteration, Alice first sends -$H_i$ and then Bob can decide which isomorphism he wants: -either $G_1 \leftrightarrow H_i$ or $G_2 \leftrightarrow H_i$. -If somehow Alice can find out before she committed to $H_i$, -she can cheat. For this assume Alice does \emph{not} know an -isomorphism between $G_1$ and $G_2$. If she knows which -isomorphism Bob will ask for she can craft $H$ in such a way -that it is isomorphism with either $G_1$ or $G_2$ (but it -cannot with both). Then in each case she would send Bob -a correct answer and he would come to the conclusion that -all is well. I let you also answer the question whether -such a protocol run between Alice and Bob would convince -a third person, say Pete. - -Since the interactive nature of the above PKZ protocol and the -correct ordering of the messages is so important for the -``correctness'' of the protocol, it might look surprising that -the same goal can also me achieved in a completely offline -manner. By this I mean Alice can publish all data at once, and -then at a later time, Bob can inspect the data and come to the -conclusion whether or not Alice knows the secret (again -without actually learning about the secret). For this -Alice has to do the following: +\noindent An article that analyses privacy and shopping habits +from a more economic point of view is available from: -\begin{enumerate} -\item Alice generates $n$ isomorphic graphs - $H_{1..n}$ (they need to be all distinct) -\item she feeds the $H_{1..n}$ into a hashing function - (for example encoded as as matrix) -\item she takes the first $n$ bits of the output of the hashing - function: - whenever the output is $0$, she shows an isomorphism - with $G_1$; for $1$ she shows an isomorphism - with $G_2$ -\end{enumerate} +\begin{center} +\url{http://www.dtc.umn.edu/~odlyzko/doc/privacy.economics.pdf} +\end{center} -\noindent The reason why this works and achieves the same -goal as the interactive variant is that Alice has no -control over the hashing function. It would be -computationally just too hard to assemble a set of -$H_{1..n}$ such that she can force where $0$s and $1$s -in the hash values are such that it would pass an external -test. The point is that Alice can publish all this data -on the comfort of her own web-page, for example, and -in this way convince everybody who bothers to check. - -The virtue of the use of graphs and isomorphism for a -zero-knowledge protocol is that the idea why it works -are relatively straightforward. The disadvantage is -that encoding any secret into a graph-isomorphism, while -possible, is awkward. The good news is that in fact -any NP problem can be used as part of a ZKP protocol. - - -\subsubsection*{Using Modular Logarithms for ZKP Protocols} - -While information can be encoded into graph isomorphisms, it -is not the most convenient carrier of information. Clearly it -is much easier to encode information into numbers. Let us look -at zero-knowledge proofs that use numbers as secrets. For this -the underlying NP-problem is to calculate discrete logarithms. -It can be used by choosing public numbers $A$, $B$, $p$, and -private $x$ such that +\noindent An attempt to untangle the web of current technology +for spying on consumers is published in: \begin{center} -$A^x \equiv B\; mod\; p$ -\end{center} - -\noindent holds. The secret Alice tries to keep secret is $x$. -The point of the modular logarithm is that it is very hard -from the public data to calculate $x$ (for large primes). -Now the protocol proceeds in three stages: - -\begin{itemize} -\item {\bf Commitment Stage} - \begin{enumerate} - \item Alice generates $z$ random numbers $r_1, \ldots, r_z$, - all less than $p - 1$. Alice then sends Bob for all - $i = 1,\ldots, z$: - \[ h_i = A^{r_i}\; mod\; p\] - \item Bob generates $z$ random bits, say $b_1,\ldots, b_z$. He can do this - by flipping $z$ times a coin, for example. - \item For each bit $b_i$, Alice sends Bob an $s_i$ where - - \begin{center} - \begin{tabular}{ll} - if $b_i = 0$: & $s_i = r_i$\\ - if $b_i = 1$: & $s_i = (r_i - r_j) \;mod\; (p -1)$\\ - \end{tabular} - \end{center} +\url{http://cyberlaw.stanford.edu/files/publication/files/trackingsurvey12.pdf} +\end{center} - where $r_j$ is the lowest $j$ where $b_j = 1$. - - \end{enumerate} - \end{itemize} - -\noindent For understanding the last step, let $z$ be just 4. -We have four random values $r_i$ chosen by Alice and four -random bits $b_i$ chosen subsequently by Bob, for example - - \begin{center} - \begin{tabular}{lcccc} - $r_i$:\; & 4 & 9 & 1 & 3\\ - $b_i$:\; & 0 & 1 & 0 & 1\\ - & & $\uparrow$ \\ - & & $j$ - \end{tabular} - \end{center} - - \noindent The highlighted column is the lowest where $b_i = - 1$ (counted from the left). That means $r_j = 9$. The reason - for letting Alice choose the random numbers $r_1, \ldots, r_z$ - will become clear shortly. Next is the confirmation - phase where Bob essentially checks whether Alice has sent - him ``correct'' $s_i$ and $h_i$. - -\begin{itemize} -\item {\bf Confirmation Stage} - \begin{enumerate} - \item For each $b_i$ Bob checks whether $s_i$ - conform to the protocol - - \begin{center} - \begin{tabular}{ll} - if $b_i = 0$: & $A^{s_i} \stackrel{?}{\equiv} h_i\;mod\;p$\\ - if $b_i = 1$: & $A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p$\\ - \end{tabular} - \end{center} - \end{enumerate} -\end{itemize} - -\noindent To understand the case for $b_i = 1$, you have -to do the following calculation: +\noindent An article that sheds light on the paradox that +people usually worry about privacy invasions of little +significance, and overlook the privacy invasion that might +cause significant damage: \begin{center} -\begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l} -$A^{s_i}$ & $=$ & $A^{r_i - r_j}$\\ - & $=$ & $A^{r_i} * A^{-r_j}$\\ - & $=$ & $h_{r_i} * h_{r_j}^{-1}\;mod\;p$ -\end{tabular} -\end{center} - -\noindent What is interesting that so far nothing has been -sent about $x$, which is the secret Alice has. Also notice -that Bob does not know $r_j$. He received - -\begin{center} -$r_j - r_j$, $r_m - r_j$, \ldots, $r_p - r_j \;mod \;p - 1$ -\end{center} - -\noindent whenever his corresponding bits were $1$. So Bob -does not know $r_j$ and also does not know any $r_i$ where the -bit was $1$. Information about the $x$ is sent in the next -stage (obviously not revealing $x$). - -\begin{itemize} -\item {\bf Proving Stage} - -\begin{enumerate} -\item Alice proves she knows $x$, the discrete log of $B$, -by sending - -\begin{center} -$s_{z+1} = x - r_j\;mod\;p-1$ +\url{http://www.heinz.cmu.edu/~acquisti/papers/Acquisti-Grossklags-Chapter-Etrics.pdf} \end{center} -\item Bob confirms + +Interesting ideas \begin{center} -$A^{s_{z+1}} \stackrel{?}{\equiv} B * h_j^{-1} \;mod \; p$ +\url{https://adnauseam.io} \end{center} -\end{enumerate} -\end{itemize} - -\noindent To understand the last step, you have to do the trick -again that - -\[A^{s_{z+1}} = A^{x-r_j} = \ldots -\] - -\noindent -which I leave to you. - -Now the question is how can Alice cheat? In order to cheat she -has to coordinate what she sends as $h_i$ in step 1 and $s_i$ -in step 3 of the commitment stage, and also what to send as -$s_{z+1}$ in the proving stage. For the latter of course -Alice does not know $x$, so she just chooses some random -number for $s_{z+1}$ and calculates - -\[A^{s_{z+1}}\] \noindent -and then solves the equation - -\[A^{s_{z+1}} \equiv B * y \;mod\;p\] - -\noindent for $y$. This is easy since no logarithm needs to be -computed. If Alice can guess the $j$ where the first 1 will -appear in Bob's bit vector, then she sends the inverse of $y$ -as $h_j$ and 0 as $s_j$. However, notice that when she -calculates a solution for $y$ she does not know $r_j$. For this she -would need to calculate the modular logarithm - -\[y \equiv A^{r_j}\;mod\;p\] - -\noindent which is hard (see step 1 in the commitment stage). - -Having settled on what $h_j$ should be, now what should Alice -send as the other $h_i$ and other $s_i$? If the $b_i$ happens -to be a 1, then the $h_i$ and other $s_i$ need to satisfy the -test - -\[A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p\] - -\noindent where she has already settled on the value of -$h_j^{-1}$. Lets say she choses $s_i$ at random, then she just -needs to solve - -\[A^{s_i} \equiv z * h_j^{-1} \;mod\; p\] - -\noindent for $z$. Again that is easy, but it does not allow -us to know $r_i$, because then we would again need to solve -a modular logarithm problem. Let us call an $h_i$ which was -solved the easy way as \emph{bogus}. Alice has to produce -bogus $h_i$ for all bits that are going to be $1$ in advance! -This means she has to guess all the bits correctly. (Yes? -I let you think about this.) - -Let us see what happens if she guesses wrongly: Suppose the -bit $b_i = 1$ where she thought she will get a 0. Then she has -already sent an $h_i$ and $h_j$ and now must find an $s_i$ -such that - -\[A^{s_i} \equiv h_i * h_j^{-1} \;mod\; p\] - -\noindent holds. For this remember in calculating $h_i$, she -just chose a random $s_i$. Now she has to send a genuine one. -But this is of course too hard. If she knew the genuine $r_i$ -and $r_j$ for $h_i$ and $h_j$, it would be easy (in this case -$s_i = r_i - r_j$). But she does not. So she will be found -out. If $b_i = 0$, but she thought she will get a 1, then -she has to send a $s_i$ which satisfies - -\[A^{s_i} \equiv h_i\;mod\;p\] - -\noindent Again she does not know $r_i$. So it is a too hard -task and she will be found out again. - -To sum up, in order for Alice to successfully cheat Bob, she -needs to guess \emph{all} bits correctly. She has only a -$\frac{1}{2^z}$ chance of doing this. - -\subsubsection*{Further Reading} - -Make sure you understand what NP problems -are.\footnote{\url{http://en.wikipedia.org/wiki/NP_(complexity)}} -They are the building blocks for zero-knowledge proofs. -Zero-Knowldege proofs are not yet widely used in production -systems, but it is slowly gaining ground. One area of application -where they pop up is crypto currencies (for example Zerocoins -or how to make sure a Bitcoin exchange is solvent without -revealing its assets). - -If you want to brush up on the modular logarithm problem, -the Khan Academy has a nice video: +And a paper that predicts ad-blockers will in the end win over anti-ad-blocking: \begin{center} -\url{https://www.khanacademy.org/video/discrete-logarithm-problem} +\url{http://randomwalker.info/publications/ad-blocking-framework-techniques.pdf} \end{center} + \end{document} -http://blog.cryptographyengineering.com/2014/11/zero-knowledge-proofs-illustrated-primer.html - -http://btravers.weebly.com/uploads/6/7/2/9/6729909/zero_knowledge_technique.pdf -http://zk-ssh.cms.ac/docs/Zero_Knowledge_Prinzipien.pdf -http://www.wisdom.weizmann.ac.il/~oded/PS/zk-tut02v4.ps - -socialist millionares problem -http://en.wikipedia.org/wiki/Socialist_millionaire -http://twistedoakstudios.com/blog/Post3724_explain-it-like-im-five-the-socialist-millionaire-problem-and-secure-multi-party-computation - +http://randomwalker.info/teaching/fall-2012-privacy-technologies/? +http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/ +http://repository.cmu.edu/cgi/viewcontent.cgi?article=1077&context=hcii +https://josephhall.org/papers/NYU-MCC-1303-S2012_privacy_syllabus.pdf +http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf +http://www.cs.cmu.edu/~yuxiangw/docs/Differential%20Privacy.pdf +https://www.youtube.com/watch?v=Gx13lgEudtU +https://www.cs.purdue.edu/homes/ctask/pdfs/CERIAS_Presentation.pdf +http://www.futureofprivacy.org/wp-content/uploads/Differential-Privacy-as-a-Response-to-the-Reidentification-Threat-Klinefelter-and-Chin.pdf +http://www.cis.upenn.edu/~aaroth/courses/slides/Overview.pdf +http://www.cl.cam.ac.uk/~sjm217/papers/tor14design.pdf %%% Local Variables: %%% mode: latex diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho07.pdf Binary file handouts/ho07.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho07.tex --- a/handouts/ho07.tex Sat Sep 23 19:39:53 2017 +0100 +++ b/handouts/ho07.tex Sat Sep 23 19:52:27 2017 +0100 @@ -1,561 +1,1011 @@ \documentclass{article} \usepackage{../style} \usepackage{../graphics} - -\begin{document} -\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015} +\usepackage{../langs} +\usepackage{../data} -%https://www.theguardian.com/technology/2016/oct/04/yahoo-secret-email-program-nsa-fbi -%https://nakedsecurity.sophos.com/2015/11/12/california-collects-owns-and-sells-infants-dna-samples/ -%http://randomwalker.info/teaching/fall-2012-privacy-technologies/? -%https://josephhall.org/papers/NYU-MCC-1303-S2012_privacy_syllabus.pdf -%http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf -%http://www.cs.cmu.edu/~yuxiangw/docs/Differential%20Privacy.pdf -%https://www.youtube.com/watch?v=Gx13lgEudtU -%https://fpf.org/wp-content/uploads/Differential-Privacy-as-a-Response-to-the-Reidentification-Threat-Klinefelter-and-Chin.pdf -%http://research.neustar.biz/2014/09/08/differential-privacy-the-basics/ +%https://crypto.stanford.edu/cs251/ +%https://programmingblockchain.gitbooks.io/programmingblockchain/content/ -%===== -%Tim Greene, Network World, 17 Dec 2015 (via ACM TechNews, 18 Dec 2015) -% -%Massachusetts Institute of Technology (MIT) researchers' experimental -%Vuvuzela messaging system offers more privacy than The Onion Router (Tor) by -%rendering text messages sent through it untraceable. MIT Ph.D. student -%David Lazar says Vuvuzela resists traffic analysis attacks, while Tor -%cannot. The researchers say the system functions no matter how many parties -%are using it to communicate, and it employs encryption and a set of servers -%to conceal whether or not parties are participating in text-based dialogues. -%"Vuvuzela prevents an adversary from learning which pairs of users are -%communicating, as long as just one out of [the] servers is not compromised, -%even for users who continue to use Vuvuzela for years," they note. Vuvuzela -%can support millions of users hosted on commodity servers deployed by a -%single group of users. Instead of anonymizing users, Vuvuzela prevents -%outside observers from differentiating between people sending messages, -%receiving messages, or neither, according to Lazar. The system imposes -%noise on the client-server traffic which cannot be distinguished from actual -%messages, and all communications are triple-wrapped in encryption by three -%servers. "Vuvuzela guarantees privacy as long as one of the servers is -%uncompromised, so using more servers increases security at the cost of -%increased message latency," Lazar notes. -%http://orange.hosting.lsoft.com/trk/click?ref=znwrbbrs9_5-e70bx2d991x066779& +% bug in smart contracts +% https://www.benthamsgaze.org/2016/06/17/smart-contracts-beyond-the-age-of-innocence/ +% http://hackingdistributed.com/2016/06/18/analysis-of-the-dao-exploit/ -%%%% -%% canvas tracking -%%https://freedom-to-tinker.com/blog/englehardt/the-princeton-web-census-a-1-million-site-measurement-and-analysis-of-web-privacy/ +% hard forks +% https://github.com/bitcoin/bips/blob/master/bip-0050.mediawiki -%%% -%% cupit re-identification attack -%% https://nakedsecurity.sophos.com/2016/05/20/published-personal-data-on-70000-okcupid-users-taken-down-after-dmca-order/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+nakedsecurity+%28Naked+Security+-+Sophos%29 - -%Differential privacy -%===================== -%https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/ +% only 25% needed to obtain larger shares of mining +% http://www.cs.cornell.edu/~ie53/publications/btcProcFC.pdf -%Differential privacy, translated from Apple-speak, is the -%statistical science of trying to learn as much as possible -%about a group while learning as little as possible about any -%individual in it. +% re-identification attacks +% https://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf -%As Roth notes when he refers to a “mathematical proof,” -%differential privacy doesn’t merely try to obfuscate or -%“anonymize” users’ data. That anonymization approach, he -%argues, tends to fail. In 2007, for instance, Netflix released -%a large collection of its viewers’ film ratings as part of a -%competition to optimize its recommendations, removing people’s -%names and other identifying details and publishing only their -%Netflix ratings. But researchers soon cross-referenced the -%Netflix data with public review data on IMDB to match up -%similar patterns of recommendations between the sites and add -%names back into Netflix’s supposedly anonymous database. +% bit-coin papers +% https://crypto.stanford.edu/cs251/syllabus.html + +% bit coin talk --- at 20:00 mins +%https://www.usenix.org/conference/lisa16/conference-program/presentation/perlman -%As an example of that last method, Microsoft’s Dwork points to -%the technique in which a survey asks if the respondent has -%ever, say, broken a law. But first, the survey asks them to -%flip a coin. If the result is tails, they should answer -%honestly. If the result is heads, they’re instructed to flip -%the coin again and then answer “yes” for heads or “no” for -%tails. The resulting random noise can be subtracted from the -%results with a bit of algebra, and every respondent is -%protected from punishment if they admitted to lawbreaking. - -%https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf - -% Windows 10 data send back to Microsoft (Cortana) -%Here’s a non-exhaustive list of data sent back: location data, text -%input, voice input, touch input, webpages you visit, and telemetry -%data regarding your general usage of your computer, including which -%programs you run and for how long. - -% Businesses are already using customised pricing online based on -% information they can glean about you. It is hard to know how -% widespread the practice is; companies keep their pricing strategies -% closely guarded and are wary of the bad PR price discrimination -% could pose. However, it is clear that a number of large retailers -% are experimenting with it. Staples, for example, has offered -% discounted prices based on whether rival stores are within 20 miles -% of its customers’ location. Office Depot has admitted to using its -% customers’ browsing history and location to vary its range of offers -% and products. A 2014 study from Northeastern University found -% evidence of “steering” or differential pricing at four out of 10 -% general merchandise websites and five out of five travel -% websites. (Steering is when a company doesn’t give you a customised -% price, but points you towards more expensive options if it thinks -% you will pay more.) The online travel company Orbitz raised -% headlines in 2012 when it emerged that the firm was pointing Mac -% users towards higher-priced hotel rooms than PC users. +% In fact, far from freeing people from the oppression of the state, +% blockchains perversely promise the perfect tool for a fully +% auditable, tax compliant, cashless society. Similarly, the belief it +% is an anonymous digital cash has quickly vanished and we are now +% seeing a large number of analytics companies, set-up specifically to +% work with law enforcement agencies, to police this new parallel +% financial system. +% +% But today blockchain is riddled with +% contradictions and misunderstandings. Most of its problems are very +% fixable, if you want to fix them -%%% government will overwrite your wishes if it is annoymous -%% https://www.lightbluetouchpaper.org/2016/12/05/government-u-turn-on-health-privacy/ - -%% corporate surveilance / privacy - report and CC3C talk -%% http://crackedlabs.org/en/networksofcontrol -%% https://media.ccc.de/v/33c3-8414-corporate_surveillance_digital_tracking_big_data_privacy#video&t=2933 - -\section*{Handout 6 (Privacy)} +% history of bitcoins +% https://futurism.com/images/this-week-in-tech-jan-15-22-2016/ -The first motor car was invented around 1886. For ten years, -until 1896, the law in the UK (and elsewhere) required a -person to walk in front of any moving car waving a red flag. -Cars were such a novelty that most people did not know what to -make of them. The person with the red flag was intended to -warn the public, for example horse owners, about the impending -novelty---a car. In my humble opinion, we are at the same -stage of development with privacy. Nobody really knows what it -is about or what it is good for. All seems very hazy. There -are a few laws (e.g.~cookie law, right-to-be-forgotten law) -which address problems with privacy, but even if they are well -intentioned, they either back-fire or are already obsolete -because of newer technologies. The result is that the world of -``privacy'' looks a little bit like the old Wild -West---lawless and mythical. +\begin{document} +\fnote{\copyright{} Christian Urban, 2014, 2015} -We would have hoped that after Snowden, Western governments -would be a bit more sensitive and enlightned about the topic -of privacy, but this is far from the truth. Ross Anderson -wrote the following in his blog\footnote{\url{https://www.lightbluetouchpaper.org/2016/02/11/report-on-the-ip-bill/}} about the approach taken in -the US to lessons learned from the Snowden leaks and contrasts -this with the new snooping bill that is considered in the UK -parliament: +\section*{Handout 7 (Bitcoins)} -\begin{quote}\it -``The comparison with the USA is stark. There, all three -branches of government realised they'd gone too far after -Snowden. President Obama set up the NSA review group, and -implemented most of its recommendations by executive order; -the judiciary made changes to the procedures of the FISA -Court; and Congress failed to renew the data retention -provisions in the Patriot Act (aided by the judiciary). Yet -here in Britain the response is just to take Henry VIII powers -to legalise all the illegal things that GCHQ had been up to, -and hope that the European courts won't strike the law down -yet again.'' -\end{quote} +In my opinion Bitcoins are an elaborate Ponzi +scheme\footnote{\url{http://en.wikipedia.org/wiki/Ponzi_scheme}}---still +the ideas behind them are really beautiful and not too +difficult to understand. Since many colourful claims about +Bitcoins float around in the mainstream and not-so-mainstream +media, it will be instructive to re-examine such claims from a +more technically informed vantage point. For example, it is +often claimed that Bitcoins are anonymous and free from any +potential government meddling. It turns out that the first +claim ignores a lot of research in de-anonymising social +networks, and the second underestimates the persuasive means a +government has at its disposal. -\noindent Unfortunately, also big organisations besides -governments seem to take an unenlightened approach to privacy. -For example, UCAS, a charity set up to help students with -applying to universities in the UK, has a commercial unit that -happily sells your email addresses to anybody who forks out -enough money for bombarding you with spam. Yes, you can opt -out very often from such ``schemes'', but in case of UCAS any -opt-out will limit also legit emails you might actually be -interested in.\footnote{The main objectionable point, in my -opinion, is that the \emph{charity} everybody has to use for -HE applications has actually very honourable goals -(e.g.~assist applicants in gaining access to universities), -but the small print (or better the link ``About us'') reveals -they set up their organisation so that they can also -shamelessly sell the email addresses they ``harvest''. -Everything is of course very legal\ldots{}ethical?\ldots{}well -that is in the eye of the beholder. See: +There are a lot of articles, blogposts, research papers +etc.~available about Bitcoins. Below I will follow closely the +very readable explanations from + +\begin{center} +\url{http://www.michaelnielsen.org/ddi/how-the-bitcoin-protocol-actually-works/} \;\;and\smallskip\\ +\url{http://www.imponderablethings.com/2013/07/how-bitcoin-works-under-hood.html} +\end{center} -\url{http://www.ucas.com/about-us/inside-ucas/advertising-opportunities} -or -\url{http://www.theguardian.com/uk-news/2014/mar/12/ucas-sells-marketing-access-student-data-advertisers}} +\noindent The latter also contains a link to a nice youtube +video about the technical details behind Bitcoins. I will +also use some of their pictures. -Another example: Verizon, an ISP who is supposed to provide -you just with connectivity, has found a ``nice'' side-business -too: When you have enabled all privacy guards in your browser -(the few you have at your disposal), Verizon happily adds a -kind of cookie to your -HTTP-requests.\footnote{\url{http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/}} -As shown in the picture below, this cookie will be sent to -every web-site you visit. The web-sites then can forward the -cookie to advertisers who in turn pay Verizon to tell them -everything they want to know about the person who just made -this request, that is you. - +Let us start with the question who invented Bitcoins? You +could not make up the answer, but we actually do not know who +the inventor is. All we know is that the first paper + \begin{center} -\includegraphics[scale=0.16]{../pics/verizon.png} +\url{https://bitcoin.org/bitcoin.pdf} \end{center} -\noindent How disgusting! Even worse, Verizon is not known for -being the cheapest ISP on the planet (completely the -contrary), and also not known for providing the fastest -possible speeds, but rather for being among the few ISPs in -the US with a quasi-monopolistic ``market distribution''. +\noindent is signed by Satoshi Nakamoto, which however is +likely only a pen name. There is a lot of speculation who +could be the inventor, or inventors, but we simply do not +know. This part of Bitcoins is definitely anonymous so far. +The paper above is from the end of 2008; the first Bitcoin +transaction was made in January 2009. The rules in Bitcoin are +set up so that there will only ever be 21 Million Bitcoins +with the maximum reached around the year 2140. Currently there +are already 11 Million Bitcoins in `existence'. Contrast this +with traditional fiat currencies where money can be printed +almost at will. The smallest unit of a Bitcoin is called a +Satoshi, which is the $10^{-8}$th part of a Bitcoin. Remember +a Penny is the $10^{-2}$th part of a Pound. +The two main cryptographic building blocks of Bitcoins are +cryptographic hashing functions (SHA-256) and public-private +keys using the elliptic-curve encryption scheme for digital +signatures. Hashes are used to generate `fingerprints' of data +that ensure integrity (absence of tampering). Public-private +keys are used for signatures. For example sending a message, +say $msg$, together with the encrypted version + +\[ +msg, \{msg\}_{K^{priv}} +\] + +\noindent allows everybody with access to the corresponding +public key $K^{pub}$ to verify that the message came from the +person who knew the private key. Signatures are used in +Bitcoins for verifying the addresses where the Bitcoins are +sent from. Addresses in Bitcoins are essentially the public +keys. There are $2^{160}$ possible addresses, which is such a +vast amount that there is not even a check for duplicates, or +already used addresses. If you start with a random number to +generate a public-private key pair it is very unlikely that +you step on somebody else's shoes. Compare this with the +email-addresses you wanted to register with, say +Gmail, but which are always already taken. -Well, we could go on and on\ldots{}and that has not even -started us yet with all the naughty things NSA \& Friends are -up to. Why does privacy actually matter? Nobody, I think, has -a conclusive answer to this question yet. Maybe the following -four notions help with clarifying the overall picture -somewhat: +One major difference between Bitcoins and traditional banking +is that you do not have a place, or few places, that record the +balance on your account. Traditional banking involves a +central ledger which specifies the current balance in each +account, for example + +\begin{center} +\begin{tabular}{l|r} +account owner & balance\\\hline +Alice & \pounds{10.01}\\ +Bob & \pounds{4.99}\\ +Charlie & -\pounds{1.23}\\ +Eve & \pounds{0.00} +\end{tabular} +\end{center} + +\noindent Bitcoins work differently in that there is no such +central ledger, but instead a public record of all +transactions ever made. This means spending money corresponds +to sending messages of the (oversimplified) form + +\begin{equation} +\{\text{I, Alice, am giving Bob one Bitcoin.}\}_{K^{priv}_{Alice}} +\end{equation} + +\noindent These messages, called transactions, are the only +data that is ever stored in the Bitcoin system (we will come +to the precise details later on). The transactions are +encrypted with Alice's private key so that everybody, +including Bob, can use Alice's public key $K^{pub}_{Alice}$ to +verify that this message came really from Alice, or more +precisely from the person who knows $K^{priv}_{Alice}$. + +The problem with such messages in a distributed system is that +what happens if Bob receives 10, say, of these transactions? +Did Alice intend to send him 10 Bitcoins, or did the message +get duplicated by for example an attacker re-playing a sniffed +message? What is needed is a kind of serial number for such +transactions. This means transaction messages shoul look more like + +\begin{center} +$\{\text{I, Alice, am giving Bob Bitcoin \#1234567.}\}_{K^{priv}_{Alice}}$ +\end{center} + +\noindent There are two difficulties, however, that need to be +solved with serial numbers. One is who is assigning serial +numbers to Bitcoins and also how can Bob verify that Alice +actually owns this Bitcoin to pay him? In a system with a bank +as trusted third-party, Bob could do the following: \begin{itemize} -\item \textbf{Secrecy} is the mechanism used to limit the - number of principals with access to information (e.g., - cryptography or access controls). For example I better - keep my password secret, otherwise people from the wrong - side of the law might impersonate me. - -\item \textbf{Confidentiality} is the obligation to protect - the secrets of other people or organisations (secrecy - for the benefit of an organisation). For example as a - staff member at King's I have access to data, even - private data, I am allowed to use in my work but not - allowed to disclose to anyone else. - -\item \textbf{Anonymity} is the ability to leave no evidence of - an activity (e.g., sharing a secret). This is not equal - with privacy---anonymity is required in many - circumstances, for example for whistle-blowers, - voting, exam marking and so on. - -\item \textbf{Privacy} is the ability or right to protect your - personal secrets (secrecy for the benefit of an - individual). For example, in a job interview, I might - not like to disclose that I am pregnant, if I were a - woman, or that I am a father. Lest they might not hire - me. Similarly, I might not like to disclose my location - data, because thieves might break into my house if they - know I am away at work. Privacy is essentially - everything which ``shouldn't be anybody's business''. - +\item Bob asks the bank whether the Bitcoin with that serial + number belongs to Alice and Alice hasn't already spent + this Bitcoin. +\item If yes, then Bob tells the bank he accepts this Bitcoin. + The bank updates the records to show that the Bitcoin + with that serial number is now in Bob’s possession and + no longer belongs to Alice. \end{itemize} -\noindent While this might provide us with some rough -definitions, the problem with privacy is that it is an -extremely fine line what should stay private and what should -not. For example, since I am working in academia, I am every -so often very happy to be a digital exhibitionist: I am very -happy to disclose all `trivia' related to my work on my -personal web-page. This is a kind of bragging that is normal -in academia (at least in the field of CS), even expected if -you look for a job. I am even happy that Google maintains a -profile about all my academic papers and their citations. +\noindent But for this banks would need to be trusted and +would also be an easy target for any government interference, +for example. Think of the early days of music sharing where +the company Napster was the trusted third-party but also the single point of ``failure'' which +was taken offline by law enforcement. Bitcoins is more like a +system such as BitTorrent without a single central entity that +can be taken offline.\footnote{There is some Bitcoin +infrastructure that is not so immune from being taken offline: +for example Bitcoin exchanges, HQs of Bitcoin mining pools, +Bitcoin developers and so on.} + +Bitcoins solve the problem of not being able to rely on a bank +by making everybody the ``bank''. Everybody who cares can have +the entire transaction history starting with the first +transaction made in January 2009. This history of transactions +is called the \emph{blockchain}. Bob, for example, can use his +copy of the blockchain for determining whether Alice owned the +Bitcoin he received, and if she did, he transmits the message +that he owns it now to every other participant on the Bitcoin +network. An illustration of a three-block segment of the +blockchain is (simplified) as follows + +\begin{equation} +\includegraphics[scale=0.4]{../pics/bitcoinblockchain0.png} +\label{segment} +\end{equation} + +\noindent The chain grows with time. Each block contains a +list of individual transactions, written txn in the picture +above, and also a reference to the previous block, written +prev. The data in a block (txn's and prev) is hashed so that +the reference and transactions in them cannot be tampered +with. This hash is also the unique serial number of each +block. Since this previous-block-reference is also part of the +hash, the whole chain is robust against tampering. I let you +think why this is the case?\ldots{}But does it actually +eliminate all possibilities of fraud? + +We can check the consistency of the blockchain by checking +whether all the references and hashes are correctly recorded. +I have not tried it myself, but it is said that with the +current amount of data (appr.~12GB) it takes roughly a day to +check the consistency of the blockchain on a normal computer. +Fortunately this ``extended'' consistency check usually only +needs to be done once. Afterwards the blockchain only needs to +be updated consistently. + +Recall I wrote earlier that Bitcoins do not maintain a ledger, +which lists all the current balances in each account. Instead +only transactions are recorded. While a current balance of an +account is not immediately available, it is possible to +extract from the blockchain a transaction graph that looks +like the picture shown in Figure~\ref{txngraph}. Each +rectangle represents a single transaction. Take for example +the rightmost lower transaction from Charles to Emily. This +transaction has as receiver the address of Emily and as the +sender the address of Charles. In this way no Bitcoins can +appear out of thin air (we will discuss later how Bitcoins are +actually generated). If Charles did not have a transaction of +at least the amount he wants to give Emily to his name +(i.e.~send to an address with his public-private key) then +there is no way he can make a payment to Emily. Equally, if +now Emily wants to pay for a coffee, say, with the Bitcoin she +received from Charles she can essentially only forward the +message she received. The only slight complication with this +setup in Bitcoins is that ``incoming'' Bitcoins can be +combined in a transaction and ``outgoing'' Bitcoins can be +split. For example in the leftmost upper transactions in +Figure~\ref{txngraph}, Fred makes a payment to Alice. But this +payment (or transaction) combines the Bitcoins that were send +by Jane to Fred and also by Juan to Fred. This allows you to +``consolidate'' your funds: if it were only possible to split +transactions, then the amounts would get smaller and smaller. + +In Bitcoins you have the ability to both combine incoming +transactions, but also to split outgoing transactions to +potentially more than one receiver. The latter is also needed. +Consider again the rightmost transactions in +Figure~\ref{txngraph} and suppose Alice is a coffeeshop owner +selling coffees for 1 Bitcoin. Charles received a transaction +from Zack over 5 Bitcoins, say. How does Charles pay for the +coffee? There is no explicit notion of \emph{change} in the +Bitcoin system. What Charles has to do instead is to make one +single transaction with 1 Bitcoin to Alice and with 4 Bitcoins +going back to himself, which then Charles can use to give to +Emily, for example. + +\begin{figure}[t] +\begin{center} +\includegraphics[scale=0.4]{../pics/blockchain.png} +\end{center} +\caption{Transaction graph that is implicitly recorded in the +public blockchain.\label{txngraph}} +\end{figure} + +Let us consider another example. Suppose Emily received 4 +Bitcoins from Charles and independently received another +transaction (not shown in the picture) that sends 6 Bitcoins +to her. If she now wants to buy a coffee from Alice for 1 +Bitcoin, she has two possibilities: She could just forward the +transaction from Charles over 4 Bitcoins to Alice split in +such a way that Alice receives 1 Bitcoin and Emily sends the +remaining 3 Bitcoins back to herself. In this case she would +now be in the possession of two unspend Bitcoin transactions, +one over 3 Bitcoins and the independent one over 6 Bitcoins. +Or, Emily could combine both transactions (one over 4 Bitcoins +from Charles and the independent one over 6 Bitcoins) and then +split this amount with 1 Bitcoin going to Alice and 9 Bitcoins +going back to herself. + +I think this is a good time for you to pause to let this +concept of transactions to really sink in\ldots{}You should +come to the conclusion that there is really no need for a central ledger and no +need for an account balance as familiar from traditional +banking. The closest what Bitcoin has to offer for the notion +of a balance in a bank account are the unspend transactions +that a person (more precisely a public-private key address) +received. That means transactions that can still be forwarded. + +After the pause also consider the fact that whatever +transaction is recorded in the blockchain will be in the +``historical record'' for the Bitcoin system. If a transaction +says 1 Bitcoin goes from address $A$ to address $B$, then this +is what will be---$B$ has then the possibility to spend the +corresponding Bitcoins, whether the transaction was done +fraudulently or not. There is no exception to this rule. +Interestingly this is also how Bitcoins can get lost: One +possibility is that you send Bitcoins to an address for which +nobody has generated a private key, for example because of a +typo in the address field---bad luck for fat +fingers\footnote{\url{http://en.wikipedia.org/wiki/Typographical_error}} +in the Bitcoin system. The reason is that nobody has a private +key for this erroneous address and consequently cannot forward +the transaction anymore. Another possibility is that you +forget your private key and you had messages forwarded to the +corresponding public key. Also in this case bad luck: you will +never be able to forward this message again, because you will +not be able to form a valid message that sends this to +somebody else (we will see the details of this later). But +this is also a way how you can get robbed of your Bitcoins. By +old-fashioned hacking-into-a-computer crime, for example, an +attacker might get hold of your private key and then quickly +forwards the Bitcoins that are in your name to an address the +attacker controls. You will never again have access to these +Bitcoins, because for the Bitcoin system they are assumed to +be spent. And remember with Bitcoins you cannot appeal to any +higher authority. Once the Bitcoins are gone, they are gone. +This is much different in traditional banking where at least +you can try to harass the bank to roll back the transaction. + +This brings us to back to problem of double spend. Suppose Bob +is a merchant. How can he make sure that Alice does not cheat +him? She could for example send a transaction to Bob. But also +forward the ``same'' transaction to Charlie, or even herself. +If Alice manages to get the second transaction into the +blockchain, Bob will be cheated out of his money. The problem +in such conflicting situations is how should the network +update their blockchain? You might end up with a picture like +this + +\begin{center} +\includegraphics[scale=0.4]{../pics/bitcoindisagreement.png} +\end{center} -On the other hand I would be very irritated if anybody I do -not know had a too close look on my private live---it -shouldn't be anybody's business. The reason is that knowledge -about my private life can often be used against me. As mentioned -above, public location data might mean I get robbed. If -supermarkets build a profile of my shopping habits, they will -use it to \emph{their} advantage---surely not to \emph{my} -advantage. Also whatever might be collected about my life will -always be an incomplete, or even misleading, picture. For -example I am pretty sure my creditworthiness score was -temporarily(?) destroyed by not having a regular income in -this country (before coming to King's I worked in Munich for -five years). To correct such incomplete or flawed credit -history data there is, since recently, a law that allows you -to check what information is held about you for determining -your creditworthiness. But this concerns only a very small -part of the data that is held about me/you. Also -what about cases where data is wrong or outdated (but do we -need a right-to be forgotten). +\noindent where Alice convinced some part of the ``world'' +that she is still the owner of the Bitcoin and some other part +of the ``world'' thinks it's Bob's. How should such a +disagreement be resolved? This is actually the main hurdle +where Bitcoin really innovated. The answer is that Bob needs +to convince ``enough'' people on the network that the +transaction from Alice to him is legit. + +What does, however, ``enough'' mean in a distributed system? +If Alice sets up a network of a billion, say, puppy identities +and whenever Bob tries to convince, or validate, that he is +the rightful owner of the Bitcoin, then the puppy identities +agree. Bob would then have no reason to not give Alice her +coffee. But behind his back she has convinced everybody else +on the network that she is still the rightful owner of the +Bitcoin. After being outvoted, Bob would be a tad peeved. + +The reflex reaction to such a situation would be to make the +process of validating a transaction as cheap as possible. The +intention is that Bob will easily get enough peers to agree with him +that he is the rightful owner. But such a solution has always +the limitation of Alice setting up an even bigger network of +puppy identities. The really cool idea of Bitcoin is to go +into the other direction of making the process of transaction +validation (artificially) as expensive as possible, but reward +people for helping with the validation. This is really a novel +and counterintuitive idea that makes the whole system of +Bitcoins work so beautifully. + +\subsubsection*{Proof-of-Work Puzzles} + +In order to make the process of transaction validation +difficult, Bitcoin uses a kind of puzzle. Solving the puzzles +is called \emph{Bitcoin mining}, where whoever solves a puzzle +will be awarded some Bitcoins. At the beginning this was 50 +Bitcoins, but the rules of Bitcoin are set up such that this +amount halves every 210,000 transactions or so. Currently you +will be awarded 25 Bitcoins for solving a puzzle. Because the +amount will halve again and then later again and again, around +the year 2140 it will go below the level of 1 Satoshi. In that +event no new Bitcoins will ever be created again and the +amount of Bitcoins stays fixed. There will be still an +incentive to help with validating transactions, because there +is the possibility in Bitcoins to offer a transaction fee to +whoever solves a puzzle. At the moment this fee is usually set +to 0, since the incentive for miners is the 25 Bitcoins that +are currently awarded for solving puzzles. + +What do the puzzles that miners have to solve look like? The +puzzles can be illustrated roughly as follows: Given a string, +say \code{"Hello, world!"}, what is the salt so that the hash +starts with a long run of zeros? Let us look at a concrete +example. Recall that Bitcoins use the hash-function SHA-256. +Suppose we call this hash function \code{h}, then we could try +the salt \code{0} as follows: + +\begin{quote} +\code{h("Hello, world!0") =}\\ +\mbox{}\quad\footnotesize\pcode{1312af178c253f84028d480a6adc1e25e81caa44c749ec81976192e2ec934c64} +\end{quote} + +\noindent OK this does not have any zeros at all. We could +next try the salt \code{1}: + +\begin{quote} +\code{h("Hello, world!1") =}\\ +\mbox{}\quad\footnotesize\pcode{e9afc424b79e4f6ab42d99c81156d3a17228d6e1eef4139be78e948a9332a7d8} +\end{quote} + +\noindent Again this hash value does not contain any leading +zeros. We could now try out every salt until we reach + +\begin{quote} +\code{h("Hello, world!4250") =}\\ +\mbox{}\quad\footnotesize\pcode{0000c3af42fc31103f1fdc0151fa747ff87349a4714df7cc52ea464e12dcd4e9} +\end{quote} + +\noindent where we have four leading zeros. If four zeros are +enough, then the puzzle would be solved with this salt. The +point is that we can very quickly check whether a salt solves +a puzzle, but it is hard to find one. Latest research suggest +it is an NP-problem. If we want the output hash value to begin +with 10 zeroes, say, then we will, on average, need to try +$16^{10} \approx 10^{12}$ different salts before we find a +suitable one. + +In Bitcoins the puzzles are not solved according to how many +leading zeros a hash-value has, but rather whether it is below +a \emph{target}. The hardness of the puzzle can actually be +controlled by changing the target according to the available +computational power available. I think the adjustment of the +hardness of the problems is done every 2060 blocks +(appr.~every two weeks). The aim of the adjustment is that on +average the Bitcoin network will most likely solve a puzzle +within 10 Minutes. + +\begin{center} +\includegraphics[scale=0.37]{../pics/blockchainsolving.png} +\end{center} + +\noindent It could be solved quicker, but equally it could +take longer, but on average after 10 Minutes somebody on the +network will have found a solution. + +Remember that the puzzles are a kind of proof-of-work that +make the validation of transactions artificially expensive. +Consider the following picture with a blockchain and some +unconfirmed transactions. + +\begin{equation} +\includegraphics[scale=0.38]{../pics/bitcoin_unconfirmed.png} +\label{unconfirmed} +\end{equation} + +\noindent The puzzle is stated as follows: There are some +unconfirmed transactions. Choosing some of them, the miner +(i.e.~the person/computer that tries to solve a puzzle) will +form a putative block to be added to the blockchain. This +putative block will contain the transactions and the reference +to the previous block. The serial number of such a block is +simply the hash of all the data. The puzzle can then be stated +as the ``string'' corresponding to the block and which salt is +needed in order to have the hashed value being below the +target. Other miners will choose different transactions and +therefore work on a slightly different putative block and +puzzle. + +The intention of the proof-of-work puzzle is that the +blockchain is at every given moment linearly ordered, see the +picture shown in \eqref{unconfirmed}. If we don’t have such a +linear ordering at any given moment then it may not be clear +who owns which Bitcoins. Assume a miner David is lucky and +finds a suitable salt to confirm some transactions. Should he +celebrate? Not yet. Typically the blockchain will look as +follows + +\begin{center} +\includegraphics[scale=0.65]{../pics/block_chain1.png} +\end{center} + +\noindent But every so often there will be a fork + +\begin{center} +\includegraphics[scale=0.65]{../pics/block_chain_fork.png} +\end{center} + +\noindent What should be done in this case? Well, the tie is +broken if another block is solved, like so: + +\begin{center} +\includegraphics[scale=0.4]{../pics/bitcoin_blockchain_branches.png} +\end{center} + +\noindent The rule in Bitcoins is: If a fork occurs, people on +the network keep track of all forks (they can see). But at any +given time, miners only work to extend whichever fork is +longest in their copy of the block chain. Why should miners +work on the longest fork? Well their incentive is to mine +Bitcoins. If somebody else already solved a puzzle, then it +makes more sense to work on a new puzzle and obtain the +Bitcoins for solving that puzzle, rather than waste efforts on +a fork that is shorter and therefore less likely to be +``accepted''. Note that whoever solved a puzzle on the +``loosing'' fork will actually not get any Bitcoins as reward. +Tough luck. -To see how private matter can lead really to the wrong -conclusions, take the example of Stephen Hawking: When he was -diagnosed with his disease, he was given a life expectancy of -two years. If employers would know about such problems, would -they have employed Hawking? Now, he is enjoying his 70+ -birthday. Clearly personal medical data needs to stay private. - -To cut a long story short, I let you ponder about the two -statements which are often voiced in discussions about privacy: +\subsubsection*{Alice against the Rest of the World} -\begin{itemize} -\item \textit{``You have zero privacy anyway. Get over -it.''}\\ -\mbox{}\hfill{}{\small{}(by Scott Mcnealy, former CEO of Sun)} +Let us see how the blockchain and the proof-of-work puzzles +avoid the problem of double spend. If Alice wants to cheat +Bob, she would need to pull off the following ploy: -\item \textit{``If you have nothing to hide, you have nothing -to fear.''} -\end{itemize} - -\noindent If you like to watch a movie which has this topic as -its main focus I recommend \emph{Gattaca} from -1997.\footnote{\url{http://www.imdb.com/title/tt0119177/}} If -you want to read up on this topic, I can recommend the -following article that appeared in 2011 in the Chronicle of -Higher Education: +\begin{center} +\includegraphics[scale=0.4]{../pics/bitcoin_blockchain_double_spend.png} +\end{center} -\begin{center} -\url{http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/} -\end{center} +\noindent Alice makes a transaction to Bob for paying, for +example, for an online order. This transaction is confirmed, +or validated, in block 2. Bob ships the goods around block 4. +In this moment, Alice needs to get into action and try to +validate the fraudulent transaction to herself instead. At +this moment she is in a race against all the computing power +of the ``rest of the world''. Because the incentive of the +rest of the world is to work on the longest chain, that is the +one with the transaction from Alice to Bob: -\noindent Funnily, or maybe not so funnily, the author of this -article carefully tries to construct an argument that does not -only attack the nothing-to-hide statement in cases where -governments \& co collect people's deepest secrets, or -pictures of people's naked bodies, but an argument that -applies also in cases where governments ``only'' collect data -relevant to, say, preventing terrorism. The fun is of course -that in 2011 we could just not imagine that respected -governments would do such infantile things as intercepting -people's nude photos. Well, since Snowden we know some people -at the NSA did exactly that and then shared such photos among -colleagues as ``fringe benefit''. +\begin{center} +\includegraphics[scale=0.4]{../pics/bitcoin_doublespend_blockchain_race.png} +\end{center} - -\subsubsection*{Re-Identification Attacks} +\noindent As shown in the picture she has to solve the puzzles +2a to 5a one after the other, because the hash of a block is +determined via the reference by all the data in the previous +block. She might be very lucky to solve one puzzle for a block +before the rest of the world, but to be lucky many times is +very unlikely. This principle of having to race against the +rest of the world avoids the ploy of double spend. -Apart from philosophical musings, there are fortunately also -some real technical problems with privacy. The problem I want -to focus on in this handout is how to safely disclose datasets -containing potentially very private data, say health records. -What can go wrong with such disclosures can be illustrated -with four well-known examples: +In order to raise the bar for Alice even further, merchants +accepting Bitcoins use the following rule of thumb: A +transaction is ``confirmed'' if \begin{itemize} -\item In 2006, a then young company called Netflix offered a 1 - Mio \$ prize to anybody who could improve their movie - rating algorithm. For this they disclosed a dataset - containing 10\% of all Netflix users at the time - (appr.~500K). They removed names, but included numerical - ratings of movies as well as times when ratings were - uploaded. Though some information was perturbed (i.e., - slightly modified). - - Two researchers had a closer look at this anonymised - data and compared it with public data available from the - International Movie Database (IMDb). They found that - 98\% of the entries could be re-identified in the - Netflix dataset: either by their ratings or by the dates - the ratings were uploaded. The result was a class-action - suit against Netflix, which was only recently resolved - involving a lot of money. +\item[(1)] it is part of a block in the longest fork, and +\item[(2)] at least 5 blocks follow it in the longest fork. In + this case we say that the transaction has 6 + confirmations. +\end{itemize} + +\noindent A simple calculation shows that this amount of +confirmations can take up to 1 hour and more. While this seems +excessively long, from the merchant's point of view it is not +that long at all. For this recall that ordinary creditcards +can have their transactions been rolled-back for 6 months or +so. The point however is that the odds for Alice being able to +cheat are very low, unless she can muster more than 50\% of +the world Bitcoin mining capacity. In this case she could +out-race the rest of the world. The point is however that +amassing such an amount of computing power is practically +impossible for a single person or even a moderately large +group. -\item In the 1990ies, medical datasets were often made public - for research purposes. This was done in anonymised form - with names removed, but birth dates, gender and ZIP-code - were retained. In one case where such data about - hospital visits of state employees in Massachusetts was - made public, the then governor assured the public that - the released dataset protected patient privacy by - deleting identifiers. - - A graduate student could not resist cross-referencing - public voter data with the released data that still - included birth dates, gender and ZIP-code. The result - was that she could send the governor his own hospital - record. It turns out that birth dates, gender and - ZIP-code uniquely identify 87\% of people in the US. - This work resulted in a number of laws prescribing which - private data cannot be released in such datasets. - -\item In 2006, AOL published 20 million Web search queries - collected from 650,000 users (names had been deleted). - This was again done for research purposes. However, - within days an old lady, Thelma Arnold, from Lilburn, - Georgia, (11,596 inhabitants) was identified as user - No.~4417749 in this dataset. It turned out that search - engine queries are deep windows into people's private - lives. - -\item Genome-Wide Association Studies (GWAS) was a public - database of gene-frequency studies linked to diseases. - It would essentially record that people who have a - disease, say diabetes, have also certain genes. In order - to maintain privacy, the dataset would only include - aggregate information. In case of DNA data this - aggregation was achieved by mixing the DNA of many - individuals (having a disease) into a single solution. - Then this mixture was sequenced and included in the - dataset. The idea was that the aggregate information - would still be helpful to researchers, but would protect - the DNA data of individuals. - - In 2007 a forensic computer scientist showed that - individuals can still be identified. For this he used - the DNA data from a comparison group (people from the - general public) and ``subtracted'' this data from the - published data. He was left with data that included all - ``special'' DNA-markers of the individuals present in - the original mixture. He essentially deleted the - ``background noise'' in the published data. The problem - with DNA data is that it is of such a high resolution - that even if the mixture contained maybe 100 - individuals, you can with current technology detect - whether an individual was included in the mixture or - not. - - This result changed completely how DNA data is nowadays - published for research purposes. After the success of - the human-genome project with a very open culture of - exchanging data, it became much more difficult to - anonymise data so that patient's privacy is preserved. - The public GWAS database was taken offline in 2008. - +Connected with the 6-confirmation rule is an interesting +phenomenon. On average, it would take several years for a +typical computer to solve a proof-of-work puzzle, so an +individual’s chance of ever solving one before the rest of the +world, which typically takes only 10 minutes, is negligibly +low. Therefore many people join groups called \emph{mining +pools} that collectively work to solve blocks, and distribute +rewards based on work contributed. These mining pools act +somewhat like lottery pools among co-workers, except that some +of these pools are quite large, and comprise more than 20\% of +all the computers in the network. It is said that BTCC, a +large mining pool, has limited its number of members in order +to not solve more than 6 blocks in a row. Otherwise this would +undermine the trust in Bitcoins, which is also not in the +interest of BTCC, I guess. Some statistics on mining pools can +be seen at + +\begin{center} +\url{https://blockchain.info/pools} +\end{center} + +\noindent Here is an interesting problem: You are part of a lottery +pool, if you chip in some of the money to buy a lottery ticket. In +this setting it is clear when you are in or outside of the pool. But +how do you make sure people work hard in a mining pool in order to +justify a fraction of any reward? If evil me had its way, I would just +claim I do work and then sit back and relax. Or even if I do some work +for a mining pool and I happen to find a correct salt, I would keep it +secret and submit it to the bitcoin network on the ``side''. Actually, +the idea of mining pools has opened up a full can of interesting +problems. + + + +\subsubsection*{Bitcoins for Real} + +Let us now turn to the nitty gritty details. As a participant +in the Bitcoin network you need to generate and store a +public-private key pair. The public key you need to advertise +in order to receive payments (transactions). The private key +needs to be securely stored. For this there seem to be three +possibilities + +\begin{itemize} +\item an electronic wallet on your computer +\item a cloud-based storage (offered by some Bitcoin services) +\item paper-based \end{itemize} -\noindent There are many lessons that can be learned from -these examples. One is that when making datasets public in -anonymised form, you want to achieve \emph{forward privacy}. -This means, no matter what other data that is also available -or will be released later, the data in the original dataset -does not compromise an individual's privacy. This principle -was violated by the availability of ``outside data'' in the -Netflix and governor of Massachusetts cases. The additional -data permitted a re-identification of individuals in the -dataset. In case of GWAS a new technique of re-identification -compromised the privacy of people in the dataset. The case of -the AOL dataset shows clearly how incomplete such data can be: -Although the queries uniquely identified the older lady, she -also looked up diseases that her friends had, which had -nothing to do with her. Any rational analysis of her query -data must therefore have concluded, the lady is on her -death bed, while she was actually very much alive and kicking. +\noindent The first two options of course offer convenience +for making and receiving transactions. But given the nature of +the private keys and how much security relies on them (recall +if somebody gets hold of it, your Bitcoins are quickly lost +forever) I would opt for the third option for anything except +for trivial amounts of Bitcoins. As we have seen earlier in +the course, securing a computer system that it can withstand a +targeted breakin is still very much an unsolved problem. + +An interesting fact with Bitcoin keys is that there is no +check for duplicate addresses. This means when generating a +public-private key, you should really start with a carefully +chosen random number such that there is really no chance to +step on somebody's feet in the $2^{160}$ space of +possibilities. Again if you share an address with somebody +else, he or she has access to all your unspend transactions. +The absence of such a check is easily explained: How would one +do this in a distributed system? The answer you can't. It is +possible to do some sanity check of addresses that are already +used in the blockchain, but this is not a fail-proof method. +One really has to trust on the enormity of the $2^{160}$ +space for addresses. + +Let us now look at the concrete data that is stored in an transaction +message: + +\lstinputlisting[language=Scala]{../slides/msg} + +\noindent The hash in Line 1 is the hash of all the data that +follows. It is a kind of serial number for the transaction. +Line 2 contains a version number in case there are some +incompatible changes to be made. Lines 3 and 4 specify how +many incoming transactions are combined and how many outgoing +transactions there are. In our example there are one for each. +Line 5 specifies a lock time for when the transaction is +supposed to become active---this is usually set to 0 to become +active immediately. Line 6 specifies the size of the message; +it has nothing to do with the Bitcoins that are transferred. +Lines 7 to 11 specify where the Bitcoins in the transaction +are coming from. The has in line 9 specifies the incoming +transaction and the \pcode{n} in Line 10 specifies which +output of the transaction is referred to. The signature in +line 11 specifies the address (public key $K^{pub}$) from +where the Bitcoins are taken and the digital signature of the +address, that is $\{K^{pub}\}_{K^{priv}}$. Lines 12 to 15 +specify the value of the first outgoing transaction. In this +case 0.319 Bitcoins. The hash in Line 14 specifies the address +to where the Bitcoins are transferred. + +As can be seen there is no need to issue serial numbers for +transactions, the hash of the transaction data can do this +job. The hash will contain the sender addresses and +hash-references to the incoming transactions, as well as the +public key of the incoming transaction. This uniquely +identifies a transaction and the hash is the unique +fingerprint of it. The in-field also contains the address to +which a earlier transaction is made. The digital signature +ensures everybody can check that the person who makes this +transaction is in the possession of the private key. Otherwise +the signature would not match up with the public-key address. + +When mining the blockchain it only needs to be ensured that +the transactions are consistent (all hashes and signatures +match up). Then we need to generate the correct previous-block +link and solve the resulting puzzle. Once the block is +accepted, everybody can check the integrity of the whole +blockchain. + +A word of warning: The point of a lottery is that some people +win. But equally, that most people lose. Mining Bitcoins has +pretty much the same point. According to the article below, a +very large machine (very, very large in terms of June 2014) +could potentially mine \$40 worth of Bitcoins a day, but would +require magnitudes more of electricity costs to do so. -In 2016, Yahoo released the so far largest machine learning -dataset to the research community. It includes approximately -13.5 TByte of data representing around 100 Billion events from -anonymized user-news items, collected by recording -interactions of about 20M users from February 2015 to May -2015. Yahoo's gracious goal is to promote independent research -in the fields of large-scale machine learning and recommender -systems. It remains to be seen whether this data will really -only be used for that purpose. +\begin{center} +\url{http://bitcoinmagazine.com/13774/government-bans-professor-mining-bitcoin-supercomputer/} +\end{center} + +\noindent Bitcoin mining nowadays is only competitive, or +profitable, if you get the energy for free, or use special +purpose computing devices. + +This about ``free'' energy can actually hurt you very badly in +unexpected ways. You probably have heard about, or even used, +Amazon's Elastic Compute Cloud (EC2). Essentially, Amazon is +selling computing power that you can use to run your web site, +for example. It is \emph{elastic} in the sense that if you +have a lot of visitors, you pay a lot, if you have only a few, +then it is cheap. In order to bill you, you need to set +up an account with Amazon and receive some secret keys in +order to authenticate you. The clever (but also dangerous) bit +is that you upload the code of your web site to GitHub and +Amazon will pull it from there. You can probably already guess +where this is going: in order to learn about Amazon's API, it +gives out some limited computing power for free. Somebody used +this offer in order to teach himself Ruby on Rails with a +mildly practical website. Unfortunately, he uploaded also his +secret keys to GitHub (this is really an easy mistake). Now, +nasty people crawl GitHub for the purpose of stealing such +secret keys. What can they do with this? Well, they quickly +max out the limit of computing power with Amazon and mine +Bitcoins (under somebody else's account). Fortunately for this +guy, Amazon was aware of this scam and in a goodwill gesture +refunded him the money the nasty guys incurred over +night with their Bitcoin mining. If you want to read the +complete story, google for ``My \$2375 Amazon EC2 Mistake''. + +\subsubsection*{Multi-Signature Transactions} + +To be explained. -\subsubsection*{Differential Privacy} +\subsubsection*{Anonymity with Bitcoins} + +One question one often hears is how anonymous is it actually +to pay with Bitcoins? Paying with paper money used to be a +quite anonymous act (unlike paying with credit cards, for +example). But this has changed nowadays: You cannot come to a +bank anymore with a suitcase full of money and try to open a +bank account. Strict money laundering and taxation laws mean +that not even Swiss banks are prepared to take such money and +open a bank account. That is why Bitcoins are touted as +filling this niche again of anonymous payments. + +While Bitcoins are intended to be anonymous, the reality is +slightly different. I fully agree with the statement by +Nielsen from the blog article I referenced at the beginning: + +\begin{quote}\it{}``Many people claim that Bitcoin can be used +anonymously. This claim has led to the formation of +marketplaces such as Silk Road (and various successors), which +specialize in illegal goods. However, the claim that Bitcoin +is anonymous is a myth. The block chain is public, meaning +that it’s possible for anyone to see every Bitcoin transaction +ever. Although Bitcoin addresses aren't immediately associated +to real-world identities, computer scientists have done a +great deal of work figuring out how to de-anonymise +`anonymous' social networks. The block chain is a marvellous +target for these techniques. I will be extremely surprised if +the great majority of Bitcoin users are not identified with +relatively high confidence and ease in the near future.'' +\end{quote} + +\noindent The only thing I can add to this is that with the Bitcoin +blockchain we will in the future have even more pleasure hearing +confessions from reputable or not-so-reputable people, like the +infamous ``I did not inhale'' from an US +president.\footnote{\url{www.youtube.com/watch?v=Bktd_Pi4YJw}} The +whole point of the blockchain is that it public and will always be. -Differential privacy is one of the few methods that tries to -achieve forward privacy. The basic idea is to add appropriate -noise, or errors, to any query of the dataset. The intention -is to make the result of a query insensitive to individual -entries in the database. That means the results are -approximately the same no matter if a particular individual is -in the dataset or not. The hope is that the added error does -not eliminate the ``signal'' one is looking for in the -dataset. +There are some precautions one can take for boosting anonymity, for +example to use a new public-private key pair for every new +transaction, and to access Bitcoin only through the Tor network. But +the transactions in Bitcoins are designed such that they allow one to +combine incoming transactions. In such cases we know they must have +been made by the single person who knew the corresponding private +keys. So using different public-private keys for each transaction +might not actually make the de-anonymisation task much harder. And the +point about de-ano\-nymising `anonymous' social networks is that the +information is embedded into the structure of the transition +graph. And this cannot be erased with Bitcoins. + +One paper that has fun with spotting transactions made to Silk Road (2.0) +and also to Wikileaks is + +\begin{center} +\url{http://people.csail.mit.edu/spillai/data/papers/bitcoin-transaction-graph-analysis.pdf} +\end{center} + +\noindent +A paper that gathers some statistical data about the blockchain is + +\begin{center} +\url{https://eprint.iacr.org/2012/584.pdf} +\end{center} + +\subsubsection*{Government Meddling} + +Finally, what are the options for a typical Western government to +meddle with Bitcoins? This is of course one feature the proponents of +Bitcoins also tout: namely that there aren't any options. In my +opinion this is far too naive and far from the truth. Let us assume +some law enforcement agencies would not have been able to uncover the +baddies from Silk Road 1.0 and 2.0 (they have done so by uncovering +the Tor network, which is an incredible feat on its own). Would the +government in question have stopped? I do not think so. The next +target would have been Bitcoin. If I were the government, this is +what I would consider: -%\begin{center} -%User\;\;\;\; -%\begin{tabular}{c} -%tell me $f(x)$ $\Rightarrow$\\ -%$\Leftarrow$ $f(x) + \text{noise}$ -%\end{tabular} -%\;\;\;\;\begin{tabular}{@{}c} -%Database\\ -%$x_1, \ldots, x_n$ -%\end{tabular} -%\end{center} -% -%\begin{center} -%\begin{tabular}{l|l} -%Staff & Salary\\\hline -%$PM$ & \pounds{107}\\ -%$PF$ & \pounds{102}\\ -%$LM_1$ & \pounds{101}\\ -%$LF_2$ & \pounds{97}\\ -%$LM_3$ & \pounds{100}\\ -%$LM_4$ & \pounds{99}\\ -%$LF_5$ & \pounds{98} -%\end{tabular} -%\end{center} -% -% -%\begin{center} -%\begin{tikzpicture} -%\begin{axis}[symbolic y coords={salary}, -% ytick=data, -% height=3cm] %\addplot+[jump mark mid] coordinates %{(0,salary) (0.1,salary) % (0.4,salary) (0.5,salary) % (0.8,salary) (0.9,salary)}; %\end{axis} %\end{tikzpicture} -%\end{center} -% -%\begin{tikzpicture}[outline/.style={draw=#1,fill=#1!20}] % \node [outline=red] {red box}; % \node [outline=blue] at (0,-1) {blue box}; %\end{tikzpicture} +\begin{itemize} +\item The government could compel ``mayor players'' to blacklist + Bitcoins (for example at Bitcoin exchanges, which are usually + located somewhere in the vicinity of the government's reach). This + would impinge on what is called \emph{fungibility} of Bitcoins and + make them much less attractive to baddies. Suddenly their + ``hard-earned'' Bitcoin money cannot be spent anymore. The attraction + of this option is that this blacklisting can be easily done + ``whole-sale'' and therefore be really be an attractive target for + governments \& Co. +\item The government could attempt to coerce the developer + community of the Bitcoin tools. While this might be a + bit harder, we know certain governments are ready to + take such actions (we have seen this with Lavabit, just + that the developers there refused to play ball and shut + down their complete operation). +\item The government could also put pressure on mining pools + in order to blacklist transactions from baddies. Or be a + big miner itself. Given the gigantic facilities that + are built for institutions like the NSA (pictures from + the Utah dessert) + + \begin{center} + \includegraphics[scale=0.04]{../pics/nsautah1.jpg} + \hspace{3mm} + \includegraphics[scale=0.031]{../pics/nsautah2.jpg} + \end{center} + + this would not be such a high bar to jump over. Remember it + ``only'' takes to be temporarily in control of 50\%-plus of the + mining capacity in order to undermine the trust in the + system. Given sophisticated stories like Stuxnet (where we still + do not know the precise details) maybe even such large + facilities are not really needed. What happens, for example, if + a government starts DoS attacks on existing miners? They have + complete control (unfortunately) of all mayor connectivity + providers, i.e.~ISPs. + + There are estimates that the Bitcoin mining capacity + outperforms the top 500 supercomputers in the world, + combined(!): + + \begin{center}\small + \url{http://www.forbes.com/sites/reuvencohen/2013/11/28/global-bitcoin-computing-power-now-256-times-faster-than-top-500-supercomputers-combined/} + \end{center} + + But my gut feeling is that these are too simplistic + calculations. In security (and things like Bitcoins) the + world is never just black and white. The point is once + the trust is undermined, the Bitcoin system would need + to be evolved to Bitcoins 2.0. But who says that Bitcoin + 2.0 will honour the Bitcoins from Version 1.0? + \end{itemize} -\ldots +\noindent A government would potentially not really +need to follow up with such threads. Just the rumour that it +would, could be enough to get the Bitcoin-house-of-cards to +tumble. Some governments have already such an ``impressive'' +trackrecord in this area, such a thread would be entirely +credible. Because of all this, I would not have too much hope +that Bitcoins are free from interference by governments \& Co when +it will stand in their way, despite what everybody else is +saying. To sum up, the technical details behind Bitcoins are +simply cool. But still the entire Bitcoin ecosystem is in my +humble opinion rather fragile. + + +\subsubsection*{Isn't there anything good with Bitcoins?} + +As you can see, so far my argument was that yes the Bitcoin system is +based on a lot of very cool technical ideas, but otherwise it is a big +scam. You might wonder if there is not something good (in terms of +valuable for civilisation) in the bitcoin system? I think there is +actually: diamonds are quite valuable and because of this can be +used as a form of `money'---just remember the song with the line +`diamonds are forever'. + +The problem with diamonds is that in some places where they are found, +they also fund some stupid wars. You like to set up a usable system +whereby you can check whether a diamond comes from a reputable source +(not funding any wars) or from a dodgy source. For this you have to +know that `clearing houses' for diamonds can engrave with lasers +unique numbers inside the diamonds. These engravings are invisible to +the naked eye and as far as I remember these numbers cannot be removed, +except by destroying the diamond. Even if it can be removes, diamonds +without the number cannot (hopefully) be sold. +How do bitcoins come into the picture? The idea is called +\emph{coloured coins}, where you attach some additional information to +some `coins'. In the diamond example the bitcoin transactions are +supposed to act as a certificate where diamonds are from (reputable +sources or not). For this you have to know that you can attach a very +short custom-made message with each bitcoin transaction. So you would +record the diamond number inside the message. + +Now, you would set the system up so that a trusted entity (which +exists in the diamond world) buys with their public key bitcoins (or +smaller amounts). These trusted entities are essentially the places +that also cut the raw diamonds. The idea is whenever you buy a +diamond, you like to have also the corresponding bitcoin +transaction. If you want to sell the diamond, you make a transaction +to the new owner. The new owner will ask for this message, because +otherwise he/she cannot sell it later on. + +The advantage is that for each diamond you can trace back that the +transaction must have originated from the trusted entity. If yes, your +diamond will be sellable. If you do not have the message, the diamond +comes from a dodgy source and will (hopefully) not be sellable later +on. In this way you skew the incentives such that only legitimate +diamond are of value. The bitcoin system just helps with being able to +check whether the message originates from the trusted entity or +not....you do not have to consult anybody else and pay money for this +consultation. Or in any way reveal your identity by such a consultation +(the police might just keep a particularly close an eye on who contacts +such a clearing house). + +Since we hopefully all agree that funding stupid wars is bad, any +system that can starve funds for such wars must be good. Piggy-bagging +on the trust established by the bitcoin system on the public block chain +makes such a system realisable. \subsubsection*{Further Reading} -Two cool articles about how somebody obtained via the Freedom -of Information Law the taxicab dataset of New York and someone -else showed how easy it is to mine for private information: +Finally, finally, the article \begin{center}\small -\begin{tabular}{p{0.78\textwidth}} -\url{http://chriswhong.com/open-data/foil_nyc_taxi/}\smallskip\\ -\url{http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset} -\end{tabular} +\url{http://www.extremetech.com/extreme/155636-the-bitcoin-network-outperforms-the-top-500-supercomputers-combined} \end{center} -\noindent -A readable article about how supermarkets mine your shopping -habits (especially how they prey on new exhausted parents -;o) appeared in 2012 in the New York Times: +\noindent makes an interesting point: If people are willing to +solve meaningless puzzles for hard, cold cash and with this +achieve rather impressive results, what could we achieve if +the UN, say, would find the money and incentivise people to, +for example, solve protein folding +puzzles?\footnote{\url{http://en.wikipedia.org/wiki/Protein_folding}} +For this there are projects like +Folding@home.\footnote{\url{http://folding.stanford.edu}} +This might help with curing diseases such as Alzheimer or +diabetes. The same point is made in the article + +\begin{center}\small +\url{http://gizmodo.com/the-worlds-most-powerful-computer-network-is-being-was-504503726} +\end{center} + +A definitely interesting and worthy use of Bitcoins has been explored +in the thesis \begin{center} -\url{http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html} -\end{center} - -\noindent An article that analyses privacy and shopping habits -from a more economic point of view is available from: - -\begin{center} -\url{http://www.dtc.umn.edu/~odlyzko/doc/privacy.economics.pdf} +\url{http://enetium.com/resources/Thesis.pdf} \end{center} -\noindent An attempt to untangle the web of current technology -for spying on consumers is published in: +\noindent where the author proposes ways of publishing information +that is censor-resistant as part of the blockchain. The idea is that +if a government wants to use Bitcoins, it would also have to put up +with plain-text data that can be included in a transaction. -\begin{center} -\url{http://cyberlaw.stanford.edu/files/publication/files/trackingsurvey12.pdf} -\end{center} +Ken Shirrif in his blog at -\noindent An article that sheds light on the paradox that -people usually worry about privacy invasions of little -significance, and overlook the privacy invasion that might -cause significant damage: +\begin{center}\small +\url{http://www.righto.com/2014/02/bitcoin-mining-hard-way-algorithms.html} +\end{center} -\begin{center} -\url{http://www.heinz.cmu.edu/~acquisti/papers/Acquisti-Grossklags-Chapter-Etrics.pdf} -\end{center} - - -Interesting ideas +\noindent writes that every day the electricity consumption of mining +for bitcoins is roughly 15 Mega Watts---the energy consumption of a country +like Cambodia. He writes: -\begin{center} -\url{https://adnauseam.io} -\end{center} - -\noindent -And a paper that predicts ad-blockers will in the end win over anti-ad-blocking: - -\begin{center} -\url{http://randomwalker.info/publications/ad-blocking-framework-techniques.pdf} -\end{center} - +\begin{quote} + \it{}``The difficulty of mining a block is astounding. At the + current difficulty, the chance of a hash succeeding is a bit less + than one in $10^{19}$. Finding a successful hash is harder than + finding a particular grain of sand from all the grains of sand on + Earth. To find a hash every ten minutes, the Bitcoin hash rate needs + to be insanely large. Currently, the miners on the Bitcoin network + are doing about 25 million gigahashes per second. That is, every + second about 25,000,000,000,000,000 blocks gets hashed. I estimate + (very roughly) that the total hardware used for Bitcoin mining cost + tens of millions of dollars and uses as much power as the country of + Cambodia.'' +\end{quote} \end{document} -http://randomwalker.info/teaching/fall-2012-privacy-technologies/? -http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/ -http://repository.cmu.edu/cgi/viewcontent.cgi?article=1077&context=hcii -https://josephhall.org/papers/NYU-MCC-1303-S2012_privacy_syllabus.pdf -http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf -http://www.cs.cmu.edu/~yuxiangw/docs/Differential%20Privacy.pdf -https://www.youtube.com/watch?v=Gx13lgEudtU -https://www.cs.purdue.edu/homes/ctask/pdfs/CERIAS_Presentation.pdf -http://www.futureofprivacy.org/wp-content/uploads/Differential-Privacy-as-a-Response-to-the-Reidentification-Threat-Klinefelter-and-Chin.pdf -http://www.cis.upenn.edu/~aaroth/courses/slides/Overview.pdf -http://www.cl.cam.ac.uk/~sjm217/papers/tor14design.pdf +bit coin + +A fistful of bitcoins +http://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf +http://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf + +Ross Anderson & Co (no dispute resolution; co-ercion) +http://www.cl.cam.ac.uk/~sjm217/papers/fc14evidence.pdf -%%% Local Variables: -%%% mode: latex -%%% TeX-master: t -%%% End: +http://www.michaelnielsen.org/ddi/how-the-bitcoin-protocol-actually-works/ +http://www.imponderablethings.com/2013/07/how-bitcoin-works-under-hood.html + +http://randomwalker.info/bitcoin/ + +Jeffrey Robinson +Bitcon: The Naked Truth about Bitcoin + +The Bitcoin Backbone Protocol: Analysis and Applications +https://eprint.iacr.org/2014/765.pdf + +Bitcoin book +http://chimera.labs.oreilly.com/books/1234000001802/ch04.html#public_key_derivation diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho08.tex --- a/handouts/ho08.tex Sat Sep 23 19:39:53 2017 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,1011 +0,0 @@ -\documentclass{article} -\usepackage{../style} -\usepackage{../graphics} -\usepackage{../langs} -\usepackage{../data} - -%https://crypto.stanford.edu/cs251/ -%https://programmingblockchain.gitbooks.io/programmingblockchain/content/ - -% bug in smart contracts -% https://www.benthamsgaze.org/2016/06/17/smart-contracts-beyond-the-age-of-innocence/ -% http://hackingdistributed.com/2016/06/18/analysis-of-the-dao-exploit/ - -% hard forks -% https://github.com/bitcoin/bips/blob/master/bip-0050.mediawiki - -% only 25% needed to obtain larger shares of mining -% http://www.cs.cornell.edu/~ie53/publications/btcProcFC.pdf - -% re-identification attacks -% https://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf - -% bit-coin papers -% https://crypto.stanford.edu/cs251/syllabus.html - -% bit coin talk --- at 20:00 mins -%https://www.usenix.org/conference/lisa16/conference-program/presentation/perlman - -% In fact, far from freeing people from the oppression of the state, -% blockchains perversely promise the perfect tool for a fully -% auditable, tax compliant, cashless society. Similarly, the belief it -% is an anonymous digital cash has quickly vanished and we are now -% seeing a large number of analytics companies, set-up specifically to -% work with law enforcement agencies, to police this new parallel -% financial system. -% -% But today blockchain is riddled with -% contradictions and misunderstandings. Most of its problems are very -% fixable, if you want to fix them - - -% history of bitcoins -% https://futurism.com/images/this-week-in-tech-jan-15-22-2016/ - -\begin{document} -\fnote{\copyright{} Christian Urban, 2014, 2015} - -\section*{Handout 7 (Bitcoins)} - -In my opinion Bitcoins are an elaborate Ponzi -scheme\footnote{\url{http://en.wikipedia.org/wiki/Ponzi_scheme}}---still -the ideas behind them are really beautiful and not too -difficult to understand. Since many colourful claims about -Bitcoins float around in the mainstream and not-so-mainstream -media, it will be instructive to re-examine such claims from a -more technically informed vantage point. For example, it is -often claimed that Bitcoins are anonymous and free from any -potential government meddling. It turns out that the first -claim ignores a lot of research in de-anonymising social -networks, and the second underestimates the persuasive means a -government has at its disposal. - -There are a lot of articles, blogposts, research papers -etc.~available about Bitcoins. Below I will follow closely the -very readable explanations from - -\begin{center} -\url{http://www.michaelnielsen.org/ddi/how-the-bitcoin-protocol-actually-works/} \;\;and\smallskip\\ -\url{http://www.imponderablethings.com/2013/07/how-bitcoin-works-under-hood.html} -\end{center} - -\noindent The latter also contains a link to a nice youtube -video about the technical details behind Bitcoins. I will -also use some of their pictures. - -Let us start with the question who invented Bitcoins? You -could not make up the answer, but we actually do not know who -the inventor is. All we know is that the first paper - -\begin{center} -\url{https://bitcoin.org/bitcoin.pdf} -\end{center} - -\noindent is signed by Satoshi Nakamoto, which however is -likely only a pen name. There is a lot of speculation who -could be the inventor, or inventors, but we simply do not -know. This part of Bitcoins is definitely anonymous so far. -The paper above is from the end of 2008; the first Bitcoin -transaction was made in January 2009. The rules in Bitcoin are -set up so that there will only ever be 21 Million Bitcoins -with the maximum reached around the year 2140. Currently there -are already 11 Million Bitcoins in `existence'. Contrast this -with traditional fiat currencies where money can be printed -almost at will. The smallest unit of a Bitcoin is called a -Satoshi, which is the $10^{-8}$th part of a Bitcoin. Remember -a Penny is the $10^{-2}$th part of a Pound. - -The two main cryptographic building blocks of Bitcoins are -cryptographic hashing functions (SHA-256) and public-private -keys using the elliptic-curve encryption scheme for digital -signatures. Hashes are used to generate `fingerprints' of data -that ensure integrity (absence of tampering). Public-private -keys are used for signatures. For example sending a message, -say $msg$, together with the encrypted version - -\[ -msg, \{msg\}_{K^{priv}} -\] - -\noindent allows everybody with access to the corresponding -public key $K^{pub}$ to verify that the message came from the -person who knew the private key. Signatures are used in -Bitcoins for verifying the addresses where the Bitcoins are -sent from. Addresses in Bitcoins are essentially the public -keys. There are $2^{160}$ possible addresses, which is such a -vast amount that there is not even a check for duplicates, or -already used addresses. If you start with a random number to -generate a public-private key pair it is very unlikely that -you step on somebody else's shoes. Compare this with the -email-addresses you wanted to register with, say -Gmail, but which are always already taken. - -One major difference between Bitcoins and traditional banking -is that you do not have a place, or few places, that record the -balance on your account. Traditional banking involves a -central ledger which specifies the current balance in each -account, for example - -\begin{center} -\begin{tabular}{l|r} -account owner & balance\\\hline -Alice & \pounds{10.01}\\ -Bob & \pounds{4.99}\\ -Charlie & -\pounds{1.23}\\ -Eve & \pounds{0.00} -\end{tabular} -\end{center} - -\noindent Bitcoins work differently in that there is no such -central ledger, but instead a public record of all -transactions ever made. This means spending money corresponds -to sending messages of the (oversimplified) form - -\begin{equation} -\{\text{I, Alice, am giving Bob one Bitcoin.}\}_{K^{priv}_{Alice}} -\end{equation} - -\noindent These messages, called transactions, are the only -data that is ever stored in the Bitcoin system (we will come -to the precise details later on). The transactions are -encrypted with Alice's private key so that everybody, -including Bob, can use Alice's public key $K^{pub}_{Alice}$ to -verify that this message came really from Alice, or more -precisely from the person who knows $K^{priv}_{Alice}$. - -The problem with such messages in a distributed system is that -what happens if Bob receives 10, say, of these transactions? -Did Alice intend to send him 10 Bitcoins, or did the message -get duplicated by for example an attacker re-playing a sniffed -message? What is needed is a kind of serial number for such -transactions. This means transaction messages shoul look more like - -\begin{center} -$\{\text{I, Alice, am giving Bob Bitcoin \#1234567.}\}_{K^{priv}_{Alice}}$ -\end{center} - -\noindent There are two difficulties, however, that need to be -solved with serial numbers. One is who is assigning serial -numbers to Bitcoins and also how can Bob verify that Alice -actually owns this Bitcoin to pay him? In a system with a bank -as trusted third-party, Bob could do the following: - -\begin{itemize} -\item Bob asks the bank whether the Bitcoin with that serial - number belongs to Alice and Alice hasn't already spent - this Bitcoin. -\item If yes, then Bob tells the bank he accepts this Bitcoin. - The bank updates the records to show that the Bitcoin - with that serial number is now in Bob’s possession and - no longer belongs to Alice. -\end{itemize} - -\noindent But for this banks would need to be trusted and -would also be an easy target for any government interference, -for example. Think of the early days of music sharing where -the company Napster was the trusted third-party but also the single point of ``failure'' which -was taken offline by law enforcement. Bitcoins is more like a -system such as BitTorrent without a single central entity that -can be taken offline.\footnote{There is some Bitcoin -infrastructure that is not so immune from being taken offline: -for example Bitcoin exchanges, HQs of Bitcoin mining pools, -Bitcoin developers and so on.} - -Bitcoins solve the problem of not being able to rely on a bank -by making everybody the ``bank''. Everybody who cares can have -the entire transaction history starting with the first -transaction made in January 2009. This history of transactions -is called the \emph{blockchain}. Bob, for example, can use his -copy of the blockchain for determining whether Alice owned the -Bitcoin he received, and if she did, he transmits the message -that he owns it now to every other participant on the Bitcoin -network. An illustration of a three-block segment of the -blockchain is (simplified) as follows - -\begin{equation} -\includegraphics[scale=0.4]{../pics/bitcoinblockchain0.png} -\label{segment} -\end{equation} - -\noindent The chain grows with time. Each block contains a -list of individual transactions, written txn in the picture -above, and also a reference to the previous block, written -prev. The data in a block (txn's and prev) is hashed so that -the reference and transactions in them cannot be tampered -with. This hash is also the unique serial number of each -block. Since this previous-block-reference is also part of the -hash, the whole chain is robust against tampering. I let you -think why this is the case?\ldots{}But does it actually -eliminate all possibilities of fraud? - -We can check the consistency of the blockchain by checking -whether all the references and hashes are correctly recorded. -I have not tried it myself, but it is said that with the -current amount of data (appr.~12GB) it takes roughly a day to -check the consistency of the blockchain on a normal computer. -Fortunately this ``extended'' consistency check usually only -needs to be done once. Afterwards the blockchain only needs to -be updated consistently. - -Recall I wrote earlier that Bitcoins do not maintain a ledger, -which lists all the current balances in each account. Instead -only transactions are recorded. While a current balance of an -account is not immediately available, it is possible to -extract from the blockchain a transaction graph that looks -like the picture shown in Figure~\ref{txngraph}. Each -rectangle represents a single transaction. Take for example -the rightmost lower transaction from Charles to Emily. This -transaction has as receiver the address of Emily and as the -sender the address of Charles. In this way no Bitcoins can -appear out of thin air (we will discuss later how Bitcoins are -actually generated). If Charles did not have a transaction of -at least the amount he wants to give Emily to his name -(i.e.~send to an address with his public-private key) then -there is no way he can make a payment to Emily. Equally, if -now Emily wants to pay for a coffee, say, with the Bitcoin she -received from Charles she can essentially only forward the -message she received. The only slight complication with this -setup in Bitcoins is that ``incoming'' Bitcoins can be -combined in a transaction and ``outgoing'' Bitcoins can be -split. For example in the leftmost upper transactions in -Figure~\ref{txngraph}, Fred makes a payment to Alice. But this -payment (or transaction) combines the Bitcoins that were send -by Jane to Fred and also by Juan to Fred. This allows you to -``consolidate'' your funds: if it were only possible to split -transactions, then the amounts would get smaller and smaller. - -In Bitcoins you have the ability to both combine incoming -transactions, but also to split outgoing transactions to -potentially more than one receiver. The latter is also needed. -Consider again the rightmost transactions in -Figure~\ref{txngraph} and suppose Alice is a coffeeshop owner -selling coffees for 1 Bitcoin. Charles received a transaction -from Zack over 5 Bitcoins, say. How does Charles pay for the -coffee? There is no explicit notion of \emph{change} in the -Bitcoin system. What Charles has to do instead is to make one -single transaction with 1 Bitcoin to Alice and with 4 Bitcoins -going back to himself, which then Charles can use to give to -Emily, for example. - -\begin{figure}[t] -\begin{center} -\includegraphics[scale=0.4]{../pics/blockchain.png} -\end{center} -\caption{Transaction graph that is implicitly recorded in the -public blockchain.\label{txngraph}} -\end{figure} - -Let us consider another example. Suppose Emily received 4 -Bitcoins from Charles and independently received another -transaction (not shown in the picture) that sends 6 Bitcoins -to her. If she now wants to buy a coffee from Alice for 1 -Bitcoin, she has two possibilities: She could just forward the -transaction from Charles over 4 Bitcoins to Alice split in -such a way that Alice receives 1 Bitcoin and Emily sends the -remaining 3 Bitcoins back to herself. In this case she would -now be in the possession of two unspend Bitcoin transactions, -one over 3 Bitcoins and the independent one over 6 Bitcoins. -Or, Emily could combine both transactions (one over 4 Bitcoins -from Charles and the independent one over 6 Bitcoins) and then -split this amount with 1 Bitcoin going to Alice and 9 Bitcoins -going back to herself. - -I think this is a good time for you to pause to let this -concept of transactions to really sink in\ldots{}You should -come to the conclusion that there is really no need for a central ledger and no -need for an account balance as familiar from traditional -banking. The closest what Bitcoin has to offer for the notion -of a balance in a bank account are the unspend transactions -that a person (more precisely a public-private key address) -received. That means transactions that can still be forwarded. - -After the pause also consider the fact that whatever -transaction is recorded in the blockchain will be in the -``historical record'' for the Bitcoin system. If a transaction -says 1 Bitcoin goes from address $A$ to address $B$, then this -is what will be---$B$ has then the possibility to spend the -corresponding Bitcoins, whether the transaction was done -fraudulently or not. There is no exception to this rule. -Interestingly this is also how Bitcoins can get lost: One -possibility is that you send Bitcoins to an address for which -nobody has generated a private key, for example because of a -typo in the address field---bad luck for fat -fingers\footnote{\url{http://en.wikipedia.org/wiki/Typographical_error}} -in the Bitcoin system. The reason is that nobody has a private -key for this erroneous address and consequently cannot forward -the transaction anymore. Another possibility is that you -forget your private key and you had messages forwarded to the -corresponding public key. Also in this case bad luck: you will -never be able to forward this message again, because you will -not be able to form a valid message that sends this to -somebody else (we will see the details of this later). But -this is also a way how you can get robbed of your Bitcoins. By -old-fashioned hacking-into-a-computer crime, for example, an -attacker might get hold of your private key and then quickly -forwards the Bitcoins that are in your name to an address the -attacker controls. You will never again have access to these -Bitcoins, because for the Bitcoin system they are assumed to -be spent. And remember with Bitcoins you cannot appeal to any -higher authority. Once the Bitcoins are gone, they are gone. -This is much different in traditional banking where at least -you can try to harass the bank to roll back the transaction. - -This brings us to back to problem of double spend. Suppose Bob -is a merchant. How can he make sure that Alice does not cheat -him? She could for example send a transaction to Bob. But also -forward the ``same'' transaction to Charlie, or even herself. -If Alice manages to get the second transaction into the -blockchain, Bob will be cheated out of his money. The problem -in such conflicting situations is how should the network -update their blockchain? You might end up with a picture like -this - -\begin{center} -\includegraphics[scale=0.4]{../pics/bitcoindisagreement.png} -\end{center} - -\noindent where Alice convinced some part of the ``world'' -that she is still the owner of the Bitcoin and some other part -of the ``world'' thinks it's Bob's. How should such a -disagreement be resolved? This is actually the main hurdle -where Bitcoin really innovated. The answer is that Bob needs -to convince ``enough'' people on the network that the -transaction from Alice to him is legit. - -What does, however, ``enough'' mean in a distributed system? -If Alice sets up a network of a billion, say, puppy identities -and whenever Bob tries to convince, or validate, that he is -the rightful owner of the Bitcoin, then the puppy identities -agree. Bob would then have no reason to not give Alice her -coffee. But behind his back she has convinced everybody else -on the network that she is still the rightful owner of the -Bitcoin. After being outvoted, Bob would be a tad peeved. - -The reflex reaction to such a situation would be to make the -process of validating a transaction as cheap as possible. The -intention is that Bob will easily get enough peers to agree with him -that he is the rightful owner. But such a solution has always -the limitation of Alice setting up an even bigger network of -puppy identities. The really cool idea of Bitcoin is to go -into the other direction of making the process of transaction -validation (artificially) as expensive as possible, but reward -people for helping with the validation. This is really a novel -and counterintuitive idea that makes the whole system of -Bitcoins work so beautifully. - -\subsubsection*{Proof-of-Work Puzzles} - -In order to make the process of transaction validation -difficult, Bitcoin uses a kind of puzzle. Solving the puzzles -is called \emph{Bitcoin mining}, where whoever solves a puzzle -will be awarded some Bitcoins. At the beginning this was 50 -Bitcoins, but the rules of Bitcoin are set up such that this -amount halves every 210,000 transactions or so. Currently you -will be awarded 25 Bitcoins for solving a puzzle. Because the -amount will halve again and then later again and again, around -the year 2140 it will go below the level of 1 Satoshi. In that -event no new Bitcoins will ever be created again and the -amount of Bitcoins stays fixed. There will be still an -incentive to help with validating transactions, because there -is the possibility in Bitcoins to offer a transaction fee to -whoever solves a puzzle. At the moment this fee is usually set -to 0, since the incentive for miners is the 25 Bitcoins that -are currently awarded for solving puzzles. - -What do the puzzles that miners have to solve look like? The -puzzles can be illustrated roughly as follows: Given a string, -say \code{"Hello, world!"}, what is the salt so that the hash -starts with a long run of zeros? Let us look at a concrete -example. Recall that Bitcoins use the hash-function SHA-256. -Suppose we call this hash function \code{h}, then we could try -the salt \code{0} as follows: - -\begin{quote} -\code{h("Hello, world!0") =}\\ -\mbox{}\quad\footnotesize\pcode{1312af178c253f84028d480a6adc1e25e81caa44c749ec81976192e2ec934c64} -\end{quote} - -\noindent OK this does not have any zeros at all. We could -next try the salt \code{1}: - -\begin{quote} -\code{h("Hello, world!1") =}\\ -\mbox{}\quad\footnotesize\pcode{e9afc424b79e4f6ab42d99c81156d3a17228d6e1eef4139be78e948a9332a7d8} -\end{quote} - -\noindent Again this hash value does not contain any leading -zeros. We could now try out every salt until we reach - -\begin{quote} -\code{h("Hello, world!4250") =}\\ -\mbox{}\quad\footnotesize\pcode{0000c3af42fc31103f1fdc0151fa747ff87349a4714df7cc52ea464e12dcd4e9} -\end{quote} - -\noindent where we have four leading zeros. If four zeros are -enough, then the puzzle would be solved with this salt. The -point is that we can very quickly check whether a salt solves -a puzzle, but it is hard to find one. Latest research suggest -it is an NP-problem. If we want the output hash value to begin -with 10 zeroes, say, then we will, on average, need to try -$16^{10} \approx 10^{12}$ different salts before we find a -suitable one. - -In Bitcoins the puzzles are not solved according to how many -leading zeros a hash-value has, but rather whether it is below -a \emph{target}. The hardness of the puzzle can actually be -controlled by changing the target according to the available -computational power available. I think the adjustment of the -hardness of the problems is done every 2060 blocks -(appr.~every two weeks). The aim of the adjustment is that on -average the Bitcoin network will most likely solve a puzzle -within 10 Minutes. - -\begin{center} -\includegraphics[scale=0.37]{../pics/blockchainsolving.png} -\end{center} - -\noindent It could be solved quicker, but equally it could -take longer, but on average after 10 Minutes somebody on the -network will have found a solution. - -Remember that the puzzles are a kind of proof-of-work that -make the validation of transactions artificially expensive. -Consider the following picture with a blockchain and some -unconfirmed transactions. - -\begin{equation} -\includegraphics[scale=0.38]{../pics/bitcoin_unconfirmed.png} -\label{unconfirmed} -\end{equation} - -\noindent The puzzle is stated as follows: There are some -unconfirmed transactions. Choosing some of them, the miner -(i.e.~the person/computer that tries to solve a puzzle) will -form a putative block to be added to the blockchain. This -putative block will contain the transactions and the reference -to the previous block. The serial number of such a block is -simply the hash of all the data. The puzzle can then be stated -as the ``string'' corresponding to the block and which salt is -needed in order to have the hashed value being below the -target. Other miners will choose different transactions and -therefore work on a slightly different putative block and -puzzle. - -The intention of the proof-of-work puzzle is that the -blockchain is at every given moment linearly ordered, see the -picture shown in \eqref{unconfirmed}. If we don’t have such a -linear ordering at any given moment then it may not be clear -who owns which Bitcoins. Assume a miner David is lucky and -finds a suitable salt to confirm some transactions. Should he -celebrate? Not yet. Typically the blockchain will look as -follows - -\begin{center} -\includegraphics[scale=0.65]{../pics/block_chain1.png} -\end{center} - -\noindent But every so often there will be a fork - -\begin{center} -\includegraphics[scale=0.65]{../pics/block_chain_fork.png} -\end{center} - -\noindent What should be done in this case? Well, the tie is -broken if another block is solved, like so: - -\begin{center} -\includegraphics[scale=0.4]{../pics/bitcoin_blockchain_branches.png} -\end{center} - -\noindent The rule in Bitcoins is: If a fork occurs, people on -the network keep track of all forks (they can see). But at any -given time, miners only work to extend whichever fork is -longest in their copy of the block chain. Why should miners -work on the longest fork? Well their incentive is to mine -Bitcoins. If somebody else already solved a puzzle, then it -makes more sense to work on a new puzzle and obtain the -Bitcoins for solving that puzzle, rather than waste efforts on -a fork that is shorter and therefore less likely to be -``accepted''. Note that whoever solved a puzzle on the -``loosing'' fork will actually not get any Bitcoins as reward. -Tough luck. - - -\subsubsection*{Alice against the Rest of the World} - -Let us see how the blockchain and the proof-of-work puzzles -avoid the problem of double spend. If Alice wants to cheat -Bob, she would need to pull off the following ploy: - -\begin{center} -\includegraphics[scale=0.4]{../pics/bitcoin_blockchain_double_spend.png} -\end{center} - -\noindent Alice makes a transaction to Bob for paying, for -example, for an online order. This transaction is confirmed, -or validated, in block 2. Bob ships the goods around block 4. -In this moment, Alice needs to get into action and try to -validate the fraudulent transaction to herself instead. At -this moment she is in a race against all the computing power -of the ``rest of the world''. Because the incentive of the -rest of the world is to work on the longest chain, that is the -one with the transaction from Alice to Bob: - -\begin{center} -\includegraphics[scale=0.4]{../pics/bitcoin_doublespend_blockchain_race.png} -\end{center} - -\noindent As shown in the picture she has to solve the puzzles -2a to 5a one after the other, because the hash of a block is -determined via the reference by all the data in the previous -block. She might be very lucky to solve one puzzle for a block -before the rest of the world, but to be lucky many times is -very unlikely. This principle of having to race against the -rest of the world avoids the ploy of double spend. - -In order to raise the bar for Alice even further, merchants -accepting Bitcoins use the following rule of thumb: A -transaction is ``confirmed'' if - -\begin{itemize} -\item[(1)] it is part of a block in the longest fork, and -\item[(2)] at least 5 blocks follow it in the longest fork. In - this case we say that the transaction has 6 - confirmations. -\end{itemize} - -\noindent A simple calculation shows that this amount of -confirmations can take up to 1 hour and more. While this seems -excessively long, from the merchant's point of view it is not -that long at all. For this recall that ordinary creditcards -can have their transactions been rolled-back for 6 months or -so. The point however is that the odds for Alice being able to -cheat are very low, unless she can muster more than 50\% of -the world Bitcoin mining capacity. In this case she could -out-race the rest of the world. The point is however that -amassing such an amount of computing power is practically -impossible for a single person or even a moderately large -group. - -Connected with the 6-confirmation rule is an interesting -phenomenon. On average, it would take several years for a -typical computer to solve a proof-of-work puzzle, so an -individual’s chance of ever solving one before the rest of the -world, which typically takes only 10 minutes, is negligibly -low. Therefore many people join groups called \emph{mining -pools} that collectively work to solve blocks, and distribute -rewards based on work contributed. These mining pools act -somewhat like lottery pools among co-workers, except that some -of these pools are quite large, and comprise more than 20\% of -all the computers in the network. It is said that BTCC, a -large mining pool, has limited its number of members in order -to not solve more than 6 blocks in a row. Otherwise this would -undermine the trust in Bitcoins, which is also not in the -interest of BTCC, I guess. Some statistics on mining pools can -be seen at - -\begin{center} -\url{https://blockchain.info/pools} -\end{center} - -\noindent Here is an interesting problem: You are part of a lottery -pool, if you chip in some of the money to buy a lottery ticket. In -this setting it is clear when you are in or outside of the pool. But -how do you make sure people work hard in a mining pool in order to -justify a fraction of any reward? If evil me had its way, I would just -claim I do work and then sit back and relax. Or even if I do some work -for a mining pool and I happen to find a correct salt, I would keep it -secret and submit it to the bitcoin network on the ``side''. Actually, -the idea of mining pools has opened up a full can of interesting -problems. - - - -\subsubsection*{Bitcoins for Real} - -Let us now turn to the nitty gritty details. As a participant -in the Bitcoin network you need to generate and store a -public-private key pair. The public key you need to advertise -in order to receive payments (transactions). The private key -needs to be securely stored. For this there seem to be three -possibilities - -\begin{itemize} -\item an electronic wallet on your computer -\item a cloud-based storage (offered by some Bitcoin services) -\item paper-based -\end{itemize} - -\noindent The first two options of course offer convenience -for making and receiving transactions. But given the nature of -the private keys and how much security relies on them (recall -if somebody gets hold of it, your Bitcoins are quickly lost -forever) I would opt for the third option for anything except -for trivial amounts of Bitcoins. As we have seen earlier in -the course, securing a computer system that it can withstand a -targeted breakin is still very much an unsolved problem. - -An interesting fact with Bitcoin keys is that there is no -check for duplicate addresses. This means when generating a -public-private key, you should really start with a carefully -chosen random number such that there is really no chance to -step on somebody's feet in the $2^{160}$ space of -possibilities. Again if you share an address with somebody -else, he or she has access to all your unspend transactions. -The absence of such a check is easily explained: How would one -do this in a distributed system? The answer you can't. It is -possible to do some sanity check of addresses that are already -used in the blockchain, but this is not a fail-proof method. -One really has to trust on the enormity of the $2^{160}$ -space for addresses. - -Let us now look at the concrete data that is stored in an transaction -message: - -\lstinputlisting[language=Scala]{../slides/msg} - -\noindent The hash in Line 1 is the hash of all the data that -follows. It is a kind of serial number for the transaction. -Line 2 contains a version number in case there are some -incompatible changes to be made. Lines 3 and 4 specify how -many incoming transactions are combined and how many outgoing -transactions there are. In our example there are one for each. -Line 5 specifies a lock time for when the transaction is -supposed to become active---this is usually set to 0 to become -active immediately. Line 6 specifies the size of the message; -it has nothing to do with the Bitcoins that are transferred. -Lines 7 to 11 specify where the Bitcoins in the transaction -are coming from. The has in line 9 specifies the incoming -transaction and the \pcode{n} in Line 10 specifies which -output of the transaction is referred to. The signature in -line 11 specifies the address (public key $K^{pub}$) from -where the Bitcoins are taken and the digital signature of the -address, that is $\{K^{pub}\}_{K^{priv}}$. Lines 12 to 15 -specify the value of the first outgoing transaction. In this -case 0.319 Bitcoins. The hash in Line 14 specifies the address -to where the Bitcoins are transferred. - -As can be seen there is no need to issue serial numbers for -transactions, the hash of the transaction data can do this -job. The hash will contain the sender addresses and -hash-references to the incoming transactions, as well as the -public key of the incoming transaction. This uniquely -identifies a transaction and the hash is the unique -fingerprint of it. The in-field also contains the address to -which a earlier transaction is made. The digital signature -ensures everybody can check that the person who makes this -transaction is in the possession of the private key. Otherwise -the signature would not match up with the public-key address. - -When mining the blockchain it only needs to be ensured that -the transactions are consistent (all hashes and signatures -match up). Then we need to generate the correct previous-block -link and solve the resulting puzzle. Once the block is -accepted, everybody can check the integrity of the whole -blockchain. - -A word of warning: The point of a lottery is that some people -win. But equally, that most people lose. Mining Bitcoins has -pretty much the same point. According to the article below, a -very large machine (very, very large in terms of June 2014) -could potentially mine \$40 worth of Bitcoins a day, but would -require magnitudes more of electricity costs to do so. - -\begin{center} -\url{http://bitcoinmagazine.com/13774/government-bans-professor-mining-bitcoin-supercomputer/} -\end{center} - -\noindent Bitcoin mining nowadays is only competitive, or -profitable, if you get the energy for free, or use special -purpose computing devices. - -This about ``free'' energy can actually hurt you very badly in -unexpected ways. You probably have heard about, or even used, -Amazon's Elastic Compute Cloud (EC2). Essentially, Amazon is -selling computing power that you can use to run your web site, -for example. It is \emph{elastic} in the sense that if you -have a lot of visitors, you pay a lot, if you have only a few, -then it is cheap. In order to bill you, you need to set -up an account with Amazon and receive some secret keys in -order to authenticate you. The clever (but also dangerous) bit -is that you upload the code of your web site to GitHub and -Amazon will pull it from there. You can probably already guess -where this is going: in order to learn about Amazon's API, it -gives out some limited computing power for free. Somebody used -this offer in order to teach himself Ruby on Rails with a -mildly practical website. Unfortunately, he uploaded also his -secret keys to GitHub (this is really an easy mistake). Now, -nasty people crawl GitHub for the purpose of stealing such -secret keys. What can they do with this? Well, they quickly -max out the limit of computing power with Amazon and mine -Bitcoins (under somebody else's account). Fortunately for this -guy, Amazon was aware of this scam and in a goodwill gesture -refunded him the money the nasty guys incurred over -night with their Bitcoin mining. If you want to read the -complete story, google for ``My \$2375 Amazon EC2 Mistake''. - -\subsubsection*{Multi-Signature Transactions} - -To be explained. - -\subsubsection*{Anonymity with Bitcoins} - -One question one often hears is how anonymous is it actually -to pay with Bitcoins? Paying with paper money used to be a -quite anonymous act (unlike paying with credit cards, for -example). But this has changed nowadays: You cannot come to a -bank anymore with a suitcase full of money and try to open a -bank account. Strict money laundering and taxation laws mean -that not even Swiss banks are prepared to take such money and -open a bank account. That is why Bitcoins are touted as -filling this niche again of anonymous payments. - -While Bitcoins are intended to be anonymous, the reality is -slightly different. I fully agree with the statement by -Nielsen from the blog article I referenced at the beginning: - -\begin{quote}\it{}``Many people claim that Bitcoin can be used -anonymously. This claim has led to the formation of -marketplaces such as Silk Road (and various successors), which -specialize in illegal goods. However, the claim that Bitcoin -is anonymous is a myth. The block chain is public, meaning -that it’s possible for anyone to see every Bitcoin transaction -ever. Although Bitcoin addresses aren't immediately associated -to real-world identities, computer scientists have done a -great deal of work figuring out how to de-anonymise -`anonymous' social networks. The block chain is a marvellous -target for these techniques. I will be extremely surprised if -the great majority of Bitcoin users are not identified with -relatively high confidence and ease in the near future.'' -\end{quote} - -\noindent The only thing I can add to this is that with the Bitcoin -blockchain we will in the future have even more pleasure hearing -confessions from reputable or not-so-reputable people, like the -infamous ``I did not inhale'' from an US -president.\footnote{\url{www.youtube.com/watch?v=Bktd_Pi4YJw}} The -whole point of the blockchain is that it public and will always be. - -There are some precautions one can take for boosting anonymity, for -example to use a new public-private key pair for every new -transaction, and to access Bitcoin only through the Tor network. But -the transactions in Bitcoins are designed such that they allow one to -combine incoming transactions. In such cases we know they must have -been made by the single person who knew the corresponding private -keys. So using different public-private keys for each transaction -might not actually make the de-anonymisation task much harder. And the -point about de-ano\-nymising `anonymous' social networks is that the -information is embedded into the structure of the transition -graph. And this cannot be erased with Bitcoins. - -One paper that has fun with spotting transactions made to Silk Road (2.0) -and also to Wikileaks is - -\begin{center} -\url{http://people.csail.mit.edu/spillai/data/papers/bitcoin-transaction-graph-analysis.pdf} -\end{center} - -\noindent -A paper that gathers some statistical data about the blockchain is - -\begin{center} -\url{https://eprint.iacr.org/2012/584.pdf} -\end{center} - -\subsubsection*{Government Meddling} - -Finally, what are the options for a typical Western government to -meddle with Bitcoins? This is of course one feature the proponents of -Bitcoins also tout: namely that there aren't any options. In my -opinion this is far too naive and far from the truth. Let us assume -some law enforcement agencies would not have been able to uncover the -baddies from Silk Road 1.0 and 2.0 (they have done so by uncovering -the Tor network, which is an incredible feat on its own). Would the -government in question have stopped? I do not think so. The next -target would have been Bitcoin. If I were the government, this is -what I would consider: - -\begin{itemize} -\item The government could compel ``mayor players'' to blacklist - Bitcoins (for example at Bitcoin exchanges, which are usually - located somewhere in the vicinity of the government's reach). This - would impinge on what is called \emph{fungibility} of Bitcoins and - make them much less attractive to baddies. Suddenly their - ``hard-earned'' Bitcoin money cannot be spent anymore. The attraction - of this option is that this blacklisting can be easily done - ``whole-sale'' and therefore be really be an attractive target for - governments \& Co. -\item The government could attempt to coerce the developer - community of the Bitcoin tools. While this might be a - bit harder, we know certain governments are ready to - take such actions (we have seen this with Lavabit, just - that the developers there refused to play ball and shut - down their complete operation). -\item The government could also put pressure on mining pools - in order to blacklist transactions from baddies. Or be a - big miner itself. Given the gigantic facilities that - are built for institutions like the NSA (pictures from - the Utah dessert) - - \begin{center} - \includegraphics[scale=0.04]{../pics/nsautah1.jpg} - \hspace{3mm} - \includegraphics[scale=0.031]{../pics/nsautah2.jpg} - \end{center} - - this would not be such a high bar to jump over. Remember it - ``only'' takes to be temporarily in control of 50\%-plus of the - mining capacity in order to undermine the trust in the - system. Given sophisticated stories like Stuxnet (where we still - do not know the precise details) maybe even such large - facilities are not really needed. What happens, for example, if - a government starts DoS attacks on existing miners? They have - complete control (unfortunately) of all mayor connectivity - providers, i.e.~ISPs. - - There are estimates that the Bitcoin mining capacity - outperforms the top 500 supercomputers in the world, - combined(!): - - \begin{center}\small - \url{http://www.forbes.com/sites/reuvencohen/2013/11/28/global-bitcoin-computing-power-now-256-times-faster-than-top-500-supercomputers-combined/} - \end{center} - - But my gut feeling is that these are too simplistic - calculations. In security (and things like Bitcoins) the - world is never just black and white. The point is once - the trust is undermined, the Bitcoin system would need - to be evolved to Bitcoins 2.0. But who says that Bitcoin - 2.0 will honour the Bitcoins from Version 1.0? - \end{itemize} - -\noindent A government would potentially not really -need to follow up with such threads. Just the rumour that it -would, could be enough to get the Bitcoin-house-of-cards to -tumble. Some governments have already such an ``impressive'' -trackrecord in this area, such a thread would be entirely -credible. Because of all this, I would not have too much hope -that Bitcoins are free from interference by governments \& Co when -it will stand in their way, despite what everybody else is -saying. To sum up, the technical details behind Bitcoins are -simply cool. But still the entire Bitcoin ecosystem is in my -humble opinion rather fragile. - - -\subsubsection*{Isn't there anything good with Bitcoins?} - -As you can see, so far my argument was that yes the Bitcoin system is -based on a lot of very cool technical ideas, but otherwise it is a big -scam. You might wonder if there is not something good (in terms of -valuable for civilisation) in the bitcoin system? I think there is -actually: diamonds are quite valuable and because of this can be -used as a form of `money'---just remember the song with the line -`diamonds are forever'. - -The problem with diamonds is that in some places where they are found, -they also fund some stupid wars. You like to set up a usable system -whereby you can check whether a diamond comes from a reputable source -(not funding any wars) or from a dodgy source. For this you have to -know that `clearing houses' for diamonds can engrave with lasers -unique numbers inside the diamonds. These engravings are invisible to -the naked eye and as far as I remember these numbers cannot be removed, -except by destroying the diamond. Even if it can be removes, diamonds -without the number cannot (hopefully) be sold. - -How do bitcoins come into the picture? The idea is called -\emph{coloured coins}, where you attach some additional information to -some `coins'. In the diamond example the bitcoin transactions are -supposed to act as a certificate where diamonds are from (reputable -sources or not). For this you have to know that you can attach a very -short custom-made message with each bitcoin transaction. So you would -record the diamond number inside the message. - -Now, you would set the system up so that a trusted entity (which -exists in the diamond world) buys with their public key bitcoins (or -smaller amounts). These trusted entities are essentially the places -that also cut the raw diamonds. The idea is whenever you buy a -diamond, you like to have also the corresponding bitcoin -transaction. If you want to sell the diamond, you make a transaction -to the new owner. The new owner will ask for this message, because -otherwise he/she cannot sell it later on. - -The advantage is that for each diamond you can trace back that the -transaction must have originated from the trusted entity. If yes, your -diamond will be sellable. If you do not have the message, the diamond -comes from a dodgy source and will (hopefully) not be sellable later -on. In this way you skew the incentives such that only legitimate -diamond are of value. The bitcoin system just helps with being able to -check whether the message originates from the trusted entity or -not....you do not have to consult anybody else and pay money for this -consultation. Or in any way reveal your identity by such a consultation -(the police might just keep a particularly close an eye on who contacts -such a clearing house). - -Since we hopefully all agree that funding stupid wars is bad, any -system that can starve funds for such wars must be good. Piggy-bagging -on the trust established by the bitcoin system on the public block chain -makes such a system realisable. - -\subsubsection*{Further Reading} - -Finally, finally, the article - -\begin{center}\small -\url{http://www.extremetech.com/extreme/155636-the-bitcoin-network-outperforms-the-top-500-supercomputers-combined} -\end{center} - -\noindent makes an interesting point: If people are willing to -solve meaningless puzzles for hard, cold cash and with this -achieve rather impressive results, what could we achieve if -the UN, say, would find the money and incentivise people to, -for example, solve protein folding -puzzles?\footnote{\url{http://en.wikipedia.org/wiki/Protein_folding}} -For this there are projects like -Folding@home.\footnote{\url{http://folding.stanford.edu}} -This might help with curing diseases such as Alzheimer or -diabetes. The same point is made in the article - -\begin{center}\small -\url{http://gizmodo.com/the-worlds-most-powerful-computer-network-is-being-was-504503726} -\end{center} - -A definitely interesting and worthy use of Bitcoins has been explored -in the thesis - -\begin{center} -\url{http://enetium.com/resources/Thesis.pdf} -\end{center} - -\noindent where the author proposes ways of publishing information -that is censor-resistant as part of the blockchain. The idea is that -if a government wants to use Bitcoins, it would also have to put up -with plain-text data that can be included in a transaction. - -Ken Shirrif in his blog at - -\begin{center}\small -\url{http://www.righto.com/2014/02/bitcoin-mining-hard-way-algorithms.html} -\end{center} - -\noindent writes that every day the electricity consumption of mining -for bitcoins is roughly 15 Mega Watts---the energy consumption of a country -like Cambodia. He writes: - -\begin{quote} - \it{}``The difficulty of mining a block is astounding. At the - current difficulty, the chance of a hash succeeding is a bit less - than one in $10^{19}$. Finding a successful hash is harder than - finding a particular grain of sand from all the grains of sand on - Earth. To find a hash every ten minutes, the Bitcoin hash rate needs - to be insanely large. Currently, the miners on the Bitcoin network - are doing about 25 million gigahashes per second. That is, every - second about 25,000,000,000,000,000 blocks gets hashed. I estimate - (very roughly) that the total hardware used for Bitcoin mining cost - tens of millions of dollars and uses as much power as the country of - Cambodia.'' -\end{quote} - -\end{document} - -bit coin - -A fistful of bitcoins -http://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf -http://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf - -Ross Anderson & Co (no dispute resolution; co-ercion) -http://www.cl.cam.ac.uk/~sjm217/papers/fc14evidence.pdf - -http://www.michaelnielsen.org/ddi/how-the-bitcoin-protocol-actually-works/ -http://www.imponderablethings.com/2013/07/how-bitcoin-works-under-hood.html - -http://randomwalker.info/bitcoin/ - -Jeffrey Robinson -Bitcon: The Naked Truth about Bitcoin - -The Bitcoin Backbone Protocol: Analysis and Applications -https://eprint.iacr.org/2014/765.pdf - -Bitcoin book -http://chimera.labs.oreilly.com/books/1234000001802/ch04.html#public_key_derivation diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho09.pdf Binary file handouts/ho09.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho09.tex --- a/handouts/ho09.tex Sat Sep 23 19:39:53 2017 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,615 +0,0 @@ -\documentclass{article} -\usepackage{../style} -\usepackage{../langs} -\usepackage{../graphics} -\usepackage{../grammar} -\usepackage{multicol} - -\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}} - -\begin{document} -\fnote{\copyright{} Christian Urban, King's College London, 2014} - -%% why are shuttle flights so good with software -%%http://www.fastcompany.com/28121/they-write-right-stuff - -\section*{Handout 9 (Static Analysis)} - -If we want to improve the safety and security of our programs, -we need a more principled approach to programming. Testing is -good, but as Edsger Dijkstra famously wrote: - -\begin{quote}\it -``Program testing can be a very effective way to show the -\underline{\smash{presence}} of bugs, but it is hopelessly -inadequate for showing their \underline{\smash{absence}}.'' -\end{quote} - -\noindent While such a more principled approach has been the -subject of intense study for a long, long time, only in the -past few years some impressive results have been achieved. One -is the complete formalisation and (mathematical) verification -of a microkernel operating system called seL4. - -\begin{center} -\url{http://sel4.systems} -\end{center} - -\noindent In 2011 this work was included in the MIT Technology -Review in the annual list of the world’s ten most important -emerging -technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}} -While this work is impressive, its technical details are too -enormous for an explanation here. Therefore let us look at -something much simpler, namely finding out properties about -programs using \emph{static analysis}. - -Static analysis is a technique that checks properties of a -program without actually running the program. This should -raise alarm bells with you---because almost all interesting -properties about programs are equivalent to the halting -problem, which we know is undecidable. For example estimating -the memory consumption of programs is in general undecidable, -just like the halting problem. Static analysis circumvents -this undecidability-problem by essentially allowing answers -\emph{yes} and \emph{no}, but also \emph{don't know}. With -this ``trick'' even the halting problem becomes -decidable\ldots{}for example we could always say \emph{don't -know}. Of course this would be silly. The point is that we -should be striving for a method that answers as often as -possible either \emph{yes} or \emph{no}---just in cases when -it is too difficult we fall back on the -\emph{don't-know}-answer. This might sound all like abstract -nonsense. Therefore let us look at a concrete example. - - -\subsubsection*{A Simple, Idealised Programming Language} - -Our starting point is a small, idealised programming language. -It is idealised because we cut several corners in comparison -with real programming languages. The language we will study -contains, amongst other things, variables holding integers. -Using static analysis, we want to find out what the sign of -these integers (positive or negative) will be when the program -runs. This sign-analysis seems like a very simple problem. But -even such simple problems, if approached naively, are in -general undecidable, just like Turing's halting problem. I let -you think why? - - -Is sign-analysis of variables an interesting problem? Well, -yes---if a compiler can find out that for example a variable -will never be negative and this variable is used as an index -for an array, then the compiler does not need to generate code -for an underflow-check. Remember some languages are immune to -buffer-overflow attacks, but they need to add underflow and -overflow checks everywhere. According to John Regher, an -expert in the field of compilers, overflow checks can cause -5-10\% slowdown, in some languages even 100\% for tight -loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If -the compiler can omit the underflow check, for example, then -this can potentially drastically speed up the generated code. - -What do programs in our simple programming language look like? -The following grammar gives a first specification: - -\begin{multicols}{2} -\begin{plstx}[rhs style=,one per line,left margin=9mm] -: \meta{Stmt} ::= \meta{label} \texttt{:} - | \meta{var} \texttt{:=} \meta{Exp} - | \texttt{jmp?} \meta{Exp} \meta{label} - | \texttt{goto} \meta{label}\\ -: \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\ -\end{plstx} -\columnbreak -\begin{plstx}[rhs style=,one per line] : \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp} - | \meta{Exp} \texttt{*} \meta{Exp} - | \meta{Exp} \texttt{=} \meta{Exp} - | \meta{num} - | \meta{var}\\ -\end{plstx} -\end{multicols} - -\noindent I assume you are familiar with such -grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}} -There are three main syntactic categories: \emph{statments} -and \emph{expressions} as well as \emph{programs}, which are -sequences of statements. Statements are either labels, -variable assignments, conditional jumps (\pcode{jmp?}) and -unconditional jumps (\pcode{goto}). Labels are just strings, -which can be used as the target of a jump. We assume that in -every program the labels are unique---if there is a clash, -then we do not know where to jump to. The conditional jumps -and variable assignments involve (arithmetic) expressions. -Expressions are either numbers, variables or compound -expressions built up from \pcode{+}, \pcode{*} and \emph{=} -(for simplicity reasons we do not consider any other -operations). We assume we have negative and positive numbers, -\ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, -\pcode{2}\ldots{} An example program that calculates the -factorial of 5 is in our programming language as follows: - -\begin{lstlisting}[language={},xleftmargin=10mm] - a := 1 - n := 5 -top: - jmp? n = 0 done - a := a * n - n := n + -1 - goto top -done: -\end{lstlisting} - -\noindent As can be seen each line of the program contains a -statement. In the first two lines we assign values to the -variables \pcode{a} and \pcode{n}. In line 4 we test whether -\pcode{n} is zero, in which case we jump to the end of the -program marked with the label \pcode{done}. If \pcode{n} is -not zero, we multiply the content of \pcode{a} by \pcode{n}, -decrease \pcode{n} by one and jump back to the beginning of -the loop, marked with the label \pcode{top}. Another program -in our language is shown in Figure~\ref{fib}. I let you think -what it calculates. - -\begin{figure}[t] -\begin{lstlisting}[numbers=none, - language={},xleftmargin=10mm] - n := 6 - m1 := 0 - m2 := 1 -loop: - jmp? n = 0 done - tmp := m2 - m2 := m1 + m2 - m1 := tmp - n := n + -1 - goto top -done: -\end{lstlisting} -\caption{A mystery program in our idealised programming language. -Try to find out what it calculates! \label{fib}} -\end{figure} - -Even if our language is rather small, it is still Turing -complete---meaning quite powerful. However, discussing this -fact in more detail would lead us too far astray. Clearly, our -programming is rather low-level and not very comfortable for -writing programs. It is inspired by real machine code, which -is the code that is executed by a CPU. So a more interesting -question is what is missing in comparison with real machine -code? Well, not much\ldots{}in principle. Real machine code, -of course, contains many more arithmetic instructions (not -just addition and multiplication) and many more conditional -jumps. We could add these to our language if we wanted, but -complexity is really beside the point here. Furthermore, real -machine code has many instructions for manipulating memory. We -do not have this at all. This is actually a more serious -simplification because we assume numbers to be arbitrary small -or large, which is not the case with real machine code. In -real machine code, basic number formats have a range and might -over-flow or under-flow this range. Also the number of -variables in our programs is potentially unlimited, while -memory in an actual computer, of course, is always limited. To -sum up, our language might look ridiculously simple, but it is -not too far removed from practically relevant issues. - - -\subsubsection*{An Interpreter} - -Designing a language is like playing god: you can say what -names for variables you allow; what programs should look like; -most importantly you can decide what each part of the program -should mean and do. While our language is quite simple and the -meaning of statements, for example, is rather straightforward, -there are still places where we need to make real choices. For -example consider the conditional jumps, say the one in the -factorial program: - -\begin{center} -\code{jmp? n = 0 done} -\end{center} - -\noindent How should they work? We could introduce Booleans -(\pcode{true} and \pcode{false}) and then jump only when the -condition is \pcode{true}. However, since we have numbers in -our language anyway, why not just encoding \pcode{true} as -one, and \pcode{false} as zero? In this way we can dispense -with the additional concept of Booleans. - -I hope the above discussion makes it already clear we need to -be a bit more careful with our programs. Below we shall -describe an interpreter for our programming language, which -specifies exactly how programs are supposed to be -run\ldots{}at least we will specify this for all \emph{good} -programs. By good programs I mean where all variables are -initialised, for example. Our interpreter will just crash if -it cannot find out the value for a variable when it is not -initialised. Also, we will assume that labels in good programs -are unique, otherwise our programs will calculate ``garbage''. - -First we will pre-process our programs. This will simplify the -definition of our interpreter later on. By pre-processing our -programs we will transform programs into \emph{snippets}. A -snippet is a label and all the code that comes after the -label. This essentially means a snippet is a \emph{map} from -labels to code.\footnote{Be sure you know what maps are. In a -programming context they are often represented as association -list where some data is associated with a key.} - -Given that programs are sequences (or lists) of statements, we -can easily calculate the snippets by just traversing this -sequence and recursively generating the map. Suppose a program -is of the general form - -\begin{center} -$stmt_1\;stmt_2\; \ldots\; stmt_n$ -\end{center} - -\noindent The idea is to go through this sequence of -statements one by one and check whether they are a label. If -yes, we add the label and the remaining statements to our map. -If no, we just continue with the next statement. To come up -with a recursive definition for generating snippets, let us -write $[]$ for the program that does not contain any -statement. Consider the following definition: - -\begin{center} -\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} -$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\ -$\textit{snippets}(stmt\;\; rest)$ & $\dn$ & -$\begin{cases} - \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\ - \textit{snippets}(rest) & \text{otherwise} -\end{cases}$ -\end{tabular} -\end{center} - -\noindent In the first clause we just return the empty map for -the program that does not contain any statement. In the second -clause, we have to distinguish the case where the first -statement is a label or not. As said before, if not, then we -just ``throw away'' the label and recursively calculate the -snippets for the rest of the program (the otherwise clause). -If yes, then we do the same, but also update the map so that -$label$ now points to the rest of the statements (the if -clause). This looks all realtively straightforward, but there -is one small problem we need to overcome: our two programs -shown so far have no label as \emph{entry point}---that is -where the execution is supposed to start. We usually assume -that the first statement will be run first. To make this the -default, it is convenient if we add to all our programs a -default label, say \pcode{""} (the empty string). With this we -can define our pre-processing of programs as follows - -\begin{center} -$\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ -\end{center} - -\noindent Let us see how this pans out in practice. If we -pre-process the factorial program shown earlier, we obtain the -following map: - -\begin{center} -\small -\lstset{numbers=none, - language={}, - xleftmargin=0mm, - aboveskip=0.5mm, - belowskip=0.5mm, - frame=single, - framerule=0mm, - framesep=0mm} -\begin{tikzpicture}[node distance=0mm] -\node (A1) [draw]{\pcode{""}}; -\node (B1) [right=of A1] {$\mapsto$}; -\node (C1) [right=of B1,anchor=north west] { -\begin{minipage}{3.5cm} -\begin{lstlisting}[language={},xleftmargin=0mm] - a := 1 - n := 5 -top: - jmp? n = 0 done - a := a * n - n := n + -1 - goto top -done: -\end{lstlisting} -\end{minipage}}; -\node (A2) [right=of C1.north east,draw] {\pcode{top}}; -\node (B2) [right=of A2] {$\mapsto$}; -\node (C2) [right=of B2, anchor=north west] { -\begin{minipage}{3.5cm} -\begin{lstlisting}[language={},xleftmargin=0mm] - jmp? n = 0 done - a := a * n - n := n + -1 - goto top -done: -\end{lstlisting} -\end{minipage}}; -\node (A3) [right=of C2.north east,draw] {\pcode{done}}; -\node (B3) [right=of A3] {$\mapsto$}; -\node (C3) [right=of B3] {$[]$}; -\end{tikzpicture} -\end{center} - -\noindent I highlighted the \emph{keys} in this map. Since -there are three labels in the factorial program (remember we -added \pcode{""}), there are three keys. When running the -factorial program and encountering a jump, then we only have -to consult this snippets-map in order to find out what the -next statements should be. - -We should now be in the position to define how a program -should be run. In the context of interpreters, this -``running'' of programs is often called \emph{evaluation}. Let -us start with the definition of how expressions are evaluated. -A first attempt might be the following recursive function: - -\begin{center} -\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} -$\textit{eval\_exp}(\texttt{n})$ & $\dn$ & $n$ - \qquad\text{if}\; \texttt{n}\; \text{is a number like} - \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, -\pcode{2}\ldots{}\\ -$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, - \texttt{e}_\texttt{2})$ & - $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) + - \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\ -$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, - \texttt{e}_\texttt{2})$ & - $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) * - \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\ -$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, - \texttt{e}_\texttt{2})$ & - $\dn$ & $\begin{cases} - 1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}) = - \textit{eval\_exp}(\texttt{e}_\texttt{2})\\ - 0 & \text{otherwise} - \end{cases}$ -\end{tabular} -\end{center} - -\noindent While this should look all rather intuitive`, still -be very careful. There is a subtlety which can be easily -overlooked: The function \textit{eval\_exp} takes an -expression of our programming language as input and returns a -number as output. Therefore whenever we encounter a number in -our program, we just return this number---this is defined in -the first clause above. Whenever we encounter an addition, -well then we first evaluate the left-hand side -$\texttt{e}_\texttt{1}$ of the addition (this will give a -number), then evaluate the right-hand side -$\texttt{e}_\texttt{2}$ (this gives another number), and -finally add both numbers together. Here is the subtlety: on -the left-hand side of the $\dn$ we have a \texttt{+} (in the -teletype font) which is the symbol for addition in our -programming language. On the right-hand side we have $+$ which -stands for the arithmetic operation from ``mathematics'' of -adding two numbers. These are rather different concepts---one -is a symbol (which we made up), and the other a mathematical -operation. When we will have a look at an actual -implementation of our interpreter, the mathematical operation -will be the function for addition from the programming -language in which we \underline{\smash{implement}} our -interpreter. While the \texttt{+} is just a symbol that is -used in our programming language. Clearly we have to use a -symbol that is a good mnemonic for addition, otherwise we will -confuse the programmers working with our language. Therefore -we use $\texttt{+}$. A similar choice is made for times in the -third clause and equality in the fourth clause. - -Remember I wrote at the beginning of this section about being -god when designing a programming language. You can see this -here: we need to give meaning to symbols. At the moment -however, we are a poor fallible god. Look again at the grammar -of our programming language and our definition. Clearly, an -expression can contain variables. So far we have ignored them. -What should our interpreter do with variables? They might -change during the evaluation of a program. For example the -variable \pcode{n} in the factorial program counts down from 5 -down to 0. How can we improve our definition above to give also -an answer whenever our interpreter encounters a variable in an -expression? The solution is to add an \emph{environment}, -written $env$, as an additional input argument to our -\textit{eval\_exp} function. - -\begin{center} -\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} -$\textit{eval\_exp}(\texttt{n}, env)$ & $\dn$ & $n$ - \qquad\text{if}\; \texttt{n}\; \text{is a number like} - \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, -\pcode{2}\ldots{}\\ -$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, - \texttt{e}_\texttt{2}, env)$ & - $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) + - \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\ -$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, - \texttt{e}_\texttt{2}, env)$ & - $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) * - \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\ -$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, - \texttt{e}_\texttt{2}, env)$ & - $\dn$ & $\begin{cases} - 1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) = - \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)\\ - 0 & \text{otherwise} - \end{cases}$\\ -$\textit{eval\_exp}(\texttt{x}, env)$ & $\dn$ & $env(x)$ -\end{tabular} -\end{center} - -\noindent This environment $env$ also acts like a map: it -associates variables with their current values. For example -after evaluating the first two lines in our factorial -program, such an environment might look as follows - -\begin{center} -\begin{tabular}{ll} -$\fbox{\texttt{a}} \mapsto \texttt{1}$ & -$\fbox{\texttt{n}} \mapsto \texttt{5}$ -\end{tabular} -\end{center} - -\noindent Again I highlighted the keys. In the clause for -variables, we can therefore consult this environment and -return whatever value is currently stored for this variable. -This is written $env(x)$---meaning we query this map with $x$ -we obtain the corresponding number. You might ask what happens -if an environment does not contain any value for, say, the -variable $x$? Well, then our interpreter just ``crashes'', or -more precisely will raise an exception. In this case we have a -``bad'' program that tried to use a variable before it was -initialised. The programmer should not have done this. In a -real programming language, we would of course try a bit harder -and for example give an error at compile time, or design our -language in such a way that this can never happen. With the -second version of \textit{eval\_exp} we completed our -definition for evaluating expressions. - -Next comes the evaluation function for statements. We define -this function in such a way that we recursively evaluate a -whole sequence of statements. Assume a program $p$ (you want -to evaluate) and its pre-processed snippets $sn$. Then we can -define: - -\begin{center} -\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}} -$\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\ -$\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ & - $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\ -$\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ & - $\dn$ & $\textit{eval\_stmts}(rest, - env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\ -$\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$ - & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$\\ -$\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$ - & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}} - \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\ - & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\ - \textit{eval\_stmts}(rest, env) & \text{otherwise}\\ - \end{array}\end{cases}$ -\end{tabular} -\end{center} - -\noindent The first clause is for the empty program, or when -we arrived at the end of the program. In this case we just -return the environment. The second clause is for when the next -statement is a label. That means the program is of the form -$\texttt{label:}\;rest$ where the label is some string and -$rest$ stands for all following statements. This case is easy, -because our evaluation function just discards the label and -evaluates the rest of the statements (we already extracted all -important information about labels when we pre-processed our -programs and generated the snippets). The third clause is for -variable assignments. Again we just evaluate the rest for the -statements, but with a modified environment---since the -variable assignment is supposed to introduce a new variable or -change the current value of a variable. For this modification -of the environment we first evaluate the expression -$\texttt{e}$ using our evaluation function for expressions. -This gives us a number. Then we assign this number to the -variable $x$ in the environment. This modified environment -will be used to evaluate the rest of the program. The fourth -clause is for the unconditional jump to a label, called -\texttt{lbl}. That means we have to look up in our snippets -map $sn$ what are the next statements for this label. -Therefore we will continue with evaluating, not with the rest -of the program, but with the statements stored in the -snippets-map under the label $\texttt{lbl}$. The fifth clause -for conditional jumps is similar, but to decide whether to -make the jump we first need to evaluate the expression -$\texttt{e}$ in order to find out whether it is $1$. If yes, -we jump, otherwise we just continue with evaluating the rest -of the program. - -Our interpreter works in two stages: First we pre-process our -program generating the snippets map $sn$, say. Second we call -the evaluation function with the default entry point and the -empty environment: - -\begin{center} -$\textit{eval\_stmts}(sn(\pcode{""}), \varnothing)$ -\end{center} - -\noindent It is interesting to note that our interpreter when -it comes to the end of the program returns an environment. Our -programming language does not contain any constructs for input -and output. Therefore this environment is the only effect we -can observe when running the program (apart from that our -interpreter might need some time before finishing the -evaluation of the program and the CPU getting hot). Evaluating -the factorial program with our interpreter we receive as -``answer''-environment - -\begin{center} -\begin{tabular}{ll} -$\fbox{\texttt{a}} \mapsto \texttt{120}$ & -$\fbox{\texttt{n}} \mapsto \texttt{0}$ -\end{tabular} -\end{center} - -\noindent While the discussion above should have illustrated -the ideas, in order to do some serious calculations, we clearly -need to implement the interpreter. - - -\subsubsection*{Scala Code for the Interpreter} - -Functional programming languages are very convenient for -implementations of interpreters. A good choice for a -functional programming language is Scala, a programming -language that combines functional and object-oriented -pro\-gramming-styles. It has received in the last five years or -so quite a bit of attention. One reason for this attention is -that, like the Java programming language, Scala compiles to -the Java Virtual Machine (JVM) and therefore Scala programs -can run under MacOSX, Linux and Windows.\footnote{There are -also experimental backends for Android and JavaScript.} Unlike -Java, however, Scala often allows programmers to write very -concise and elegant code. Some therefore say Scala is the much -better Java. A number of companies, The Guardian, Twitter, -Coursera, FourSquare, LinkedIn to name a few, either use Scala -exclusively in production code, or at least to some -substantial degree. If you want to try out Scala yourself, the -Scala compiler can be downloaded from - -\begin{quote} -\url{http://www.scala-lang.org} -\end{quote} - -Let us have a look at the Scala code shown in -Figure~\ref{code}. It shows the entire code for the -interpreter, though the implementation is admittedly no -frills. - -\begin{figure}[t] -\small -\lstinputlisting[language=Scala]{../progs/inter.scala} -\caption{The entire code of the interpreter for our -idealised programming language.\label{code}} -\end{figure} - - -\subsubsection*{Static Analysis} - -Finally we can come back to our original problem, namely -finding out what the signs of variables are - -\begin{center} - - -\end{center} - -\end{document} - -%% list of static analysers for C -http://spinroot.com/static/index.html - -%% NASA coding rules for C -http://pixelscommander.com/wp-content/uploads/2014/12/P10.pdf - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: t -%%% End: diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho98-sa.pdf Binary file handouts/ho98-sa.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho98-sa.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/handouts/ho98-sa.tex Sat Sep 23 19:52:27 2017 +0100 @@ -0,0 +1,615 @@ +\documentclass{article} +\usepackage{../style} +\usepackage{../langs} +\usepackage{../graphics} +\usepackage{../grammar} +\usepackage{multicol} + +\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}} + +\begin{document} +\fnote{\copyright{} Christian Urban, King's College London, 2014} + +%% why are shuttle flights so good with software +%%http://www.fastcompany.com/28121/they-write-right-stuff + +\section*{Handout 9 (Static Analysis)} + +If we want to improve the safety and security of our programs, +we need a more principled approach to programming. Testing is +good, but as Edsger Dijkstra famously wrote: + +\begin{quote}\it +``Program testing can be a very effective way to show the +\underline{\smash{presence}} of bugs, but it is hopelessly +inadequate for showing their \underline{\smash{absence}}.'' +\end{quote} + +\noindent While such a more principled approach has been the +subject of intense study for a long, long time, only in the +past few years some impressive results have been achieved. One +is the complete formalisation and (mathematical) verification +of a microkernel operating system called seL4. + +\begin{center} +\url{http://sel4.systems} +\end{center} + +\noindent In 2011 this work was included in the MIT Technology +Review in the annual list of the world’s ten most important +emerging +technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}} +While this work is impressive, its technical details are too +enormous for an explanation here. Therefore let us look at +something much simpler, namely finding out properties about +programs using \emph{static analysis}. + +Static analysis is a technique that checks properties of a +program without actually running the program. This should +raise alarm bells with you---because almost all interesting +properties about programs are equivalent to the halting +problem, which we know is undecidable. For example estimating +the memory consumption of programs is in general undecidable, +just like the halting problem. Static analysis circumvents +this undecidability-problem by essentially allowing answers +\emph{yes} and \emph{no}, but also \emph{don't know}. With +this ``trick'' even the halting problem becomes +decidable\ldots{}for example we could always say \emph{don't +know}. Of course this would be silly. The point is that we +should be striving for a method that answers as often as +possible either \emph{yes} or \emph{no}---just in cases when +it is too difficult we fall back on the +\emph{don't-know}-answer. This might sound all like abstract +nonsense. Therefore let us look at a concrete example. + + +\subsubsection*{A Simple, Idealised Programming Language} + +Our starting point is a small, idealised programming language. +It is idealised because we cut several corners in comparison +with real programming languages. The language we will study +contains, amongst other things, variables holding integers. +Using static analysis, we want to find out what the sign of +these integers (positive or negative) will be when the program +runs. This sign-analysis seems like a very simple problem. But +even such simple problems, if approached naively, are in +general undecidable, just like Turing's halting problem. I let +you think why? + + +Is sign-analysis of variables an interesting problem? Well, +yes---if a compiler can find out that for example a variable +will never be negative and this variable is used as an index +for an array, then the compiler does not need to generate code +for an underflow-check. Remember some languages are immune to +buffer-overflow attacks, but they need to add underflow and +overflow checks everywhere. According to John Regher, an +expert in the field of compilers, overflow checks can cause +5-10\% slowdown, in some languages even 100\% for tight +loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If +the compiler can omit the underflow check, for example, then +this can potentially drastically speed up the generated code. + +What do programs in our simple programming language look like? +The following grammar gives a first specification: + +\begin{multicols}{2} +\begin{plstx}[rhs style=,one per line,left margin=9mm] +: \meta{Stmt} ::= \meta{label} \texttt{:} + | \meta{var} \texttt{:=} \meta{Exp} + | \texttt{jmp?} \meta{Exp} \meta{label} + | \texttt{goto} \meta{label}\\ +: \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\ +\end{plstx} +\columnbreak +\begin{plstx}[rhs style=,one per line] : \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp} + | \meta{Exp} \texttt{*} \meta{Exp} + | \meta{Exp} \texttt{=} \meta{Exp} + | \meta{num} + | \meta{var}\\ +\end{plstx} +\end{multicols} + +\noindent I assume you are familiar with such +grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}} +There are three main syntactic categories: \emph{statments} +and \emph{expressions} as well as \emph{programs}, which are +sequences of statements. Statements are either labels, +variable assignments, conditional jumps (\pcode{jmp?}) and +unconditional jumps (\pcode{goto}). Labels are just strings, +which can be used as the target of a jump. We assume that in +every program the labels are unique---if there is a clash, +then we do not know where to jump to. The conditional jumps +and variable assignments involve (arithmetic) expressions. +Expressions are either numbers, variables or compound +expressions built up from \pcode{+}, \pcode{*} and \emph{=} +(for simplicity reasons we do not consider any other +operations). We assume we have negative and positive numbers, +\ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, +\pcode{2}\ldots{} An example program that calculates the +factorial of 5 is in our programming language as follows: + +\begin{lstlisting}[language={},xleftmargin=10mm] + a := 1 + n := 5 +top: + jmp? n = 0 done + a := a * n + n := n + -1 + goto top +done: +\end{lstlisting} + +\noindent As can be seen each line of the program contains a +statement. In the first two lines we assign values to the +variables \pcode{a} and \pcode{n}. In line 4 we test whether +\pcode{n} is zero, in which case we jump to the end of the +program marked with the label \pcode{done}. If \pcode{n} is +not zero, we multiply the content of \pcode{a} by \pcode{n}, +decrease \pcode{n} by one and jump back to the beginning of +the loop, marked with the label \pcode{top}. Another program +in our language is shown in Figure~\ref{fib}. I let you think +what it calculates. + +\begin{figure}[t] +\begin{lstlisting}[numbers=none, + language={},xleftmargin=10mm] + n := 6 + m1 := 0 + m2 := 1 +loop: + jmp? n = 0 done + tmp := m2 + m2 := m1 + m2 + m1 := tmp + n := n + -1 + goto top +done: +\end{lstlisting} +\caption{A mystery program in our idealised programming language. +Try to find out what it calculates! \label{fib}} +\end{figure} + +Even if our language is rather small, it is still Turing +complete---meaning quite powerful. However, discussing this +fact in more detail would lead us too far astray. Clearly, our +programming is rather low-level and not very comfortable for +writing programs. It is inspired by real machine code, which +is the code that is executed by a CPU. So a more interesting +question is what is missing in comparison with real machine +code? Well, not much\ldots{}in principle. Real machine code, +of course, contains many more arithmetic instructions (not +just addition and multiplication) and many more conditional +jumps. We could add these to our language if we wanted, but +complexity is really beside the point here. Furthermore, real +machine code has many instructions for manipulating memory. We +do not have this at all. This is actually a more serious +simplification because we assume numbers to be arbitrary small +or large, which is not the case with real machine code. In +real machine code, basic number formats have a range and might +over-flow or under-flow this range. Also the number of +variables in our programs is potentially unlimited, while +memory in an actual computer, of course, is always limited. To +sum up, our language might look ridiculously simple, but it is +not too far removed from practically relevant issues. + + +\subsubsection*{An Interpreter} + +Designing a language is like playing god: you can say what +names for variables you allow; what programs should look like; +most importantly you can decide what each part of the program +should mean and do. While our language is quite simple and the +meaning of statements, for example, is rather straightforward, +there are still places where we need to make real choices. For +example consider the conditional jumps, say the one in the +factorial program: + +\begin{center} +\code{jmp? n = 0 done} +\end{center} + +\noindent How should they work? We could introduce Booleans +(\pcode{true} and \pcode{false}) and then jump only when the +condition is \pcode{true}. However, since we have numbers in +our language anyway, why not just encoding \pcode{true} as +one, and \pcode{false} as zero? In this way we can dispense +with the additional concept of Booleans. + +I hope the above discussion makes it already clear we need to +be a bit more careful with our programs. Below we shall +describe an interpreter for our programming language, which +specifies exactly how programs are supposed to be +run\ldots{}at least we will specify this for all \emph{good} +programs. By good programs I mean where all variables are +initialised, for example. Our interpreter will just crash if +it cannot find out the value for a variable when it is not +initialised. Also, we will assume that labels in good programs +are unique, otherwise our programs will calculate ``garbage''. + +First we will pre-process our programs. This will simplify the +definition of our interpreter later on. By pre-processing our +programs we will transform programs into \emph{snippets}. A +snippet is a label and all the code that comes after the +label. This essentially means a snippet is a \emph{map} from +labels to code.\footnote{Be sure you know what maps are. In a +programming context they are often represented as association +list where some data is associated with a key.} + +Given that programs are sequences (or lists) of statements, we +can easily calculate the snippets by just traversing this +sequence and recursively generating the map. Suppose a program +is of the general form + +\begin{center} +$stmt_1\;stmt_2\; \ldots\; stmt_n$ +\end{center} + +\noindent The idea is to go through this sequence of +statements one by one and check whether they are a label. If +yes, we add the label and the remaining statements to our map. +If no, we just continue with the next statement. To come up +with a recursive definition for generating snippets, let us +write $[]$ for the program that does not contain any +statement. Consider the following definition: + +\begin{center} +\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} +$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\ +$\textit{snippets}(stmt\;\; rest)$ & $\dn$ & +$\begin{cases} + \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\ + \textit{snippets}(rest) & \text{otherwise} +\end{cases}$ +\end{tabular} +\end{center} + +\noindent In the first clause we just return the empty map for +the program that does not contain any statement. In the second +clause, we have to distinguish the case where the first +statement is a label or not. As said before, if not, then we +just ``throw away'' the label and recursively calculate the +snippets for the rest of the program (the otherwise clause). +If yes, then we do the same, but also update the map so that +$label$ now points to the rest of the statements (the if +clause). This looks all realtively straightforward, but there +is one small problem we need to overcome: our two programs +shown so far have no label as \emph{entry point}---that is +where the execution is supposed to start. We usually assume +that the first statement will be run first. To make this the +default, it is convenient if we add to all our programs a +default label, say \pcode{""} (the empty string). With this we +can define our pre-processing of programs as follows + +\begin{center} +$\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ +\end{center} + +\noindent Let us see how this pans out in practice. If we +pre-process the factorial program shown earlier, we obtain the +following map: + +\begin{center} +\small +\lstset{numbers=none, + language={}, + xleftmargin=0mm, + aboveskip=0.5mm, + belowskip=0.5mm, + frame=single, + framerule=0mm, + framesep=0mm} +\begin{tikzpicture}[node distance=0mm] +\node (A1) [draw]{\pcode{""}}; +\node (B1) [right=of A1] {$\mapsto$}; +\node (C1) [right=of B1,anchor=north west] { +\begin{minipage}{3.5cm} +\begin{lstlisting}[language={},xleftmargin=0mm] + a := 1 + n := 5 +top: + jmp? n = 0 done + a := a * n + n := n + -1 + goto top +done: +\end{lstlisting} +\end{minipage}}; +\node (A2) [right=of C1.north east,draw] {\pcode{top}}; +\node (B2) [right=of A2] {$\mapsto$}; +\node (C2) [right=of B2, anchor=north west] { +\begin{minipage}{3.5cm} +\begin{lstlisting}[language={},xleftmargin=0mm] + jmp? n = 0 done + a := a * n + n := n + -1 + goto top +done: +\end{lstlisting} +\end{minipage}}; +\node (A3) [right=of C2.north east,draw] {\pcode{done}}; +\node (B3) [right=of A3] {$\mapsto$}; +\node (C3) [right=of B3] {$[]$}; +\end{tikzpicture} +\end{center} + +\noindent I highlighted the \emph{keys} in this map. Since +there are three labels in the factorial program (remember we +added \pcode{""}), there are three keys. When running the +factorial program and encountering a jump, then we only have +to consult this snippets-map in order to find out what the +next statements should be. + +We should now be in the position to define how a program +should be run. In the context of interpreters, this +``running'' of programs is often called \emph{evaluation}. Let +us start with the definition of how expressions are evaluated. +A first attempt might be the following recursive function: + +\begin{center} +\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} +$\textit{eval\_exp}(\texttt{n})$ & $\dn$ & $n$ + \qquad\text{if}\; \texttt{n}\; \text{is a number like} + \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, +\pcode{2}\ldots{}\\ +$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, + \texttt{e}_\texttt{2})$ & + $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) + + \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\ +$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, + \texttt{e}_\texttt{2})$ & + $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) * + \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\ +$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, + \texttt{e}_\texttt{2})$ & + $\dn$ & $\begin{cases} + 1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}) = + \textit{eval\_exp}(\texttt{e}_\texttt{2})\\ + 0 & \text{otherwise} + \end{cases}$ +\end{tabular} +\end{center} + +\noindent While this should look all rather intuitive`, still +be very careful. There is a subtlety which can be easily +overlooked: The function \textit{eval\_exp} takes an +expression of our programming language as input and returns a +number as output. Therefore whenever we encounter a number in +our program, we just return this number---this is defined in +the first clause above. Whenever we encounter an addition, +well then we first evaluate the left-hand side +$\texttt{e}_\texttt{1}$ of the addition (this will give a +number), then evaluate the right-hand side +$\texttt{e}_\texttt{2}$ (this gives another number), and +finally add both numbers together. Here is the subtlety: on +the left-hand side of the $\dn$ we have a \texttt{+} (in the +teletype font) which is the symbol for addition in our +programming language. On the right-hand side we have $+$ which +stands for the arithmetic operation from ``mathematics'' of +adding two numbers. These are rather different concepts---one +is a symbol (which we made up), and the other a mathematical +operation. When we will have a look at an actual +implementation of our interpreter, the mathematical operation +will be the function for addition from the programming +language in which we \underline{\smash{implement}} our +interpreter. While the \texttt{+} is just a symbol that is +used in our programming language. Clearly we have to use a +symbol that is a good mnemonic for addition, otherwise we will +confuse the programmers working with our language. Therefore +we use $\texttt{+}$. A similar choice is made for times in the +third clause and equality in the fourth clause. + +Remember I wrote at the beginning of this section about being +god when designing a programming language. You can see this +here: we need to give meaning to symbols. At the moment +however, we are a poor fallible god. Look again at the grammar +of our programming language and our definition. Clearly, an +expression can contain variables. So far we have ignored them. +What should our interpreter do with variables? They might +change during the evaluation of a program. For example the +variable \pcode{n} in the factorial program counts down from 5 +down to 0. How can we improve our definition above to give also +an answer whenever our interpreter encounters a variable in an +expression? The solution is to add an \emph{environment}, +written $env$, as an additional input argument to our +\textit{eval\_exp} function. + +\begin{center} +\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} +$\textit{eval\_exp}(\texttt{n}, env)$ & $\dn$ & $n$ + \qquad\text{if}\; \texttt{n}\; \text{is a number like} + \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, +\pcode{2}\ldots{}\\ +$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, + \texttt{e}_\texttt{2}, env)$ & + $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) + + \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\ +$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, + \texttt{e}_\texttt{2}, env)$ & + $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) * + \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\ +$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, + \texttt{e}_\texttt{2}, env)$ & + $\dn$ & $\begin{cases} + 1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) = + \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)\\ + 0 & \text{otherwise} + \end{cases}$\\ +$\textit{eval\_exp}(\texttt{x}, env)$ & $\dn$ & $env(x)$ +\end{tabular} +\end{center} + +\noindent This environment $env$ also acts like a map: it +associates variables with their current values. For example +after evaluating the first two lines in our factorial +program, such an environment might look as follows + +\begin{center} +\begin{tabular}{ll} +$\fbox{\texttt{a}} \mapsto \texttt{1}$ & +$\fbox{\texttt{n}} \mapsto \texttt{5}$ +\end{tabular} +\end{center} + +\noindent Again I highlighted the keys. In the clause for +variables, we can therefore consult this environment and +return whatever value is currently stored for this variable. +This is written $env(x)$---meaning we query this map with $x$ +we obtain the corresponding number. You might ask what happens +if an environment does not contain any value for, say, the +variable $x$? Well, then our interpreter just ``crashes'', or +more precisely will raise an exception. In this case we have a +``bad'' program that tried to use a variable before it was +initialised. The programmer should not have done this. In a +real programming language, we would of course try a bit harder +and for example give an error at compile time, or design our +language in such a way that this can never happen. With the +second version of \textit{eval\_exp} we completed our +definition for evaluating expressions. + +Next comes the evaluation function for statements. We define +this function in such a way that we recursively evaluate a +whole sequence of statements. Assume a program $p$ (you want +to evaluate) and its pre-processed snippets $sn$. Then we can +define: + +\begin{center} +\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}} +$\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\ +$\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ & + $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\ +$\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ & + $\dn$ & $\textit{eval\_stmts}(rest, + env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\ +$\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$ + & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$\\ +$\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$ + & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}} + \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\ + & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\ + \textit{eval\_stmts}(rest, env) & \text{otherwise}\\ + \end{array}\end{cases}$ +\end{tabular} +\end{center} + +\noindent The first clause is for the empty program, or when +we arrived at the end of the program. In this case we just +return the environment. The second clause is for when the next +statement is a label. That means the program is of the form +$\texttt{label:}\;rest$ where the label is some string and +$rest$ stands for all following statements. This case is easy, +because our evaluation function just discards the label and +evaluates the rest of the statements (we already extracted all +important information about labels when we pre-processed our +programs and generated the snippets). The third clause is for +variable assignments. Again we just evaluate the rest for the +statements, but with a modified environment---since the +variable assignment is supposed to introduce a new variable or +change the current value of a variable. For this modification +of the environment we first evaluate the expression +$\texttt{e}$ using our evaluation function for expressions. +This gives us a number. Then we assign this number to the +variable $x$ in the environment. This modified environment +will be used to evaluate the rest of the program. The fourth +clause is for the unconditional jump to a label, called +\texttt{lbl}. That means we have to look up in our snippets +map $sn$ what are the next statements for this label. +Therefore we will continue with evaluating, not with the rest +of the program, but with the statements stored in the +snippets-map under the label $\texttt{lbl}$. The fifth clause +for conditional jumps is similar, but to decide whether to +make the jump we first need to evaluate the expression +$\texttt{e}$ in order to find out whether it is $1$. If yes, +we jump, otherwise we just continue with evaluating the rest +of the program. + +Our interpreter works in two stages: First we pre-process our +program generating the snippets map $sn$, say. Second we call +the evaluation function with the default entry point and the +empty environment: + +\begin{center} +$\textit{eval\_stmts}(sn(\pcode{""}), \varnothing)$ +\end{center} + +\noindent It is interesting to note that our interpreter when +it comes to the end of the program returns an environment. Our +programming language does not contain any constructs for input +and output. Therefore this environment is the only effect we +can observe when running the program (apart from that our +interpreter might need some time before finishing the +evaluation of the program and the CPU getting hot). Evaluating +the factorial program with our interpreter we receive as +``answer''-environment + +\begin{center} +\begin{tabular}{ll} +$\fbox{\texttt{a}} \mapsto \texttt{120}$ & +$\fbox{\texttt{n}} \mapsto \texttt{0}$ +\end{tabular} +\end{center} + +\noindent While the discussion above should have illustrated +the ideas, in order to do some serious calculations, we clearly +need to implement the interpreter. + + +\subsubsection*{Scala Code for the Interpreter} + +Functional programming languages are very convenient for +implementations of interpreters. A good choice for a +functional programming language is Scala, a programming +language that combines functional and object-oriented +pro\-gramming-styles. It has received in the last five years or +so quite a bit of attention. One reason for this attention is +that, like the Java programming language, Scala compiles to +the Java Virtual Machine (JVM) and therefore Scala programs +can run under MacOSX, Linux and Windows.\footnote{There are +also experimental backends for Android and JavaScript.} Unlike +Java, however, Scala often allows programmers to write very +concise and elegant code. Some therefore say Scala is the much +better Java. A number of companies, The Guardian, Twitter, +Coursera, FourSquare, LinkedIn to name a few, either use Scala +exclusively in production code, or at least to some +substantial degree. If you want to try out Scala yourself, the +Scala compiler can be downloaded from + +\begin{quote} +\url{http://www.scala-lang.org} +\end{quote} + +Let us have a look at the Scala code shown in +Figure~\ref{code}. It shows the entire code for the +interpreter, though the implementation is admittedly no +frills. + +\begin{figure}[t] +\small +\lstinputlisting[language=Scala]{../progs/inter.scala} +\caption{The entire code of the interpreter for our +idealised programming language.\label{code}} +\end{figure} + + +\subsubsection*{Static Analysis} + +Finally we can come back to our original problem, namely +finding out what the signs of variables are + +\begin{center} + + +\end{center} + +\end{document} + +%% list of static analysers for C +http://spinroot.com/static/index.html + +%% NASA coding rules for C +http://pixelscommander.com/wp-content/uploads/2014/12/P10.pdf + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% End: diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho99-zkp.pdf Binary file handouts/ho99-zkp.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 handouts/ho99-zkp.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/handouts/ho99-zkp.tex Sat Sep 23 19:52:27 2017 +0100 @@ -0,0 +1,622 @@ +\documentclass{article} +\usepackage{../style} + +\begin{document} +\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015} + +\section*{Handout 6 (Zero-Knowledge Proofs)} + +Zero-knowledge proofs (short ZKP) solve a paradoxical puzzle: +How to convince somebody else that one knows a secret, without +revealing what the secret actually is? This sounds like a +problem the Mad Hatter from Alice in Wonderland would occupy +himself with, but actually there some serious and not so +serious applications of it. For example, if you solve +crosswords with your friend, say Bob, you might want to +convince him that you found a solution for one question, but +of course you do not want to reveal the solution, as this +might give Bob an advantage somewhere else in the crossword. + +So how to convince Bob that you know the answer (or a secret)? +One way would be to come up with the following protocol: +Suppose the answer is \textit{folio}. Then look up the +definition of \textit{folio} in a dictionary. Say you find: + +\begin{quote} +``an \textit{individual} leaf of paper or parchment, either +loose as one of a series or forming part of a bound volume, +which is numbered on the recto or front side only.'' +\end{quote} + +\noindent +Take the first non-small word\footnote{Let's say the, a, an, +or, and \ldots fall into the category of small words.} in this definition, +in this case \textit{individual}, and look up the definition +of this word, say + +\begin{quote} +``a single \textit{human} being as distinct from a group'' +\end{quote} + +\noindent +In this definition take the second non-small word, that +is \textit{human}, and again look up the definition of this +word. This will yield + +\begin{quote} +``relating to or \textit{characteristic} of humankind'' +\end{quote} + +\noindent +You could go on looking up the definition of the third +non-small word in this definition and so on. But let us assume +you agreed with Bob to stop after three iterations with the +third non-article word in the last definition, that is +\textit{or}. Now, instead of sending to Bob the solution +\textit{folio}, you send to him \textit{characteristic}. + +How can Bob verify that you know the solution? Well, once he +solved it himself, he can use the dictionary and follow the +same ``trail'' as you did. If the final word agrees with what +you had sent him, he must infer you knew the solution earlier +than him. This protocol works like a one-way hash function +because \textit{characteristic} does not give any hint as to +what was the first word was. I leave you to think why this +protocol avoids small words? + +After Bob found his solution and verified that according to +the protocol it ``maps'' also to \textit{characteristic}, can +he be entirely sure no cheating is going on? Not with 100\% +certainty. It could have been possible that he was given +\textit{characteristic} as the word, but it derived from a +different word. This might seem very unlikely, but at least +theoretical it is a possibility. Protocols based on +zero-knowledge proofs will produce a similar result---they +give an answer that might be erroneous in a very small number +of cases. The point is to iterate them long enough so that the +theoretical possibility of cheating is negligibly small. + +By the way, the authors of the paper ``Dismantling Megamos +Crypto: Wirelessly Lockpicking a Vehicle Immobilizer'' who +were barred from publishing their results used also a hash to +prove they did the work and (presumably) managed to get into +cars without a key; see Figure~\ref{paper}. This is very +similar to the method above about crosswords: They like to +prove that they did the work, but not giving out the +``solution''. But this also shows what the problem with such a +method is: yes, we can hide the secret temporarily, but if +somebody else wants to verify it, then the secret has to be +made public. Bob needs to know that \textit{folio} is the +solution before he can verify the claim of Alice that she had +the solution first. Similarly with the car-crypto paper: we +needed to wait until September 2015 when the authors were +finally able to publish their findings in order to verify the +hash. Zero-knowledge proofs, in contrast, can be immediately +checked, even if the secret is not public yet and perhaps +never will be. + +\begin{figure} +\begin{center} +\addtolength{\fboxsep}{4mm} +\fbox{\includegraphics[scale=0.4]{../pics/Dismantling_Megamos_Crypto.png}} +\end{center} +\caption{The authors of this paper used a hash in order to prove +later that they have managed to break into cars.\label{paper}} +\end{figure} + + +\subsubsection*{ZKP: An Illustrative Example} + +The idea behind zero-knowledge proofs is not very obvious and +will surely take some time for you to digest. Therefore let us +start with a simple illustrative example. The example will not +be perfect, but hopefully explain the gist of the idea. The +example is called Alibaba's cave, which graphically looks as +follows:\footnote{The example is taken from an +article titled ``How to Explain Zero-Knowledge Protocols +to Your Children'' available from +\url{http://pages.cs.wisc.edu/~mkowalcz/628.pdf}.} + +\begin{center} +\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c} +\includegraphics[scale=0.1]{../pics/alibaba1.png} & +\includegraphics[scale=0.1]{../pics/alibaba2.png} & +\includegraphics[scale=0.1]{../pics/alibaba3.png} \\ +Step 1 & Step 2 & Step 3 +\end{tabular} +\end{center} + +\noindent Let us take a closer look at the picture in Step 1: +The cave has a tunnel which forks at some point. Both forks +are connected in a loop. At the deep end of the loop is a +magic wand. The point of the magic wand is that Alice knows +the secret word for how to open it. She wants to keep the word +secret, but wants to convince Bob that she knows it. + +One way of course would be to let Bob follow her, but then he +would also find out the secret. So this does not work. To find +a solution, let us first fix the rules: At the beginning Alice +and Bob are outside the cave. Alice goes in alone and takes +either tunnel labelled $A$ in the picture, or the other tunnel +labelled $B$. She waits at the magic wand for the instructions +from Bob, who also goes into the gave and observes what +happens at the fork. He has no knowledge which tunnel Alice +took and calls out (in Step 2) that she should emerge from tunnel +$A$, for example. If she knows the secret for opening the +wand, this will not be a problem for Alice. If she was already +in the $A$-segment of the tunnel, then she just comes back. If +she was in the $B$-segment of the tunnel she will say the magic +word and goes through the wand to emerge from $A$ as requested +by Bob. + +Let us have a look at the case where Alice cheats, that is not +knows the secret. She would still go into the cave and use, +for example the $B$-segment of the tunnel. If now Bob says she +should emerge from $B$, she is lucky. But if he says she +should emerge from $A$ then Alice is in trouble: Bob will find +out she does not actually know the secret. So in order to fool +Bob she needs to anticipate his call, and already go into the +corresponding tunnel. This of course also does not work, since +Bob can make any call he wants. Consequently in order to find +out whether Alice cheats, Bob just needs to repeat this +protocol many times. Each time Alice has a chance of +$\frac{1}{2}$ to be lucky or being found out. Iterating this +$n$ times means she must be right every time and when +cheating: the probability for this is $\frac{1}{2}^n$, number +that for already relatively small $n$, say 10, is incredibly +small. + + +There are some interesting observations we can make about +Alibaba's cave and the ZKP protocol between Alice and Bob: + +\begin{itemize} +\item {\bf Completeness} If Alice knows the secret, Bob + accepts Alice ``proof'' for sure. There is no error + possible in that Bob thinks Alice cheats when she + actually knows the secret. +\item {\bf Soundness} If Alice does not know the secret, + Bob accepts her ``proof'' with a very small probability. + If, as in the example above, the probability of being + able to hide cheating is $\frac{1}{2}$ in each round + it will be $\frac{1}{2}^n$ after $n$-rounds, which even + for small $n$ say $> 10$ is very small indeed. +\item {\bf Zero-Knowledge} Even if Bob accepts + the proof by Alice, he cannot convince anybody else. +\end{itemize} + +\noindent The last property is the most interesting one. +Assume Alice has convinced Bob that she knows the secret and +Bob filmed the whole protocol with a camera. Can he use the +video to convince anybody else? After a moment of thought, you +will agree that this is not the case. Alice and Bob might have +just made it all up and colluded by Bob telling Alice +beforehand which tunnel he will call. In this way it appears +as if all iterations of the protocol were successful, but they +prove nothing. If another person wants to find out whether +Alice knows the secret, he or she would have to conduct the +protocol again. This is actually the formal definition of a +zero-knowledge proof: an independent observer cannot +distinguish between a real protocol (where Alice knows the +secret) and a fake one (where Bob and Alice colluded). + + +\subsubsection*{Using an Graph-Isomorphism Problem for ZKPs} + +Now the question is how can we translate Alibaba's cave into a +computer science solution? It turns out we need an NP problem +for that. The main feature of an NP problem is that it is +computational very hard to generate a solution, but it is very +easy to check whether a given solution indeed solves the +problem at hand.\footnote{The question whether $P = NP$ or not, +we leave to others to speculate about.} + +One NP problem is the \emph{graph isomorphism problem}. It +essentially determines whether the following two graphs, say +$G_1$ and $G_2$, can be moved and stretched so that they look +exactly the same. + +\begin{center} +\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c} +$G_1$ & $G_2$\\ +\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple.png}} & +\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple-b.png}}& + +\footnotesize +\begin{tabular}{rl} +Graph $G_1$ & Graph $G_2$\\ +a & 1\\ +b & 6\\ +c & 8\\ +d & 3\\ +g & 5\\ +h & 2\\ +i & 4\\ +j & 7\\ +\end{tabular} +\end{tabular} +\end{center} + +\noindent The table on the right gives a mapping of the nodes +of the first graph to the nodes of the second. With this +mapping we can check: node $a$ is connected in $G_1$ with $g$, +$h$ and $i$. Node $a$ maps to node $1$ in $G_2$, which is +connected to $2$, $4$ and $5$, which again correspond via the +mapping to $h$, $i$ and $g$ respectively. Let us write +$\sigma$ for such a table and let us write + +\[G_1 = \sigma(G_2)\] + +\noindent for two isomorphic graphs ($\sigma$ being the +isomorphism). It is actually very easy to construct two +isomorphic graphs: Start with an arbitrary graph, re-label the +nodes consistently. Alice will need to do this frequently +for the protocol below. In order to be useful, we therefore +would need to consider substantially larger graphs, like +with thousand nodes. + +Now the secret which Alice tries to hide is the knowledge of +such an isomorphism $\sigma$ between two such graphs. But she +can convince Bob whether she knows such an isomorphism for two +graphs, say $G_1$ and $G_2$, using the same idea as in the +example with Alibaba's cave. For this Alice and Bob must +follow the following protocol: + +\begin{enumerate} +\item Alice generates an isomorphic graph $H$ which she sends + to Bob (in each iteration she needs to generate a + different $H$). +\item + After receiving $H$, Bob asks Alice either for an + isomorphism between $G_1$ and $H$, or $G_2$ and $H$. +\item Alice and Bob repeat this procedure $n$ times. +\end{enumerate} + +\noindent In Step 1 it is important that Alice always +generates a fresh isomorphic graph. I let you think what +would happen if Alice sends out twice the same graph $H$. + +As said before, this is relatively easy to generate by +consistently relabelling nodes. If she started from $G_1$, +Alice will have generated + +\begin{equation} +H = \sigma'(G_1)\label{hiso} +\end{equation} + +\noindent where $\sigma'$ is the isomorphism between $H$ and +$G_1$. But she could equally have started from $G_2$. In the +case where $G_1$ and $G_2$ are isomorphic, if $H$ is +isomorphic with $G_1$, it will also be isomorphic with +$G_2$, and vice versa. + +After generating $H$, Alice sends it to Bob. The important +point is that she needs to ``commit'' to this $H$, therefore +this kind of zero-knowledge protocols are called +\emph{commitment protocols}. Only after receiving $H$, Bob +will make up his mind about which isomorphism he asks +for---whether between $H$ and $G_1$ or $H$ and $G_2$. For this +he could flip a coin, since the choice should be as +unpredictable for Alice as possible. Once Alice receives the +request, she has to produce an isomorphism. If she generated +$H$ as shown in \eqref{hiso} and is asked for an isomorphism +between $H$ and $G_1$, she just sends $\sigma'$. If she had +been asked for an isomorphism between $H$ and $G_2$, she just +has to compose her secret isomorphism $\sigma$ and $\sigma'$. +The main point for the protocol is that even knowing the +isomorphism between $H$ and $G_1$ or $H$ and $G_2$, will not +make the task easier to find the isomorphism between $G_1$ and +$G_2$, which is the secret Alice tries to protect. + +In order to make it crystal clear how this protocol proceeds, +let us give a version using our more formal notation for +protocols: + +\begin{center} +\begin{tabular}{lrl} +0) & $A \to B:$ & $G_1$ and $G_2$\\ +1a) & $A \to B:$ & $H_1$ \\ +1b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_1$? + \quad(or $G_2 \leftrightarrow H_1$)\\ +1c) & $A \to B:$ & requested isomorphism\\ +2a) & $A \to B:$ & $H_2$\\ +2b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_2$? + \quad(or $G_2 \leftrightarrow H_2$)\\ +2c) & $A \to B:$ & requested isomorphism\\ + & \ldots\\ +\end{tabular} +\end{center} + +\noindent As can be seen the protocol runs for some +agreed number of iterations. The $H_i$ Alice needs to +produce, need to be all distinct. I hope you now know +why? + +It is also crucial that in each iteration, Alice first sends +$H_i$ and then Bob can decide which isomorphism he wants: +either $G_1 \leftrightarrow H_i$ or $G_2 \leftrightarrow H_i$. +If somehow Alice can find out before she committed to $H_i$, +she can cheat. For this assume Alice does \emph{not} know an +isomorphism between $G_1$ and $G_2$. If she knows which +isomorphism Bob will ask for she can craft $H$ in such a way +that it is isomorphism with either $G_1$ or $G_2$ (but it +cannot with both). Then in each case she would send Bob +a correct answer and he would come to the conclusion that +all is well. I let you also answer the question whether +such a protocol run between Alice and Bob would convince +a third person, say Pete. + +Since the interactive nature of the above PKZ protocol and the +correct ordering of the messages is so important for the +``correctness'' of the protocol, it might look surprising that +the same goal can also me achieved in a completely offline +manner. By this I mean Alice can publish all data at once, and +then at a later time, Bob can inspect the data and come to the +conclusion whether or not Alice knows the secret (again +without actually learning about the secret). For this +Alice has to do the following: + +\begin{enumerate} +\item Alice generates $n$ isomorphic graphs + $H_{1..n}$ (they need to be all distinct) +\item she feeds the $H_{1..n}$ into a hashing function + (for example encoded as as matrix) +\item she takes the first $n$ bits of the output of the hashing + function: + whenever the output is $0$, she shows an isomorphism + with $G_1$; for $1$ she shows an isomorphism + with $G_2$ +\end{enumerate} + +\noindent The reason why this works and achieves the same +goal as the interactive variant is that Alice has no +control over the hashing function. It would be +computationally just too hard to assemble a set of +$H_{1..n}$ such that she can force where $0$s and $1$s +in the hash values are such that it would pass an external +test. The point is that Alice can publish all this data +on the comfort of her own web-page, for example, and +in this way convince everybody who bothers to check. + +The virtue of the use of graphs and isomorphism for a +zero-knowledge protocol is that the idea why it works +are relatively straightforward. The disadvantage is +that encoding any secret into a graph-isomorphism, while +possible, is awkward. The good news is that in fact +any NP problem can be used as part of a ZKP protocol. + + +\subsubsection*{Using Modular Logarithms for ZKP Protocols} + +While information can be encoded into graph isomorphisms, it +is not the most convenient carrier of information. Clearly it +is much easier to encode information into numbers. Let us look +at zero-knowledge proofs that use numbers as secrets. For this +the underlying NP-problem is to calculate discrete logarithms. +It can be used by choosing public numbers $A$, $B$, $p$, and +private $x$ such that + +\begin{center} +$A^x \equiv B\; mod\; p$ +\end{center} + +\noindent holds. The secret Alice tries to keep secret is $x$. +The point of the modular logarithm is that it is very hard +from the public data to calculate $x$ (for large primes). +Now the protocol proceeds in three stages: + +\begin{itemize} +\item {\bf Commitment Stage} + \begin{enumerate} + \item Alice generates $z$ random numbers $r_1, \ldots, r_z$, + all less than $p - 1$. Alice then sends Bob for all + $i = 1,\ldots, z$: + \[ h_i = A^{r_i}\; mod\; p\] + \item Bob generates $z$ random bits, say $b_1,\ldots, b_z$. He can do this + by flipping $z$ times a coin, for example. + \item For each bit $b_i$, Alice sends Bob an $s_i$ where + + \begin{center} + \begin{tabular}{ll} + if $b_i = 0$: & $s_i = r_i$\\ + if $b_i = 1$: & $s_i = (r_i - r_j) \;mod\; (p -1)$\\ + \end{tabular} + \end{center} + + where $r_j$ is the lowest $j$ where $b_j = 1$. + + \end{enumerate} + \end{itemize} + +\noindent For understanding the last step, let $z$ be just 4. +We have four random values $r_i$ chosen by Alice and four +random bits $b_i$ chosen subsequently by Bob, for example + + \begin{center} + \begin{tabular}{lcccc} + $r_i$:\; & 4 & 9 & 1 & 3\\ + $b_i$:\; & 0 & 1 & 0 & 1\\ + & & $\uparrow$ \\ + & & $j$ + \end{tabular} + \end{center} + + \noindent The highlighted column is the lowest where $b_i = + 1$ (counted from the left). That means $r_j = 9$. The reason + for letting Alice choose the random numbers $r_1, \ldots, r_z$ + will become clear shortly. Next is the confirmation + phase where Bob essentially checks whether Alice has sent + him ``correct'' $s_i$ and $h_i$. + +\begin{itemize} +\item {\bf Confirmation Stage} + \begin{enumerate} + \item For each $b_i$ Bob checks whether $s_i$ + conform to the protocol + + \begin{center} + \begin{tabular}{ll} + if $b_i = 0$: & $A^{s_i} \stackrel{?}{\equiv} h_i\;mod\;p$\\ + if $b_i = 1$: & $A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p$\\ + \end{tabular} + \end{center} + \end{enumerate} +\end{itemize} + +\noindent To understand the case for $b_i = 1$, you have +to do the following calculation: + +\begin{center} +\begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l} +$A^{s_i}$ & $=$ & $A^{r_i - r_j}$\\ + & $=$ & $A^{r_i} * A^{-r_j}$\\ + & $=$ & $h_{r_i} * h_{r_j}^{-1}\;mod\;p$ +\end{tabular} +\end{center} + +\noindent What is interesting that so far nothing has been +sent about $x$, which is the secret Alice has. Also notice +that Bob does not know $r_j$. He received + +\begin{center} +$r_j - r_j$, $r_m - r_j$, \ldots, $r_p - r_j \;mod \;p - 1$ +\end{center} + +\noindent whenever his corresponding bits were $1$. So Bob +does not know $r_j$ and also does not know any $r_i$ where the +bit was $1$. Information about the $x$ is sent in the next +stage (obviously not revealing $x$). + +\begin{itemize} +\item {\bf Proving Stage} + +\begin{enumerate} +\item Alice proves she knows $x$, the discrete log of $B$, +by sending + +\begin{center} +$s_{z+1} = x - r_j\;mod\;p-1$ +\end{center} + +\item Bob confirms + +\begin{center} +$A^{s_{z+1}} \stackrel{?}{\equiv} B * h_j^{-1} \;mod \; p$ +\end{center} +\end{enumerate} +\end{itemize} + +\noindent To understand the last step, you have to do the trick +again that + +\[A^{s_{z+1}} = A^{x-r_j} = \ldots +\] + +\noindent +which I leave to you. + +Now the question is how can Alice cheat? In order to cheat she +has to coordinate what she sends as $h_i$ in step 1 and $s_i$ +in step 3 of the commitment stage, and also what to send as +$s_{z+1}$ in the proving stage. For the latter of course +Alice does not know $x$, so she just chooses some random +number for $s_{z+1}$ and calculates + +\[A^{s_{z+1}}\] + +\noindent +and then solves the equation + +\[A^{s_{z+1}} \equiv B * y \;mod\;p\] + +\noindent for $y$. This is easy since no logarithm needs to be +computed. If Alice can guess the $j$ where the first 1 will +appear in Bob's bit vector, then she sends the inverse of $y$ +as $h_j$ and 0 as $s_j$. However, notice that when she +calculates a solution for $y$ she does not know $r_j$. For this she +would need to calculate the modular logarithm + +\[y \equiv A^{r_j}\;mod\;p\] + +\noindent which is hard (see step 1 in the commitment stage). + +Having settled on what $h_j$ should be, now what should Alice +send as the other $h_i$ and other $s_i$? If the $b_i$ happens +to be a 1, then the $h_i$ and other $s_i$ need to satisfy the +test + +\[A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p\] + +\noindent where she has already settled on the value of +$h_j^{-1}$. Lets say she choses $s_i$ at random, then she just +needs to solve + +\[A^{s_i} \equiv z * h_j^{-1} \;mod\; p\] + +\noindent for $z$. Again that is easy, but it does not allow +us to know $r_i$, because then we would again need to solve +a modular logarithm problem. Let us call an $h_i$ which was +solved the easy way as \emph{bogus}. Alice has to produce +bogus $h_i$ for all bits that are going to be $1$ in advance! +This means she has to guess all the bits correctly. (Yes? +I let you think about this.) + +Let us see what happens if she guesses wrongly: Suppose the +bit $b_i = 1$ where she thought she will get a 0. Then she has +already sent an $h_i$ and $h_j$ and now must find an $s_i$ +such that + +\[A^{s_i} \equiv h_i * h_j^{-1} \;mod\; p\] + +\noindent holds. For this remember in calculating $h_i$, she +just chose a random $s_i$. Now she has to send a genuine one. +But this is of course too hard. If she knew the genuine $r_i$ +and $r_j$ for $h_i$ and $h_j$, it would be easy (in this case +$s_i = r_i - r_j$). But she does not. So she will be found +out. If $b_i = 0$, but she thought she will get a 1, then +she has to send a $s_i$ which satisfies + +\[A^{s_i} \equiv h_i\;mod\;p\] + +\noindent Again she does not know $r_i$. So it is a too hard +task and she will be found out again. + +To sum up, in order for Alice to successfully cheat Bob, she +needs to guess \emph{all} bits correctly. She has only a +$\frac{1}{2^z}$ chance of doing this. + +\subsubsection*{Further Reading} + +Make sure you understand what NP problems +are.\footnote{\url{http://en.wikipedia.org/wiki/NP_(complexity)}} +They are the building blocks for zero-knowledge proofs. +Zero-Knowldege proofs are not yet widely used in production +systems, but it is slowly gaining ground. One area of application +where they pop up is crypto currencies (for example Zerocoins +or how to make sure a Bitcoin exchange is solvent without +revealing its assets). + +If you want to brush up on the modular logarithm problem, +the Khan Academy has a nice video: + +\begin{center} +\url{https://www.khanacademy.org/video/discrete-logarithm-problem} +\end{center} + +\end{document} + +http://blog.cryptographyengineering.com/2014/11/zero-knowledge-proofs-illustrated-primer.html + +http://btravers.weebly.com/uploads/6/7/2/9/6729909/zero_knowledge_technique.pdf +http://zk-ssh.cms.ac/docs/Zero_Knowledge_Prinzipien.pdf +http://www.wisdom.weizmann.ac.il/~oded/PS/zk-tut02v4.ps + +socialist millionares problem +http://en.wikipedia.org/wiki/Socialist_millionaire +http://twistedoakstudios.com/blog/Post3724_explain-it-like-im-five-the-socialist-millionaire-problem-and-secure-multi-party-computation + + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% End: diff -r 280e057558b8 -r 7a6e8f603e08 hws/hw01.pdf Binary file hws/hw01.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 hws/hw02.pdf Binary file hws/hw02.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 hws/hw03.pdf Binary file hws/hw03.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 hws/hw04.pdf Binary file hws/hw04.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 hws/hw05.pdf Binary file hws/hw05.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 hws/hw06.pdf Binary file hws/hw06.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 hws/hw06.tex --- a/hws/hw06.tex Sat Sep 23 19:39:53 2017 +0100 +++ b/hws/hw06.tex Sat Sep 23 19:52:27 2017 +0100 @@ -1,21 +1,65 @@ \documentclass{article} \usepackage{../style} - \begin{document} \section*{Homework 6} -\HEADER +\begin{enumerate} +\item What are good uses of anonymity services like Tor? + +\item What is meant by the notion \emph{forward privacy}? + +\item What is a \emph{re-identification attack}? -\begin{enumerate} +\item Imagine you have a completely `innocent' email message, + like birthday wishes to your grandmother. Why should you + still encrypt this message and your grandmother take the + effort to decrypt it? + (Hint: The answer has nothing to do with preserving the + privacy of your grandmother and nothing to do with + keeping her birthday wishes supersecret. Also nothing to + do with you and grandmother testing the latest + encryption technology, nor just for the sake of it.) - +\item One part of achieving privacy (but not the only one) is to + properly encrypt your conversations on the Internet. But this is + fiercely resisted by some spy agencies. These agencies (and some + politicians for that matter) argue that, for example, ISIL's + recruiters broadcast messages on, say, Twitter, and get people to + follow them. Then they move potential recruits to Twitter Direct + Messaging to evaluate if they are a legitimate recruit. If yes, they + move them to an encrypted mobile-messaging app. The spy agencies + argue that although they can follow the conversations on Twitter, + they ``go dark'' on the encrypted message app. To counter this + ``going-dark problem'', the spy agencies push for the implementation + of back-doors in iMessage and Facebook and Skype and everything else + UK or US-made, which they can use eavesdrop on conversations without + the conversants' knowledge or consent.\medskip + + What is the fallacy in the spy agencies going-dark argument? + (Hint: Think what would happen if the spy agencies and certain + politicians get their wish.) + +\item DNA data is very sensitive and can easily violate the privacy of + (living) people. To get around this, two scientists from Denmark + proposed to create a \emph{necrogenomic database} which would record + the DNA data of all Danish citizens and residents at the time of + their \emph{death}. By matching these to information about illnesses + and ailments in life, helpful evidence could be gathered about the + genetic origins of diseases. The idea is that the privacy of dead + people cannot be violated. -\item \POSTSCRIPT -\end{enumerate} + What is the fallacy behind this reasoning? +\item A few years ago a Google executive tried to allay worries about + Google pooring over all your emails on Gmail. He said something + along the lines: you are watched by an algorithm; this is like being + naked in front of your dog. What is wrong with this argument? + +\item \POSTSCRIPT +\end{enumerate} \end{document} %%% Local Variables: diff -r 280e057558b8 -r 7a6e8f603e08 hws/hw07.pdf Binary file hws/hw07.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 hws/hw07.tex --- a/hws/hw07.tex Sat Sep 23 19:39:53 2017 +0100 +++ b/hws/hw07.tex Sat Sep 23 19:52:27 2017 +0100 @@ -3,63 +3,37 @@ \begin{document} -\section*{Homework 6} + +% For Alice to cheat, she has to get her transaction into the blockchain. +% For this she has to solve proof-of-work puzzles faster than anybody +% else. Is it possible for her to precompute several blocks that would +% validate a fraudulent transaction by her? Give a short explanation +% for your reasoning. + +\section*{Homework 7} + +\HEADER \begin{enumerate} -\item What are good uses of anonymity services like Tor? - -\item What is meant by the notion \emph{forward privacy}? - -\item What is a \emph{re-identification attack}? +\item How can the hardness of the proof-of-work puzzles in + Bitcoins be adjusted? What is parameter that determines + how the hardness is adjusted? -\item Imagine you have a completely `innocent' email message, - like birthday wishes to your grandmother. Why should you - still encrypt this message and your grandmother take the - effort to decrypt it? - - (Hint: The answer has nothing to do with preserving the - privacy of your grandmother and nothing to do with - keeping her birthday wishes supersecret. Also nothing to - do with you and grandmother testing the latest - encryption technology, nor just for the sake of it.) +\item What is the main data that is stored in Bitcoin's + blockchain? + +\item What is is the purpose of the proof-of-work puzzle in + Bitcoins? -\item One part of achieving privacy (but not the only one) is to - properly encrypt your conversations on the Internet. But this is - fiercely resisted by some spy agencies. These agencies (and some - politicians for that matter) argue that, for example, ISIL's - recruiters broadcast messages on, say, Twitter, and get people to - follow them. Then they move potential recruits to Twitter Direct - Messaging to evaluate if they are a legitimate recruit. If yes, they - move them to an encrypted mobile-messaging app. The spy agencies - argue that although they can follow the conversations on Twitter, - they ``go dark'' on the encrypted message app. To counter this - ``going-dark problem'', the spy agencies push for the implementation - of back-doors in iMessage and Facebook and Skype and everything else - UK or US-made, which they can use eavesdrop on conversations without - the conversants' knowledge or consent.\medskip - - What is the fallacy in the spy agencies going-dark argument? - (Hint: Think what would happen if the spy agencies and certain - politicians get their wish.) - -\item DNA data is very sensitive and can easily violate the privacy of - (living) people. To get around this, two scientists from Denmark - proposed to create a \emph{necrogenomic database} which would record - the DNA data of all Danish citizens and residents at the time of - their \emph{death}. By matching these to information about illnesses - and ailments in life, helpful evidence could be gathered about the - genetic origins of diseases. The idea is that the privacy of dead - people cannot be violated. +\item The department has large labs full of computers that are + pretty much idle over night. Why is it a bad idea to let + them mine for Bitcoins? - What is the fallacy behind this reasoning? +\item Is it possible that Bitcoins can get lost (be + irretrievable)? -\item A few years ago a Google executive tried to allay worries about - Google pooring over all your emails on Gmail. He said something - along the lines: you are watched by an algorithm; this is like being - naked in front of your dog. What is wrong with this argument? - -\item \POSTSCRIPT -\end{enumerate} +\item \POSTSCRIPT +\end{enumerate} \end{document} %%% Local Variables: diff -r 280e057558b8 -r 7a6e8f603e08 hws/hw08.pdf Binary file hws/hw08.pdf has changed diff -r 280e057558b8 -r 7a6e8f603e08 hws/hw08.tex --- a/hws/hw08.tex Sat Sep 23 19:39:53 2017 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,42 +0,0 @@ -\documentclass{article} -\usepackage{../style} - -\begin{document} - - -% For Alice to cheat, she has to get her transaction into the blockchain. -% For this she has to solve proof-of-work puzzles faster than anybody -% else. Is it possible for her to precompute several blocks that would -% validate a fraudulent transaction by her? Give a short explanation -% for your reasoning. - -\section*{Homework 7} - -\HEADER - -\begin{enumerate} -\item How can the hardness of the proof-of-work puzzles in - Bitcoins be adjusted? What is parameter that determines - how the hardness is adjusted? - -\item What is the main data that is stored in Bitcoin's - blockchain? - -\item What is is the purpose of the proof-of-work puzzle in - Bitcoins? - -\item The department has large labs full of computers that are - pretty much idle over night. Why is it a bad idea to let - them mine for Bitcoins? - -\item Is it possible that Bitcoins can get lost (be - irretrievable)? - -\item \POSTSCRIPT -\end{enumerate} -\end{document} - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: t -%%% End: