1 \documentclass{article} |
1 \documentclass{article} |
2 \usepackage{../style} |
2 \usepackage{../style} |
|
3 \usepackage{../graphics} |
3 |
4 |
4 \begin{document} |
5 \begin{document} |
5 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015} |
6 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015} |
6 |
7 |
7 \section*{Handout 6 (Zero-Knowledge Proofs)} |
8 %https://www.theguardian.com/technology/2016/oct/04/yahoo-secret-email-program-nsa-fbi |
8 |
9 %https://nakedsecurity.sophos.com/2015/11/12/california-collects-owns-and-sells-infants-dna-samples/ |
9 Zero-knowledge proofs (short ZKP) solve a paradoxical puzzle: |
10 %http://randomwalker.info/teaching/fall-2012-privacy-technologies/? |
10 How to convince somebody else that one knows a secret, without |
11 %https://josephhall.org/papers/NYU-MCC-1303-S2012_privacy_syllabus.pdf |
11 revealing what the secret actually is? This sounds like a |
12 %http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf |
12 problem the Mad Hatter from Alice in Wonderland would occupy |
13 %http://www.cs.cmu.edu/~yuxiangw/docs/Differential%20Privacy.pdf |
13 himself with, but actually there some serious and not so |
14 %https://www.youtube.com/watch?v=Gx13lgEudtU |
14 serious applications of it. For example, if you solve |
15 %https://fpf.org/wp-content/uploads/Differential-Privacy-as-a-Response-to-the-Reidentification-Threat-Klinefelter-and-Chin.pdf |
15 crosswords with your friend, say Bob, you might want to |
16 %http://research.neustar.biz/2014/09/08/differential-privacy-the-basics/ |
16 convince him that you found a solution for one question, but |
17 |
17 of course you do not want to reveal the solution, as this |
18 %===== |
18 might give Bob an advantage somewhere else in the crossword. |
19 %Tim Greene, Network World, 17 Dec 2015 (via ACM TechNews, 18 Dec 2015) |
19 |
20 % |
20 So how to convince Bob that you know the answer (or a secret)? |
21 %Massachusetts Institute of Technology (MIT) researchers' experimental |
21 One way would be to come up with the following protocol: |
22 %Vuvuzela messaging system offers more privacy than The Onion Router (Tor) by |
22 Suppose the answer is \textit{folio}. Then look up the |
23 %rendering text messages sent through it untraceable. MIT Ph.D. student |
23 definition of \textit{folio} in a dictionary. Say you find: |
24 %David Lazar says Vuvuzela resists traffic analysis attacks, while Tor |
24 |
25 %cannot. The researchers say the system functions no matter how many parties |
25 \begin{quote} |
26 %are using it to communicate, and it employs encryption and a set of servers |
26 ``an \textit{individual} leaf of paper or parchment, either |
27 %to conceal whether or not parties are participating in text-based dialogues. |
27 loose as one of a series or forming part of a bound volume, |
28 %"Vuvuzela prevents an adversary from learning which pairs of users are |
28 which is numbered on the recto or front side only.'' |
29 %communicating, as long as just one out of [the] servers is not compromised, |
|
30 %even for users who continue to use Vuvuzela for years," they note. Vuvuzela |
|
31 %can support millions of users hosted on commodity servers deployed by a |
|
32 %single group of users. Instead of anonymizing users, Vuvuzela prevents |
|
33 %outside observers from differentiating between people sending messages, |
|
34 %receiving messages, or neither, according to Lazar. The system imposes |
|
35 %noise on the client-server traffic which cannot be distinguished from actual |
|
36 %messages, and all communications are triple-wrapped in encryption by three |
|
37 %servers. "Vuvuzela guarantees privacy as long as one of the servers is |
|
38 %uncompromised, so using more servers increases security at the cost of |
|
39 %increased message latency," Lazar notes. |
|
40 %http://orange.hosting.lsoft.com/trk/click?ref=znwrbbrs9_5-e70bx2d991x066779& |
|
41 |
|
42 %%%% |
|
43 %% canvas tracking |
|
44 %%https://freedom-to-tinker.com/blog/englehardt/the-princeton-web-census-a-1-million-site-measurement-and-analysis-of-web-privacy/ |
|
45 |
|
46 %%% |
|
47 %% cupit re-identification attack |
|
48 %% https://nakedsecurity.sophos.com/2016/05/20/published-personal-data-on-70000-okcupid-users-taken-down-after-dmca-order/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+nakedsecurity+%28Naked+Security+-+Sophos%29 |
|
49 |
|
50 %Differential privacy |
|
51 %===================== |
|
52 %https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/ |
|
53 |
|
54 %Differential privacy, translated from Apple-speak, is the |
|
55 %statistical science of trying to learn as much as possible |
|
56 %about a group while learning as little as possible about any |
|
57 %individual in it. |
|
58 |
|
59 %As Roth notes when he refers to a “mathematical proof,” |
|
60 %differential privacy doesn’t merely try to obfuscate or |
|
61 %“anonymize” users’ data. That anonymization approach, he |
|
62 %argues, tends to fail. In 2007, for instance, Netflix released |
|
63 %a large collection of its viewers’ film ratings as part of a |
|
64 %competition to optimize its recommendations, removing people’s |
|
65 %names and other identifying details and publishing only their |
|
66 %Netflix ratings. But researchers soon cross-referenced the |
|
67 %Netflix data with public review data on IMDB to match up |
|
68 %similar patterns of recommendations between the sites and add |
|
69 %names back into Netflix’s supposedly anonymous database. |
|
70 |
|
71 %As an example of that last method, Microsoft’s Dwork points to |
|
72 %the technique in which a survey asks if the respondent has |
|
73 %ever, say, broken a law. But first, the survey asks them to |
|
74 %flip a coin. If the result is tails, they should answer |
|
75 %honestly. If the result is heads, they’re instructed to flip |
|
76 %the coin again and then answer “yes” for heads or “no” for |
|
77 %tails. The resulting random noise can be subtracted from the |
|
78 %results with a bit of algebra, and every respondent is |
|
79 %protected from punishment if they admitted to lawbreaking. |
|
80 |
|
81 %https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf |
|
82 |
|
83 % Windows 10 data send back to Microsoft (Cortana) |
|
84 %Here’s a non-exhaustive list of data sent back: location data, text |
|
85 %input, voice input, touch input, webpages you visit, and telemetry |
|
86 %data regarding your general usage of your computer, including which |
|
87 %programs you run and for how long. |
|
88 |
|
89 % Businesses are already using customised pricing online based on |
|
90 % information they can glean about you. It is hard to know how |
|
91 % widespread the practice is; companies keep their pricing strategies |
|
92 % closely guarded and are wary of the bad PR price discrimination |
|
93 % could pose. However, it is clear that a number of large retailers |
|
94 % are experimenting with it. Staples, for example, has offered |
|
95 % discounted prices based on whether rival stores are within 20 miles |
|
96 % of its customers’ location. Office Depot has admitted to using its |
|
97 % customers’ browsing history and location to vary its range of offers |
|
98 % and products. A 2014 study from Northeastern University found |
|
99 % evidence of “steering” or differential pricing at four out of 10 |
|
100 % general merchandise websites and five out of five travel |
|
101 % websites. (Steering is when a company doesn’t give you a customised |
|
102 % price, but points you towards more expensive options if it thinks |
|
103 % you will pay more.) The online travel company Orbitz raised |
|
104 % headlines in 2012 when it emerged that the firm was pointing Mac |
|
105 % users towards higher-priced hotel rooms than PC users. |
|
106 |
|
107 |
|
108 %%% government will overwrite your wishes if it is annoymous |
|
109 %% https://www.lightbluetouchpaper.org/2016/12/05/government-u-turn-on-health-privacy/ |
|
110 |
|
111 %% corporate surveilance / privacy - report and CC3C talk |
|
112 %% http://crackedlabs.org/en/networksofcontrol |
|
113 %% https://media.ccc.de/v/33c3-8414-corporate_surveillance_digital_tracking_big_data_privacy#video&t=2933 |
|
114 |
|
115 \section*{Handout 6 (Privacy)} |
|
116 |
|
117 The first motor car was invented around 1886. For ten years, |
|
118 until 1896, the law in the UK (and elsewhere) required a |
|
119 person to walk in front of any moving car waving a red flag. |
|
120 Cars were such a novelty that most people did not know what to |
|
121 make of them. The person with the red flag was intended to |
|
122 warn the public, for example horse owners, about the impending |
|
123 novelty---a car. In my humble opinion, we are at the same |
|
124 stage of development with privacy. Nobody really knows what it |
|
125 is about or what it is good for. All seems very hazy. There |
|
126 are a few laws (e.g.~cookie law, right-to-be-forgotten law) |
|
127 which address problems with privacy, but even if they are well |
|
128 intentioned, they either back-fire or are already obsolete |
|
129 because of newer technologies. The result is that the world of |
|
130 ``privacy'' looks a little bit like the old Wild |
|
131 West---lawless and mythical. |
|
132 |
|
133 We would have hoped that after Snowden, Western governments |
|
134 would be a bit more sensitive and enlightned about the topic |
|
135 of privacy, but this is far from the truth. Ross Anderson |
|
136 wrote the following in his blog\footnote{\url{https://www.lightbluetouchpaper.org/2016/02/11/report-on-the-ip-bill/}} about the approach taken in |
|
137 the US to lessons learned from the Snowden leaks and contrasts |
|
138 this with the new snooping bill that is considered in the UK |
|
139 parliament: |
|
140 |
|
141 \begin{quote}\it |
|
142 ``The comparison with the USA is stark. There, all three |
|
143 branches of government realised they'd gone too far after |
|
144 Snowden. President Obama set up the NSA review group, and |
|
145 implemented most of its recommendations by executive order; |
|
146 the judiciary made changes to the procedures of the FISA |
|
147 Court; and Congress failed to renew the data retention |
|
148 provisions in the Patriot Act (aided by the judiciary). Yet |
|
149 here in Britain the response is just to take Henry VIII powers |
|
150 to legalise all the illegal things that GCHQ had been up to, |
|
151 and hope that the European courts won't strike the law down |
|
152 yet again.'' |
29 \end{quote} |
153 \end{quote} |
30 |
154 |
|
155 \noindent Unfortunately, also big organisations besides |
|
156 governments seem to take an unenlightened approach to privacy. |
|
157 For example, UCAS, a charity set up to help students with |
|
158 applying to universities in the UK, has a commercial unit that |
|
159 happily sells your email addresses to anybody who forks out |
|
160 enough money for bombarding you with spam. Yes, you can opt |
|
161 out very often from such ``schemes'', but in case of UCAS any |
|
162 opt-out will limit also legit emails you might actually be |
|
163 interested in.\footnote{The main objectionable point, in my |
|
164 opinion, is that the \emph{charity} everybody has to use for |
|
165 HE applications has actually very honourable goals |
|
166 (e.g.~assist applicants in gaining access to universities), |
|
167 but the small print (or better the link ``About us'') reveals |
|
168 they set up their organisation so that they can also |
|
169 shamelessly sell the email addresses they ``harvest''. |
|
170 Everything is of course very legal\ldots{}ethical?\ldots{}well |
|
171 that is in the eye of the beholder. See: |
|
172 |
|
173 \url{http://www.ucas.com/about-us/inside-ucas/advertising-opportunities} |
|
174 or |
|
175 \url{http://www.theguardian.com/uk-news/2014/mar/12/ucas-sells-marketing-access-student-data-advertisers}} |
|
176 |
|
177 Another example: Verizon, an ISP who is supposed to provide |
|
178 you just with connectivity, has found a ``nice'' side-business |
|
179 too: When you have enabled all privacy guards in your browser |
|
180 (the few you have at your disposal), Verizon happily adds a |
|
181 kind of cookie to your |
|
182 HTTP-requests.\footnote{\url{http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/}} |
|
183 As shown in the picture below, this cookie will be sent to |
|
184 every web-site you visit. The web-sites then can forward the |
|
185 cookie to advertisers who in turn pay Verizon to tell them |
|
186 everything they want to know about the person who just made |
|
187 this request, that is you. |
|
188 |
|
189 \begin{center} |
|
190 \includegraphics[scale=0.16]{../pics/verizon.png} |
|
191 \end{center} |
|
192 |
|
193 \noindent How disgusting! Even worse, Verizon is not known for |
|
194 being the cheapest ISP on the planet (completely the |
|
195 contrary), and also not known for providing the fastest |
|
196 possible speeds, but rather for being among the few ISPs in |
|
197 the US with a quasi-monopolistic ``market distribution''. |
|
198 |
|
199 |
|
200 Well, we could go on and on\ldots{}and that has not even |
|
201 started us yet with all the naughty things NSA \& Friends are |
|
202 up to. Why does privacy actually matter? Nobody, I think, has |
|
203 a conclusive answer to this question yet. Maybe the following |
|
204 four notions help with clarifying the overall picture |
|
205 somewhat: |
|
206 |
|
207 \begin{itemize} |
|
208 \item \textbf{Secrecy} is the mechanism used to limit the |
|
209 number of principals with access to information (e.g., |
|
210 cryptography or access controls). For example I better |
|
211 keep my password secret, otherwise people from the wrong |
|
212 side of the law might impersonate me. |
|
213 |
|
214 \item \textbf{Confidentiality} is the obligation to protect |
|
215 the secrets of other people or organisations (secrecy |
|
216 for the benefit of an organisation). For example as a |
|
217 staff member at King's I have access to data, even |
|
218 private data, I am allowed to use in my work but not |
|
219 allowed to disclose to anyone else. |
|
220 |
|
221 \item \textbf{Anonymity} is the ability to leave no evidence of |
|
222 an activity (e.g., sharing a secret). This is not equal |
|
223 with privacy---anonymity is required in many |
|
224 circumstances, for example for whistle-blowers, |
|
225 voting, exam marking and so on. |
|
226 |
|
227 \item \textbf{Privacy} is the ability or right to protect your |
|
228 personal secrets (secrecy for the benefit of an |
|
229 individual). For example, in a job interview, I might |
|
230 not like to disclose that I am pregnant, if I were a |
|
231 woman, or that I am a father. Lest they might not hire |
|
232 me. Similarly, I might not like to disclose my location |
|
233 data, because thieves might break into my house if they |
|
234 know I am away at work. Privacy is essentially |
|
235 everything which ``shouldn't be anybody's business''. |
|
236 |
|
237 \end{itemize} |
|
238 |
|
239 \noindent While this might provide us with some rough |
|
240 definitions, the problem with privacy is that it is an |
|
241 extremely fine line what should stay private and what should |
|
242 not. For example, since I am working in academia, I am every |
|
243 so often very happy to be a digital exhibitionist: I am very |
|
244 happy to disclose all `trivia' related to my work on my |
|
245 personal web-page. This is a kind of bragging that is normal |
|
246 in academia (at least in the field of CS), even expected if |
|
247 you look for a job. I am even happy that Google maintains a |
|
248 profile about all my academic papers and their citations. |
|
249 |
|
250 On the other hand I would be very irritated if anybody I do |
|
251 not know had a too close look on my private live---it |
|
252 shouldn't be anybody's business. The reason is that knowledge |
|
253 about my private life can often be used against me. As mentioned |
|
254 above, public location data might mean I get robbed. If |
|
255 supermarkets build a profile of my shopping habits, they will |
|
256 use it to \emph{their} advantage---surely not to \emph{my} |
|
257 advantage. Also whatever might be collected about my life will |
|
258 always be an incomplete, or even misleading, picture. For |
|
259 example I am pretty sure my creditworthiness score was |
|
260 temporarily(?) destroyed by not having a regular income in |
|
261 this country (before coming to King's I worked in Munich for |
|
262 five years). To correct such incomplete or flawed credit |
|
263 history data there is, since recently, a law that allows you |
|
264 to check what information is held about you for determining |
|
265 your creditworthiness. But this concerns only a very small |
|
266 part of the data that is held about me/you. Also |
|
267 what about cases where data is wrong or outdated (but do we |
|
268 need a right-to be forgotten). |
|
269 |
|
270 |
|
271 To see how private matter can lead really to the wrong |
|
272 conclusions, take the example of Stephen Hawking: When he was |
|
273 diagnosed with his disease, he was given a life expectancy of |
|
274 two years. If employers would know about such problems, would |
|
275 they have employed Hawking? Now, he is enjoying his 70+ |
|
276 birthday. Clearly personal medical data needs to stay private. |
|
277 |
|
278 To cut a long story short, I let you ponder about the two |
|
279 statements which are often voiced in discussions about privacy: |
|
280 |
|
281 \begin{itemize} |
|
282 \item \textit{``You have zero privacy anyway. Get over |
|
283 it.''}\\ |
|
284 \mbox{}\hfill{}{\small{}(by Scott Mcnealy, former CEO of Sun)} |
|
285 |
|
286 \item \textit{``If you have nothing to hide, you have nothing |
|
287 to fear.''} |
|
288 \end{itemize} |
|
289 |
|
290 \noindent If you like to watch a movie which has this topic as |
|
291 its main focus I recommend \emph{Gattaca} from |
|
292 1997.\footnote{\url{http://www.imdb.com/title/tt0119177/}} If |
|
293 you want to read up on this topic, I can recommend the |
|
294 following article that appeared in 2011 in the Chronicle of |
|
295 Higher Education: |
|
296 |
|
297 \begin{center} |
|
298 \url{http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/} |
|
299 \end{center} |
|
300 |
|
301 \noindent Funnily, or maybe not so funnily, the author of this |
|
302 article carefully tries to construct an argument that does not |
|
303 only attack the nothing-to-hide statement in cases where |
|
304 governments \& co collect people's deepest secrets, or |
|
305 pictures of people's naked bodies, but an argument that |
|
306 applies also in cases where governments ``only'' collect data |
|
307 relevant to, say, preventing terrorism. The fun is of course |
|
308 that in 2011 we could just not imagine that respected |
|
309 governments would do such infantile things as intercepting |
|
310 people's nude photos. Well, since Snowden we know some people |
|
311 at the NSA did exactly that and then shared such photos among |
|
312 colleagues as ``fringe benefit''. |
|
313 |
|
314 |
|
315 \subsubsection*{Re-Identification Attacks} |
|
316 |
|
317 Apart from philosophical musings, there are fortunately also |
|
318 some real technical problems with privacy. The problem I want |
|
319 to focus on in this handout is how to safely disclose datasets |
|
320 containing potentially very private data, say health records. |
|
321 What can go wrong with such disclosures can be illustrated |
|
322 with four well-known examples: |
|
323 |
|
324 \begin{itemize} |
|
325 \item In 2006, a then young company called Netflix offered a 1 |
|
326 Mio \$ prize to anybody who could improve their movie |
|
327 rating algorithm. For this they disclosed a dataset |
|
328 containing 10\% of all Netflix users at the time |
|
329 (appr.~500K). They removed names, but included numerical |
|
330 ratings of movies as well as times when ratings were |
|
331 uploaded. Though some information was perturbed (i.e., |
|
332 slightly modified). |
|
333 |
|
334 Two researchers had a closer look at this anonymised |
|
335 data and compared it with public data available from the |
|
336 International Movie Database (IMDb). They found that |
|
337 98\% of the entries could be re-identified in the |
|
338 Netflix dataset: either by their ratings or by the dates |
|
339 the ratings were uploaded. The result was a class-action |
|
340 suit against Netflix, which was only recently resolved |
|
341 involving a lot of money. |
|
342 |
|
343 \item In the 1990ies, medical datasets were often made public |
|
344 for research purposes. This was done in anonymised form |
|
345 with names removed, but birth dates, gender and ZIP-code |
|
346 were retained. In one case where such data about |
|
347 hospital visits of state employees in Massachusetts was |
|
348 made public, the then governor assured the public that |
|
349 the released dataset protected patient privacy by |
|
350 deleting identifiers. |
|
351 |
|
352 A graduate student could not resist cross-referencing |
|
353 public voter data with the released data that still |
|
354 included birth dates, gender and ZIP-code. The result |
|
355 was that she could send the governor his own hospital |
|
356 record. It turns out that birth dates, gender and |
|
357 ZIP-code uniquely identify 87\% of people in the US. |
|
358 This work resulted in a number of laws prescribing which |
|
359 private data cannot be released in such datasets. |
|
360 |
|
361 \item In 2006, AOL published 20 million Web search queries |
|
362 collected from 650,000 users (names had been deleted). |
|
363 This was again done for research purposes. However, |
|
364 within days an old lady, Thelma Arnold, from Lilburn, |
|
365 Georgia, (11,596 inhabitants) was identified as user |
|
366 No.~4417749 in this dataset. It turned out that search |
|
367 engine queries are deep windows into people's private |
|
368 lives. |
|
369 |
|
370 \item Genome-Wide Association Studies (GWAS) was a public |
|
371 database of gene-frequency studies linked to diseases. |
|
372 It would essentially record that people who have a |
|
373 disease, say diabetes, have also certain genes. In order |
|
374 to maintain privacy, the dataset would only include |
|
375 aggregate information. In case of DNA data this |
|
376 aggregation was achieved by mixing the DNA of many |
|
377 individuals (having a disease) into a single solution. |
|
378 Then this mixture was sequenced and included in the |
|
379 dataset. The idea was that the aggregate information |
|
380 would still be helpful to researchers, but would protect |
|
381 the DNA data of individuals. |
|
382 |
|
383 In 2007 a forensic computer scientist showed that |
|
384 individuals can still be identified. For this he used |
|
385 the DNA data from a comparison group (people from the |
|
386 general public) and ``subtracted'' this data from the |
|
387 published data. He was left with data that included all |
|
388 ``special'' DNA-markers of the individuals present in |
|
389 the original mixture. He essentially deleted the |
|
390 ``background noise'' in the published data. The problem |
|
391 with DNA data is that it is of such a high resolution |
|
392 that even if the mixture contained maybe 100 |
|
393 individuals, you can with current technology detect |
|
394 whether an individual was included in the mixture or |
|
395 not. |
|
396 |
|
397 This result changed completely how DNA data is nowadays |
|
398 published for research purposes. After the success of |
|
399 the human-genome project with a very open culture of |
|
400 exchanging data, it became much more difficult to |
|
401 anonymise data so that patient's privacy is preserved. |
|
402 The public GWAS database was taken offline in 2008. |
|
403 |
|
404 \end{itemize} |
|
405 |
|
406 \noindent There are many lessons that can be learned from |
|
407 these examples. One is that when making datasets public in |
|
408 anonymised form, you want to achieve \emph{forward privacy}. |
|
409 This means, no matter what other data that is also available |
|
410 or will be released later, the data in the original dataset |
|
411 does not compromise an individual's privacy. This principle |
|
412 was violated by the availability of ``outside data'' in the |
|
413 Netflix and governor of Massachusetts cases. The additional |
|
414 data permitted a re-identification of individuals in the |
|
415 dataset. In case of GWAS a new technique of re-identification |
|
416 compromised the privacy of people in the dataset. The case of |
|
417 the AOL dataset shows clearly how incomplete such data can be: |
|
418 Although the queries uniquely identified the older lady, she |
|
419 also looked up diseases that her friends had, which had |
|
420 nothing to do with her. Any rational analysis of her query |
|
421 data must therefore have concluded, the lady is on her |
|
422 death bed, while she was actually very much alive and kicking. |
|
423 |
|
424 In 2016, Yahoo released the so far largest machine learning |
|
425 dataset to the research community. It includes approximately |
|
426 13.5 TByte of data representing around 100 Billion events from |
|
427 anonymized user-news items, collected by recording |
|
428 interactions of about 20M users from February 2015 to May |
|
429 2015. Yahoo's gracious goal is to promote independent research |
|
430 in the fields of large-scale machine learning and recommender |
|
431 systems. It remains to be seen whether this data will really |
|
432 only be used for that purpose. |
|
433 |
|
434 \subsubsection*{Differential Privacy} |
|
435 |
|
436 Differential privacy is one of the few methods that tries to |
|
437 achieve forward privacy. The basic idea is to add appropriate |
|
438 noise, or errors, to any query of the dataset. The intention |
|
439 is to make the result of a query insensitive to individual |
|
440 entries in the database. That means the results are |
|
441 approximately the same no matter if a particular individual is |
|
442 in the dataset or not. The hope is that the added error does |
|
443 not eliminate the ``signal'' one is looking for in the |
|
444 dataset. |
|
445 |
|
446 %\begin{center} |
|
447 %User\;\;\;\; |
|
448 %\begin{tabular}{c} |
|
449 %tell me $f(x)$ $\Rightarrow$\\ |
|
450 %$\Leftarrow$ $f(x) + \text{noise}$ |
|
451 %\end{tabular} |
|
452 %\;\;\;\;\begin{tabular}{@{}c} |
|
453 %Database\\ |
|
454 %$x_1, \ldots, x_n$ |
|
455 %\end{tabular} |
|
456 %\end{center} |
|
457 % |
|
458 %\begin{center} |
|
459 %\begin{tabular}{l|l} |
|
460 %Staff & Salary\\\hline |
|
461 %$PM$ & \pounds{107}\\ |
|
462 %$PF$ & \pounds{102}\\ |
|
463 %$LM_1$ & \pounds{101}\\ |
|
464 %$LF_2$ & \pounds{97}\\ |
|
465 %$LM_3$ & \pounds{100}\\ |
|
466 %$LM_4$ & \pounds{99}\\ |
|
467 %$LF_5$ & \pounds{98} |
|
468 %\end{tabular} |
|
469 %\end{center} |
|
470 % |
|
471 % |
|
472 %\begin{center} |
|
473 %\begin{tikzpicture} |
|
474 %\begin{axis}[symbolic y coords={salary}, |
|
475 % ytick=data, |
|
476 % height=3cm] |
|
477 %\addplot+[jump mark mid] coordinates |
|
478 %{(0,salary) (0.1,salary) |
|
479 % (0.4,salary) (0.5,salary) |
|
480 % (0.8,salary) (0.9,salary)}; |
|
481 %\end{axis} |
|
482 %\end{tikzpicture} |
|
483 %\end{center} |
|
484 % |
|
485 %\begin{tikzpicture}[outline/.style={draw=#1,fill=#1!20}] |
|
486 % \node [outline=red] {red box}; |
|
487 % \node [outline=blue] at (0,-1) {blue box}; |
|
488 %\end{tikzpicture} |
|
489 |
|
490 \ldots |
|
491 |
|
492 |
|
493 \subsubsection*{Further Reading} |
|
494 |
|
495 Two cool articles about how somebody obtained via the Freedom |
|
496 of Information Law the taxicab dataset of New York and someone |
|
497 else showed how easy it is to mine for private information: |
|
498 |
|
499 \begin{center}\small |
|
500 \begin{tabular}{p{0.78\textwidth}} |
|
501 \url{http://chriswhong.com/open-data/foil_nyc_taxi/}\smallskip\\ |
|
502 \url{http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset} |
|
503 \end{tabular} |
|
504 \end{center} |
|
505 |
|
506 \noindent |
|
507 A readable article about how supermarkets mine your shopping |
|
508 habits (especially how they prey on new exhausted parents |
|
509 ;o) appeared in 2012 in the New York Times: |
|
510 |
|
511 \begin{center} |
|
512 \url{http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html} |
|
513 \end{center} |
|
514 |
|
515 \noindent An article that analyses privacy and shopping habits |
|
516 from a more economic point of view is available from: |
|
517 |
|
518 \begin{center} |
|
519 \url{http://www.dtc.umn.edu/~odlyzko/doc/privacy.economics.pdf} |
|
520 \end{center} |
|
521 |
|
522 \noindent An attempt to untangle the web of current technology |
|
523 for spying on consumers is published in: |
|
524 |
|
525 \begin{center} |
|
526 \url{http://cyberlaw.stanford.edu/files/publication/files/trackingsurvey12.pdf} |
|
527 \end{center} |
|
528 |
|
529 \noindent An article that sheds light on the paradox that |
|
530 people usually worry about privacy invasions of little |
|
531 significance, and overlook the privacy invasion that might |
|
532 cause significant damage: |
|
533 |
|
534 \begin{center} |
|
535 \url{http://www.heinz.cmu.edu/~acquisti/papers/Acquisti-Grossklags-Chapter-Etrics.pdf} |
|
536 \end{center} |
|
537 |
|
538 |
|
539 Interesting ideas |
|
540 |
|
541 \begin{center} |
|
542 \url{https://adnauseam.io} |
|
543 \end{center} |
|
544 |
31 \noindent |
545 \noindent |
32 Take the first non-small word\footnote{Let's say the, a, an, |
546 And a paper that predicts ad-blockers will in the end win over anti-ad-blocking: |
33 or, and \ldots fall into the category of small words.} in this definition, |
547 |
34 in this case \textit{individual}, and look up the definition |
548 \begin{center} |
35 of this word, say |
549 \url{http://randomwalker.info/publications/ad-blocking-framework-techniques.pdf} |
36 |
550 \end{center} |
37 \begin{quote} |
551 |
38 ``a single \textit{human} being as distinct from a group'' |
|
39 \end{quote} |
|
40 |
|
41 \noindent |
|
42 In this definition take the second non-small word, that |
|
43 is \textit{human}, and again look up the definition of this |
|
44 word. This will yield |
|
45 |
|
46 \begin{quote} |
|
47 ``relating to or \textit{characteristic} of humankind'' |
|
48 \end{quote} |
|
49 |
|
50 \noindent |
|
51 You could go on looking up the definition of the third |
|
52 non-small word in this definition and so on. But let us assume |
|
53 you agreed with Bob to stop after three iterations with the |
|
54 third non-article word in the last definition, that is |
|
55 \textit{or}. Now, instead of sending to Bob the solution |
|
56 \textit{folio}, you send to him \textit{characteristic}. |
|
57 |
|
58 How can Bob verify that you know the solution? Well, once he |
|
59 solved it himself, he can use the dictionary and follow the |
|
60 same ``trail'' as you did. If the final word agrees with what |
|
61 you had sent him, he must infer you knew the solution earlier |
|
62 than him. This protocol works like a one-way hash function |
|
63 because \textit{characteristic} does not give any hint as to |
|
64 what was the first word was. I leave you to think why this |
|
65 protocol avoids small words? |
|
66 |
|
67 After Bob found his solution and verified that according to |
|
68 the protocol it ``maps'' also to \textit{characteristic}, can |
|
69 he be entirely sure no cheating is going on? Not with 100\% |
|
70 certainty. It could have been possible that he was given |
|
71 \textit{characteristic} as the word, but it derived from a |
|
72 different word. This might seem very unlikely, but at least |
|
73 theoretical it is a possibility. Protocols based on |
|
74 zero-knowledge proofs will produce a similar result---they |
|
75 give an answer that might be erroneous in a very small number |
|
76 of cases. The point is to iterate them long enough so that the |
|
77 theoretical possibility of cheating is negligibly small. |
|
78 |
|
79 By the way, the authors of the paper ``Dismantling Megamos |
|
80 Crypto: Wirelessly Lockpicking a Vehicle Immobilizer'' who |
|
81 were barred from publishing their results used also a hash to |
|
82 prove they did the work and (presumably) managed to get into |
|
83 cars without a key; see Figure~\ref{paper}. This is very |
|
84 similar to the method above about crosswords: They like to |
|
85 prove that they did the work, but not giving out the |
|
86 ``solution''. But this also shows what the problem with such a |
|
87 method is: yes, we can hide the secret temporarily, but if |
|
88 somebody else wants to verify it, then the secret has to be |
|
89 made public. Bob needs to know that \textit{folio} is the |
|
90 solution before he can verify the claim of Alice that she had |
|
91 the solution first. Similarly with the car-crypto paper: we |
|
92 needed to wait until September 2015 when the authors were |
|
93 finally able to publish their findings in order to verify the |
|
94 hash. Zero-knowledge proofs, in contrast, can be immediately |
|
95 checked, even if the secret is not public yet and perhaps |
|
96 never will be. |
|
97 |
|
98 \begin{figure} |
|
99 \begin{center} |
|
100 \addtolength{\fboxsep}{4mm} |
|
101 \fbox{\includegraphics[scale=0.4]{../pics/Dismantling_Megamos_Crypto.png}} |
|
102 \end{center} |
|
103 \caption{The authors of this paper used a hash in order to prove |
|
104 later that they have managed to break into cars.\label{paper}} |
|
105 \end{figure} |
|
106 |
|
107 |
|
108 \subsubsection*{ZKP: An Illustrative Example} |
|
109 |
|
110 The idea behind zero-knowledge proofs is not very obvious and |
|
111 will surely take some time for you to digest. Therefore let us |
|
112 start with a simple illustrative example. The example will not |
|
113 be perfect, but hopefully explain the gist of the idea. The |
|
114 example is called Alibaba's cave, which graphically looks as |
|
115 follows:\footnote{The example is taken from an |
|
116 article titled ``How to Explain Zero-Knowledge Protocols |
|
117 to Your Children'' available from |
|
118 \url{http://pages.cs.wisc.edu/~mkowalcz/628.pdf}.} |
|
119 |
|
120 \begin{center} |
|
121 \begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c} |
|
122 \includegraphics[scale=0.1]{../pics/alibaba1.png} & |
|
123 \includegraphics[scale=0.1]{../pics/alibaba2.png} & |
|
124 \includegraphics[scale=0.1]{../pics/alibaba3.png} \\ |
|
125 Step 1 & Step 2 & Step 3 |
|
126 \end{tabular} |
|
127 \end{center} |
|
128 |
|
129 \noindent Let us take a closer look at the picture in Step 1: |
|
130 The cave has a tunnel which forks at some point. Both forks |
|
131 are connected in a loop. At the deep end of the loop is a |
|
132 magic wand. The point of the magic wand is that Alice knows |
|
133 the secret word for how to open it. She wants to keep the word |
|
134 secret, but wants to convince Bob that she knows it. |
|
135 |
|
136 One way of course would be to let Bob follow her, but then he |
|
137 would also find out the secret. So this does not work. To find |
|
138 a solution, let us first fix the rules: At the beginning Alice |
|
139 and Bob are outside the cave. Alice goes in alone and takes |
|
140 either tunnel labelled $A$ in the picture, or the other tunnel |
|
141 labelled $B$. She waits at the magic wand for the instructions |
|
142 from Bob, who also goes into the gave and observes what |
|
143 happens at the fork. He has no knowledge which tunnel Alice |
|
144 took and calls out (in Step 2) that she should emerge from tunnel |
|
145 $A$, for example. If she knows the secret for opening the |
|
146 wand, this will not be a problem for Alice. If she was already |
|
147 in the $A$-segment of the tunnel, then she just comes back. If |
|
148 she was in the $B$-segment of the tunnel she will say the magic |
|
149 word and goes through the wand to emerge from $A$ as requested |
|
150 by Bob. |
|
151 |
|
152 Let us have a look at the case where Alice cheats, that is not |
|
153 knows the secret. She would still go into the cave and use, |
|
154 for example the $B$-segment of the tunnel. If now Bob says she |
|
155 should emerge from $B$, she is lucky. But if he says she |
|
156 should emerge from $A$ then Alice is in trouble: Bob will find |
|
157 out she does not actually know the secret. So in order to fool |
|
158 Bob she needs to anticipate his call, and already go into the |
|
159 corresponding tunnel. This of course also does not work, since |
|
160 Bob can make any call he wants. Consequently in order to find |
|
161 out whether Alice cheats, Bob just needs to repeat this |
|
162 protocol many times. Each time Alice has a chance of |
|
163 $\frac{1}{2}$ to be lucky or being found out. Iterating this |
|
164 $n$ times means she must be right every time and when |
|
165 cheating: the probability for this is $\frac{1}{2}^n$, number |
|
166 that for already relatively small $n$, say 10, is incredibly |
|
167 small. |
|
168 |
|
169 |
|
170 There are some interesting observations we can make about |
|
171 Alibaba's cave and the ZKP protocol between Alice and Bob: |
|
172 |
|
173 \begin{itemize} |
|
174 \item {\bf Completeness} If Alice knows the secret, Bob |
|
175 accepts Alice ``proof'' for sure. There is no error |
|
176 possible in that Bob thinks Alice cheats when she |
|
177 actually knows the secret. |
|
178 \item {\bf Soundness} If Alice does not know the secret, |
|
179 Bob accepts her ``proof'' with a very small probability. |
|
180 If, as in the example above, the probability of being |
|
181 able to hide cheating is $\frac{1}{2}$ in each round |
|
182 it will be $\frac{1}{2}^n$ after $n$-rounds, which even |
|
183 for small $n$ say $> 10$ is very small indeed. |
|
184 \item {\bf Zero-Knowledge} Even if Bob accepts |
|
185 the proof by Alice, he cannot convince anybody else. |
|
186 \end{itemize} |
|
187 |
|
188 \noindent The last property is the most interesting one. |
|
189 Assume Alice has convinced Bob that she knows the secret and |
|
190 Bob filmed the whole protocol with a camera. Can he use the |
|
191 video to convince anybody else? After a moment of thought, you |
|
192 will agree that this is not the case. Alice and Bob might have |
|
193 just made it all up and colluded by Bob telling Alice |
|
194 beforehand which tunnel he will call. In this way it appears |
|
195 as if all iterations of the protocol were successful, but they |
|
196 prove nothing. If another person wants to find out whether |
|
197 Alice knows the secret, he or she would have to conduct the |
|
198 protocol again. This is actually the formal definition of a |
|
199 zero-knowledge proof: an independent observer cannot |
|
200 distinguish between a real protocol (where Alice knows the |
|
201 secret) and a fake one (where Bob and Alice colluded). |
|
202 |
|
203 |
|
204 \subsubsection*{Using an Graph-Isomorphism Problem for ZKPs} |
|
205 |
|
206 Now the question is how can we translate Alibaba's cave into a |
|
207 computer science solution? It turns out we need an NP problem |
|
208 for that. The main feature of an NP problem is that it is |
|
209 computational very hard to generate a solution, but it is very |
|
210 easy to check whether a given solution indeed solves the |
|
211 problem at hand.\footnote{The question whether $P = NP$ or not, |
|
212 we leave to others to speculate about.} |
|
213 |
|
214 One NP problem is the \emph{graph isomorphism problem}. It |
|
215 essentially determines whether the following two graphs, say |
|
216 $G_1$ and $G_2$, can be moved and stretched so that they look |
|
217 exactly the same. |
|
218 |
|
219 \begin{center} |
|
220 \begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c} |
|
221 $G_1$ & $G_2$\\ |
|
222 \raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple.png}} & |
|
223 \raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple-b.png}}& |
|
224 |
|
225 \footnotesize |
|
226 \begin{tabular}{rl} |
|
227 Graph $G_1$ & Graph $G_2$\\ |
|
228 a & 1\\ |
|
229 b & 6\\ |
|
230 c & 8\\ |
|
231 d & 3\\ |
|
232 g & 5\\ |
|
233 h & 2\\ |
|
234 i & 4\\ |
|
235 j & 7\\ |
|
236 \end{tabular} |
|
237 \end{tabular} |
|
238 \end{center} |
|
239 |
|
240 \noindent The table on the right gives a mapping of the nodes |
|
241 of the first graph to the nodes of the second. With this |
|
242 mapping we can check: node $a$ is connected in $G_1$ with $g$, |
|
243 $h$ and $i$. Node $a$ maps to node $1$ in $G_2$, which is |
|
244 connected to $2$, $4$ and $5$, which again correspond via the |
|
245 mapping to $h$, $i$ and $g$ respectively. Let us write |
|
246 $\sigma$ for such a table and let us write |
|
247 |
|
248 \[G_1 = \sigma(G_2)\] |
|
249 |
|
250 \noindent for two isomorphic graphs ($\sigma$ being the |
|
251 isomorphism). It is actually very easy to construct two |
|
252 isomorphic graphs: Start with an arbitrary graph, re-label the |
|
253 nodes consistently. Alice will need to do this frequently |
|
254 for the protocol below. In order to be useful, we therefore |
|
255 would need to consider substantially larger graphs, like |
|
256 with thousand nodes. |
|
257 |
|
258 Now the secret which Alice tries to hide is the knowledge of |
|
259 such an isomorphism $\sigma$ between two such graphs. But she |
|
260 can convince Bob whether she knows such an isomorphism for two |
|
261 graphs, say $G_1$ and $G_2$, using the same idea as in the |
|
262 example with Alibaba's cave. For this Alice and Bob must |
|
263 follow the following protocol: |
|
264 |
|
265 \begin{enumerate} |
|
266 \item Alice generates an isomorphic graph $H$ which she sends |
|
267 to Bob (in each iteration she needs to generate a |
|
268 different $H$). |
|
269 \item |
|
270 After receiving $H$, Bob asks Alice either for an |
|
271 isomorphism between $G_1$ and $H$, or $G_2$ and $H$. |
|
272 \item Alice and Bob repeat this procedure $n$ times. |
|
273 \end{enumerate} |
|
274 |
|
275 \noindent In Step 1 it is important that Alice always |
|
276 generates a fresh isomorphic graph. I let you think what |
|
277 would happen if Alice sends out twice the same graph $H$. |
|
278 |
|
279 As said before, this is relatively easy to generate by |
|
280 consistently relabelling nodes. If she started from $G_1$, |
|
281 Alice will have generated |
|
282 |
|
283 \begin{equation} |
|
284 H = \sigma'(G_1)\label{hiso} |
|
285 \end{equation} |
|
286 |
|
287 \noindent where $\sigma'$ is the isomorphism between $H$ and |
|
288 $G_1$. But she could equally have started from $G_2$. In the |
|
289 case where $G_1$ and $G_2$ are isomorphic, if $H$ is |
|
290 isomorphic with $G_1$, it will also be isomorphic with |
|
291 $G_2$, and vice versa. |
|
292 |
|
293 After generating $H$, Alice sends it to Bob. The important |
|
294 point is that she needs to ``commit'' to this $H$, therefore |
|
295 this kind of zero-knowledge protocols are called |
|
296 \emph{commitment protocols}. Only after receiving $H$, Bob |
|
297 will make up his mind about which isomorphism he asks |
|
298 for---whether between $H$ and $G_1$ or $H$ and $G_2$. For this |
|
299 he could flip a coin, since the choice should be as |
|
300 unpredictable for Alice as possible. Once Alice receives the |
|
301 request, she has to produce an isomorphism. If she generated |
|
302 $H$ as shown in \eqref{hiso} and is asked for an isomorphism |
|
303 between $H$ and $G_1$, she just sends $\sigma'$. If she had |
|
304 been asked for an isomorphism between $H$ and $G_2$, she just |
|
305 has to compose her secret isomorphism $\sigma$ and $\sigma'$. |
|
306 The main point for the protocol is that even knowing the |
|
307 isomorphism between $H$ and $G_1$ or $H$ and $G_2$, will not |
|
308 make the task easier to find the isomorphism between $G_1$ and |
|
309 $G_2$, which is the secret Alice tries to protect. |
|
310 |
|
311 In order to make it crystal clear how this protocol proceeds, |
|
312 let us give a version using our more formal notation for |
|
313 protocols: |
|
314 |
|
315 \begin{center} |
|
316 \begin{tabular}{lrl} |
|
317 0) & $A \to B:$ & $G_1$ and $G_2$\\ |
|
318 1a) & $A \to B:$ & $H_1$ \\ |
|
319 1b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_1$? |
|
320 \quad(or $G_2 \leftrightarrow H_1$)\\ |
|
321 1c) & $A \to B:$ & requested isomorphism\\ |
|
322 2a) & $A \to B:$ & $H_2$\\ |
|
323 2b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_2$? |
|
324 \quad(or $G_2 \leftrightarrow H_2$)\\ |
|
325 2c) & $A \to B:$ & requested isomorphism\\ |
|
326 & \ldots\\ |
|
327 \end{tabular} |
|
328 \end{center} |
|
329 |
|
330 \noindent As can be seen the protocol runs for some |
|
331 agreed number of iterations. The $H_i$ Alice needs to |
|
332 produce, need to be all distinct. I hope you now know |
|
333 why? |
|
334 |
|
335 It is also crucial that in each iteration, Alice first sends |
|
336 $H_i$ and then Bob can decide which isomorphism he wants: |
|
337 either $G_1 \leftrightarrow H_i$ or $G_2 \leftrightarrow H_i$. |
|
338 If somehow Alice can find out before she committed to $H_i$, |
|
339 she can cheat. For this assume Alice does \emph{not} know an |
|
340 isomorphism between $G_1$ and $G_2$. If she knows which |
|
341 isomorphism Bob will ask for she can craft $H$ in such a way |
|
342 that it is isomorphism with either $G_1$ or $G_2$ (but it |
|
343 cannot with both). Then in each case she would send Bob |
|
344 a correct answer and he would come to the conclusion that |
|
345 all is well. I let you also answer the question whether |
|
346 such a protocol run between Alice and Bob would convince |
|
347 a third person, say Pete. |
|
348 |
|
349 Since the interactive nature of the above PKZ protocol and the |
|
350 correct ordering of the messages is so important for the |
|
351 ``correctness'' of the protocol, it might look surprising that |
|
352 the same goal can also me achieved in a completely offline |
|
353 manner. By this I mean Alice can publish all data at once, and |
|
354 then at a later time, Bob can inspect the data and come to the |
|
355 conclusion whether or not Alice knows the secret (again |
|
356 without actually learning about the secret). For this |
|
357 Alice has to do the following: |
|
358 |
|
359 \begin{enumerate} |
|
360 \item Alice generates $n$ isomorphic graphs |
|
361 $H_{1..n}$ (they need to be all distinct) |
|
362 \item she feeds the $H_{1..n}$ into a hashing function |
|
363 (for example encoded as as matrix) |
|
364 \item she takes the first $n$ bits of the output of the hashing |
|
365 function: |
|
366 whenever the output is $0$, she shows an isomorphism |
|
367 with $G_1$; for $1$ she shows an isomorphism |
|
368 with $G_2$ |
|
369 \end{enumerate} |
|
370 |
|
371 \noindent The reason why this works and achieves the same |
|
372 goal as the interactive variant is that Alice has no |
|
373 control over the hashing function. It would be |
|
374 computationally just too hard to assemble a set of |
|
375 $H_{1..n}$ such that she can force where $0$s and $1$s |
|
376 in the hash values are such that it would pass an external |
|
377 test. The point is that Alice can publish all this data |
|
378 on the comfort of her own web-page, for example, and |
|
379 in this way convince everybody who bothers to check. |
|
380 |
|
381 The virtue of the use of graphs and isomorphism for a |
|
382 zero-knowledge protocol is that the idea why it works |
|
383 are relatively straightforward. The disadvantage is |
|
384 that encoding any secret into a graph-isomorphism, while |
|
385 possible, is awkward. The good news is that in fact |
|
386 any NP problem can be used as part of a ZKP protocol. |
|
387 |
|
388 |
|
389 \subsubsection*{Using Modular Logarithms for ZKP Protocols} |
|
390 |
|
391 While information can be encoded into graph isomorphisms, it |
|
392 is not the most convenient carrier of information. Clearly it |
|
393 is much easier to encode information into numbers. Let us look |
|
394 at zero-knowledge proofs that use numbers as secrets. For this |
|
395 the underlying NP-problem is to calculate discrete logarithms. |
|
396 It can be used by choosing public numbers $A$, $B$, $p$, and |
|
397 private $x$ such that |
|
398 |
|
399 \begin{center} |
|
400 $A^x \equiv B\; mod\; p$ |
|
401 \end{center} |
|
402 |
|
403 \noindent holds. The secret Alice tries to keep secret is $x$. |
|
404 The point of the modular logarithm is that it is very hard |
|
405 from the public data to calculate $x$ (for large primes). |
|
406 Now the protocol proceeds in three stages: |
|
407 |
|
408 \begin{itemize} |
|
409 \item {\bf Commitment Stage} |
|
410 \begin{enumerate} |
|
411 \item Alice generates $z$ random numbers $r_1, \ldots, r_z$, |
|
412 all less than $p - 1$. Alice then sends Bob for all |
|
413 $i = 1,\ldots, z$: |
|
414 \[ h_i = A^{r_i}\; mod\; p\] |
|
415 \item Bob generates $z$ random bits, say $b_1,\ldots, b_z$. He can do this |
|
416 by flipping $z$ times a coin, for example. |
|
417 \item For each bit $b_i$, Alice sends Bob an $s_i$ where |
|
418 |
|
419 \begin{center} |
|
420 \begin{tabular}{ll} |
|
421 if $b_i = 0$: & $s_i = r_i$\\ |
|
422 if $b_i = 1$: & $s_i = (r_i - r_j) \;mod\; (p -1)$\\ |
|
423 \end{tabular} |
|
424 \end{center} |
|
425 |
|
426 where $r_j$ is the lowest $j$ where $b_j = 1$. |
|
427 |
|
428 \end{enumerate} |
|
429 \end{itemize} |
|
430 |
|
431 \noindent For understanding the last step, let $z$ be just 4. |
|
432 We have four random values $r_i$ chosen by Alice and four |
|
433 random bits $b_i$ chosen subsequently by Bob, for example |
|
434 |
|
435 \begin{center} |
|
436 \begin{tabular}{lcccc} |
|
437 $r_i$:\; & 4 & 9 & 1 & 3\\ |
|
438 $b_i$:\; & 0 & 1 & 0 & 1\\ |
|
439 & & $\uparrow$ \\ |
|
440 & & $j$ |
|
441 \end{tabular} |
|
442 \end{center} |
|
443 |
|
444 \noindent The highlighted column is the lowest where $b_i = |
|
445 1$ (counted from the left). That means $r_j = 9$. The reason |
|
446 for letting Alice choose the random numbers $r_1, \ldots, r_z$ |
|
447 will become clear shortly. Next is the confirmation |
|
448 phase where Bob essentially checks whether Alice has sent |
|
449 him ``correct'' $s_i$ and $h_i$. |
|
450 |
|
451 \begin{itemize} |
|
452 \item {\bf Confirmation Stage} |
|
453 \begin{enumerate} |
|
454 \item For each $b_i$ Bob checks whether $s_i$ |
|
455 conform to the protocol |
|
456 |
|
457 \begin{center} |
|
458 \begin{tabular}{ll} |
|
459 if $b_i = 0$: & $A^{s_i} \stackrel{?}{\equiv} h_i\;mod\;p$\\ |
|
460 if $b_i = 1$: & $A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p$\\ |
|
461 \end{tabular} |
|
462 \end{center} |
|
463 \end{enumerate} |
|
464 \end{itemize} |
|
465 |
|
466 \noindent To understand the case for $b_i = 1$, you have |
|
467 to do the following calculation: |
|
468 |
|
469 \begin{center} |
|
470 \begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l} |
|
471 $A^{s_i}$ & $=$ & $A^{r_i - r_j}$\\ |
|
472 & $=$ & $A^{r_i} * A^{-r_j}$\\ |
|
473 & $=$ & $h_{r_i} * h_{r_j}^{-1}\;mod\;p$ |
|
474 \end{tabular} |
|
475 \end{center} |
|
476 |
|
477 \noindent What is interesting that so far nothing has been |
|
478 sent about $x$, which is the secret Alice has. Also notice |
|
479 that Bob does not know $r_j$. He received |
|
480 |
|
481 \begin{center} |
|
482 $r_j - r_j$, $r_m - r_j$, \ldots, $r_p - r_j \;mod \;p - 1$ |
|
483 \end{center} |
|
484 |
|
485 \noindent whenever his corresponding bits were $1$. So Bob |
|
486 does not know $r_j$ and also does not know any $r_i$ where the |
|
487 bit was $1$. Information about the $x$ is sent in the next |
|
488 stage (obviously not revealing $x$). |
|
489 |
|
490 \begin{itemize} |
|
491 \item {\bf Proving Stage} |
|
492 |
|
493 \begin{enumerate} |
|
494 \item Alice proves she knows $x$, the discrete log of $B$, |
|
495 by sending |
|
496 |
|
497 \begin{center} |
|
498 $s_{z+1} = x - r_j\;mod\;p-1$ |
|
499 \end{center} |
|
500 |
|
501 \item Bob confirms |
|
502 |
|
503 \begin{center} |
|
504 $A^{s_{z+1}} \stackrel{?}{\equiv} B * h_j^{-1} \;mod \; p$ |
|
505 \end{center} |
|
506 \end{enumerate} |
|
507 \end{itemize} |
|
508 |
|
509 \noindent To understand the last step, you have to do the trick |
|
510 again that |
|
511 |
|
512 \[A^{s_{z+1}} = A^{x-r_j} = \ldots |
|
513 \] |
|
514 |
|
515 \noindent |
|
516 which I leave to you. |
|
517 |
|
518 Now the question is how can Alice cheat? In order to cheat she |
|
519 has to coordinate what she sends as $h_i$ in step 1 and $s_i$ |
|
520 in step 3 of the commitment stage, and also what to send as |
|
521 $s_{z+1}$ in the proving stage. For the latter of course |
|
522 Alice does not know $x$, so she just chooses some random |
|
523 number for $s_{z+1}$ and calculates |
|
524 |
|
525 \[A^{s_{z+1}}\] |
|
526 |
|
527 \noindent |
|
528 and then solves the equation |
|
529 |
|
530 \[A^{s_{z+1}} \equiv B * y \;mod\;p\] |
|
531 |
|
532 \noindent for $y$. This is easy since no logarithm needs to be |
|
533 computed. If Alice can guess the $j$ where the first 1 will |
|
534 appear in Bob's bit vector, then she sends the inverse of $y$ |
|
535 as $h_j$ and 0 as $s_j$. However, notice that when she |
|
536 calculates a solution for $y$ she does not know $r_j$. For this she |
|
537 would need to calculate the modular logarithm |
|
538 |
|
539 \[y \equiv A^{r_j}\;mod\;p\] |
|
540 |
|
541 \noindent which is hard (see step 1 in the commitment stage). |
|
542 |
|
543 Having settled on what $h_j$ should be, now what should Alice |
|
544 send as the other $h_i$ and other $s_i$? If the $b_i$ happens |
|
545 to be a 1, then the $h_i$ and other $s_i$ need to satisfy the |
|
546 test |
|
547 |
|
548 \[A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p\] |
|
549 |
|
550 \noindent where she has already settled on the value of |
|
551 $h_j^{-1}$. Lets say she choses $s_i$ at random, then she just |
|
552 needs to solve |
|
553 |
|
554 \[A^{s_i} \equiv z * h_j^{-1} \;mod\; p\] |
|
555 |
|
556 \noindent for $z$. Again that is easy, but it does not allow |
|
557 us to know $r_i$, because then we would again need to solve |
|
558 a modular logarithm problem. Let us call an $h_i$ which was |
|
559 solved the easy way as \emph{bogus}. Alice has to produce |
|
560 bogus $h_i$ for all bits that are going to be $1$ in advance! |
|
561 This means she has to guess all the bits correctly. (Yes? |
|
562 I let you think about this.) |
|
563 |
|
564 Let us see what happens if she guesses wrongly: Suppose the |
|
565 bit $b_i = 1$ where she thought she will get a 0. Then she has |
|
566 already sent an $h_i$ and $h_j$ and now must find an $s_i$ |
|
567 such that |
|
568 |
|
569 \[A^{s_i} \equiv h_i * h_j^{-1} \;mod\; p\] |
|
570 |
|
571 \noindent holds. For this remember in calculating $h_i$, she |
|
572 just chose a random $s_i$. Now she has to send a genuine one. |
|
573 But this is of course too hard. If she knew the genuine $r_i$ |
|
574 and $r_j$ for $h_i$ and $h_j$, it would be easy (in this case |
|
575 $s_i = r_i - r_j$). But she does not. So she will be found |
|
576 out. If $b_i = 0$, but she thought she will get a 1, then |
|
577 she has to send a $s_i$ which satisfies |
|
578 |
|
579 \[A^{s_i} \equiv h_i\;mod\;p\] |
|
580 |
|
581 \noindent Again she does not know $r_i$. So it is a too hard |
|
582 task and she will be found out again. |
|
583 |
|
584 To sum up, in order for Alice to successfully cheat Bob, she |
|
585 needs to guess \emph{all} bits correctly. She has only a |
|
586 $\frac{1}{2^z}$ chance of doing this. |
|
587 |
|
588 \subsubsection*{Further Reading} |
|
589 |
|
590 Make sure you understand what NP problems |
|
591 are.\footnote{\url{http://en.wikipedia.org/wiki/NP_(complexity)}} |
|
592 They are the building blocks for zero-knowledge proofs. |
|
593 Zero-Knowldege proofs are not yet widely used in production |
|
594 systems, but it is slowly gaining ground. One area of application |
|
595 where they pop up is crypto currencies (for example Zerocoins |
|
596 or how to make sure a Bitcoin exchange is solvent without |
|
597 revealing its assets). |
|
598 |
|
599 If you want to brush up on the modular logarithm problem, |
|
600 the Khan Academy has a nice video: |
|
601 |
|
602 \begin{center} |
|
603 \url{https://www.khanacademy.org/video/discrete-logarithm-problem} |
|
604 \end{center} |
|
605 |
552 |
606 \end{document} |
553 \end{document} |
607 |
554 |
608 http://blog.cryptographyengineering.com/2014/11/zero-knowledge-proofs-illustrated-primer.html |
555 http://randomwalker.info/teaching/fall-2012-privacy-technologies/? |
609 |
556 http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/ |
610 http://btravers.weebly.com/uploads/6/7/2/9/6729909/zero_knowledge_technique.pdf |
557 http://repository.cmu.edu/cgi/viewcontent.cgi?article=1077&context=hcii |
611 http://zk-ssh.cms.ac/docs/Zero_Knowledge_Prinzipien.pdf |
558 https://josephhall.org/papers/NYU-MCC-1303-S2012_privacy_syllabus.pdf |
612 http://www.wisdom.weizmann.ac.il/~oded/PS/zk-tut02v4.ps |
559 http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf |
613 |
560 http://www.cs.cmu.edu/~yuxiangw/docs/Differential%20Privacy.pdf |
614 socialist millionares problem |
561 https://www.youtube.com/watch?v=Gx13lgEudtU |
615 http://en.wikipedia.org/wiki/Socialist_millionaire |
562 https://www.cs.purdue.edu/homes/ctask/pdfs/CERIAS_Presentation.pdf |
616 http://twistedoakstudios.com/blog/Post3724_explain-it-like-im-five-the-socialist-millionaire-problem-and-secure-multi-party-computation |
563 http://www.futureofprivacy.org/wp-content/uploads/Differential-Privacy-as-a-Response-to-the-Reidentification-Threat-Klinefelter-and-Chin.pdf |
617 |
564 http://www.cis.upenn.edu/~aaroth/courses/slides/Overview.pdf |
|
565 http://www.cl.cam.ac.uk/~sjm217/papers/tor14design.pdf |
618 |
566 |
619 %%% Local Variables: |
567 %%% Local Variables: |
620 %%% mode: latex |
568 %%% mode: latex |
621 %%% TeX-master: t |
569 %%% TeX-master: t |
622 %%% End: |
570 %%% End: |