11 were such a novelty that most people did not know what to make |
11 were such a novelty that most people did not know what to make |
12 of them. The person with the red flag was intended to warn the |
12 of them. The person with the red flag was intended to warn the |
13 public, for example horse owners, about the impending |
13 public, for example horse owners, about the impending |
14 novelty---a car. In my humble opinion, we are at the same |
14 novelty---a car. In my humble opinion, we are at the same |
15 stage of development with privacy. Nobody really knows what it |
15 stage of development with privacy. Nobody really knows what it |
16 is about or what it is good for. All seems very hazy. The |
16 is about or what it is good for. All seems very hazy. There |
17 result is that the world of ``privacy'' looks a little bit |
17 are a few laws (cookie law, right-to-be-forgotten) which |
18 like the old Wild West. Anything seems to go. |
18 address problems with privacy, but even if they are well |
|
19 intentioned, they either back-fire or are already obsolete |
|
20 because of newer technologies. The result is that the world of |
|
21 ``privacy'' looks a little bit like the old Wild West. |
19 |
22 |
20 For example, UCAS, a charity set up to help students to apply |
23 For example, UCAS, a charity set up to help students to apply |
21 to universities, has a commercial unit that happily sells your |
24 to universities, has a commercial unit that happily sells your |
22 email addresses to anybody who forks out enough money in order |
25 email addresses to anybody who forks out enough money in order |
23 to be able to bombard you with spam. Yes, you can opt out very |
26 to be able to bombard you with spam. Yes, you can opt out very |
35 |
38 |
36 \url{http://www.ucas.com/about-us/inside-ucas/advertising-opportunities} |
39 \url{http://www.ucas.com/about-us/inside-ucas/advertising-opportunities} |
37 or |
40 or |
38 \url{http://www.theguardian.com/uk-news/2014/mar/12/ucas-sells-marketing-access-student-data-advertisers}} |
41 \url{http://www.theguardian.com/uk-news/2014/mar/12/ucas-sells-marketing-access-student-data-advertisers}} |
39 |
42 |
40 Another example: Verizon, an ISP who provides you with |
43 Another example: Verizon, an ISP who is supposed to provide |
41 connectivity, has found a ``nice'' side-business too: When you |
44 you just with connectivity, has found a ``nice'' side-business |
42 have enabled all privacy guards in your browser, the few you |
45 too: When you have enabled all privacy guards in your browser |
43 have at your disposal, Verizon happily adds a kind of cookie |
46 (the few you have at your disposal) Verizon happily adds a |
44 to your |
47 kind of cookie to your |
45 HTTP-requests.\footnote{\url{http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/}} |
48 HTTP-requests.\footnote{\url{http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/}} |
46 As shown in the picture below, this cookie will be sent to |
49 As shown in the picture below, this cookie will be sent to |
47 every web-site you visit. The web-sites then can forward the |
50 every web-site you visit. The web-sites then can forward the |
48 cookie to advertisers who in turn pay Verizon to tell them |
51 cookie to advertisers who in turn pay Verizon to tell them |
49 everything they want to know about the person who just made |
52 everything they want to know about the person who just made |
50 this request, that is you. |
53 this request, that is you. |
51 |
54 |
52 \begin{center} |
55 \begin{center} |
53 \includegraphics[scale=0.21]{../pics/verizon.png} |
56 \includegraphics[scale=0.19]{../pics/verizon.png} |
54 \end{center} |
57 \end{center} |
55 |
58 |
56 \noindent How disgusting? Even worse, Verizon is not known for |
59 \noindent How disgusting? Even worse, Verizon is not known for |
57 being the cheapest ISP on the planet (completely the |
60 being the cheapest ISP on the planet (completely the |
58 contrary), and also not known for providing the fastest |
61 contrary), and also not known for providing the fastest |
125 is, since recently, a law that allows you to check what |
129 is, since recently, a law that allows you to check what |
126 information is held about you for determining your |
130 information is held about you for determining your |
127 creditworthiness. But this concerns only a very small part of |
131 creditworthiness. But this concerns only a very small part of |
128 the data that is held about me/you. |
132 the data that is held about me/you. |
129 |
133 |
130 Take the example of Stephen Hawking: when he was diagnosed |
134 To see how private matter can lead really to the wrong |
131 with his disease, he was given a life expectancy of two years. |
135 conclusions, take the example of Stephen Hawking: When he was |
132 If an employer would know about such problems, would they have |
136 diagnosed with his disease, he was given a life expectancy of |
133 employed Hawking? Now he is enjoying his 70+ birthday. |
137 two years. If employers would know about such problems, would |
134 Clearly personal medical data needs to stay private. |
138 they have employed Hawking? Now, he is enjoying his 70+ |
|
139 birthday. Clearly personal medical data needs to stay private. |
|
140 A movie which has this topic as its main focus is Gattaca from |
|
141 1997.\footnote{\url{http://www.imdb.com/title/tt0119177/}} |
|
142 |
135 |
143 |
136 To cut a long story short, I let you ponder about the two |
144 To cut a long story short, I let you ponder about the two |
137 statements that often voiced in discussions about privacy: |
145 statements that often voiced in discussions about privacy: |
138 |
146 |
139 \begin{itemize} |
147 \begin{itemize} |
184 some information was perturbed (i.e., slightly |
193 some information was perturbed (i.e., slightly |
185 modified). |
194 modified). |
186 |
195 |
187 Two researchers had a closer look at this anonymised |
196 Two researchers had a closer look at this anonymised |
188 data and compared it with public data available from the |
197 data and compared it with public data available from the |
189 International Movie Database (IMDb). They found that 98 |
198 International Movie Database (IMDb). They found that |
190 \% of the entries could be re-identified in the Netflix |
199 98\% of the entries could be re-identified in the |
191 dataset: either by their ratings or by the dates the |
200 Netflix dataset: either by their ratings or by the dates |
192 ratings were uploaded. The result was a class-action |
201 the ratings were uploaded. The result was a class-action |
193 suit against Netflix, which was only recently resolved |
202 suit against Netflix, which was only recently resolved |
194 involving a lot of money. |
203 involving a lot of money. |
195 |
204 |
196 \item In the 1990ies, medical datasets were often made public |
205 \item In the 1990ies, medical datasets were often made public |
197 for research purposes. This was done in anonymised form |
206 for research purposes. This was done in anonymised form |
198 with names removed, but birth dates, gender, ZIP-code |
207 with names removed, but birth dates, gender and ZIP-code |
199 were retained. In one case where such data about |
208 were retained. In one case where such data about |
200 hospital visits of state employees in Massachusetts was |
209 hospital visits of state employees in Massachusetts was |
201 made public, the then governor assured the public that |
210 made public, the then governor assured the public that |
202 the released dataset protected patient privacy by |
211 the released dataset protected patient privacy by |
203 deleting identifiers. A graduate student could not |
212 deleting identifiers. |
204 resist cross-referencing public voter data with the |
213 |
205 released data including birth dates, gender and |
214 A graduate student could not resist cross-referencing |
206 ZIP-code. The result was that she could send the |
215 public voter data with the released data including birth |
207 governor his own hospital record. It turns out that |
216 dates, gender and ZIP-code. The result was that she |
208 birth dates, gender and ZIP-code uniquely identify 87\% |
217 could send the governor his own hospital record. It |
209 people in the US. |
218 turns out that birth dates, gender and ZIP-code uniquely |
|
219 identify 87\% of people in the US. This work resulted |
|
220 in a number of laws prescribing which private data |
|
221 cannot be released in such datasets. |
210 |
222 |
211 \item In 2006, AOL published 20 million Web search queries |
223 \item In 2006, AOL published 20 million Web search queries |
212 collected from 650,000 users (names had been deleted). |
224 collected from 650,000 users (names had been deleted). |
213 This was again done for research purposes. However, |
225 This was again done for research purposes. However, |
214 within days an old lady, Thelma Arnold, from Lilburn, |
226 within days an old lady, Thelma Arnold, from Lilburn, |
215 Georgia, (11,596 inhabitants) was identified as user |
227 Georgia, (11,596 inhabitants) was identified as user |
216 No.~4417749 in this dataset. It turned out that search |
228 No.~4417749 in this dataset. It turned out that search |
217 engine queries are deep windows into people's private |
229 engine queries are deep windows into people's private |
218 lives. |
230 lives. |
219 |
231 |
220 \item Genomic-Wide Association Studies (GWAS) was a public |
232 \item Genome-Wide Association Studies (GWAS) was a public |
221 database of gene-frequency studies linked to diseases. |
233 database of gene-frequency studies linked to diseases. |
|
234 It would essentially record that people who have a |
|
235 disease, say diabetes, have also these genes. In order |
|
236 to maintain privacy, the dataset would only include |
|
237 aggregate information. In case of DNA data this was |
|
238 achieved by mixing the DNA of many individuals (having |
|
239 a disease) into a single solution. Then this mixture |
|
240 was sequenced and included in the dataset. The idea |
|
241 was that the agregate information would still be helpful |
|
242 to researchers, but would protect the DNA data of |
|
243 individuals. |
|
244 |
|
245 In 2007 a forensic computer scientist showed that |
|
246 individuals can be still identified. For this he used |
|
247 the DNA data from a comparison group (people from the |
|
248 general public) and ``subtracted'' this data from the |
|
249 published data. He was left with data that included |
|
250 all ``special'' DNA-markers of the individuals |
|
251 present in the original mixture. He essentially deleted |
|
252 the ``background noise''. Now the problem with |
|
253 DNA data is that it is of such a high resolution that |
|
254 even if the mixture contained maybe 100 individuals, |
|
255 you can now detect whether an individual was included |
|
256 in the mixture or not. |
222 |
257 |
223 |
258 This result changed completely how DNA data is nowadays |
224 you only needed partial DNA information in order to |
259 published for research purposes. After the success of |
225 identify whether an individual was part of the study — |
260 the human-genome project with a very open culture of |
226 DB closed in 2008 |
261 exchanging data, it became much more difficult to |
|
262 anonymise datasuch that patient's privacy is preserved. |
|
263 The public GWAS database was taken offline in 2008. |
227 |
264 |
228 \end{itemize} |
265 \end{itemize} |
|
266 |
|
267 \noindent There are many lessons that can be learned from |
|
268 these examples. One is that when making data public in |
|
269 anonymised form you want to achieve \emph{forward privacy}. |
|
270 This means, no matter of what other data that is also available |
|
271 or will be released later, the data does not compromise |
|
272 an individual's privacy. This principle was violated by the |
|
273 data in the Netflix and governor of Massachusetts cases. There |
|
274 additional data allowed one to re-identify individuals in the |
|
275 dataset. In case of GWAS a new technique of re-identification |
|
276 compromised the privacy of people on the list. |
|
277 The case of the AOL dataset shows clearly how incomplete such |
|
278 data can be: Although the queries uniquely identified the |
|
279 old lady, she also looked up diseases that her friends had, |
|
280 which had nothing to do with her. Any rational analysis of her |
|
281 query data must have concluded, the lady is on her deathbed, |
|
282 while she was actually very much alive and kicking. |
|
283 |
|
284 \subsubsection*{Differential Privacy} |
|
285 |
|
286 Differential privacy is one of the few methods, that tries to |
|
287 achieve forward privacy with large datasets. The basic idea |
|
288 is to add appropriate noise, or errors, to any query of the |
|
289 dataset. The intention is to make the result of a query |
|
290 insensitive to individual entries in the database. The hope is |
|
291 that the added error does not eliminate the ``signal'' one is |
|
292 looking for by querying the dataset. |
|
293 |
|
294 |
|
295 |
|
296 \begin{center} |
|
297 User\;\;\;\; |
|
298 \begin{tabular}{c} |
|
299 tell me $f(x)$ $\Rightarrow$\\ |
|
300 $\Leftarrow$ $f(x) + \text{noise}$ |
|
301 \end{tabular} |
|
302 \;\;\;\;\begin{tabular}{@{}c} |
|
303 Database\\ |
|
304 $x_1, \ldots, x_n$ |
|
305 \end{tabular} |
|
306 \end{center} |
229 |
307 |
230 |
308 |
231 \end{document} |
309 \end{document} |
232 |
310 |
233 http://randomwalker.info/teaching/fall-2012-privacy-technologies/? |
311 http://randomwalker.info/teaching/fall-2012-privacy-technologies/? |