13 public, for example horse owners, about the impending |
13 public, for example horse owners, about the impending |
14 novelty---a car. In my humble opinion, we are at the same |
14 novelty---a car. In my humble opinion, we are at the same |
15 stage of development with privacy. Nobody really knows what it |
15 stage of development with privacy. Nobody really knows what it |
16 is about or what it is good for. All seems very hazy. The |
16 is about or what it is good for. All seems very hazy. The |
17 result is that the world of ``privacy'' looks a little bit |
17 result is that the world of ``privacy'' looks a little bit |
18 like the old Wild West. For example, UCAS, a charity set up to |
18 like the old Wild West. Anything seems to go. |
19 help students apply to universities, has a commercial unit |
19 |
20 that happily sells your email addresses to anybody who forks |
20 For example, UCAS, a charity set up to help students to apply |
21 out enough money in order to bombard you with spam. Yes, you |
21 to universities, has a commercial unit that happily sells your |
22 can opt out very often, but in case of UCAS any opt-out will |
22 email addresses to anybody who forks out enough money in order |
23 limit also legit emails you might actually be interested |
23 to be able to bombard you with spam. Yes, you can opt out very |
|
24 often in such ``schemes'', but in case of UCAS any opt-out |
|
25 will limit also legit emails you might actually be interested |
24 in.\footnote{The main objectionable point, in my opinion, is |
26 in.\footnote{The main objectionable point, in my opinion, is |
25 that the \emph{charity} everybody has to use for HE |
27 that the \emph{charity} everybody has to use for HE |
26 applications has actually very honourable goals (e.g.~assist |
28 applications has actually very honourable goals (e.g.~assist |
27 applicants in gaining access to universities), but in their |
29 applicants in gaining access to universities), but in their |
28 small print (or better under the link ``About us'') reveals |
30 small print (or better under the link ``About us'') reveals |
29 they set up their organisation so that they can also |
31 they set up their organisation so that they can also |
30 shamelessly sell email addresses the ``harvest''. Everything |
32 shamelessly sell email addresses they ``harvest''. Everything |
31 is of course very legal\ldots{}moral?\ldots{}well that is in |
33 is of course very legal\ldots{}moral?\ldots{}well that is in |
32 the eye of the beholder. See: |
34 the eye of the beholder. See: |
33 |
35 |
34 \url{http://www.ucas.com/about-us/inside-ucas/advertising-opportunities} |
36 \url{http://www.ucas.com/about-us/inside-ucas/advertising-opportunities} |
35 or |
37 or |
36 \url{http://www.theguardian.com/uk-news/2014/mar/12/ucas-sells-marketing-access-student-data-advertisers}} |
38 \url{http://www.theguardian.com/uk-news/2014/mar/12/ucas-sells-marketing-access-student-data-advertisers}} |
37 |
39 |
38 Verizon, an ISP who provides you with connectivity, has found |
40 Another example: Verizon, an ISP who provides you with |
39 a ``nice'' side-business too: When you have enabled all |
41 connectivity, has found a ``nice'' side-business too: When you |
40 privacy guards in your browser, the few you have at your |
42 have enabled all privacy guards in your browser, the few you |
41 disposal, Verizon happily adds a kind of cookie to your |
43 have at your disposal, Verizon happily adds a kind of cookie |
|
44 to your |
42 HTTP-requests.\footnote{\url{http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/}} |
45 HTTP-requests.\footnote{\url{http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/}} |
43 As shown in the picture below, this cookie will be sent to |
46 As shown in the picture below, this cookie will be sent to |
44 every web-site you visit. The web-sites then can forward the |
47 every web-site you visit. The web-sites then can forward the |
45 cookie to advertisers who in turn pay Verizon to tell them |
48 cookie to advertisers who in turn pay Verizon to tell them |
46 everything they want to know about the person who just made |
49 everything they want to know about the person who just made |
89 not like to disclose that I am pregnant, if I were |
92 not like to disclose that I am pregnant, if I were |
90 a woman, or that I am a father. Similarly, I might not |
93 a woman, or that I am a father. Similarly, I might not |
91 like to disclose my location data, because thieves might |
94 like to disclose my location data, because thieves might |
92 break into my house if they know I am away at work. |
95 break into my house if they know I am away at work. |
93 Privacy is essentially everything which `shouldn't be |
96 Privacy is essentially everything which `shouldn't be |
94 anybodies business'. |
97 anybody's business'. |
95 |
98 |
96 \end{itemize} |
99 \end{itemize} |
97 |
100 |
98 \noindent While this might provide us with some rough |
101 \noindent While this might provide us with some rough |
99 definitions, the problem with privacy is that it is an |
102 definitions, the problem with privacy is that it is an |
100 extremely fine line what should stay private and what should |
103 extremely fine line what should stay private and what should |
101 not. For example, since I am working in academia, I am very |
104 not. For example, since I am working in academia, I am very |
102 happy to be essentially a digital exhibitionist: I am happy to |
105 happy to be a digital exhibitionist: I am very happy to |
103 disclose all `trivia' related to my work on my personal |
106 disclose all `trivia' related to my work on my personal |
104 web-page. This is a kind of bragging that is normal in |
107 web-page. This is a kind of bragging that is normal in |
105 academia (at least in the CS field). I am even happy that |
108 academia (at least in the field of CS), even expected if you |
106 Google maintains a profile about all of my academic papers and |
109 look for a job. I am even happy that Google maintains a |
107 their citations. |
110 profile about all my academic papers and their citations. |
108 |
111 |
109 On the other hand I would be very peeved if anybody had a too |
112 On the other hand I would be very irritated if anybody I do |
110 close look on my private live---it shouldn't be anybodies |
113 not know had a too close look on my private live---it |
111 business. The reason is that knowledge about my private life |
114 shouldn't be anybody's business. The reason is that knowledge |
112 usually is used against me. As mentioned above, public |
115 about my private life usually is used against me. As mentioned |
113 location data might mean I get robbed. If supermarkets build a |
116 above, public location data might mean I get robbed. If |
114 profile of my shopping habits, they will use it to |
117 supermarkets build a profile of my shopping habits, they will |
115 \emph{their} advantage---surely not to \emph{my} advantage. |
118 use it to \emph{their} advantage---surely not to \emph{my} |
116 Also whatever might be collected about my life will always be |
119 advantage. Also whatever might be collected about my life will |
117 an incomplete, or even misleading, picture---I am sure my |
120 always be an incomplete, or even misleading, picture---for |
118 creditworthiness score was temporarily(?) destroyed by not |
121 example I am sure my creditworthiness score was temporarily(?) |
119 having a regular income in this country (before coming to |
122 destroyed by not having a regular income in this country |
120 King's I worked in Munich). To correct such incomplete or |
123 (before coming to King's I worked in Munich for five years). |
121 flawed data there is, since recently, a law that allows you to |
124 To correct such incomplete or flawed credit history data there |
122 check what information is held about you for determining your |
125 is, since recently, a law that allows you to check what |
|
126 information is held about you for determining your |
123 creditworthiness. But this concerns only a very small part of |
127 creditworthiness. But this concerns only a very small part of |
124 the data that is held about me/you. |
128 the data that is held about me/you. |
125 |
129 |
126 This is an endless field. I let you ponder about the two |
130 To cut a long story short, I let you ponder about the two |
127 statements that are often float about in discussions about |
131 statements that often voiced in discussions about privacy: |
128 privacy: |
|
129 |
132 |
130 \begin{itemize} |
133 \begin{itemize} |
131 \item \textit{``You have zero privacy anyway. Get over it.''}\\ |
134 \item \textit{``You have zero privacy anyway. Get over it.''}\\ |
132 \mbox{}\hfill{}Scott Mcnealy (CEO of Sun) |
135 \mbox{}\hfill{}{\small{}by Scott Mcnealy (CEO of Sun)} |
133 |
136 |
134 \item \textit{``If you have nothing to hide, you have nothing |
137 \item \textit{``If you have nothing to hide, you have nothing |
135 to fear.''} |
138 to fear.''} |
136 \end{itemize} |
139 \end{itemize} |
137 |
140 |
138 \noindent There are some technical problems that are easier to |
141 \noindent An article that attempts a deeper analysis appeared |
139 discuss and that often have privacy implications. The problem |
142 in 2011 in the Chronicle of Higher Education |
140 I want to focus on is how to safely disclose datasets. What |
143 |
141 can go wrong with this can be illustrated with three examples: |
144 \begin{center} |
|
145 \url{http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/} |
|
146 \end{center} |
|
147 |
|
148 \noindent Funnily, or maybe not so funnily, the author of this |
|
149 article carefully tries to construct an argument that does not |
|
150 only attack the nothing-to-hide statement in cases where |
|
151 governments \& Co collect people's deepest secrets, or |
|
152 pictures of people's naked bodies, but an argument that |
|
153 applies also in cases where governments ``only'' collect data |
|
154 relevant to, say, preventing terrorism. The fun is of course, |
|
155 in 2011 we could just not imagine that respected governments |
|
156 would do such infantile things as intercepting people's nude |
|
157 photos. Well, since Snowden we know some people at the NSA did |
|
158 and then shared such photos among colleagues as ``fringe |
|
159 benefit''. |
|
160 |
|
161 |
|
162 \subsubsection*{Re-Identification Attacks} |
|
163 |
|
164 Apart from philosophical arguments, there are fortunately also |
|
165 some real technical problems with privacy implications. The |
|
166 problem I want to focus on in this handout is how to safely |
|
167 disclose datasets containing potentially private data, say |
|
168 health data. What can go wrong with such disclosures can be |
|
169 illustrated with four examples: |
142 |
170 |
143 \begin{itemize} |
171 \begin{itemize} |
144 \item In 2006 a then young company called Netflix offered a 1 |
172 \item In 2006, a then young company called Netflix offered a 1 |
145 Mio \$ prize to anybody who could improve their movie |
173 Mio \$ prize to anybody who could improve their movie |
146 rating algorithm. For this they disclosed a dataset |
174 rating algorithm. For this they disclosed a dataset |
147 containing 10\% of all Netflix users (appr.~500K). They |
175 containing 10\% of all Netflix users at the time |
148 removed names, but included numerical ratings as well as |
176 (appr.~500K). They removed names, but included numerical |
149 times of ratings. Though some information was perturbed |
177 ratings of movies as well as times of ratings. Though |
150 (i.e., slightly modified). |
178 some information was perturbed (i.e., slightly |
|
179 modified). |
151 |
180 |
152 Two researchers took that data and compared it with |
181 Two researchers had a closer look at this anonymised |
153 public data available from the International Movie |
182 data and compared it with public data available from the |
154 Database (IMDb). They found that 98 \% of the entries |
183 International Movie Database (IMDb). They found that 98 |
155 could be re-identified: either by their ratings or by |
184 \% of the entries could be re-identified in the Netflix |
156 the dates the ratings were uploaded. |
185 dataset: either by their ratings or by the dates the |
157 |
186 ratings were uploaded. The result was a class-action |
158 \item In the 1990, medical databases were routinely made |
187 suit against Netflix, which was only recently resolved |
159 publicised for research purposes. This was done in |
188 involving a lot of money. |
160 anonymised form with names removed, but birth dates, |
189 |
161 gender, ZIP-code were retained. |
190 \item In the 1990ies, medical datasets were often made public for |
|
191 research purposes. This was done in anonymised form with |
|
192 names removed, but birth dates, gender, ZIP-code were |
|
193 retained. In one case where such data was made public |
|
194 about state employees in Massachusetts, the then |
|
195 governor assured the public that the released dataset |
|
196 protected patient privacy by deleting identifiers. A |
|
197 graduate student could not resist and cross-referenced |
|
198 public voter data with the data about birth dates, |
|
199 gender, ZIP-code. The result was that she could send |
|
200 the governor his own hospital record. |
|
201 |
|
202 \item In 2006, AOL published 20 million Web search queries |
|
203 collected of 650,000 users (names had been deleted). |
|
204 This was again for research purposes. However, within |
|
205 days an old lady, Thelma Arnold, from Lilburn, Georgia, |
|
206 (11,596 inhabitants) was identified as user No.~4417749 |
|
207 in this dataset. It turned out that search engine |
|
208 queries are windows into people's private lives. |
|
209 |
|
210 \item Genomic-Wide Association Studies (GWAS) was a public |
|
211 database of gene-frequency studies linked to diseases. |
|
212 you only needed partial DNA information in order to |
|
213 identify whether an individual was part of the study — |
|
214 DB closed in 2008 |
162 |
215 |
163 \end{itemize} |
216 \end{itemize} |
164 |
217 |
165 |
218 |
166 \end{document} |
219 \end{document} |