56 \noindent How disgusting? Even worse, Verizon is not known for |
56 \noindent How disgusting? Even worse, Verizon is not known for |
57 being the cheapest ISP on the planet (completely the |
57 being the cheapest ISP on the planet (completely the |
58 contrary), and also not known for providing the fastest |
58 contrary), and also not known for providing the fastest |
59 possible speeds, but rather for being among the few ISPs in |
59 possible speeds, but rather for being among the few ISPs in |
60 the US with a quasi-monopolistic ``market distribution''. |
60 the US with a quasi-monopolistic ``market distribution''. |
|
61 |
|
62 |
61 Well, we could go on and on\ldots{}and that has not even |
63 Well, we could go on and on\ldots{}and that has not even |
62 started us yet with all the naughty things NSA \& Friends are |
64 started us yet with all the naughty things NSA \& Friends are |
63 up to. |
65 up to. Why does privacy matter? Nobody, I think, has a |
64 |
66 conclusive answer to this question yet. Maybe the following four |
65 Why does privacy matter? Nobody, I think, has a conclusive |
67 notions help with clarifying the overall picture somewhat: |
66 answer to this question. Maybe the following four notions |
|
67 help with clarifying the overall picture somewhat: |
|
68 |
68 |
69 \begin{itemize} |
69 \begin{itemize} |
70 \item \textbf{Secrecy} is the mechanism used to limit the |
70 \item \textbf{Secrecy} is the mechanism used to limit the |
71 number of principals with access to information (e.g., |
71 number of principals with access to information (e.g., |
72 cryptography or access controls). For example I better |
72 cryptography or access controls). For example I better |
99 \end{itemize} |
99 \end{itemize} |
100 |
100 |
101 \noindent While this might provide us with some rough |
101 \noindent While this might provide us with some rough |
102 definitions, the problem with privacy is that it is an |
102 definitions, the problem with privacy is that it is an |
103 extremely fine line what should stay private and what should |
103 extremely fine line what should stay private and what should |
104 not. For example, since I am working in academia, I am very |
104 not. For example, since I am working in academia, I am every |
105 happy to be a digital exhibitionist: I am very happy to |
105 so often very happy to be a digital exhibitionist: I am very |
106 disclose all `trivia' related to my work on my personal |
106 happy to disclose all `trivia' related to my work on my |
107 web-page. This is a kind of bragging that is normal in |
107 personal web-page. This is a kind of bragging that is normal |
108 academia (at least in the field of CS), even expected if you |
108 in academia (at least in the field of CS), even expected if |
109 look for a job. I am even happy that Google maintains a |
109 you look for a job. I am even happy that Google maintains a |
110 profile about all my academic papers and their citations. |
110 profile about all my academic papers and their citations. |
111 |
111 |
112 On the other hand I would be very irritated if anybody I do |
112 On the other hand I would be very irritated if anybody I do |
113 not know had a too close look on my private live---it |
113 not know had a too close look on my private live---it |
114 shouldn't be anybody's business. The reason is that knowledge |
114 shouldn't be anybody's business. The reason is that knowledge |
125 is, since recently, a law that allows you to check what |
125 is, since recently, a law that allows you to check what |
126 information is held about you for determining your |
126 information is held about you for determining your |
127 creditworthiness. But this concerns only a very small part of |
127 creditworthiness. But this concerns only a very small part of |
128 the data that is held about me/you. |
128 the data that is held about me/you. |
129 |
129 |
|
130 Take the example of Stephen Hawking: when he was diagnosed |
|
131 with his disease, he was given a life expectancy of two years. |
|
132 If an employer would know about such problems, would they have |
|
133 employed Hawking? Now he is enjoying his 70+ birthday. |
|
134 Clearly personal medical data needs to stay private. |
|
135 |
130 To cut a long story short, I let you ponder about the two |
136 To cut a long story short, I let you ponder about the two |
131 statements that often voiced in discussions about privacy: |
137 statements that often voiced in discussions about privacy: |
132 |
138 |
133 \begin{itemize} |
139 \begin{itemize} |
134 \item \textit{``You have zero privacy anyway. Get over it.''}\\ |
140 \item \textit{``You have zero privacy anyway. Get over it.''}\\ |
149 article carefully tries to construct an argument that does not |
155 article carefully tries to construct an argument that does not |
150 only attack the nothing-to-hide statement in cases where |
156 only attack the nothing-to-hide statement in cases where |
151 governments \& Co collect people's deepest secrets, or |
157 governments \& Co collect people's deepest secrets, or |
152 pictures of people's naked bodies, but an argument that |
158 pictures of people's naked bodies, but an argument that |
153 applies also in cases where governments ``only'' collect data |
159 applies also in cases where governments ``only'' collect data |
154 relevant to, say, preventing terrorism. The fun is of course, |
160 relevant to, say, preventing terrorism. The fun is of course |
155 in 2011 we could just not imagine that respected governments |
161 that in 2011 we could just not imagine that respected |
156 would do such infantile things as intercepting people's nude |
162 governments would do such infantile things as intercepting |
157 photos. Well, since Snowden we know some people at the NSA did |
163 people's nude photos. Well, since Snowden we know some people |
158 and then shared such photos among colleagues as ``fringe |
164 at the NSA did exactly that and then shared such photos among |
159 benefit''. |
165 colleagues as ``fringe benefit''. |
160 |
166 |
161 |
167 |
162 \subsubsection*{Re-Identification Attacks} |
168 \subsubsection*{Re-Identification Attacks} |
163 |
169 |
164 Apart from philosophical arguments, there are fortunately also |
170 Apart from philosophical musings, there are fortunately also |
165 some real technical problems with privacy implications. The |
171 some real technical problems with privacy. The problem I want |
166 problem I want to focus on in this handout is how to safely |
172 to focus on in this handout is how to safely disclose datasets |
167 disclose datasets containing potentially private data, say |
173 containing potentially private data, say health data. What can |
168 health data. What can go wrong with such disclosures can be |
174 go wrong with such disclosures can be illustrated with four |
169 illustrated with four examples: |
175 well-known examples: |
170 |
176 |
171 \begin{itemize} |
177 \begin{itemize} |
172 \item In 2006, a then young company called Netflix offered a 1 |
178 \item In 2006, a then young company called Netflix offered a 1 |
173 Mio \$ prize to anybody who could improve their movie |
179 Mio \$ prize to anybody who could improve their movie |
174 rating algorithm. For this they disclosed a dataset |
180 rating algorithm. For this they disclosed a dataset |
185 dataset: either by their ratings or by the dates the |
191 dataset: either by their ratings or by the dates the |
186 ratings were uploaded. The result was a class-action |
192 ratings were uploaded. The result was a class-action |
187 suit against Netflix, which was only recently resolved |
193 suit against Netflix, which was only recently resolved |
188 involving a lot of money. |
194 involving a lot of money. |
189 |
195 |
190 \item In the 1990ies, medical datasets were often made public for |
196 \item In the 1990ies, medical datasets were often made public |
191 research purposes. This was done in anonymised form with |
197 for research purposes. This was done in anonymised form |
192 names removed, but birth dates, gender, ZIP-code were |
198 with names removed, but birth dates, gender, ZIP-code |
193 retained. In one case where such data was made public |
199 were retained. In one case where such data about |
194 about state employees in Massachusetts, the then |
200 hospital visits of state employees in Massachusetts was |
195 governor assured the public that the released dataset |
201 made public, the then governor assured the public that |
196 protected patient privacy by deleting identifiers. A |
202 the released dataset protected patient privacy by |
197 graduate student could not resist and cross-referenced |
203 deleting identifiers. A graduate student could not |
198 public voter data with the data about birth dates, |
204 resist cross-referencing public voter data with the |
199 gender, ZIP-code. The result was that she could send |
205 released data including birth dates, gender and |
200 the governor his own hospital record. |
206 ZIP-code. The result was that she could send the |
|
207 governor his own hospital record. It turns out that |
|
208 birth dates, gender and ZIP-code uniquely identify 87\% |
|
209 people in the US. |
201 |
210 |
202 \item In 2006, AOL published 20 million Web search queries |
211 \item In 2006, AOL published 20 million Web search queries |
203 collected of 650,000 users (names had been deleted). |
212 collected from 650,000 users (names had been deleted). |
204 This was again for research purposes. However, within |
213 This was again done for research purposes. However, |
205 days an old lady, Thelma Arnold, from Lilburn, Georgia, |
214 within days an old lady, Thelma Arnold, from Lilburn, |
206 (11,596 inhabitants) was identified as user No.~4417749 |
215 Georgia, (11,596 inhabitants) was identified as user |
207 in this dataset. It turned out that search engine |
216 No.~4417749 in this dataset. It turned out that search |
208 queries are windows into people's private lives. |
217 engine queries are deep windows into people's private |
|
218 lives. |
209 |
219 |
210 \item Genomic-Wide Association Studies (GWAS) was a public |
220 \item Genomic-Wide Association Studies (GWAS) was a public |
211 database of gene-frequency studies linked to diseases. |
221 database of gene-frequency studies linked to diseases. |
|
222 |
|
223 |
212 you only needed partial DNA information in order to |
224 you only needed partial DNA information in order to |
213 identify whether an individual was part of the study — |
225 identify whether an individual was part of the study — |
214 DB closed in 2008 |
226 DB closed in 2008 |
215 |
227 |
216 \end{itemize} |
228 \end{itemize} |