# HG changeset patch # User Christian Urban # Date 1415929066 0 # Node ID 591b62e1f86a669de010ceb80a7fd0226903bbb4 # Parent b1ba3d88696ed5e0092543ffd96b8507574b6d05 updated diff -r b1ba3d88696e -r 591b62e1f86a handouts/ho07.pdf Binary file handouts/ho07.pdf has changed diff -r b1ba3d88696e -r 591b62e1f86a handouts/ho07.tex --- a/handouts/ho07.tex Fri Nov 14 01:17:38 2014 +0000 +++ b/handouts/ho07.tex Fri Nov 14 01:37:46 2014 +0000 @@ -58,13 +58,13 @@ contrary), and also not known for providing the fastest possible speeds, but rather for being among the few ISPs in the US with a quasi-monopolistic ``market distribution''. + + Well, we could go on and on\ldots{}and that has not even started us yet with all the naughty things NSA \& Friends are -up to. - -Why does privacy matter? Nobody, I think, has a conclusive -answer to this question. Maybe the following four notions -help with clarifying the overall picture somewhat: +up to. Why does privacy matter? Nobody, I think, has a +conclusive answer to this question yet. Maybe the following four +notions help with clarifying the overall picture somewhat: \begin{itemize} \item \textbf{Secrecy} is the mechanism used to limit the @@ -101,12 +101,12 @@ \noindent While this might provide us with some rough definitions, the problem with privacy is that it is an extremely fine line what should stay private and what should -not. For example, since I am working in academia, I am very -happy to be a digital exhibitionist: I am very happy to -disclose all `trivia' related to my work on my personal -web-page. This is a kind of bragging that is normal in -academia (at least in the field of CS), even expected if you -look for a job. I am even happy that Google maintains a +not. For example, since I am working in academia, I am every +so often very happy to be a digital exhibitionist: I am very +happy to disclose all `trivia' related to my work on my +personal web-page. This is a kind of bragging that is normal +in academia (at least in the field of CS), even expected if +you look for a job. I am even happy that Google maintains a profile about all my academic papers and their citations. On the other hand I would be very irritated if anybody I do @@ -127,6 +127,12 @@ creditworthiness. But this concerns only a very small part of the data that is held about me/you. +Take the example of Stephen Hawking: when he was diagnosed +with his disease, he was given a life expectancy of two years. +If an employer would know about such problems, would they have +employed Hawking? Now he is enjoying his 70+ birthday. +Clearly personal medical data needs to stay private. + To cut a long story short, I let you ponder about the two statements that often voiced in discussions about privacy: @@ -151,22 +157,22 @@ governments \& Co collect people's deepest secrets, or pictures of people's naked bodies, but an argument that applies also in cases where governments ``only'' collect data -relevant to, say, preventing terrorism. The fun is of course, -in 2011 we could just not imagine that respected governments -would do such infantile things as intercepting people's nude -photos. Well, since Snowden we know some people at the NSA did -and then shared such photos among colleagues as ``fringe -benefit''. +relevant to, say, preventing terrorism. The fun is of course +that in 2011 we could just not imagine that respected +governments would do such infantile things as intercepting +people's nude photos. Well, since Snowden we know some people +at the NSA did exactly that and then shared such photos among +colleagues as ``fringe benefit''. \subsubsection*{Re-Identification Attacks} -Apart from philosophical arguments, there are fortunately also -some real technical problems with privacy implications. The -problem I want to focus on in this handout is how to safely -disclose datasets containing potentially private data, say -health data. What can go wrong with such disclosures can be -illustrated with four examples: +Apart from philosophical musings, there are fortunately also +some real technical problems with privacy. The problem I want +to focus on in this handout is how to safely disclose datasets +containing potentially private data, say health data. What can +go wrong with such disclosures can be illustrated with four +well-known examples: \begin{itemize} \item In 2006, a then young company called Netflix offered a 1 @@ -187,28 +193,34 @@ suit against Netflix, which was only recently resolved involving a lot of money. -\item In the 1990ies, medical datasets were often made public for - research purposes. This was done in anonymised form with - names removed, but birth dates, gender, ZIP-code were - retained. In one case where such data was made public - about state employees in Massachusetts, the then - governor assured the public that the released dataset - protected patient privacy by deleting identifiers. A - graduate student could not resist and cross-referenced - public voter data with the data about birth dates, - gender, ZIP-code. The result was that she could send - the governor his own hospital record. +\item In the 1990ies, medical datasets were often made public + for research purposes. This was done in anonymised form + with names removed, but birth dates, gender, ZIP-code + were retained. In one case where such data about + hospital visits of state employees in Massachusetts was + made public, the then governor assured the public that + the released dataset protected patient privacy by + deleting identifiers. A graduate student could not + resist cross-referencing public voter data with the + released data including birth dates, gender and + ZIP-code. The result was that she could send the + governor his own hospital record. It turns out that + birth dates, gender and ZIP-code uniquely identify 87\% + people in the US. \item In 2006, AOL published 20 million Web search queries - collected of 650,000 users (names had been deleted). - This was again for research purposes. However, within - days an old lady, Thelma Arnold, from Lilburn, Georgia, - (11,596 inhabitants) was identified as user No.~4417749 - in this dataset. It turned out that search engine - queries are windows into people's private lives. + collected from 650,000 users (names had been deleted). + This was again done for research purposes. However, + within days an old lady, Thelma Arnold, from Lilburn, + Georgia, (11,596 inhabitants) was identified as user + No.~4417749 in this dataset. It turned out that search + engine queries are deep windows into people's private + lives. \item Genomic-Wide Association Studies (GWAS) was a public database of gene-frequency studies linked to diseases. + + you only needed partial DNA information in order to identify whether an individual was part of the study — DB closed in 2008