updated
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Fri, 14 Nov 2014 01:37:46 +0000
changeset 310 591b62e1f86a
parent 309 b1ba3d88696e
child 311 8befc029ca1e
updated
handouts/ho07.pdf
handouts/ho07.tex
Binary file handouts/ho07.pdf has changed
--- a/handouts/ho07.tex	Fri Nov 14 01:17:38 2014 +0000
+++ b/handouts/ho07.tex	Fri Nov 14 01:37:46 2014 +0000
@@ -58,13 +58,13 @@
 contrary), and also not known for providing the fastest
 possible speeds, but rather for being among the few ISPs in
 the US with a quasi-monopolistic ``market distribution''.
+
+
 Well, we could go on and on\ldots{}and that has not even
 started us yet with all the naughty things NSA \& Friends are
-up to. 
-
-Why does privacy matter? Nobody, I think, has a conclusive
-answer to this question. Maybe the following four notions
-help with clarifying the overall picture somewhat: 
+up to. Why does privacy matter? Nobody, I think, has a
+conclusive answer to this question yet. Maybe the following four
+notions help with clarifying the overall picture somewhat: 
 
 \begin{itemize}
 \item \textbf{Secrecy} is the mechanism used to limit the
@@ -101,12 +101,12 @@
 \noindent While this might provide us with some rough
 definitions, the problem with privacy is that it is an
 extremely fine line what should stay private and what should
-not. For example, since I am working in academia, I am very
-happy to be a digital exhibitionist: I am very happy to
-disclose all `trivia' related to my work on my personal
-web-page. This is a kind of bragging that is normal in
-academia (at least in the field of CS), even expected if you
-look for a job. I am even happy that Google maintains a
+not. For example, since I am working in academia, I am every
+so often very happy to be a digital exhibitionist: I am very
+happy to disclose all `trivia' related to my work on my
+personal web-page. This is a kind of bragging that is normal
+in academia (at least in the field of CS), even expected if
+you look for a job. I am even happy that Google maintains a
 profile about all my academic papers and their citations. 
 
 On the other hand I would be very irritated if anybody I do
@@ -127,6 +127,12 @@
 creditworthiness. But this concerns only a very small part of
 the data that is held about me/you.
 
+Take the example of Stephen Hawking: when he was diagnosed
+with his disease, he was given a life expectancy of two years.
+If an employer would know about such problems, would they have
+employed Hawking? Now he is enjoying his 70+ birthday.
+Clearly personal medical data needs to stay private.
+
 To cut a long story short, I let you ponder about the two
 statements that often voiced in discussions about privacy:
 
@@ -151,22 +157,22 @@
 governments \& Co collect people's deepest secrets, or
 pictures of people's naked bodies, but an argument that
 applies also in cases where governments ``only'' collect data
-relevant to, say, preventing terrorism. The fun is of course,
-in 2011 we could just not imagine that respected governments
-would do such infantile things as intercepting people's nude
-photos. Well, since Snowden we know some people at the NSA did
-and then shared such photos among colleagues as ``fringe
-benefit''.  
+relevant to, say, preventing terrorism. The fun is of course
+that in 2011 we could just not imagine that respected
+governments would do such infantile things as intercepting
+people's nude photos. Well, since Snowden we know some people
+at the NSA did exactly that and then shared such photos among
+colleagues as ``fringe benefit''.  
 
 
 \subsubsection*{Re-Identification Attacks} 
 
-Apart from philosophical arguments, there are fortunately also
-some real technical problems with privacy implications. The
-problem I want to focus on in this handout is how to safely
-disclose datasets containing potentially private data, say
-health data. What can go wrong with such disclosures can be
-illustrated with four examples:
+Apart from philosophical musings, there are fortunately also
+some real technical problems with privacy. The problem I want
+to focus on in this handout is how to safely disclose datasets
+containing potentially private data, say health data. What can
+go wrong with such disclosures can be illustrated with four
+well-known examples:
 
 \begin{itemize}
 \item In 2006, a then young company called Netflix offered a 1
@@ -187,28 +193,34 @@
       suit against Netflix, which was only recently resolved
       involving a lot of money.
 
-\item In the 1990ies, medical datasets were often made public for
-      research purposes. This was done in anonymised form with
-      names removed, but birth dates, gender, ZIP-code were
-      retained. In one case where such data was made public
-      about state employees in Massachusetts, the then
-      governor assured the public that the released dataset
-      protected patient privacy by deleting identifiers. A
-      graduate student could not resist and cross-referenced
-      public voter data with the data about birth dates,
-      gender, ZIP-code. The result was that she could send
-      the governor his own hospital record.
+\item In the 1990ies, medical datasets were often made public
+      for research purposes. This was done in anonymised form
+      with names removed, but birth dates, gender, ZIP-code
+      were retained. In one case where such data about
+      hospital visits of state employees in Massachusetts was
+      made public, the then governor assured the public that
+      the released dataset protected patient privacy by
+      deleting identifiers. A graduate student could not
+      resist cross-referencing public voter data with the
+      released data including birth dates, gender and
+      ZIP-code. The result was that she could send the
+      governor his own hospital record. It turns out that
+      birth dates, gender and ZIP-code uniquely identify 87\%
+      people in the US.
  
 \item In 2006, AOL published 20 million Web search queries
-      collected of 650,000 users (names had been deleted).
-      This was again for research purposes. However, within
-      days an old lady, Thelma Arnold, from Lilburn, Georgia,
-      (11,596 inhabitants) was identified as user No.~4417749
-      in this dataset. It turned out that search engine
-      queries are windows into people's private lives. 
+      collected from 650,000 users (names had been deleted).
+      This was again done for research purposes. However,
+      within days an old lady, Thelma Arnold, from Lilburn,
+      Georgia, (11,596 inhabitants) was identified as user
+      No.~4417749 in this dataset. It turned out that search
+      engine queries are deep windows into people's private
+      lives. 
   
 \item Genomic-Wide Association Studies (GWAS) was a public
       database of gene-frequency studies linked to diseases.
+      
+      
       you only needed partial DNA information in order to
       identify whether an individual was part of the study —
       DB closed in 2008