--- a/handouts/ho07.tex Fri Nov 14 01:17:38 2014 +0000
+++ b/handouts/ho07.tex Fri Nov 14 01:37:46 2014 +0000
@@ -58,13 +58,13 @@
contrary), and also not known for providing the fastest
possible speeds, but rather for being among the few ISPs in
the US with a quasi-monopolistic ``market distribution''.
+
+
Well, we could go on and on\ldots{}and that has not even
started us yet with all the naughty things NSA \& Friends are
-up to.
-
-Why does privacy matter? Nobody, I think, has a conclusive
-answer to this question. Maybe the following four notions
-help with clarifying the overall picture somewhat:
+up to. Why does privacy matter? Nobody, I think, has a
+conclusive answer to this question yet. Maybe the following four
+notions help with clarifying the overall picture somewhat:
\begin{itemize}
\item \textbf{Secrecy} is the mechanism used to limit the
@@ -101,12 +101,12 @@
\noindent While this might provide us with some rough
definitions, the problem with privacy is that it is an
extremely fine line what should stay private and what should
-not. For example, since I am working in academia, I am very
-happy to be a digital exhibitionist: I am very happy to
-disclose all `trivia' related to my work on my personal
-web-page. This is a kind of bragging that is normal in
-academia (at least in the field of CS), even expected if you
-look for a job. I am even happy that Google maintains a
+not. For example, since I am working in academia, I am every
+so often very happy to be a digital exhibitionist: I am very
+happy to disclose all `trivia' related to my work on my
+personal web-page. This is a kind of bragging that is normal
+in academia (at least in the field of CS), even expected if
+you look for a job. I am even happy that Google maintains a
profile about all my academic papers and their citations.
On the other hand I would be very irritated if anybody I do
@@ -127,6 +127,12 @@
creditworthiness. But this concerns only a very small part of
the data that is held about me/you.
+Take the example of Stephen Hawking: when he was diagnosed
+with his disease, he was given a life expectancy of two years.
+If an employer would know about such problems, would they have
+employed Hawking? Now he is enjoying his 70+ birthday.
+Clearly personal medical data needs to stay private.
+
To cut a long story short, I let you ponder about the two
statements that often voiced in discussions about privacy:
@@ -151,22 +157,22 @@
governments \& Co collect people's deepest secrets, or
pictures of people's naked bodies, but an argument that
applies also in cases where governments ``only'' collect data
-relevant to, say, preventing terrorism. The fun is of course,
-in 2011 we could just not imagine that respected governments
-would do such infantile things as intercepting people's nude
-photos. Well, since Snowden we know some people at the NSA did
-and then shared such photos among colleagues as ``fringe
-benefit''.
+relevant to, say, preventing terrorism. The fun is of course
+that in 2011 we could just not imagine that respected
+governments would do such infantile things as intercepting
+people's nude photos. Well, since Snowden we know some people
+at the NSA did exactly that and then shared such photos among
+colleagues as ``fringe benefit''.
\subsubsection*{Re-Identification Attacks}
-Apart from philosophical arguments, there are fortunately also
-some real technical problems with privacy implications. The
-problem I want to focus on in this handout is how to safely
-disclose datasets containing potentially private data, say
-health data. What can go wrong with such disclosures can be
-illustrated with four examples:
+Apart from philosophical musings, there are fortunately also
+some real technical problems with privacy. The problem I want
+to focus on in this handout is how to safely disclose datasets
+containing potentially private data, say health data. What can
+go wrong with such disclosures can be illustrated with four
+well-known examples:
\begin{itemize}
\item In 2006, a then young company called Netflix offered a 1
@@ -187,28 +193,34 @@
suit against Netflix, which was only recently resolved
involving a lot of money.
-\item In the 1990ies, medical datasets were often made public for
- research purposes. This was done in anonymised form with
- names removed, but birth dates, gender, ZIP-code were
- retained. In one case where such data was made public
- about state employees in Massachusetts, the then
- governor assured the public that the released dataset
- protected patient privacy by deleting identifiers. A
- graduate student could not resist and cross-referenced
- public voter data with the data about birth dates,
- gender, ZIP-code. The result was that she could send
- the governor his own hospital record.
+\item In the 1990ies, medical datasets were often made public
+ for research purposes. This was done in anonymised form
+ with names removed, but birth dates, gender, ZIP-code
+ were retained. In one case where such data about
+ hospital visits of state employees in Massachusetts was
+ made public, the then governor assured the public that
+ the released dataset protected patient privacy by
+ deleting identifiers. A graduate student could not
+ resist cross-referencing public voter data with the
+ released data including birth dates, gender and
+ ZIP-code. The result was that she could send the
+ governor his own hospital record. It turns out that
+ birth dates, gender and ZIP-code uniquely identify 87\%
+ people in the US.
\item In 2006, AOL published 20 million Web search queries
- collected of 650,000 users (names had been deleted).
- This was again for research purposes. However, within
- days an old lady, Thelma Arnold, from Lilburn, Georgia,
- (11,596 inhabitants) was identified as user No.~4417749
- in this dataset. It turned out that search engine
- queries are windows into people's private lives.
+ collected from 650,000 users (names had been deleted).
+ This was again done for research purposes. However,
+ within days an old lady, Thelma Arnold, from Lilburn,
+ Georgia, (11,596 inhabitants) was identified as user
+ No.~4417749 in this dataset. It turned out that search
+ engine queries are deep windows into people's private
+ lives.
\item Genomic-Wide Association Studies (GWAS) was a public
database of gene-frequency studies linked to diseases.
+
+
you only needed partial DNA information in order to
identify whether an individual was part of the study —
DB closed in 2008