Several years ago we ran across some research that suggested that basic regression analysis and data science techniques employed against an anonymous data set (meaning a data set that has been striped of personally identifiable information) could infer the identity of people associated with individual records with with some degree of accuracy. However, a new study published in the journal Nature Communications finds that with as few as four associated data elements such as birthday, zip code, occupation, and gender the rate of re-identification is surprisingly high. The research reveals the relative risk of being discovered among anonymous data. Read the paper here... If you want to try it for yourself, you can test drive the simulator here and see just how easy it is for an algorithm to identify you from an aggregation of de-identified but related data. This concept is nothing new. If you have ever worked with classified data you know that even sensitive but unclassified information can become classified if enough of it is aggregated together to create a compelling 'circumstantial' view of the environment. This is a pretty powerful endorsement for non-persistence techniques in our book...
Share on Facebook
Share on Twitter
I'm busy working on my blog posts. Watch this space!