De-identified Data and Third Party Data Mining: The Risk of Re-identification of Personal Information

TLDR

Anonymized data can be re‑identified, as demonstrated by Netflix and AOL, exposing companies to liability and policy violations. The study underscores the need for stronger privacy safeguards and database security to prevent third‑party re‑identification of anonymized data.

Abstract

Recent computer science research demonstrates that anonymized data can sometimes be easily re-identified with particular individuals, despite companies’ attempts to isolate personal information. Netflix and AOL are two examples of companies that released personal data intended to be anonymous but which was reidentified with individual users with the use of very small amounts of auxiliary data. Re-identification of anonymized data may expose companies to increased liability, as the information may no longer be treated as anonymous. In addition, companies may violate their own privacy policies by releasing anonymous information to third parties that can be easily re-identified with individual users. The potential for third parties to re-identify anonymous information with its individual source indicates the need for both increased privacy protection of anonymized information and increased security for databases containing anonymized information.