Concepedia

TLDR

Publishing personal data while protecting sensitive attributes is a key challenge, and k‑anonymity—requiring each record to be indistinguishable from at least k − 1 others—has become a popular privacy definition, yet it is known to suffer from serious privacy issues. This article demonstrates two simple attacks that expose severe privacy flaws in k‑anonymized data and proposes ℓ‑diversity as a stronger privacy criterion to counter these attacks. The authors analyze the attacks in detail and develop ℓ‑diversity, then experimentally show that it is practical and can be implemented efficiently. The attacks reveal that low diversity of sensitive attributes and attackers’ background knowledge can compromise k‑anonymity, while ℓ‑diversity effectively mitigates these vulnerabilities according to the experimental evaluation.

Abstract

Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k -anonymity has gained popularity. In a k -anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain identifying attributes. In this article, we show using two simple attacks that a k -anonymized dataset has some subtle but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This is a known problem. Second, attackers often have background knowledge, and we show that k -anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks, and we propose a novel and powerful privacy criterion called ℓ-diversity that can defend against such attacks. In addition to building a formal foundation for ℓ-diversity, we show in an experimental evaluation that ℓ-diversity is practical and can be implemented efficiently.

References

YearCitations

Page 1