Closeness: A New Privacy Measure for Data Publishing

TLDR

k‑anonymity requires each equivalence class to contain at least k records, but it cannot prevent attribute disclosure; ℓ‑diversity was introduced to address this by ensuring at least ℓ distinct sensitive values per class. The authors aim to demonstrate the limitations of ℓ‑diversity and introduce a new privacy notion, closeness. They define t‑closeness, requiring the sensitive attribute distribution in each equivalence class to differ from the overall distribution by at most t, extend it to (n,t)-closeness for higher utility, and propose two distance metrics for measuring distribution similarity. The study shows that ℓ‑diversity is neither necessary nor sufficient for preventing attribute disclosure, and demonstrates that closeness offers superior privacy protection through theoretical arguments and empirical experiments.

Abstract

The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain "identifying" attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The notion of ℓ-diversity has been proposed to address this; ℓ-diversity requires that each equivalence class has at least ℓ well-represented (in Section 2) values for each sensitive attribute. In this paper, we show that ℓ-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. Motivated by these limitations, we propose a new notion of privacy called "closeness." We first present the base model t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). We then propose a more flexible privacy model called (n,t)-closeness that offers higher utility. We describe our desiderata for designing a distance measure between two probability distributions and present two distance measures. We discuss the rationale for using closeness as a privacy measure and illustrate its advantages through examples and experiments.

References

Page 1

	Year	Citations

Page 1