On the complexity of optimal K-anonymity

TLDR

k‑anonymization is a data‑release technique that preserves privacy and integrity. The study proposes a polynomial‑time algorithm for optimal k‑anonymity with a size‑independent approximation when k is constant. The algorithm achieves an O(k log k) approximation with a constant ≤4, though its runtime is exponential in k; a refined version removes this dependence and yields an O(k log m) approximation where m is the relation degree. The authors prove that optimal k‑anonymization is NP‑hard in two general settings, yet the presented algorithm may run efficiently in practice.

Abstract

The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are NP-hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4, However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice.

References

Page 1

	Year	Citations

Page 1