Concepedia

TLDR

Differential privacy has been extensively studied in academia but remains underexplored in industry, likely due to its stringent privacy guarantees. The study aims to implement three basic DP architectures in a deployed telco big data platform for data mining and to outline future research directions for practical industrial deployment. The authors deploy and evaluate three DP architectures—Hybridized DM, DB, and a third design—within the telco platform to support data mining applications. Experiments show that with a weak privacy budget (ε ≥ 3) all architectures incur less than 5% accuracy loss, while a strong budget (ε ≤ 0.1) causes 15–30% loss, with the Hybridized DM/DB design performing best; accuracy loss rises with feature variety and falls with larger training data volumes.

Abstract

Differential privacy (DP) has been widely explored in academia recently but less so in industry possibly due to its strong privacy guarantee. This paper makes the first attempt to implement three basic DP architectures in the deployed telecommunication (telco) big data platform for data mining applications. We find that all DP architectures have less than 5% loss of prediction accuracy when the weak privacy guarantee is adopted (e.g., privacy budget parameter ε ≥ 3). However, when the strong privacy guarantee is assumed (e.g., privacy budget parameter ε ≤ 0:1), all DP architectures lead to 15% ~ 30% accuracy loss, which implies that real-word industrial data mining systems cannot work well under such a strong privacy guarantee recommended by previous research works. Among the three basic DP architectures, the Hybridized DM (Data Mining) and DB (Database) architecture performs the best because of its complicated privacy protection design for the specific data mining algorithm. Through extensive experiments on big data, we also observe that the accuracy loss increases by increasing the variety of features, but decreases by increasing the volume of training data. Therefore, to make DP practically usable in large-scale industrial systems, our observations suggest that we may explore three possible research directions in future: (1) Relaxing the privacy guarantee (e.g., increasing privacy budget ε) and studying its effectiveness on specific industrial applications; (2) Designing specific privacy scheme for specific data mining algorithms; and (3) Using large volume of data but with low variety for training the classification models.

References

YearCitations

Page 1