Using Randomized Response for Differential Privacy Preserving Data Collection

TLDR

The use of randomized response in surveys enables accurate population statistics while preserving individual privacy. The paper investigates using randomized response to enforce differential privacy in data collection and compares it to the standard Laplace mechanism. The authors implement a client‑side randomized algorithm that perturbs each value, analyze its mean‑squared‑error utility from binary to multichotomous attributes, derive explicit formulas proving superiority over the Laplace mechanism, and evaluate the approach on the YesiWell database with biomarker and social network data. Empirical results demonstrate that randomized response achieves lower utility loss than Laplace output perturbation, especially for high‑sensitivity functions.

Abstract

This paper studies how to enforce differential privacy by using the randomized response in the data collection scenario. Given a client’s value, the randomized algorithm executed by the client reports to the untrusted server a perturbed value. The use of randomized response in surveys enables easy estimations of accurate population statistics while preserving the privacy of the individual respondents. We compare the randomized response with the standard Laplace mechanism which is based on query-output independent adding of Laplace noise. Our research starts from the simple case with one single binary attribute and extends to the general case with multiple polychotomous attributes. We measure utility preservation in terms of the mean squared error of the estimate for various calculations including individual value estimate, proportion estimate, and various derived statistics. We theoretically derive the explicit formula of the mean squared error of various derived statistics based on the randomized response theory and prove the randomized response outperforms the Laplace mechanism. We evaluate our algorithms on YesiWell database including sensitive biomarker data and social network relationships of patients. Empirical evaluation results show effectiveness of our proposed techniques. Especially the use of the randomized response for collecting data incurs fewer utility loss than the output perturbation when the sensitivity of functions is high.

References

Page 1

	Year	Citations
The Advanced Theory of Statistics. G. M. Clarke, M. G. Kendall, A. Stuart Journal of the Royal Statistical Society Series A (General) Statistical FoundationStatistical InferenceStatistical ScienceMathematical StatisticStatistics	1978	6.7K
The Advanced Theory of Statistics. M. G. Kendall, A. Stuart Journal of the Royal Statistical Society Series D (The Statistician) Statistical FoundationStatistical InferenceStatistical ScienceMathematical StatisticStatistics	1968	1.6K
Privacy integrated queries Frank McSherry Privacy ProtectionEngineeringInformation SecurityAnalysis LanguageHardware Security	2009	1.1K
Smooth sensitivity and sampling in private data analysis Kobbi Nissim, Sofya Raskhodnikova, Adam Smith Privacy ProtectionEngineeringPrivate Data AnalysisInformation SecurityData Science	2007	990
Local Privacy and Statistical Minimax Rates John C. Duchi, Michael I. Jordan, Martin J. Wainwright Local PrivacyPrivacy ProtectionEngineeringData ScienceInformation Security	2013	965
A firm foundation for private data analysis Cynthia Dwork Communications of the ACM EngineeringInformation SecurityInformation PrivacyCommunicationData Science	2010	842
Limiting privacy breaches in privacy preserving data mining Alexandre Evfimievski, Johannes Gehrke, Ramakrishnan Srikant Privacy ProtectionNew FormulationEngineeringInformation SecurityData Mining Security	2003	826
No free lunch in data privacy Daniel Kifer, Ashwin Machanavajjhala Privacy ProtectionEngineeringInformation SecurityPopularized ClaimsPrivacy Protections	2011	617
Boosting the accuracy of differentially private histograms through consistency Michael Hay, Vibhor Rastogi, Gerome Miklau, Proceedings of the VLDB Endowment Privacy ProtectionConsistency ConstraintsEngineeringMachine LearningData Science	2010	519
Privacy-preserving logistic regression Kamalika Chaudhuri, Claire Monteleoni	2008	491

Page 1