Publication | Closed Access
Using Randomized Response for Differential Privacy Preserving Data Collection
99
Citations
24
References
2016
Year
Unknown Venue
The use of randomized response in surveys enables accurate population statistics while preserving individual privacy. The paper investigates using randomized response to enforce differential privacy in data collection and compares it to the standard Laplace mechanism. The authors implement a client‑side randomized algorithm that perturbs each value, analyze its mean‑squared‑error utility from binary to multichotomous attributes, derive explicit formulas proving superiority over the Laplace mechanism, and evaluate the approach on the YesiWell database with biomarker and social network data. Empirical results demonstrate that randomized response achieves lower utility loss than Laplace output perturbation, especially for high‑sensitivity functions.
This paper studies how to enforce differential privacy by using the randomized response in the data collection scenario. Given a client’s value, the randomized algorithm executed by the client reports to the untrusted server a perturbed value. The use of randomized response in surveys enables easy estimations of accurate population statistics while preserving the privacy of the individual respondents. We compare the randomized response with the standard Laplace mechanism which is based on query-output independent adding of Laplace noise. Our research starts from the simple case with one single binary attribute and extends to the general case with multiple polychotomous attributes. We measure utility preservation in terms of the mean squared error of the estimate for various calculations including individual value estimate, proportion estimate, and various derived statistics. We theoretically derive the explicit formula of the mean squared error of various derived statistics based on the randomized response theory and prove the randomized response outperforms the Laplace mechanism. We evaluate our algorithms on YesiWell database including sensitive biomarker data and social network relationships of patients. Empirical evaluation results show effectiveness of our proposed techniques. Especially the use of the randomized response for collecting data incurs fewer utility loss than the output perturbation when the sensitivity of functions is high.
| Year | Citations | |
|---|---|---|
1978 | 6.7K | |
1968 | 1.6K | |
2009 | 1.1K | |
2007 | 990 | |
2013 | 965 | |
2010 | 842 | |
2003 | 826 | |
2011 | 617 | |
2010 | 519 | |
2008 | 491 |
Page 1
Page 1