Privacy-Preserving Distributed Linear Regression on High-Dimensional Data

TLDR

Linear regression requires solving a system of linear equations. The paper proposes a hybrid multi‑party computation protocol that combines Yao’s garbled circuits with tailored inner‑product protocols to compute linear regression models on vertically partitioned data while preserving privacy. The authors evaluate several secure computation techniques, including a new Conjugate Gradient Descent algorithm that uses efficient fixed‑point arithmetic to preserve accuracy and convergence, and combine Yao’s garbled circuits with custom inner‑product protocols. The proposed method outperforms previous privacy‑preserving ridge regression techniques and scales to one million records and one hundred features within an hour.

Abstract

Abstract We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model involves solving a system of linear equations. We conduct a comprehensive evaluation and comparison of different techniques for securely performing this task, including a new Conjugate Gradient Descent (CGD) algorithm. This algorithm is suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers. Our technique improves on Nikolaenko et al.’s method for privacy-preserving ridge regression (S&P 2013), and can be used as a building block in other analyses. We implement a complete system and demonstrate that our approach is highly scalable, solving data analysis problems with one million records and one hundred features in less than one hour of total running time.

References

Page 1

	Year	Citations

Page 1