Privacy-preserving <i>k</i> -means clustering over vertically partitioned data

TLDR

Privacy concerns can prevent data sharing, yet distributed knowledge discovery can yield valid results while safeguarding data disclosure. The study proposes a k‑means clustering method for vertically partitioned data where each site holds different attributes of the same entities. Each site learns the cluster assignment of each entity but learns nothing about the attributes held by other sites.

Abstract

Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. The key is to obtain valid results, while providing guarantees on the (non)disclosure of data. We present a method for k-means clustering when different sites contain different attributes for a common set of entities. Each site learns the cluster of each entity, but learns nothing about the attributes at other sites.

References

Page 1

	Year	Citations
Maximum Likelihood from Incomplete Data Via the <i>EM</i> Algorithm A. P. Dempster, N. M. Laird, Donald B. Rubin Journal of the Royal Statistical Society Series B (Statistical Methodology) Statistical Signal ProcessingMixture DistributionEngineeringData ScienceIncompleteness	1977	49.2K
The EM Algorithm and Extensions Debashis Kushary, Geoffrey J. McLachlan, Thriyambakam Krishnan Technometrics Em AlgorithmStatistical Signal ProcessingMixture DistributionEngineeringStatistical Methods	1998	5.1K
Pattern Classification and Scene Analysis Michael Thompson, Richard O. Duda, Peter E. Hart Leonardo Scene AnalysisImage AnalysisMachine VisionEngineeringData Mining	1974	4.5K
How to generate and exchange secrets Andrew Chi-Chih Yao Cryptographic PrimitiveEngineeringInformation SecurityCryptographic Protocol DesignInformation Leakage	1986	3.7K
How to play ANY mental game Oded Goldreich, Silvio Micali, Avi Wigderson Game AiNeuropsychologyEngineeringGame TheoryCognition	1987	3.5K
Privacy-preserving data mining Rakesh Agrawal, Ramakrishnan Srikant ACM SIGMOD Record Privacy-preserving Data MiningEngineeringMachine LearningDecision-tree ClassifierInformation Security	2000	3K
Privacy-preserving data mining Rakesh Agrawal, Ramakrishnan Srikant Privacy-preserving Data MiningEngineeringMachine LearningDecision-tree ClassifierInformation Security	2000	1.7K
On the design and quantification of privacy preserving data mining algorithms Dakshi Agrawal, Charų C. Aggarwal Privacy ProtectionEngineeringInformation SecurityData Mining AlgorithmsData Science	2001	1K
Privacy preserving association rule mining in vertically partitioned data Jaideep Vaidya, Chris Clifton EngineeringPrivacy ConsiderationsInformation SecurityPattern DiscoveryPattern Mining	2002	1K
Refining Initial Points for K-Means Clustering Paul S. Bradley, Usama M. Fayyad	1998	999

Page 1