Concepedia

TLDR

Data mining often extracts knowledge from databases, but modern databases are distributed across multiple parties, each wishing to preserve privacy, making privacy‑preserving techniques essential for tasks such as cluster analysis, notably k‑means clustering, which partitions data into meaningful groups. This paper proposes privacy‑preserving protocols for k‑means clustering, including a secure comparison sub‑protocol for the Millionaires’ Problem, to enable clustering of horizontally or vertically partitioned data among multiple parties. The protocols employ secure multi‑party computation to perform k‑means clustering on partitioned data while preserving each party’s privacy, leveraging a secure comparison sub‑protocol to handle the Millionaires’ Problem.

Abstract

Extracting meaningful and valuable knowledge from databases is often done by various data mining algorithms. Nowadays, databases are distributed among two or more parties because of different reasons such as physical and geographical restrictions and the most important issue is privacy. Related data is normally maintained by more than one organization, each of which wants to keep its individual information private. Thus, privacy-preserving techniques and protocols are designed to perform data mining on distributed environments when privacy is highly concerned. Cluster analysis is a technique in data mining, by which data can be divided into some meaningful clusters, and it has an important role in different fields such as bio-informatics, marketing, machine learning, climate and medicine. k-means Clustering is a prominent algorithm in this category which creates a one-level clustering of data. In this paper we introduce privacy-preserving protocols for this algorithm, along with a protocol for Secure comparison, known as the Millionaires’ Problem, as a sub-protocol, to handle the clustering of horizontally or vertically partitioned data among two or more parties.

References

YearCitations

Page 1