Publication | Closed Access
Privacy-Preserving Data Classification and Similarity Evaluation for Distributed Systems
22
Citations
29
References
2016
Year
Unknown Venue
Privacy ProtectionEngineeringMachine LearningInformation SecurityData ScienceData MiningPrivacy SystemData IntegrationData ManagementData Mining TechniqueSimilarity EvaluationPrivacy ConcernsPrivacy ServiceKnowledge DiscoveryData PrivacyComputer ScienceDifferential PrivacyPrivacyData SecurityCryptographyPrivacy PreservationData ClassificationBig Data
Data classification is a widely used data mining technique for big data analysis. By training massive data collected from the real world, data classification helps learners discover hidden data patterns. In addition to data training, given a trained model from collected data, a user can classify whether a new incoming data belongs to an existing class, or, multiple distributed entities may collaborate to test the similarity of their trained results. However, due to data locality and privacy concerns, it is infeasible for large-scale distributed systems to share each individual's datasets with each other for data similarity check. On the one hand, the trained model is an entity's private asset and may leak private information, which should be well protected from all other non-collaborative entities. On the other hand, the new incoming data may contain sensitive information which cannot be disclosed directly for classification. To address the above privacy issues, we propose a privacy-preserving data classification and similarity evaluation scheme for distributed systems. With our scheme, neither new arriving data nor trained models are directly revealed during the classification and similarity evaluation procedures. The proposed scheme can be applied to many fields using data classification and evaluation. Based on extensive real-world experiments, we have also evaluated the privacy preservation, feasibility, and efficiency of the proposed scheme.
| Year | Citations | |
|---|---|---|
Page 1
Page 1