SocInf: Membership Inference Attacks on Social Media Health Data With Machine Learning

Abstract

Social media networks have shown rapid growth in the past, and massive social data are generated which can reveal behavior or emotion propensities of users. Numerous social researchers leverage machine learning technology to build social media analytic models which can detect the abnormal behaviors or mental illnesses from the social media data effectively. Although the researchers only public the prediction interfaces of the machine learning models, in general, these interfaces may leak information about the individual data records on which the models were trained. Knowing a certain user's social media record was used to train a model can breach user privacy. In this paper, we present SocInf and focus on the fundamental problem known as membership inference. The key idea of SocInf is to construct a mimic model which has a similar prediction behavior with the public model, and then we can disclose the prediction differences between the training and testing data set by abusing the mimic model. With elaborated analytics on the predictions of the mimic model, SocInf can thus infer whether a given record is in the victim model's training set or not. We empirically evaluate the attack performance of SocInf on machine learning models trained by Xgboost, logistics, and online cloud platform. Using the realistic data, the experiment results show that SocInf can achieve an inference accuracy and precision of 73% and 84%, respectively, in average, and of 83% and 91% at best.

References

Page 1

	Year	Citations

Page 1