Publication | Open Access
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
1.2K
Citations
28
References
2017
Year
EngineeringMachine LearningMachine Learning AlgorithmsBig Data AnalyticsComputational EpidemiologyDisease ClassificationData ScienceData MiningAi HealthcarePublic HealthHealthcare Big DataHealth Care AnalyticsHealthcare CommunitiesPredictive AnalyticsKnowledge DiscoveryBig Data GrowthDeep LearningEpidemiologyDisease PredictionHealth InformaticsBig Data
Big‑data growth in biomedical and healthcare communities promises early disease detection and improved patient care, yet incomplete data quality and regional disease heterogeneity limit prediction accuracy, and no prior work has simultaneously leveraged structured and unstructured data for this purpose. The study aims to develop a multimodal CNN‑based disease risk prediction algorithm that integrates structured and unstructured hospital data to forecast chronic disease outbreaks in high‑frequency communities. The authors trained and evaluated a latent‑factor‑augmented CNN model on 2013‑2015 hospital data from central China, reconstructing missing entries and applying the model to cerebral infarction risk prediction. The proposed multimodal CNN achieved 94.8% accuracy, outperforming unimodal CNNs and other baseline algorithms while converging faster.
With big data growth in biomedical and healthcare communities, accurate analysis of medical data benefits early disease detection, patient care, and community services. However, the analysis accuracy is reduced when the quality of medical data is incomplete. Moreover, different regions exhibit unique characteristics of certain regional diseases, which may weaken the prediction of disease outbreaks. In this paper, we streamline machine learning algorithms for effective prediction of chronic disease outbreak in disease-frequent communities. We experiment the modified prediction models over real-life hospital data collected from central China in 2013-2015. To overcome the difficulty of incomplete data, we use a latent factor model to reconstruct the missing data. We experiment on a regional chronic disease of cerebral infarction. We propose a new convolutional neural network (CNN)-based multimodal disease risk prediction algorithm using structured and unstructured data from hospital. To the best of our knowledge, none of the existing work focused on both data types in the area of medical big data analytics. Compared with several typical prediction algorithms, the prediction accuracy of our proposed algorithm reaches 94.8% with a convergence speed, which is faster than that of the CNN-based unimodal disease risk prediction algorithm.
| Year | Citations | |
|---|---|---|
Page 1
Page 1