Publication | Closed Access
Missing value imputation methods for TCM medical data and its effect in the classifier accuracy
20
Citations
15
References
2017
Year
Unknown Venue
EngineeringDiagnosisDisease ClassificationValue Imputation MethodsTcm Medical DataMean ImputationData ScienceData MiningDecision Tree LearningBiostatisticsPublic HealthData Pre-processingStatisticsPredictive AnalyticsKnowledge DiscoveryClassifier AccuracyMedical Data MiningData TreatmentMedical DataHealth Informatics
Objective: Medical data mining is a research hotspot. But medical data often contains missing values, which brings difficulties to the medical data analysis. This work evaluates the performance of several imputation methods. Methods: In this paper, we first simulate the missing data set by completely deleting some data from the complete data set, and use the Euclidean distance KNN, the correlation coefficient KNN and the mean to fill several algorithms to estimate the exact data and compare the accuracy of different algorithm estimation. Then we use these filling algorithms to fill clinical data which has missing values and get complete data. Then we construct a predict model of patient disease by random forest algorithm and classification and regression trees algorithm. By comparing the observed values with the predicted values, we examined the effect of different filling algorithms on the prediction accuracy. Results: The accuracy of the three algorithms is compared under different missing rates. In the filling experiment, the performance of KNN based Pearson correlation coefficient is obviously better than KNN based Euclidean metric and mean imputation. And in the predict model, the performance of these three filling algorithms is the same as in the filling experiment. But the gap is not very significant.
| Year | Citations | |
|---|---|---|
Page 1
Page 1