Publication | Closed Access
A Novel Algorithm for Missing Data Imputation on Machine Learning
26
Citations
19
References
2019
Year
Artificial IntelligenceKnn ImputationEngineeringMachine LearningMachine Learning ToolImputation MethodsEnsemble AlgorithmData ScienceData MiningManagementBiostatisticsBig DataData Pre-processingStatisticsMissing Data ImputationIterative ImputationComputational Learning TheoryPredictive AnalyticsKnowledge DiscoveryData ClassificationData TreatmentHealth InformaticsData Modeling
Missing data value plays a significant role in medical research and its presence causes an adverse effect on machine learning and AI models which leads to the wrong insights for decision making. Past few decades, researchers have developed and applied various imputation approaches to real-world applications. In addition, imputation methods help us to build effective models to discover hidden patterns in medical applications that can provide insightful outcomes for better decision-making. In this paper, a new approach is proposed to impute the missing data value using XGBoost (eXtreme Gradient Boosting) of ensemble learning method for continuous attributes in medical datasets. The proposed methods are continuous type attribute imputations for continuous and discrete data attributes. In this approach, we impute each missing data attribute value by predicting its data value from non-missing data attributes. The experiments are conducted on benchmark medical datasets missing values ranging from 1.98% to 50.65% and compared with iterative imputation, KNN imputation, and missForest imputation. In our study, we observe that missXGBoost can successfully handle missing data attributes of continuous types of attributes and it outperforms other imputation methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1