Publication | Closed Access
Missing Value Estimation for Mixed-Attribute Data Sets
285
Citations
31
References
2010
Year
EngineeringMachine LearningData ScienceData MiningData ImputationPredictive AnalyticsConsistent EstimatorsKnowledge DiscoveryManagementData TreatmentImputation MethodsData AnalyticsStatisticsMultiset Data AnalysisValue EstimationData Modeling
Missing data imputation is crucial for learning from incomplete data, and while many successful techniques exist for homogeneous attribute datasets, no estimator has yet been designed for mixed‑attribute data sets. The study aims to develop estimators for mixed‑attribute data sets, proposing two consistent estimators for discrete and continuous missing values and a mixture‑kernel iterative estimator. The authors propose two consistent estimators for discrete and continuous missing values and a mixture‑kernel iterative estimator for mixed‑attribute data, then evaluate it with extensive experiments against typical algorithms. Experiments show the proposed approach outperforms existing imputation methods in classification accuracy and RMSE across various missing ratios.
Missing data imputation is a key issue in learning from incomplete data. Various techniques have been developed with great successes on dealing with missing values in data sets with homogeneous attributes (their independent attributes are all either continuous or discrete). This paper studies a new setting of missing data imputation, i.e., imputing missing data in data sets with heterogeneous attributes (their independent attributes are of different types), referred to as imputing mixed-attribute data sets. Although many real applications are in this setting, there is no estimator designed for imputing mixed-attribute data sets. This paper first proposes two consistent estimators for discrete and continuous missing target values, respectively. And then, a mixture-kernel-based iterative estimator is advocated to impute mixed-attribute data sets. The proposed method is evaluated with extensive experiments compared with some typical algorithms, and the result demonstrates that the proposed approach is better than these existing imputation methods in terms of classification accuracy and root mean square error (RMSE) at different missing ratios.
| Year | Citations | |
|---|---|---|
Page 1
Page 1