Publication | Closed Access
"Missing is useful": missing values in cost-sensitive decision trees
193
Citations
27
References
2005
Year
Total CostEngineeringMachine LearningDiagnosisData ScienceData MiningClass ImbalanceDecision TreeManagementDecision Tree LearningBiostatisticsDecision TheoryStatisticsSupervised LearningPredictive AnalyticsCost-sensitive Decision TreesKnowledge DiscoveryComputer ScienceData TreatmentCost-sensitive LearningCost-sensitive Machine LearningDecision ScienceHealth Informatics
Many real‑world datasets contain missing values, and prior work typically treats them as a problem by imputing them before training, but when attributes are costly, it can be more cost‑effective to omit them, analogous to skipping expensive tests in medical diagnosis. This study investigates cost‑sensitive learning that balances test and misclassification costs, and compares strategies that exploit only known values, demonstrating that missing values can be advantageous for cost reduction. The authors compare several strategies that use only known values in cost‑sensitive decision tree learning, showing that missing values can reduce total cost. The results show that missing values actually reduce the total cost of tests and misclassifications, making imputation unnecessary.
Many real-world data sets for machine learning and data mining contain missing values and much previous research regards it as a problem and attempts to impute missing values before training and testing. In this paper, we study this issue in cost-sensitive learning that considers both test costs and misclassification costs. If some attributes (tests) are too expensive in obtaining their values, it would be more cost-effective to miss out their values, similar to skipping expensive and risky tests (missing values) in patient diagnosis (classification). That is, "missing is useful" as missing values actually reduces the total cost of tests and misclassifications and, therefore, it is not meaningful to impute their values. We discuss and compare several strategies that utilize only known values and that "missing is useful" for cost reduction in cost-sensitive decision tree learning.
| Year | Citations | |
|---|---|---|
Page 1
Page 1