Publication | Closed Access
Training cost-sensitive neural networks with methods addressing the class imbalance problem
1.3K
Citations
30
References
2006
Year
Artificial IntelligenceClassification MethodEngineeringMachine LearningData ScienceClass ImbalancePattern RecognitionMultiple Classifier SystemPredictive AnalyticsManagementComputer ScienceCost-sensitive Neural NetworksCost-sensitive LearningClassifier SystemClass Imbalance ProblemCost-sensitive Machine LearningStatisticsSupervised Learning
The study empirically evaluates how sampling and threshold‑moving influence training of cost‑sensitive neural networks. The authors compare oversampling, undersampling, and threshold‑moving—each altering data distribution or decision thresholds—to encode example costs, and combine them via hard or soft voting ensembles, evaluating on 21 UCI datasets with three cost matrices and a real‑world cost‑sensitive set. The results show that cost‑sensitive learning is more difficult for multiclass than binary tasks, with higher imbalance worsening performance; most techniques succeed on binary tasks but often fail or harm multiclass performance, while threshold‑moving and soft‑ensemble remain relatively effective, indicating that methods believed to mitigate class imbalance may only help on imbalanced binary datasets.
This paper studies empirically the effect of sampling and threshold-moving in training cost-sensitive neural networks. Both oversampling and undersampling are considered. These techniques modify the distribution of the training data such that the costs of the examples are conveyed explicitly by the appearances of the examples. Threshold-moving tries to move the output threshold toward inexpensive classes such that examples with higher costs become harder to be misclassified. Moreover, hard-ensemble and soft-ensemble, i.e., the combination of above techniques via hard or soft voting schemes, are also tested. Twenty-one UCl data sets with three types of cost matrices and a real-world cost-sensitive data set are used in the empirical study. The results suggest that cost-sensitive learning with multiclass tasks is more difficult than with two-class tasks, and a higher degree of class imbalance may increase the difficulty. It also reveals that almost all the techniques are effective on two-class tasks, while most are ineffective and even may cause negative effect on multiclass tasks. Overall, threshold-moving and soft-ensemble are relatively good choices in training cost-sensitive neural networks. The empirical study also suggests that some methods that have been believed to be effective in addressing the class imbalance problem may, in fact, only be effective on learning with imbalanced two-class data sets.
| Year | Citations | |
|---|---|---|
Page 1
Page 1