Concepedia

Publication | Closed Access

Boosting for Learning Multiple Classes with Imbalanced Class Distribution

303

Citations

18

References

2006

Year

TLDR

Imbalanced class distributions degrade classifier performance, a problem mainly studied in binary settings but also present in multi‑class tasks, where existing solutions fail and cost matrices are often unavailable. The authors develop a cost‑sensitive boosting algorithm to improve classification of multi‑class imbalanced data. They use a genetic algorithm to search for an optimal cost matrix for each class. Experiments show the algorithm significantly improves classification performance on imbalanced datasets.

Abstract

Classification of data with imbalanced class distribution has posed a significant drawback of the performance attainable by most standard classifier learning algorithms, which assume a relatively balanced class distribution and equal misclassification costs. This learning difficulty attracts a lot of research interests. Most efforts concentrate on bi-class problems. However, bi-class is not the only scenario where the class imbalance problem prevails. Reported solutions for bi-class applications are not applicable to multi-class problems. In this paper, we develop a cost-sensitive boosting algorithm to improve the classification performance of imbalanced data involving multiple classes. One barrier of applying the cost-sensitive boosting algorithm to the imbalanced data is that the cost matrix is often unavailable for a problem domain. To solve this problem, we apply Genetic Algorithm to search the optimum cost setup of each class. Empirical tests show that the proposed cost-sensitive boosting algorithm improves the classification performances of imbalanced data sets significantly.

References

YearCitations

Page 1