Publication | Open Access
Resampling-Based Ensemble Methods for Online Class Imbalance Learning
385
Citations
29
References
2014
Year
EngineeringMachine LearningClass Imbalance LearningStreaming AlgorithmEnsemble MethodsConcept DriftData ScienceData MiningPattern RecognitionClass ImbalanceResampling-based Ensemble MethodsBig DataMultiple Classifier SystemStatisticsPredictive AnalyticsKnowledge DiscoveryComputer ScienceData Stream MiningClass Imbalance OnlineEnsemble Algorithm
Online class‑imbalance learning combines online learning with highly skewed class distributions, a challenge common in real‑world applications such as fault diagnosis and intrusion detection. This work seeks to refine the resampling strategy of the OOB and UOB ensembles, conduct the first comprehensive analysis of class‑imbalance dynamics in data streams, and introduce two adaptive‑weight ensembles, WEOB1 and WEOB2. The authors build on OOB and UOB, which use resampling and time‑decayed metrics, and extend them by adaptively weighting the two models to form the WEOB1 and WEOB2 ensembles. Results show that UOB outperforms in static streams, OOB is more robust to dynamic imbalance changes, data distribution is a key performance factor, and the new WEOB1/2 ensembles combine these strengths with high accuracy and robustness.
Online class imbalance learning is a new learning problem that combines the challenges of both online learning and class imbalance learning. It deals with data streams having very skewed class distributions. This type of problems commonly exists in real-world applications, such as fault diagnosis of real-time control monitoring systems and intrusion detection in computer networks. In our earlier work, we defined class imbalance online, and proposed two learning algorithms OOB and UOB that build an ensemble model overcoming class imbalance in real time through resampling and time-decayed metrics. In this paper, we further improve the resampling strategy inside OOB and UOB, and look into their performance in both static and dynamic data streams. We give the first comprehensive analysis of class imbalance in data streams, in terms of data distributions, imbalance rates and changes in class imbalance status. We find that UOB is better at recognizing minority-class examples in static data streams, and OOB is more robust against dynamic changes in class imbalance status. The data distribution is a major factor affecting their performance. Based on the insight gained, we then propose two new ensemble methods that maintain both OOB and UOB with adaptive weights for final predictions, called WEOB1 and WEOB2. They are shown to possess the strength of OOB and UOB with good accuracy and robustness.
| Year | Citations | |
|---|---|---|
Page 1
Page 1