A streaming ensemble algorithm (SEA) for large-scale classification

TLDR

Ensemble methods such as Boosting and Bagging are highly effective yet require repeated resampling, which limits their use in large‑scale data mining. The proposed SEA algorithm trains separate classifiers on sequential data chunks and merges them into a fixed‑size ensemble using a heuristic replacement strategy. SEA achieves fast, memory‑efficient classification on streaming data, matching the accuracy of a single decision tree built on all data while quickly adapting to concept drift.

Abstract

Ensemble methods have recently garnered a great deal of attention in the machine learning community. Techniques such as Boosting and Bagging have proven to be highly effective but require repeated resampling of the training data, making them inappropriate in a data mining context. The methods presented in this paper take advantage of plentiful data, building separate classifiers on sequential chunks of training points. These classifiers are combined into a fixed-size ensemble using a heuristic replacement strategy. The result is a fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift.

References

Page 1

	Year	Citations

Page 1