Publication | Closed Access
A streaming ensemble algorithm (SEA) for large-scale classification
1.2K
Citations
20
References
2001
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningData ScienceData MiningPattern RecognitionConcept DriftPredictive AnalyticsHeuristic Replacement StrategyKnowledge DiscoveryMultiple Classifier SystemData Stream MiningComputer ScienceClassifier SystemSequential ChunksEnsemble AlgorithmEnsemble MethodsBig Data
Ensemble methods such as Boosting and Bagging are highly effective yet require repeated resampling, which limits their use in large‑scale data mining. The proposed SEA algorithm trains separate classifiers on sequential data chunks and merges them into a fixed‑size ensemble using a heuristic replacement strategy. SEA achieves fast, memory‑efficient classification on streaming data, matching the accuracy of a single decision tree built on all data while quickly adapting to concept drift.
Ensemble methods have recently garnered a great deal of attention in the machine learning community. Techniques such as Boosting and Bagging have proven to be highly effective but require repeated resampling of the training data, making them inappropriate in a data mining context. The methods presented in this paper take advantage of plentiful data, building separate classifiers on sequential chunks of training points. These classifiers are combined into a fixed-size ensemble using a heuristic replacement strategy. The result is a fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift.
| Year | Citations | |
|---|---|---|
Page 1
Page 1