The random subspace method for constructing decision forests

TLDR

Decision‑tree research has focused on splitting criteria and tree‑size optimization, yet the trade‑off between overfitting and maximum accuracy remains unresolved. The authors propose a decision‑tree‑based classifier that preserves training accuracy while enhancing generalization as model complexity increases. The method constructs a forest by pseudorandomly selecting feature subsets to build trees in random subspaces and analyzes tree independence to improve combined accuracy. Experiments on public datasets show the subspace method outperforms single‑tree classifiers and other forest construction techniques.

Abstract

Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method's superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy.

References

Page 1

	Year	Citations
Classification and Regression Trees. Alexander Gordon, Leo Breiman, Jerome H. Friedman, Biometrics Data Analysis MethodEngineeringMachine LearningData ScienceData Mining	1984	23.8K
Classification and Regression Trees. John Van Ryzin, Leo Breiman, Jerome H. Friedman, Journal of the American Statistical Association Data ClassificationEngineeringMachine LearningData ScienceData Mining	1986	21K
Bagging Predictors Leo Breiman Machine Learning	1996	16.6K
Bagging predictors Leo Breiman Machine Learning	1996	16.2K
Induction of Decision Trees J. R. Quinlan Machine Learning EngineeringMachine LearningData ScienceData MiningRule Induction	1986	14.5K
Induction of decision trees J. R. Quinlan Machine Learning EngineeringMachine LearningData ScienceData MiningRule Induction	1986	12.3K
Experiments with a new boosting algorithm Yoav Freund, Robert E. Schapire	1996	7.6K
Random decision forests Tin Kam Ho EngineeringMachine LearningGeneralization AccuracyClassification MethodData Science	2002	4.9K
Decision combination in multiple classifier systems Tin Kam Ho, J.J. Hull, Sargur N. Srihari IEEE Transactions on Pattern Analysis and Machine Intelligence Decision FusionClassification MethodEngineeringMachine LearningData Science	1994	1.5K
Bagging, boosting, and C4.S J. R. Quinlan National Conference on Artificial Intelligence Artificial IntelligenceEngineeringMachine LearningData ScienceData Mining	1996	1.3K

Page 1