Concepedia

Publication | Closed Access

The random subspace method for constructing decision forests

6.7K

Citations

31

References

1998

Year

TLDR

Decision‑tree research has focused on splitting criteria and tree‑size optimization, yet the trade‑off between overfitting and maximum accuracy remains unresolved. The authors propose a decision‑tree‑based classifier that preserves training accuracy while enhancing generalization as model complexity increases. The method constructs a forest by pseudorandomly selecting feature subsets to build trees in random subspaces and analyzes tree independence to improve combined accuracy. Experiments on public datasets show the subspace method outperforms single‑tree classifiers and other forest construction techniques.

Abstract

Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method's superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy.