Concepedia

Abstract

We propose diagnostic statistics which might assist in choosing the size of a random forest for classification. We use these statistics sequentially as we construct the forest. The statistics are computed from out-of-bag or test set votes and give an estimate of expected disagreement between the current and infinite forests. Simulation studies are provided to illustrate the performance of these statistics and to compare them with other methods for choosing the size of a random forest.