Concepedia

Publication | Open Access

Hyperparameters and tuning strategies for random forest

1.3K

Citations

39

References

2019

Year

TLDR

The random forest algorithm has many hyperparameters such as sample size, variable selection, splitting rule, node size, and number of trees, and while default settings generally perform well, tuning these parameters can improve predictive performance. This paper reviews how hyperparameters affect random forest performance and variable importance, then demonstrates model‑based optimization for tuning and introduces the tuneRanger R package to automate this process. The authors employ model‑based optimization (MBO) to tune random forest hyperparameters, implement it in the tuneRanger R package, and benchmark its performance against other R tuning tools and default settings. In benchmark experiments, tuneRanger achieves comparable or superior prediction accuracy while reducing runtime relative to other R tuning implementations and the default random forest.

Abstract

The random forest algorithm (RF) has several hyperparameters that have to be set by the user, e.g., the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain and the number of trees. In this paper, we first provide a literature review on the parameters' influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a brief overview of tuning strategies we demonstrate the application of one of the most established tuning strategies, model-based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters.

References

YearCitations

Page 1