Concepedia

Publication | Open Access

Conditional variable importance for random forests

3.2K

Citations

33

References

2008

Year

TLDR

Random forests are popular for small‑n, large‑p problems, complex interactions, and correlated predictors, and their variable importance measures are used as screening tools, but these measures are biased toward correlated predictors. The authors aim to develop a conditional permutation scheme to compute variable importance that mitigates this bias. They identify two mechanisms—preference for correlated predictors during tree construction and an additional advantage from the unconditional permutation scheme—and propose a conditional permutation scheme to address them. The conditional variable importance more reliably reflects the true impact of each predictor than the original marginal approach.

Abstract

Random forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. However, these variable importance measures show a bias towards correlated predictor variables. We identify two mechanisms responsible for this finding: (i) A preference for the selection of correlated predictors in the tree building process and (ii) an additional advantage for correlated predictor variables induced by the unconditional permutation scheme that is employed in the computation of the variable importance measure. Based on these considerations we develop a new, conditional permutation scheme for the computation of the variable importance measure. The resulting conditional variable importance reflects the true impact of each predictor variable more reliably than the original marginal approach.

References

YearCitations

Page 1