Publication | Open Access
<b>ranger</b>: A Fast Implementation of Random Forests for High Dimensional Data in <i>C++</i> and <i>R</i>
3K
Citations
15
References
2017
Year
High Dimensional DataMachine LearningData ScienceData MiningEngineeringComputational GenomicsBiomedical Data ScienceKnowledge DiscoveryComputational BiologyR Package RangerDecision Tree LearningRandom ForestsComputer ScienceFast ImplementationBioinformaticsC++ ApplicationHigh-performance Data Analytics
The software is a fast implementation of random forests for high dimensional data. The authors introduce ranger, a C++ application and R package for random forests. Ranger implements ensembles of classification, regression, and survival trees, and the authors describe its implementation, provide usage examples, validate it against a reference implementation, and benchmark runtime and memory against other packages. Validation demonstrates that ranger scales best with increasing features, samples, trees, and splitting features, and is the fastest and most memory‑efficient implementation for genome‑wide association study data.
We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.
| Year | Citations | |
|---|---|---|
Page 1
Page 1