Publication | Closed Access
iForest: Interpreting Random Forests via Visual Analytics
197
Citations
40
References
2018
Year
EngineeringMachine LearningData ExplorationData ScienceData MiningDecision TreeManagementDecision Tree LearningVisual AnalyticsData ModelingInterpreting Random ForestsPredictive AnalyticsKnowledge DiscoveryVisual Data MiningComputer ScienceComputer VisionModel InterpretabilityEnsemble ModelClassifier SystemDecision TreesEnsemble Algorithm
Random forests combine many decision trees to achieve superior predictive performance, yet their complex, heterogeneous tree structures hinder interpretability, especially in domains demanding transparent decisions. This work introduces a visual analytics system that aggregates and summarizes all decision paths in a random forest to elucidate how individual predictions are formed. The system displays complete tree information and distilled decision paths, and its utility was validated through two usage scenarios and a qualitative user study.
As an ensemble model that consists of many independent decision trees, random forests generate predictions by feeding the input to internal trees and summarizing their outputs. The ensemble nature of the model helps random forests outperform any individual decision tree. However, it also leads to a poor model interpretability, which significantly hinders the model from being used in fields that require transparent and explainable predictions, such as medical diagnosis and financial fraud detection. The interpretation challenges stem from the variety and complexity of the contained decision trees. Each decision tree has its unique structure and properties, such as the features used in the tree and the feature threshold in each tree node. Thus, a data input may lead to a variety of decision paths. To understand how a final prediction is achieved, it is desired to understand and compare all decision paths in the context of all tree structures, which is a huge challenge for any users. In this paper, we propose a visual analytic system aiming at interpreting random forest models and predictions. In addition to providing users with all the tree information, we summarize the decision paths in random forests, which eventually reflects the working mechanism of the model and reduces users' mental burden of interpretation. To demonstrate the effectiveness of our system, two usage scenarios and a qualitative user study are conducted.
| Year | Citations | |
|---|---|---|
Page 1
Page 1