Concepedia

Publication | Closed Access

Isolation Forest

5.2K

Citations

11

References

2008

Year

TLDR

Existing anomaly detection methods model normal data, yet the isolation concept has not been explored in the literature. The paper proposes a model‑based method that isolates anomalies instead of profiling normal points. iForest isolates anomalies via random sub‑sampling, achieving linear time complexity, low memory usage, and a small constant factor. Empirical results show iForest outperforms ORCA, LOF, and random forests in AUC and speed, especially on large or high‑dimensional data, and remains effective when no anomalies are present in training.

Abstract

Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. To our best knowledge, the concept of isolation has not been explored in current literature. The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement. Our empirical evaluation shows that iForest performs favourably to ORCA, a near-linear time complexity distance-based method, LOF and random forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional problems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies.

References

YearCitations

2007

24.3K

2000

5.1K

1999

1.9K

1998

1.5K

2003

1K

2006

529

2006

329

1996

327

2000

222

2003

190

Page 1