Concepedia

Publication | Closed Access

Incremental Local Outlier Detection for Data Streams

377

Citations

38

References

2007

Year

TLDR

Outlier detection is increasingly critical in industrial and financial settings, yet the challenge is amplified when data arrive as high‑speed streams. The study proposes an incremental Local Outlier Factor (LOF) algorithm tailored for real‑time data stream outlier detection. The algorithm dynamically updates point profiles and, based on theoretical analysis, limits updates to a small neighborhood, making the update cost independent of the total dataset size. Experiments on simulated and real datasets demonstrate that the incremental LOF achieves detection performance comparable to the static LOF while requiring significantly less computation, and it effectively identifies outliers and distributional changes.

Abstract

Outlier detection has recently become an important problem in many industrial and financial applications. This problem is further complicated by the fact that in many cases, outliers have to be detected from data streams that arrive at an enormous pace. In this paper, an incremental LOF (local outlier factor) algorithm, appropriate for detecting outliers in data streams, is proposed. The proposed incremental LOF algorithm provides equivalent detection performance as the iterated static LOF algorithm (applied after insertion of each data record), while requiring significantly less computational time. In addition, the incremental LOF algorithm also dynamically updates the profiles of data points. This is a very important property, since data profiles may change over time. The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points TV in the data set. Our experiments performed on several simulated and real life data sets have demonstrated that the proposed incremental LOF algorithm is computationally efficient, while at the same time very successful in detecting outliers and changes of distributional behavior in various data stream applications

References

YearCitations

Page 1