Publication | Closed Access
Incremental Local Outlier Detection for Data Streams
377
Citations
38
References
2007
Year
Unknown Venue
Cluster ComputingAnomaly DetectionEngineeringData ScienceData MiningData Stream MiningOutlier DetectionKnowledge DiscoveryLocal Outlier FactorManagementStreaming AlgorithmComputer ScienceStreaming DataData ManagementSignal ProcessingData StreamsBig Data
Outlier detection is increasingly critical in industrial and financial settings, yet the challenge is amplified when data arrive as high‑speed streams. The study proposes an incremental Local Outlier Factor (LOF) algorithm tailored for real‑time data stream outlier detection. The algorithm dynamically updates point profiles and, based on theoretical analysis, limits updates to a small neighborhood, making the update cost independent of the total dataset size. Experiments on simulated and real datasets demonstrate that the incremental LOF achieves detection performance comparable to the static LOF while requiring significantly less computation, and it effectively identifies outliers and distributional changes.
Outlier detection has recently become an important problem in many industrial and financial applications. This problem is further complicated by the fact that in many cases, outliers have to be detected from data streams that arrive at an enormous pace. In this paper, an incremental LOF (local outlier factor) algorithm, appropriate for detecting outliers in data streams, is proposed. The proposed incremental LOF algorithm provides equivalent detection performance as the iterated static LOF algorithm (applied after insertion of each data record), while requiring significantly less computational time. In addition, the incremental LOF algorithm also dynamically updates the profiles of data points. This is a very important property, since data profiles may change over time. The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points TV in the data set. Our experiments performed on several simulated and real life data sets have demonstrated that the proposed incremental LOF algorithm is computationally efficient, while at the same time very successful in detecting outliers and changes of distributional behavior in various data stream applications
| Year | Citations | |
|---|---|---|
Page 1
Page 1