Publication | Closed Access
Angle-based outlier detection in high-dimensional data
802
Citations
32
References
2008
Year
Unknown Venue
Data ObjectsAnomaly DetectionMachine VisionData ScienceData MiningPattern RecognitionImage AnalysisMachine LearningOutlier DetectionKnowledge DiscoveryEngineeringNovelty DetectionAngle-based Outlier DetectionComputer ScienceDimensionality ReductionLarge SetStatisticsSimilarity Search
Outlier detection in large datasets seeks to identify distinct group mechanisms, but existing distance‑based methods degrade in high dimensions due to the curse of dimensionality. The paper proposes ABOD, an angle‑based outlier detection method that evaluates angle variance between a point and all others. ABOD computes angle‑based variance between a point and all others, and its variants were evaluated against LOF on synthetic and real datasets, demonstrating superior performance in high dimensions. ABOD mitigates the curse of dimensionality, requires no parameter tuning, and outperforms LOF in high‑dimensional synthetic and real datasets.
Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. All existing approaches, however, are based on an assessment of distances (sometimes indirectly by assuming certain distributions) in the full-dimensional Euclidean data space. In high-dimensional data, these approaches are bound to deteriorate due to the notorious "curse of dimensionality". In this paper, we propose a novel approach named ABOD (Angle-Based Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points. This way, the effects of the "curse of dimensionality" are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method does not rely on any parameter selection influencing the quality of the achieved ranking. In a thorough experimental evaluation, we compare ABOD to the well-established distance-based method LOF for various artificial and a real world data set and show ABOD to perform especially well on high-dimensional data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1