Publication | Closed Access
Parallel Algorithms for Distance-Based and Density-Based Outliers
59
Citations
19
References
2006
Year
Unknown Venue
Cluster ComputingAnomaly DetectionMachine LearningEngineeringInformation ForensicsUnsupervised Machine LearningParallel AlgorithmsOptimization-based Data MiningData ScienceData MiningPattern RecognitionNetwork IntrusionOutlier DetectionKnowledge DiscoveryComputer ScienceData Stream MiningPruning RuleParallel ProgrammingBig Data
An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism. Outlier detection has many applications, such as data cleaning, fraud detection and network intrusion. The existence of outliers can indicate individuals or groups that exhibit a behavior that is very different from most of the individuals of the dataset. In this paper we design two parallel algorithms, the first one is for finding out distance-based outliers based on nested loops along with randomization and the use of a pruning rule. The second parallel algorithm is for detecting density-based local outliers. In both cases data parallelism is used. We show that both algorithms reach near linear speedup. Our algorithms are tested on four real-world datasets coming from the Machine Learning Database Repository at the UCI.
| Year | Citations | |
|---|---|---|
Page 1
Page 1