Publication | Closed Access
Scalable parallel OPTICS data clustering using graph algorithmic techniques
52
Citations
50
References
2013
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureHigh Performance ComputingParallel MetaheuristicsCluster TechnologyClassical Optics AlgorithmData ScienceParallel ComputingComputational GeometryMassively-parallel ComputingComputer EngineeringComputer ScienceOptics AlgorithmGpu ClusterGraph TheoryParallel ProcessingHigh ParallelismParallel ProgrammingGraph Algorithmic TechniquesData-level ParallelismBig Data
OPTICS is a hierarchical density-based data clustering algorithm that discovers arbitrary-shaped clusters and eliminates noise using adjustable reachability distance thresholds. Parallelizing OPTICS is considered challenging as the algorithm exhibits a strongly sequential data access order. We present a scalable parallel OPTICS algorithm (Poptics) designed using graph algorithmic concepts. To break the data access sequentiality, POPTICS exploits the similarities between the OPTICS algorithm and Prim's Minimum Spanning Tree algorithm. Additionally, we use the disjoint-set data structure to achieve a high parallelism for distributed cluster extraction. Using high dimensional datasets containing up to a billion floating point numbers, we show scalable speedups of up to 27.5 for our OpenMP implementation on a 40-core shared-memory machine, and up to 3,008 for our MPI implementation on a 4,096-core distributed-memory machine. We also show that the quality of the results given by POPTICS is comparable to those given by the classical OPTICS algorithm.
| Year | Citations | |
|---|---|---|
Page 1
Page 1