Publication | Closed Access
μDBSCAN: An Exact Scalable DBSCAN Algorithm for Big Data Exploiting Spatial Locality
22
Citations
28
References
2019
Year
Unknown Venue
Cluster ComputingEngineeringBig Data AnalyticsBig Data IndexingExact Dbscan ClustersCluster TechnologyData ScienceData MiningDbscan AlgorithmParallel ComputingData ManagementHigh-performance Data AnalyticsClustering (Nuclear Physics)Knowledge DiscoveryComputer ScienceBig Data SearchData-intensive ComputingClustering (Data Mining)Traditional DbscanMassive Data ProcessingBig Data
DBSCAN is one of the most popular and effective clustering algorithms that is capable of identifying arbitrary-shaped clusters and noise efficiently. However, its super-linear complexity makes it infeasible for applications involving clustering of Big Data. A major portion of the computation time of DBSCAN is taken up by the neighborhood queries, which becomes a bottleneck to its performance. We address this issue in our proposed micro-cluster based DBSCAN algorithm, μDBSCAN, which identifies core-points even without performing neighbourhood queries and becomes instrumental in reducing the run-time of the algorithm. It also significantly reduces the computation time per neighbourhood query while producing exact DBSCAN clusters. Moreover, the micro-cluster based solution makes it scalable for high dimensional data. We also propose a highly scalable distributed implementation of μDBSCAN, μDBSCAN-D, to exploit a commodity cluster infrastructure. Experimental results demonstrate tremendous improvements in performance of our proposed algorithms as compared to their respective state-of-the-art solutions for various standard datasets. μDBSCAN-D is an exact parallel solution for DBSCAN which is capable of processing massive amounts of data efficiently (1 billion data points in 41 minutes on a 32 node cluster), while producing a clustering that is same as that of traditional DBSCAN.
| Year | Citations | |
|---|---|---|
Page 1
Page 1