Block-Diagonal Guided DBSCAN Clustering

Abstract

Cluster analysis constitutes a pivotal component of database mining, with DBSCAN being one of the most extensively employed algorithms in this domain. Nevertheless, DBSCAN is encumbered by several limitations, including challenges in processing high-dimensional datasets, a pronounced sensitivity to input parameters, and inconsistencies in generating reliable clustering outcomes. This paper presents a refined version of DBSCAN that utilizes the block-diagonal property of similarity graphs to enhance the clustering process. The core concept involves the construction of a graph that assesses the similarity among high-dimensional data points, capable of transformation into a block-diagonal form via an unknown permutation. This is followed by a cluster-ordering procedure that establishes the requisite permutation, thereby facilitating the straightforward identification of clustering structures through the recognition of diagonal blocks in the permuted graph. The principal obstacle addressed in this study is the construction of a graph that inherently possesses a block-diagonal structure, the permutation of this graph to actualize such a structure, and the autonomous identification of diagonal blocks within the permuted graph. To surmount these challenges, we initially devise a block-diagonal constrained self-representation model to create a similarity graph that exhibits a block-diagonal form post-permutation. A gradient descent-based methodology is proposed to resolve this problem effectively. Concurrently, we engineer a traversal algorithm, inspired by DBSCAN, that discerns clusters of high density within the graph and generates an enhanced cluster ordering. The attainment of a block-diagonal structure is then realized through permutation aligned with the traversal sequence, laying a robust foundation for both automated and interactive cluster analysis. Moreover, a novel split-and-refine algorithm is introduced to autonomously identify all diagonal blocks within the permuted graph, offering theoretical optimality under specific conditions. Extensive evaluations of our method across twelve rigorous real-world benchmark datasets affirm its superiority over contemporary state-of-the-art clustering techniques.

References

Page 1

	Year	Citations

Page 1