Concepedia

Publication | Closed Access

Clustering and projected clustering with adaptive neighbors

946

Citations

21

References

2014

Year

TLDR

Clustering performance depends on the similarity matrix, yet similarity learning and clustering are usually performed separately, often yielding suboptimal partitions. This work introduces a clustering framework that jointly learns the similarity matrix and the cluster structure. The method assigns adaptive neighbors to each point based on local distances, imposes a rank constraint on the similarity Laplacian to match the desired number of clusters, and optimizes the resulting problem with an efficient algorithm, also extending to projected clustering for high‑dimensional data. Experiments on synthetic and benchmark datasets demonstrate that the proposed methods consistently outperform existing clustering techniques.

Abstract

Many clustering methods partition the data groups based on the input data similarity matrix. Thus, the clustering results highly depend on the data similarity learning. Because the similarity measurement and data clustering are often conducted in two separated steps, the learned data similarity may not be the optimal one for data clustering and lead to the suboptimal results. In this paper, we propose a novel clustering model to learn the data similarity matrix and clustering structure simultaneously. Our new model learns the data similarity matrix by assigning the adaptive and optimal neighbors for each data point based on the local distances. Meanwhile, the new rank constraint is imposed to the Laplacian matrix of the data similarity matrix, such that the connected components in the resulted similarity matrix are exactly equal to the cluster number. We derive an efficient algorithm to optimize the proposed challenging problem, and show the theoretical analysis on the connections between our method and the K-means clustering, and spectral clustering. We also further extend the new clustering model for the projected clustering to handle the high-dimensional data. Extensive empirical results on both synthetic data and real-world benchmark data sets show that our new clustering methods consistently outperforms the related clustering approaches.

References

YearCitations

Page 1