A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data

TLDR

Subspace clustering relies on constructing a similarity graph, and recent sparse, low‑rank, and l2‑norm methods achieve state‑of‑the‑art performance but suffer from cubic time complexity and inability to handle out‑of‑sample data. The authors propose a unified framework enabling representation‑based subspace clustering to handle both large‑scale and out‑of‑sample data. The framework tackles large‑scale data by sampling, clustering, coding, and classifying them as out‑of‑sample points, and provides error‑bound estimates by modeling each subspace as a point in a hyperspace. Experiments on benchmark datasets demonstrate that the proposed methods outperform recent scalable subspace clustering approaches on large‑scale data.

Abstract

Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph, which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and l2 -norm-based representation, and have achieved the state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving the large-scale problems. Second, they cannot cope with the out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework that makes the representation-based subspace clustering algorithms feasible to cluster both the out-of-sample and the large-scale data. Under our framework, the large-scale problem is tackled by converting it as the out-of-sample problem in the manner of sampling, clustering, coding, and classifying. Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently proposed scalable methods in clustering a large-scale data set.

References

Page 1

	Year	Citations

Page 1