Subspace clustering for high dimensional data

TLDR

Subspace clustering extends traditional clustering by identifying clusters within specific subspaces, addressing the challenge of irrelevant dimensions that obscure patterns in high‑dimensional data. The paper surveys subspace clustering algorithms and proposes a hierarchical taxonomy based on their defining characteristics. The authors evaluate top‑down versus bottom‑up methods through empirical scalability and accuracy experiments and outline possible application domains. The comparison reveals differences in scalability and accuracy between the two approaches, highlighting contexts where subspace clustering offers distinct advantages.

Abstract

Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. Often in high dimensional data, many dimensions are irrelevant and can mask existing clusters in noisy data. Feature selection removes irrelevant and redundant dimensions by analyzing the entire dataset. Subspace clustering algorithms localize the search for relevant dimensions allowing them to find clusters that exist in multiple, possibly overlapping subspaces. There are two major branches of subspace clustering based on their search strategy. Top-down algorithms find an initial clustering in the full set of dimensions and evaluate the subspaces of each cluster, iteratively improving the results. Bottom-up approaches find dense regions in low dimensional spaces and combine them to form clusters. This paper presents a survey of the various subspace clustering algorithms along with a hierarchy organizing the algorithms by their defining characteristics. We then compare the two main approaches to subspace clustering using empirical scalability and accuracy tests and discuss some potential applications where subspace clustering could be particularly useful.

References

Page 1

	Year	Citations

Page 1