Publication | Closed Access
Parallel Spectral Clustering in Distributed Systems
619
Citations
47
References
2010
Year
Spectral TheoryCluster ComputingEngineeringNetwork AnalysisDense Similarity MatrixParallel AlgorithmsCluster TechnologyData ScienceData MiningPattern RecognitionParallel Spectral ClusteringParallel ComputingDistributed ModelDocument ClusteringNyström MethodKnowledge DiscoveryDistributed SystemsComputer ScienceDimensionality ReductionDistributed ProcessingMemory UseParallel ProgrammingSimilarity SearchMassive Data ProcessingBig Data
Spectral clustering outperforms traditional methods like k‑means but struggles with memory and computational scalability on large datasets. The study aims to enable clustering of large datasets by exploring two common approximations of the dense similarity matrix. After comparing sparsification and the Nyström method, the authors adopt a nearest‑neighbor sparsification strategy and parallelize both memory usage and computation across distributed machines. Experiments on a 193,844‑instance document set and a 2,121,863‑instance photo set demonstrate that the parallel algorithm can effectively handle large‑scale clustering problems.
Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.
| Year | Citations | |
|---|---|---|
Page 1
Page 1