Publication | Closed Access
On the Performance of Clustering in Hilbert Spaces
131
Citations
34
References
2008
Year
Cluster ComputingEngineeringMachine LearningHilbert SpacesComputational ComplexityFunctional AnalysisData ScienceData MiningPattern RecognitionClustering SchemeRandom MappingApproximation TheoryStatisticsDocument ClusteringKnowledge DiscoveryComputer ScienceDimensionality ReductionNonlinear Dimensionality ReductionHigh-dimensional MethodStatistical InferenceK-means Clustering Scheme
Based on randomly drawn vectors in a separable Hilbert space, one may construct a k-means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector X from the set of cluster centers. Our main result states that, for an almost surely bounded , the expected excess clustering risk is O(¿1/n) . Since clustering in high (or even infinite)-dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson-Lindenstrauss-type random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses k-means clustering after random projection of the data to a low-dimensional space. We argue that random projections work better than other simplistic dimension reduction schemes.
| Year | Citations | |
|---|---|---|
Page 1
Page 1