Publication | Closed Access
A local search approximation algorithm for k-means clustering
497
Citations
39
References
2002
Year
Unknown Venue
EngineeringComputational ComplexityRange SearchingEfficient Approximation AlgorithmsUnsupervised Machine LearningData ScienceData MiningPattern RecognitionN Data PointsCombinatorial OptimizationComputational GeometryApproximation TheoryDocument ClusteringClustering (Nuclear Physics)Knowledge DiscoveryComputer ScienceApproximation AlgorithmsLocal Improvement HeuristicNatural SciencesClustering (Data Mining)Fuzzy Clustering
K‑means clustering seeks k centers minimizing mean squared distance, but no exact polynomial‑time algorithm exists and existing asymptotically efficient approximations are impractical due to large constants, leaving many heuristics without performance guarantees and prompting the search for a simple, practical approximation. The study aims to determine whether a simple, practical approximation algorithm for k‑means clustering can be devised. The authors propose a local‑improvement heuristic that swaps centers in and out, and empirically demonstrate its effectiveness when combined with Lloyd’s algorithm. The heuristic achieves a (9+ε)-approximation, with an almost tight lower bound of (9-ε), and empirical results show it performs well when paired with Lloyd’s algorithm.
In k-means clustering we are given a set of n data points in d-dimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the extremely high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance.We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9+ε)-approximation algorithm. We show that the approximation factor is almost tight, by giving an example for which the algorithm achieves an approximation factor of (9-ε). To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with Lloyd's algorithm, this heuristic performs quite well in practice.
| Year | Citations | |
|---|---|---|
Page 1
Page 1