Publication | Closed Access
An efficient k-means clustering algorithm: analysis and implementation
5.6K
Citations
47
References
2002
Year
Cluster ComputingEngineeringUnsupervised Machine LearningCluster TechnologyOptimization-based Data MiningImage AnalysisData ScienceData MiningPattern RecognitionEfficient K-meansComputational GeometryDocument ClusteringMachine VisionKnowledge DiscoveryComputer ScienceData CompressionColor QuantizationComputer VisionFiltering AlgorithmFuzzy Clustering
K‑means clustering seeks k centers that minimize mean‑squared distance to data points, and Lloyd’s algorithm is a widely used heuristic for this problem. The authors introduce the filtering algorithm, a streamlined implementation of Lloyd’s method, and demonstrate its practical efficiency. The filtering algorithm uses a kd‑tree to efficiently implement Lloyd’s k‑means clustering. Analysis shows the algorithm runs faster with greater cluster separation, and empirical tests on synthetic and real datasets confirm its effectiveness in color quantization, data compression, and image segmentation.
In k-means clustering, we are given a set of n data points in d-dimensional space R/sup d/ and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
| Year | Citations | |
|---|---|---|
Page 1
Page 1