Publication | Open Access
Making <i>k</i>-means even faster
188
Citations
18
References
2010
Year
Unknown Venue
Document ClusteringClustering (Nuclear Physics)Exact K-meansData ScienceData MiningNew AccelerationEngineeringComputational LinguisticsText ProcessingSimilarity SearchInnermost K-meansComputer ScienceLanguage StudiesDimensionality ReductionClustering (Data Mining)LinguisticsUnsupervised Machine LearningMachine Translation
The k-means algorithm is widely used for clustering, compressing, and summarizing vector data. In this paper, we propose a new acceleration for exact k-means that gives the same answer, but is much faster in practice. Like Elkan’s accelerated algorithm [8], our algorithm avoids distance computations using distance bounds and the triangle inequality. Our algorithm uses one novel lower bound for point-center distances, which allows it to eliminate the innermost k-means loop 80% of the time or more in our experiments. On datasets of low and medium dimension (e.g. up to 50 dimensions), our algorithm is much faster than other methods, including methods based on low-dimensional indexes, such as k-d trees. Other advantages are that it is very simple to implement and it has a very small memory overhead, much smaller than other accelerated algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1