Publication | Closed Access
Clustering for metric and nonmetric distance measures
49
Citations
33
References
2010
Year
K -Median ProblemDocument ClusteringEngineeringData ScienceData MiningApproximation TheorySize KSimilarity MeasureAlgorithmic Information TheoryKnowledge DiscoveryComputational ComplexityArbitrary Metric SpaceApproximation MethodProbability TheoryCombinatorial OptimizationFunctional Data AnalysisStatisticsNonmetric Distance Measures
We study a generalization of the k -median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n , our goal is to find a set C of size k such that the sum of errors D( P,C ) = ∑ p ∈ P min c ∈ C {D( p,c )} is minimized. The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the k -median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time n 2 O ( mk log( mk /ϵ)), where m is a constant that depends only on ϵ and D. Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the k -median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean k -median problem and the Euclidean k -means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].
| Year | Citations | |
|---|---|---|
1951 | 19.5K | |
1982 | 15.1K | |
2007 | 6.3K | |
1936 | 6K | |
1967 | 2.6K | |
2004 | 1.5K | |
1993 | 994 | |
1998 | 826 | |
1998 | 681 | |
1980 | 559 |
Page 1
Page 1