Concepedia

Publication | Closed Access

Coresets and Approximate Clustering for Bregman Divergences

15

Citations

0

References

2009

Year

Abstract

We study the generalized k-median problem with respect to a Bregman divergence Dø. Given a finite set P ⊆ ℝd of size n, our goal is to find a set C of size k such that the sum of errors cost(P, C) = Σp∊P minc∊C {Dø(p, c)} is minimized. The Bregman k-median problem plays an important role in many applications, e.g. information theory, statistics, text classification, and speech processing. We give the first coreset construction for this problem for a large subclass of Bregman divergences, including important dissimilarity measures such as the Kullback-Leibler divergence and the Itakura-Saito divergence. Using these coresets, we give a (1 + ∊)-approximation algorithm for the Bregman k-median problem with running time . This result improves over the previousely fastest known (1 + ∊)-approximation algorithm from [1]. Unlike the analysis of most coreset constructions our analysis does not rely on the construction of ∊-nets. Instead, we prove our results by purely combinatorial means.