Supervised k-Means Clustering

Abstract

The k-means clustering algorithm is one of the most widely used, effective, and best understood clustering methods. However, successful use of k-means requires a carefully chosen distance measure that reflects the properties of the clustering task. Since designing this distance measure by hand is often difficult, we provide methods for training k-means using supervised data. Given training data in the form of sets of items with their desired partitioning, we provide a structural SVM method that learns a distance measure so that k-means produces the desired clusterings. We propose two variants of the methods – one based on a spectral relaxation and one based on the traditional k-means algorithm – that are both computationally efficient. For each variant, we provide a theoretical characterization of its accuracy in solving the training problem. We also provide an empirical clustering quality and runtime analysis of these learning methods on varied high-dimensional datasets. Categories and Subject Descriptors I.2.6 [Artificial Intelligence]: Learning—induction, parameter learning; I.5.3 [Pattern Recognition]: Clustering—algorithms,

References

Page 1

	Year	Citations

Page 1