Distance Metric Learning with Application to Clustering with Side-Information

TLDR

Clustering algorithms such as K‑means depend on a suitable metric, and without one users may need to manually adjust the metric to obtain meaningful clusters. The paper proposes a systematic method for users to supply similarity examples so that a distance metric can be learned that respects those relationships. The algorithm learns a metric by solving a convex optimization problem that incorporates user‑supplied similar and dissimilar pairs, yielding efficient, globally optimal solutions. Empirical results show that the learned metrics substantially improve clustering performance.

Abstract

Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many plausible ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider similar. For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in ℝn, learns a distance metric over ℝn that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance.

References

Page 1

	Year	Citations

Page 1