Publication | Open Access
Learning with a Wasserstein Loss
270
Citations
18
References
2015
Year
Structured PredictionEngineeringMachine LearningText MiningNatural Language ProcessingData ScienceData MiningProbability MeasuresSemi-supervised LearningSupervised LearningAutomatic ClassificationComputational Learning TheoryPredictive AnalyticsKnowledge DiscoveryLoss FunctionStatistical Learning TheoryDeep LearningWasserstein DistanceWasserstein Loss
Learning multi‑label outputs is difficult, yet many problems have a natural output metric; the Wasserstein distance offers a natural dissimilarity measure, and recent regularized approximations enable efficient optimization. The paper develops a Wasserstein‑based loss function for multi‑label learning. The authors present an efficient learning algorithm using a regularized Wasserstein distance, extend it to unnormalized measures, and provide a statistical learning bound. The Wasserstein loss promotes smooth predictions and outperforms a baseline on Yahoo Flickr tag prediction.
Learning to predict multi-label outputs is challenging, but in many problems there is a natural metric on the outputs that can be used to improve predictions. In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance. The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact Wasserstein distance is costly, recent work has described a regularized approximation that is efficiently computed. We describe an efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures. We also describe a statistical learning bound for the loss. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data tag prediction problem, using the Yahoo Flickr Creative Commons dataset, outperforming a baseline that doesn't use the metric.
| Year | Citations | |
|---|---|---|
Page 1
Page 1