Boosting for transfer learning

TLDR

Traditional machine learning assumes training and test data share the same distribution, yet this often fails when a new domain appears with only labeled data from a similar old domain, making labeling costly and discarding old data wasteful. The paper introduces TrAdaBoost, a transfer learning framework that extends boosting algorithms to address this distribution shift. TrAdaBoost iteratively reweights examples, leveraging a small set of newly labeled data to exploit abundant old data, and its convergence and performance are theoretically and empirically validated. Experiments show that TrAdaBoost learns accurate models with minimal new data and effectively transfers knowledge from old to new domains.

Abstract

Traditional machine learning makes a basic assumption: the training and test data should be under the same distribution. However, in many cases, this identical-distribution assumption does not hold. The assumption might be violated when a task from one new domain comes, while there are only labeled data from a similar old domain. Labeling the new data can be costly and it would also be a waste to throw away all the old data. In this paper, we present a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms (Freund & Schapire, 1997). TrAdaBoost allows users to utilize a small amount of newly labeled data to leverage the old data to construct a high-quality classification model for the new data. We show that this method can allow us to learn an accurate model using only a tiny amount of new data and a large amount of old data, even when the new data are not sufficient to train a model alone. We show that TrAdaBoost allows knowledge to be effectively transferred from the old data to the new. The effectiveness of our algorithm is analyzed theoretically and empirically to show that our iterative algorithm can converge well to an accurate model.

References

Page 1

	Year	Citations

Page 1