Do Better ImageNet Models Transfer Better?

TLDR

Transfer learning is central to computer vision, yet the link between ImageNet performance and transfer ability has never been systematically examined, despite the common assumption that better ImageNet accuracy implies better performance on other vision tasks. The study compares 16 classification networks across 12 image classification datasets to assess the relationship between ImageNet accuracy and transfer performance. We evaluate 16 classification networks on 12 image classification datasets, using both fixed feature extraction and fine‑tuning. The results show a near‑perfect correlation between ImageNet accuracy and transfer accuracy for both fixed feature extraction and fine‑tuning, yet this relationship is highly sensitive to ImageNet training regularization and breaks down on small fine‑grained datasets, indicating that while ImageNet architectures generalize well, their learned features are less transferable than previously thought.

Abstract

Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer. An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks. However, this hypothesis has never been systematically tested. Here, we compare the performance of 16 classification networks on 12 image classification datasets. We find that, when networks are used as fixed feature extractors or fine-tuned, there is a strong correlation between ImageNet accuracy and transfer accuracy (r = 0.99 and 0.96, respectively). In the former setting, we find that this relationship is very sensitive to the way in which networks are trained on ImageNet; many common forms of regularization slightly improve ImageNet accuracy but yield features that are much worse for transfer learning. Additionally, we find that, on two small fine-grained image classification datasets, pretraining on ImageNet provides minimal benefits, indicating the learned features from ImageNet do not transfer well to fine-grained tasks. Together, our results show that ImageNet architectures generalize well across datasets, but ImageNet features are less general than previously suggested.

References

Page 1

	Year	Citations

Page 1