Self-supervised Pretraining of Visual Features in the Wild

TLDR

Self‑supervised methods such as MoCo, SimCLR, BYOL, and SwAV have narrowed the performance gap with supervised learning on curated datasets like ImageNet, yet their effectiveness on uncurated, random images remains untested. This study investigates whether self‑supervision can achieve comparable performance when trained on large‑scale random, uncurated images without supervision. The authors trained a 1.3 B‑parameter RegNetY model, named SEER, on 1 B random images using 512 GPUs, employing a self‑supervised training regime. SEER attains 84.2 % top‑1 accuracy on ImageNet, outperforming the previous best self‑supervised model by 1 %, and demonstrates strong few‑shot performance with 77.9 % top‑1 accuracy using only 10 % of ImageNet data. Code is available at https://github.com/facebookresearch/vissl.

Abstract

Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: https://github.com/facebookresearch/vissl

References

Page 1

	Year	Citations

Page 1