Unsupervised Feature Learning via Non-parametric Instance Discrimination

TLDR

Neural network classifiers trained with labels can implicitly learn visual similarity between categories. The study investigates whether instance‑level discrimination can learn useful feature representations without class labels. The authors formulate instance discrimination as a non‑parametric classification problem and employ noise‑contrastive estimation to handle the large number of instance classes. The method outperforms state‑of‑the‑art ImageNet classification, scales with more data and larger networks, yields competitive semi‑supervised learning and object detection after fine‑tuning, and is highly compact—requiring only 600 MB for a million images and enabling fast nearest‑neighbor retrieval.

Abstract

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether this observation can be extended beyond the conventional domain of supervised learning: Can we learn a good feature representation that captures apparent similarity among instances, instead of classes, by merely asking the feature to be discriminative of individual instances? We formulate this intuition as a non-parametric classification problem at the instance-level, and use noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes. Our experimental results demonstrate that, under unsupervised learning settings, our method surpasses the state-of-the-art on ImageNet classification by a large margin. Our method is also remarkable for consistently improving test performance with more training data and better network architectures. By fine-tuning the learned feature, we further obtain competitive results for semi-supervised learning and object detection tasks. Our non-parametric model is highly compact: With 128 features per image, our method requires only 600MB storage for a million images, enabling fast nearest neighbour retrieval at the run time.

References

Page 1

	Year	Citations

Page 1