The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

TLDR

The study introduces four real‑world distribution‑shift datasets and a novel data‑augmentation method that surpasses models pre‑trained with 1000× more labeled data. Using the new datasets, the authors benchmark existing out‑of‑distribution robustness methods and develop a new augmentation technique that outperforms models pre‑trained with 1000× more labeled data. The authors find that larger models and artificial augmentations enhance robustness on real‑world distribution shifts, that gains on artificial benchmarks transfer, yet no single method consistently improves across all shift types, underscoring the need to evaluate multiple shifts together.

Abstract

We introduce four new real-world distribution shift datasets consisting of changes in image style, image blurriness, geographic location, camera operation, and more. With our new datasets, we take stock of previously proposed methods for improving out-of-distribution robustness and put them to the test. We find that using larger models and artificial data augmentations can improve robustness on real-world distribution shifts, contrary to claims in prior work. We find improvements in artificial robustness benchmarks can transfer to real-world distribution shifts, contrary to claims in prior work. Motivated by our observation that data augmentations can help with real-world distribution shifts, we also introduce a new data augmentation method which advances the state-of-the-art and outperforms models pre-trained with 1000× more labeled data. Overall we find that some methods consistently help with distribution shifts in texture and local image statistics, but these methods do not help with some other distribution shifts like geographic changes. Our results show that future research must study multiple distribution shifts simultaneously, as we demonstrate that no evaluated method consistently improves robustness.

References

Page 1

	Year	Citations

Page 1