Concepedia

TLDR

Near out‑of‑distribution detection is a major challenge for deep neural networks. The study aims to improve near‑OOD detection by exploring few‑shot outlier exposure with pre‑trained transformers. The authors use multi‑modal image‑text transformers, feeding only outlier class names, to detect OOD without images. Large‑scale pre‑trained transformers raise near‑OOD AUROC from 85 % to over 96 % on CIFAR‑100 vs CIFAR‑10, from 66 % to 77 % on a genomics benchmark, and achieve 98.7–99.46 % with few‑shot exposure, while CLIP‑style class‑name inputs further surpass prior SOTA.

Abstract

Near out-of-distribution detection (OOD) is a major challenge for deep neural networks. We demonstrate that large-scale pre-trained transformers can significantly improve the state-of-the-art (SOTA) on a range of near OOD tasks across different data modalities. For instance, on CIFAR-100 vs CIFAR-10 OOD detection, we improve the AUROC from 85% (current SOTA) to more than 96% using Vision Transformers pre-trained on ImageNet-21k. On a challenging genomics OOD detection benchmark, we improve the AUROC from 66% to 77% using transformers and unsupervised pre-training. To further improve performance, we explore the few-shot outlier exposure setting where a few examples from outlier classes may be available; we show that pre-trained transformers are particularly well-suited for outlier exposure, and that the AUROC of OOD detection on CIFAR-100 vs CIFAR-10 can be improved to 98.7% with just 1 image per OOD class, and 99.46% with 10 images per OOD class. For multi-modal image-text pre-trained transformers such as CLIP, we explore a new way of using just the names of outlier classes as a sole source of information without any accompanying images, and show that this outperforms previous SOTA on standard vision OOD benchmark tasks.

References

YearCitations

Page 1