On Calibration of Modern Neural Networks

TLDR

Confidence calibration—the problem of predicting probability estimates representative of the true correctness likelihood—is important for classification models in many applications. The study evaluates the performance of various post‑processing calibration methods on state‑of‑the‑art architectures with image and document classification datasets. The authors assess calibration by applying these methods to modern neural networks across image and document classification tasks. They find that modern neural networks are poorly calibrated, with depth, width, weight decay, and Batch Normalization significantly affecting calibration, and that temperature scaling provides a simple, effective recipe for improving calibration across most datasets.

Abstract

Confidence calibration -- the problem of predicting probability estimates representative of the true correctness likelihood -- is important for classification models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration. We evaluate the performance of various post-processing calibration methods on state-of-the-art architectures with image and document classification datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions.

References

Page 1

	Year	Citations

Page 1