On calibration of modern neural networks

TLDR

"Confidence calibration, the task of predicting probability estimates that reflect true correctness likelihood, is crucial for classification models across many applications." That is fine. Ensure each ends with period. No extra commentary. Let's craft sentences: Background: "Confidence calibration, the task of predicting probability estimates that reflect true correctness likelihood, is crucial for classification models across many applications." Purpose: "The study evaluates the performance of various post‑processing calibration methods on state‑of‑the‑art architectures with image and document classification datasets." Mechanism: "The authors assess these methods on state‑of‑the‑art architectures with image and document classification datasets." But that is redundant.

Abstract

Confidence calibration -- the problem of predicting probability estimates representative of the true correctness likelihood -- is important for classification models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration. We evaluate the performance of various post-processing calibration methods on state-of-the-art architectures with image and document classification datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions.