Bayesian Learning via Stochastic Gradient Langevin Dynamics

TLDR

The authors introduce a framework that learns from large datasets by iteratively processing small mini‑batches and offers a practical Monte Carlo method that collects posterior samples once a sampling threshold is exceeded. They achieve Bayesian inference by adding calibrated noise to stochastic gradient updates, annealing the step size to drive iterates toward the posterior, and apply this approach to mixture models, logistic regression, and ICA with natural gradients. The method yields samples from the true posterior and, by blending optimization with Bayesian sampling, inherently guards against overfitting.

Abstract

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior provides an inbuilt protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a sampling threshold and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients.

References

Page 1

	Year	Citations

Page 1