Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks

TLDR

Large multilayer neural networks trained with backpropagation achieve state‑of‑the‑art performance, yet require extensive hyperparameter tuning, lack calibrated probabilistic predictions, and are prone to overfitting, whereas Bayesian learning can mitigate these issues but is not scalable to large datasets or networks. This paper introduces probabilistic backpropagation (PBP), a scalable method for learning Bayesian neural networks. PBP extends classical backpropagation by propagating probability distributions forward through the network and computing gradients backward to update weight posteriors. Experiments on ten real‑world datasets demonstrate that PBP is significantly faster than existing Bayesian techniques, delivers competitive predictive performance, and accurately estimates posterior variance of network weights.

Abstract

Large multilayer neural networks trained with backpropagation have recently achieved state-of-the-art results in a wide range of problems. However, using backprop for neural net learning still has some disadvantages, e.g., having to tune a large number of hyperparameters to the data, lack of calibrated probabilistic predictions, and a tendency to overfit the training data. In principle, the Bayesian approach to learning neural networks does not have these problems. However, existing Bayesian techniques lack scalability to large dataset and network sizes. In this work we present a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP). Similar to classical backpropagation, PBP works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients. A series of experiments on ten real-world datasets show that PBP is significantly faster than other techniques, while offering competitive predictive abilities. Our experiments also show that PBP provides accurate estimates of the posterior variance on the network weights.

References

Page 1

	Year	Citations

Page 1