Deep Boltzmann machines

TLDR

The paper introduces a new learning algorithm for multi‑layer Boltzmann machines. The algorithm estimates data‑dependent expectations via a variational approximation that concentrates on a single mode, approximates data‑independent expectations with persistent Markov chains, and employs a layer‑by‑layer pre‑training phase to efficiently learn deep Boltzmann machines with many layers and millions of parameters. Experiments on MNIST and NORB demonstrate that deep Boltzmann machines learn effective generative models and achieve strong performance on handwritten digit and visual object recognition.

Abstract

We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent expectations are estimated using a variational approximation that tends to focus on a single mode, and dataindependent expectations are approximated using persistent Markov chains. The use of two quite different techniques for estimating the two types of expectation that enter into the gradient of the log-likelihood makes it practical to learn Boltzmann machines with multiple hidden layers and millions of parameters. The learning can be made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass. We present results on the MNIST and NORB datasets showing that deep Boltzmann machines learn good generative models and perform well on handwritten digit and visual object recognition tasks.

References

Page 1

	Year	Citations

Page 1