Sparse Feature Learning for Deep Belief Networks

TLDR

Unsupervised learning seeks to uncover hidden structure and produce representations that improve supervised learning, often by reconstructing inputs while enforcing properties such as low dimensionality or sparsity, or by approximating data density through stochastic reconstruction. The authors introduce an efficient algorithm for learning sparse representations and propose a criterion that balances reconstruction error against information content to compare unsupervised models, benchmarking against a Restricted Boltzmann Machine. They validate the approach by extracting features from handwritten digit and natural image patch datasets, demonstrating its practical applicability. Stacking multiple levels of these machines and training them sequentially enables the capture of high‑order dependencies among input variables.

Abstract

Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efficient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured.

References

Page 1

	Year	Citations

Page 1