Multi-column deep neural networks for image classification

TLDR

Traditional computer vision methods fall short of human performance on tasks such as handwritten digit and traffic sign recognition, whereas biologically plausible wide‑and‑deep neural networks can match or exceed it. By using small receptive‑field convolutional winner‑take‑all neurons to build deep, sparsely connected layers analogous to the mammalian visual pathway, training only the winning neurons, and aggregating predictions from multiple columns that specialize on differently preprocessed inputs, the authors achieve fast, GPU‑accelerated learning. The approach attains near‑human accuracy on MNIST, outperforms humans by a factor of two on a traffic‑sign benchmark, and surpasses state‑of‑the‑art performance on numerous other image‑classification datasets.

Abstract

Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.

References

Page 1

	Year	Citations

Page 1