Recurrent Models of Visual Attention

Abstract

Applying convolutional neural networks to large images is computationally ex-pensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is ca-pable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it per-forms can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so. 1

References

Page 1

	Year	Citations
Long Short-Term Memory Sepp Hochreiter, Jürgen Schmidhuber Neural Computation	1997	93.8K
Rapid object detection using a boosted cascade of simple features Paul Viola, Michael Jones EngineeringMachine LearningFeature DetectionMachine Learning ApproachBiometrics	2005	18.1K
A model of saliency-based visual attention for rapid scene analysis Laurent Itti, Christof Koch, Ernst Niebur IEEE Transactions on Pattern Analysis and Machine Intelligence Dynamical Neural NetworkScene AnalysisEngineeringSaliency-based Visual AttentionMultiscale Image Features	1998	11.2K
Simple statistical gradient-following algorithms for connectionist reinforcement learning Ronald J. Williams Machine Learning Artificial IntelligenceEngineeringMachine LearningSequential LearningConnectionist Reinforcement Learning	1992	7.4K
Policy Gradient Methods for Reinforcement Learning with Function Approximation Richard S. Sutton, David McAllester, Satinder Singh,	1999	5K
Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Antonio Torralba, Aude Oliva, Monica S. Castelhano, Psychological Review Scene AnalysisEngineeringContextual Guidance ModelContextual GuidanceAttention	2006	1.6K
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks Pierre Sermanet, David Eigen, Xiang Zhang, arXiv (Cornell University) Image ClassificationConvolutional Neural NetworkMachine VisionImage AnalysisMachine Learning	2013	1.3K
The Dynamic Representation of Scenes Ronald A. Rensink Visual Cognition	2000	1.1K
Cascade object detection with deformable part models Pedro F. Felzenszwalb, Ross Girshick, David McAllester Structured PredictionEngineeringMachine LearningObject CategorizationNatural Language Processing	2010	930
What is an object? Bogdan Alexe, Thomas Deselaers, Vittorio Ferrari Scene AnalysisEngineeringIntelligent SystemsSemanticsAbstract Object Theory	2010	803

Page 1