Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

TLDR

Object detection on PASCAL VOC has plateaued, with top methods relying on complex ensembles that fuse low‑level features with high‑level context. We introduce a simple, scalable algorithm that raises VOC 2012 mean average precision by over 30 % to 53.3 %. The method, called R‑CNN, applies high‑capacity CNNs to bottom‑up region proposals and, when data are scarce, uses supervised pre‑training on an auxiliary task followed by domain‑specific fine‑tuning. Experiments show the network learns a rich hierarchy of image features and achieves the reported mAP improvement. Source code is available at http://www.cs.berkeley.edu/~rbg/rcnn.

Abstract

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

References

Page 1

	Year	Citations
ImageNet classification with deep convolutional neural networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Communications of the ACM Convolutional Neural NetworkEngineeringMachine LearningNeural NetworkImagenet Classification	2017	75.5K
ImageNet: A large-scale hierarchical image database Jia Deng, Wei Dong, Richard Socher, 2009 IEEE Conference on Computer Vision and Pattern Recognition EngineeringMachine LearningImage RetrievalImage DatabaseImage Recognition (Computer Vision)	2009	60.2K
Gradient-based learning applied to document recognition Yann LeCun, Léon Bottou, Yoshua Bengio, Proceedings of the IEEE EngineeringMachine LearningMultilayer Neural NetworksImage AnalysisData Science	1998	56.5K
Distinctive Image Features from Scale-Invariant Keypoints David Lowe International Journal of Computer Vision Machine VisionImage AnalysisFeature DetectionEngineeringPattern Recognition	2004	54.6K
Histograms of Oriented Gradients for Human Detection Navneet Dalal, Bill Triggs EngineeringFeature DetectionMachine LearningBiometricsOriented Gradients	2005	31.6K
The Pascal Visual Object Classes (VOC) Challenge Mark Everingham, Luc Van Gool, Christopher K. I. Williams, International Journal of Computer Vision Image AnalysisMachine VisionEngineeringObject CategorizationPattern Recognition	2009	19K
Backpropagation Applied to Handwritten Zip Code Recognition Yann LeCun, Bernhard E. Boser, J. S. Denker, Neural Computation Artificial IntelligenceConvolutional Neural NetworkEngineeringMachine LearningAi Foundation	1989	11.6K
Object Detection with Discriminatively Trained Part-Based Models Pedro F. Felzenszwalb, Ross Girshick, David McAllester, IEEE Transactions on Pattern Analysis and Machine Intelligence Multiple Instance LearningObject Detection SystemMachine LearningEngineeringLatent Svm	2009	10K
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope Aude Oliva, Antonio Torralba International Journal of Computer Vision	2001	6.4K
Selective Search for Object Recognition Jasper Uijlings, Koen E. A. van de Sande, Theo Gevers, International Journal of Computer Vision Machine VisionMachine LearningImage AnalysisEngineeringPattern Recognition	2013	6.1K

Page 1