Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

TLDR

Very deep convolutional networks, such as the Inception architecture, have driven major advances in image recognition, and residual connections added to traditional architectures have achieved state‑of‑the‑art performance comparable to Inception‑v3. The study investigates whether adding residual connections to Inception networks improves training speed and performance, and introduces streamlined residual and non‑residual Inception architectures. The authors propose streamlined residual and non‑residual Inception architectures and train them with residual connections to accelerate learning. Empirical results show that residual connections accelerate training and yield modest performance gains, with streamlined residual Inception variants improving single‑frame accuracy on ILSVRC‑2012 and achieving a 3.08 % top‑5 error when ensembled with Inception‑v4.

Abstract

Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional architecture has yielded state-of-the-art performance in the 2015 ILSVRC challenge; its performance was similar to the latest generation Inception-v3 network. This raises the question of whether there are any benefit in combining the Inception architecture with residual connections. Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin. We also present several new streamlined architectures for both residual and non-residual Inception networks. These variations improve the single-frame recognition performance on the ILSVRC 2012 classification task significantly. We further demonstrate how proper activation scaling stabilizes the training of very wide residual Inception networks. With an ensemble of three residual and one Inception-v4, we achieve 3.08 percent top-5 error on the test set of the ImageNet classification (CLS) challenge

References

Page 1

	Year	Citations
Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang, Shaoqing Ren, Image ClassificationDeep Neural NetworksMachine VisionImage AnalysisMachine Learning	2016	214.9K
ImageNet classification with deep convolutional neural networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Communications of the ACM Convolutional Neural NetworkEngineeringMachine LearningNeural NetworkImagenet Classification	2017	75.5K
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan, Andrew Zisserman arXiv (Cornell University) Geometric LearningConvolutional Neural NetworkEngineeringMachine LearningConvolutional Network Depth	2014	75.4K
Going deeper with convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Image ClassificationDeep Neural NetworksImage AnalysisMachine LearningData Science	2015	46.2K
Fully convolutional networks for semantic segmentation Jonathan Long, Evan Shelhamer, Trevor Darrell	2015	36.2K
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Convolutional Neural NetworkEngineeringMachine LearningFeature DetectionRich Feature Hierarchies	2014	31.2K
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe, Christian Szegedy arXiv (Cornell University) Data AugmentationDeep Neural NetworksMachine VisionMachine LearningData Science	2015	24.2K
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe arXiv (Cornell University) Data AugmentationDeep Neural NetworksMachine VisionMachine LearningData Science	2024	15.6K
Backpropagation Applied to Handwritten Zip Code Recognition Yann LeCun, Bernhard E. Boser, J. S. Denker, Neural Computation Artificial IntelligenceConvolutional Neural NetworkEngineeringMachine LearningAi Foundation	1989	11.6K
Large-Scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Convolutional Neural NetworkMachine VisionMachine LearningData ScienceImage Analysis	2014	6.3K

Page 1