MobileNetV2: Inverted Residuals and Linear Bottlenecks

TLDR

The paper introduces MobileNetV2, a mobile architecture that improves state‑of‑the‑art performance across tasks and model sizes, and presents efficient applications to object detection via SSDLite and to semantic segmentation via Mobile DeepLabv3. MobileNetV2 uses an inverted residual structure with shortcut connections between thin bottleneck layers, lightweight depthwise convolutions for non‑linearity, and decouples input/output domains from transformation expressiveness; the authors evaluate it on ImageNet, COCO, VOC, measuring accuracy, multiply‑add operations, latency, and parameters. MobileNetV2 achieves state‑of‑the‑art performance across tasks and model sizes, and the authors show that removing non‑linearities in narrow layers preserves representational power, improving performance.

Abstract

In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on ImageNet [1] classification, COCO object detection [2], VOC image segmentation [3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.

References

Page 1

	Year	Citations
Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang, Shaoqing Ren, Image ClassificationDeep Neural NetworksMachine VisionImage AnalysisMachine Learning	2016	214.9K
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan, Andrew Zisserman arXiv (Cornell University) Geometric LearningConvolutional Neural NetworkEngineeringMachine LearningConvolutional Network Depth	2014	75.4K
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, IEEE Transactions on Pattern Analysis and Machine Intelligence	2016	52.4K
Going deeper with convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Image ClassificationDeep Neural NetworksImage AnalysisMachine LearningData Science	2015	46.2K
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky, Jia Deng, Hao Su, International Journal of Computer Vision Image ClassificationConvolutional Neural NetworkMachine VisionImage AnalysisEngineering	2015	39.5K
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, IEEE Transactions on Pattern Analysis and Machine Intelligence Semantic Image SegmentationConvolutional Neural NetworkScene AnalysisImage AnalysisMachine Learning	2017	21.4K
YOLO9000: Better, Faster, Stronger Joseph Redmon, Ali Farhadi Convolutional Neural NetworkImage ClassificationMachine VisionMachine LearningImage Analysis	2017	18.6K
Xception: Deep Learning with Depthwise Separable Convolutions François Chollet Convolutional Neural NetworkEngineeringMachine LearningInception ModulesImage Analysis	2017	18.2K
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, arXiv (Cornell University) Region ProposalScene AnalysisImage AnalysisMachine LearningMachine Vision	2015	18.2K
Aggregated Residual Transformations for Deep Neural Networks Saining Xie, Ross Girshick, Piotr Dollár, Convolutional Neural NetworkEngineeringMachine LearningAutoencodersImage Classification	2017	11.6K

Page 1