MnasNet: Platform-Aware Neural Architecture Search for Mobile

TLDR

Designing convolutional neural networks for mobile devices is challenging because models must be small, fast, and accurate, yet the vast architectural possibilities make manual trade‑off balancing difficult. This work proposes an automated mobile neural architecture search that explicitly incorporates real‑world latency into the objective to identify models balancing accuracy and speed. The method measures inference latency on actual mobile phones and uses a novel factorized hierarchical search space that promotes layer diversity to efficiently explore the architecture space. Experimental results show that MnasNet consistently outperforms state‑of‑the‑art mobile CNNs, achieving 75.2 % top‑1 accuracy with 78 ms latency on a Pixel phone—1.8× faster than MobileNetV2 and 2.3× faster than NASNet—while also delivering higher mAP on COCO object detection. Code is available at https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet.

Abstract

Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, we propose a novel factorized hierarchical search space that encourages layer diversity throughout the network. Experimental results show that our approach consistently outperforms state-of-the-art mobile CNN models across multiple vision tasks. On the ImageNet classification task, our MnasNet achieves 75.2% top-1 accuracy with 78ms latency on a Pixel phone, which is 1.8× faster than MobileNetV2 with 0.5% higher accuracy and 2.3× faster than NASNet with 1.2% higher accuracy. Our MnasNet also achieves better mAP quality than MobileNets for COCO object detection. Code is at https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet.

References

Page 1

	Year	Citations

Page 1