Mixed Precision Neural Architecture Search for Energy Efficient Deep Learning

Abstract

Large scale deep neural networks (DNNs) have achieved remarkable successes in various artificial intelligence applications. However, high computational complexity and energy costs of DNNs impede their deployment on edge devices with a limited energy budget. Two major approaches have been investigated for learning compact and energy-efficient DNNs. Neural architecture search (NAS) enables the design automation of neural network structures to achieve both high accuracy and energy efficiency. The other one, model quantization, leverages low-precision representation and arithmetic to trade off efficiency against accuracy. Although NAS and quantization are both critical components of the DNN design closure, limited research considered them collaboratively. In this paper, we propose a new methodology to perform end-to-end joint optimization over the neural architecture and quantization space. Our approach searches for the optimal combinations of architectures and precisions (bit-widths) to directly optimize both the prediction accuracy and hardware energy consumption. Our framework improves and automatizes the flow across neural architecture design and hardware deployment. Experimental results demonstrate that our proposed approach achieves better energy efficiency than advanced quantization approaches and efficiency-aware NAS methods on CIFAR-100 and ImageNet. We study different search and quantization policies, and offer insights for both neural architecture and hardware designs.

References

Page 1

	Year	Citations

Page 1