Compression of Deep Convolutional Neural Networks for Fast and Low Power\n Mobile Applications

Abstract

Although the latest high-end smartphone has powerful CPU and GPU, running\ndeeper convolutional neural networks (CNNs) for complex tasks such as ImageNet\nclassification on mobile devices is challenging. To deploy deep CNNs on mobile\ndevices, we present a simple and effective scheme to compress the entire CNN,\nwhich we call one-shot whole network compression. The proposed scheme consists\nof three steps: (1) rank selection with variational Bayesian matrix\nfactorization, (2) Tucker decomposition on kernel tensor, and (3) fine-tuning\nto recover accumulated loss of accuracy, and each step can be easily\nimplemented using publicly available tools. We demonstrate the effectiveness of\nthe proposed scheme by testing the performance of various compressed CNNs\n(AlexNet, VGGS, GoogLeNet, and VGG-16) on the smartphone. Significant\nreductions in model size, runtime, and energy consumption are obtained, at the\ncost of small loss in accuracy. In addition, we address the important\nimplementation level issue on 1?1 convolution, which is a key operation of\ninception module of GoogLeNet as well as CNNs compressed by our proposed\nscheme.\n