Rethinking the Smaller-Norm-Less-Informative Assumption in Channel\n Pruning of Convolution Layers

Abstract

Model pruning has become a useful technique that improves the computational\nefficiency of deep learning, making it possible to deploy solutions in\nresource-limited scenarios. A widely-used practice in relevant work assumes\nthat a smaller-norm parameter or feature plays a less informative role at the\ninference time. In this paper, we propose a channel pruning technique for\naccelerating the computations of deep convolutional neural networks (CNNs) that\ndoes not critically rely on this assumption. Instead, it focuses on direct\nsimplification of the channel-to-channel computation graph of a CNN without the\nneed of performing a computationally difficult and not-always-useful task of\nmaking high-dimensional tensors of CNN structured sparse. Our approach takes\ntwo stages: first to adopt an end-to- end stochastic training method that\neventually forces the outputs of some channels to be constant, and then to\nprune those constant channels from the original neural network by adjusting the\nbiases of their impacting layers such that the resulting compact model can be\nquickly fine-tuned. Our approach is mathematically appealing from an\noptimization perspective and easy to reproduce. We experimented our approach\nthrough several image learning benchmarks and demonstrate its interesting\naspects and competitive performance.\n