Quantized Guided Pruning for Efficient Hardware Implementations of\n Convolutional Neural Networks

Abstract

Convolutional Neural Networks (CNNs) are state-of-the-art in numerous\ncomputer vision tasks such as object classification and detection. However, the\nlarge amount of parameters they contain leads to a high computational\ncomplexity and strongly limits their usability in budget-constrained devices\nsuch as embedded devices. In this paper, we propose a combination of a new\npruning technique and a quantization scheme that effectively reduce the\ncomplexity and memory usage of convolutional layers of CNNs, and replace the\ncomplex convolutional operation by a low-cost multiplexer. We perform\nexperiments on the CIFAR10, CIFAR100 and SVHN and show that the proposed method\nachieves almost state-of-the-art accuracy, while drastically reducing the\ncomputational and memory footprints. We also propose an efficient hardware\narchitecture to accelerate CNN operations. The proposed hardware architecture\nis a pipeline and accommodates multiple layers working at the same time to\nspeed up the inference process.\n

References

Page 1

	Year	Citations

Page 1