Memory-Reduced Network Stacking for Edge-Level CNN Architecture With Structured Weight Pruning

Abstract

This paper presents a novel stacking and multi-level indexing scheme for convolutional neural networks (CNNs) used in energy-limited edge-level systems. Basically, the proposed scheme offers multiple accuracy modes by adopting a structured weight pruning method that enables a CNN to be trained once with multiple pruning ratios and thereby allows for adaptive energy-accuracy trade-offs. The memory overhead required to store several different networks is kept to a minimum by adopting a novel method for including smaller lower-accuracy networks as subnetworks of larger higher-accuracy networks and by using a unique multi-level indexing scheme that can effectively store compressed weight data for the proposed stacked-CNN architecture. Experimental results show that the proposed method successfully reduces the memory footprint by up to 33% when compared to a baseline CNN architecture. An FPGA-based multimode CNN accelerator that implements the proposed scheme has been designed. Energy usage analysis with a case study shows that the inference energy required for on-device CNN processing can be reduced by up to 1.94 times over the baseline design.

References

Page 1

	Year	Citations

Page 1