Publication | Closed Access
Memory-Reduced Network Stacking for Edge-Level CNN Architecture With Structured Weight Pruning
24
Citations
34
References
2019
Year
Convolutional Neural NetworkEngineeringMachine LearningComputer ArchitectureNovel StackingMemory-reduced Network StackingSparse Neural NetworkEmbedded Machine LearningParallel ComputingStructured WeightEdge-level Cnn ArchitectureComputer EngineeringComputer ScienceDeep LearningNeural Architecture SearchModel CompressionHardware AccelerationEdge ComputingMulti-level Indexing SchemeConvolutional Neural Networks
This paper presents a novel stacking and multi-level indexing scheme for convolutional neural networks (CNNs) used in energy-limited edge-level systems. Basically, the proposed scheme offers multiple accuracy modes by adopting a structured weight pruning method that enables a CNN to be trained once with multiple pruning ratios and thereby allows for adaptive energy-accuracy trade-offs. The memory overhead required to store several different networks is kept to a minimum by adopting a novel method for including smaller lower-accuracy networks as subnetworks of larger higher-accuracy networks and by using a unique multi-level indexing scheme that can effectively store compressed weight data for the proposed stacked-CNN architecture. Experimental results show that the proposed method successfully reduces the memory footprint by up to 33% when compared to a baseline CNN architecture. An FPGA-based multimode CNN accelerator that implements the proposed scheme has been designed. Energy usage analysis with a case study shows that the inference energy required for on-device CNN processing can be reduced by up to 1.94 times over the baseline design.
| Year | Citations | |
|---|---|---|
Page 1
Page 1