ASBP: Automatic Structured Bit-Pruning for RRAM-based NN Accelerator

Abstract

Network sparsity or pruning is an extensively studied method to optimize the computation efficiency of deep neural networks (DNNs) for CMOS-based accelerators, such as FPGAs and GPUs. Though the RRAM-based accelerator has demonstrated superior performance and energy efficiency for DNN tasks, deploying the sparse neural networks desires dedicated consideration to save resource consumption without introducing the expensive index overhead and sophisticated control. To exploit the potential of sparse neural network design on the RRAM-based accelerator, we propose an automatic structured bit-pruning design, ASBP, to harmonize the optimization objective of DNN sparsity with efficient RRAM deployment. Specifically, ASBP prunes the bits of weight which are split into different crossbars and thus, free the zero-value crossbar when mapping the neural network into RRAM-based accelerators without extra hardware modification. Meanwhile, ASBP employs the reinforcement learning (RL) approach to automatically select the best crossbar-aware bit-sparsity strategy for any given neural network without laborious human efforts. According to our experiments on a set of representative neural networks, ASBP saves up to 79.01% energy consumption and 54.79% area overhead compared to the baseline that deploys the original DNN on the RRAM-based accelerator. Besides, ASBP outperforms the state-of-the-art bit-sparsity design by 1.4x in terms of the energy reduction on the RRAM-based accelerator.

References

Page 1

	Year	Citations

Page 1