FPGA-based CNN Processor with Filter-Wise-Optimized Bit Precision

Abstract

Many efforts have been made to improve the efficiency for inference of deep convolutional neural network. To achieve further improvement of the efficiency without penalty of accuracy, we propose filter-wise optimized quantization with variable precision and the hardware architecture that fully supports it; as the bit precision for operations is reduced by granularity optimizing weight bit precision filter-by-filter, the execution time is reduced proportionally to the total number of computations multiplied with the number of weight bit. We implement the proposed architecture on FPGA and demonstrate that ResNet-50 run with 5.3× less execution cycles without penalty of accuracy.

References

Page 1

	Year	Citations

Page 1