Publication | Closed Access
Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA
27
Citations
0
References
2019
Year
Unknown Venue
Convolutional Neural NetworkEngineeringMachine LearningHardware AlgorithmComputer ArchitectureImage AnalysisData ScienceSparse Neural NetworkEmbedded Machine LearningBinarized Neural NetworkComputer EngineeringComputer ScienceDeep LearningNeural Architecture SearchBnn InferenceFpga DesignModel CompressionComputer VisionClassical CnnHardware AccelerationTowards Fast
Binarized Neural Network (BNN) removes bitwidth redundancy in classical CNN by using a single bit (-1/+1) for network parameters and intermediate representations, which has greatly reduced the off-chip data transfer and storage overhead. However, a large amount of computation redundancy still exists in BNN inference. By analyzing local properties of images and the learned BNN kernel weights, we observe an average of ~78% input similarity and ~59% weight similarity among weight kernels, measured by our proposed metric in common network architectures. Thus there does exist redundancy that can be exploited to further reduce the amount of on-chip computations.