A fully connected layer elimination for a binarizec convolutional neural network on an FPGA

Abstract

A pre-trained convolutional deep neural network (CNN) is widely used for embedded systems, which requires highly power-and-area efficiency. In that case, the CPU is too slow, the embedded GPU dissipates much power, and the ASIC cannot keep up with the rapidly progress of the CNN variations. This paper uses a binarized CNN which treats only binary 2-values for the inputs and the weights. Since the multiplier is replaced into an XNOR circuit, we can realize a high-performance MAC circuit by using many XNOR circuits. In the paper, we eliminate internal FC layers excluding the last one, then, insert a binarized average pooling layer, which can be realized by a majority circuit for binarized (1/0) values. In that case, since the weight memory is replaced into the 1's counter, we can realize a compact and faster CNN than the conventional ones. We implemented the VGG-11 benchmark CNN for the CIFAR10 image classification task on the Xilinx Inc. Zedboard. Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better.

References

Page 1

	Year	Citations

Page 1