An FPGA-based processor for training convolutional neural networks

Abstract

Convolutional neural networks (CNNs) have gained great success in various computer vision applications. However, training a CNN model is computation-intensive and time-consuming. Hence training is mainly processed on large clusters of high-performance processors like server CPUs and GPUs. In this paper, we propose an FPGA-based processor design to accelerate the training process of CNNs. We first analyze the operations in all types of CNN layers in the training process. A uniform computation engine design is proposed to efficiently carry out all kinds of operations based on the analysis. Then a scalable accelerator framework is presented that exploits the parallelism further by unrolling the loops in two levels. The proposed accelerator design is demonstrated by implementing a processor on the Xilinx ZU19EG FPGA working at 200 MHz. The evaluation results on a group of CNN models show that our processor is 5.7 to 10.7-fold faster than the software implementations on the Intel Core i5-4440 CPU(@3.10GHz).

References

Page 1

	Year	Citations

Page 1