Concepedia

Abstract

Convolutional neural networks (CNNs) are deployed in a wide range of image recognition, scene segmentation and object detection applications. Achieving state of the art accuracy in CNNs often results in large models and complex topologies that require significant compute resources to complete in a timely manner. Binarised neural networks (BNNs) have been proposed as an optimised variant of CNNs, which constrain the weights and activations to +1 or -1 and thus offer compact models and lower computational complexity per operation. This paper presents a high performance BNN accelerator on the Intel®Xeon+FPGA™ platform. The proposed accelerator is designed to take advantage of the Xeon+FPGA system in a way that a specialised FPGA architecture can be targeted for the most compute intensive parts of the BNN whilst other parts of the topology can be handled by the Xeon™ CPU. The implementation is evaluated by comparing the raw compute performance and energy efficiency for key layers in standard CNN topologies against an Nvidia Titan X Pascal GPU and other published FPGA BNN accelerators. The results show that our single-package integrated Arria™ 10 FPGA accelerator coupled with a high-end Xeon CPU can offer comparable performance and better energy efficiency than a high-end discrete Titan X GPU card. In addition, our solution delivers the best performance compared to previous BNN FPGA implementations.

References

YearCitations

Page 1