Publication | Closed Access
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster
217
Citations
19
References
2016
Year
Unknown Venue
Cluster ComputingEngineeringHardware AccelerationHardware AlgorithmDifferent Fpga BoardsComputer EngineeringComputer ArchitectureEnergy-efficient Cnn ImplementationFpga-based Cnn AcceleratorsParallel ProgrammingComputer ScienceReconfigurable ArchitectureParallel ComputingDeep LearningFpga DesignFpga Boards
Recently, FPGA-based CNN accelerators have demonstrated superior energy efficiency compared to high-performance devices like GPGPUs. However, due to the constrained on-chip resource and many other factors, single-board FPGA designs may have difficulties in achieving optimal energy efficiency. In this paper we present a deeply pipelined multi-FPGA architecture that expands the design space for optimal performance and energy efficiency. A dynamic programming algorithm is proposed to map the CNN computing layers efficiently to different FPGA boards. To demonstrate the potential of the architecture, we built a prototype system with seven FPGA boards connected with high-speed serial links. The experimental results on AlexNet and VGG-16 show that the prototype can achieve up to 21x and 2x energy efficiency compared to optimized multi-core CPU and GPU implementations, respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1