Publication | Closed Access
Specializing FGPU for Persistent Deep Learning
15
Citations
12
References
2019
Year
Unknown Venue
EngineeringMachine LearningComputer ArchitectureGpu ComputingHardware SecurityOverlay ArchitecturesParallel ComputingComputer EngineeringComputer ScienceDeep LearningNeural Architecture SearchFpga DesignGpu ArchitectureHardware AccelerationProgram AnalysisPdl-fgpu ArchitecturePersistent Deep LearningPerformant Overlay ArchitectureParallel Programming
Overlay architectures are a good way to enable fast development and debug on FPGAs at the expense of potentially limited performance when compared to fully customized FPGA designs. When used in concert with a hand-tuned FPGA solution, a performant overlay architecture can improve the time-to-solution and thus overall productivity of FPGA solutions. In this work, we tune and specialize FGPU, an open source OpenCL-programmable GPU overlay for FPGAs. We demonstrate that our PDL-FGPU architecture is able to maintain the ease-of-programming and generality of a software programmable soft GPU while achieving high performance due to specialization in the persistent deep learning domain. We also propose a easy method to specialize for different domains. PDL-FGPU includes new instructions, along with micro-architecture and compiler enhancements. We evaluate both the FGPU baseline and the proposed PDL-FGPU on a modern high-end Intel Stratix 10 2800 FPGA running a set of persistent DL applications (RNN, GRU, LSTM), as well as general non-DL applications to demonstrate generality. PDL-FGPU requires 1.5-3x more ALMs, 4.4-6.4x more M20ks, and 4.6-10x more DSPs than the FGPU baseline, but improves performance by 55-727x for persistent DL applications with an average 15% degradation on general non-PDL applications. We also demonstrate that the PDL-FGPU is only 4-7x slower than the Nvidia Volta V100 GPU.
| Year | Citations | |
|---|---|---|
Page 1
Page 1