Publication | Closed Access
Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC
172
Citations
11
References
2016
Year
Unknown Venue
Artificial IntelligenceAnalytics ServersMachine LearningData ScienceEngineeringHardware AccelerationAdvanced ComputingComputer EngineeringComputer ArchitectureRecurrent Neural NetworksParallel ProgrammingComputer ScienceGated Recurrent UnitParallel ComputingDeep LearningNeural Architecture SearchFpga DesignRecurrent Neural Network
Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix vector multiplications (SGEMVs) that are the majority of the computation in GRU. Then, we study the opportunities to accelerate the remaining SGEMVs using FPGAs, in comparison to 14-nm ASIC, GPU, and multi-core CPU. Results show that FPGA provides superior performance/Watt over CPU and GPU because FPGA's on-chip BRAMs, hard DSPs, and reconfigurable fabric allow for efficiently extracting fine-grained parallelisms from small/medium size matrices used by GRU. Moreover, newer FPGAs with more DSPs, on-chip BRAMs, and higher frequency have the potential to narrow the FPGA-ASIC efficiency gap.
| Year | Citations | |
|---|---|---|
Page 1
Page 1