Publication | Closed Access
A High-Speed and Low-Complexity Architecture for Softmax Function in Deep Learning
185
Citations
14
References
2018
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningComputer ArchitectureHardware SystemsRecurrent Neural NetworkSparse Neural NetworkComputing SystemsSoftmax FunctionComputer EngineeringComputer ScienceDeep LearningNeural Architecture SearchFpga DesignExponentiation UnitsModel CompressionLow-complexity ArchitectureDeep Neural NetworksHardware AccelerationVlsi Architecture
Recently, significant improvement has been achieved for hardware architecture design of deep neural networks (DNNs). However, the hardware implementation of one widely used softmax function in DNNs has not been much investigated, which involves expensive division and exponentiation units. This paper performs an efficient hardware implementation of softmax function. Mathematical transformations and linear fitting are used to simplify this function. Multiple algorithmic strength reduction strategies and fast addition methods are employed to optimize the architecture. By using these techniques, complicated logic units like multipliers are eliminated and the memory consumption is largely reduced while the accuracy loss is negligible. The proposed design is coded using hardware description language (HDL) and synthesized under the TSMC 28-nm CMOS technology. Synthesis results show that the architecture achieves a throughput of 6.976 G/s for 8-bit input data. The power efficiency of 463.04 Gb/(mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> · mW) is achieved and it costs only 0.015mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> area resources. To the best of our knowledge, this is the first work on efficient hardware implementation for softmax in open literature.
| Year | Citations | |
|---|---|---|
Page 1
Page 1