Publication | Closed Access
A Configurable Floating-Point Multiple-Precision Processing Element for HPC and AI Converged Computing
33
Citations
30
References
2021
Year
Artificial IntelligenceMultiplication ArrayEngineeringHardware AlgorithmComputer ArchitectureSupercomputer ArchitectureHardware SystemsApproximate ComputingAi Converged ComputingParallel ComputingElectrical EngineeringConfigurable AcceleratorsComputer EngineeringHardware OptimizationComputer ScienceReconfigurable ArchitectureFpga DesignCo-processorsHardware AccelerationDomain-specific AcceleratorParallel Programming
There is an emerging need to design configurable accelerators for the high-performance computing (HPC) and artificial intelligence (AI) applications in different precisions. Thus, the floating-point (FP) processing element (PE), which is the key basic unit of the accelerators, is necessary to meet multiple-precision requirements with energy-efficient operations. However, the existing structures by using high-precision-split (HPS) and low-precision-combination (LPC) methods result in low utilization rate of the multiplication array and long multiterm processing period, respectively. In this article, a configurable FP multiple-precision PE design is proposed with the LPC structure. Half precision, single precision, and double precision are supported. The 100% multiplier utilization rate of the multiplication array for all precisions is achieved with improved speed in the comparison and summation process. The proposed design is realized in a 28-nm process with 1.429-GHz clock frequency. Compared with the existing multiple-precision FP methods, the proposed structure achieves 63% and 88% area-saving performance for FP16 and FP32 operations, respectively. The <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$4\times $ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$20\times $ </tex-math></inline-formula> maximum throughput rates are obtained when compared with fixed FP32 and FP64 operations. Compared with the previous multiple-precision PEs, the proposed one achieves the best energy-efficiency performance with 975.13 GFLOPS/W.
| Year | Citations | |
|---|---|---|
Page 1
Page 1