Publication | Closed Access
A 1ynm 1.25V 8Gb 16Gb/s/Pin GDDR6-Based Accelerator-in-Memory Supporting 1TFLOPS MAC Operation and Various Activation Functions for Deep Learning Application
34
Citations
10
References
2022
Year
EngineeringCalculation Delay TimeEmerging Memory TechnologyComputer ArchitectureHardware SystemsMulti-channel Memory ArchitectureComputer MemoryDeep Learning ApplicationMemory DevicesElectrical EngineeringComputer EngineeringDedicated CommandComputer ScienceDeep LearningMicroelectronicsMemory ReliabilityMemory ArchitectureHardware AccelerationHigh Bandwidth MemoryVarious Activation FunctionsDomain-specific AcceleratorMac OperationIn-memory Computing
In this article, a 1.25-V 8-Gb, 16-Gb/s/pin GDDR6-based accelerator-in-memory (AiM) is presented. A dedicated command (CMD) set for deep learning (DL) is introduced to minimize latency when switching operation modes, and a bank-wide mantissa shift (BWMS) scheme is adopted to minimize calculation delay time, current consumption, and circuit area during multiply-accumulate (MAC) operation. By storing the lookup table (LUT) in the reserved word line in the dynamic random access memory (DRAM) bank cell, it is possible to support various activation functions (AFs), such as Gaussian error linear unit (GELU), sigmoid, and Tanh as well as rectified linear unit (ReLU) and Leaky ReLU. Performance evaluation was conducted by measuring the fabricated chip in ATE and a self-manufactured field-programmable gate array (FPGA)-based system. In the ATE-level evaluation, it operates at 16 Gbps up to a voltage as low as 1.10 V. When evaluated by GEMV and MNIST in the FPGA-based system, it was confirmed that the performance gains of 7.5–10.5 times were possible compared to the HBM2-based or GDDR6-based systems.
| Year | Citations | |
|---|---|---|
Page 1
Page 1