A 1ynm 1.25V 8Gb 16Gb/s/Pin GDDR6-Based Accelerator-in-Memory Supporting 1TFLOPS MAC Operation and Various Activation Functions for Deep Learning Application

Abstract

In this article, a 1.25-V 8-Gb, 16-Gb/s/pin GDDR6-based accelerator-in-memory (AiM) is presented. A dedicated command (CMD) set for deep learning (DL) is introduced to minimize latency when switching operation modes, and a bank-wide mantissa shift (BWMS) scheme is adopted to minimize calculation delay time, current consumption, and circuit area during multiply-accumulate (MAC) operation. By storing the lookup table (LUT) in the reserved word line in the dynamic random access memory (DRAM) bank cell, it is possible to support various activation functions (AFs), such as Gaussian error linear unit (GELU), sigmoid, and Tanh as well as rectified linear unit (ReLU) and Leaky ReLU. Performance evaluation was conducted by measuring the fabricated chip in ATE and a self-manufactured field-programmable gate array (FPGA)-based system. In the ATE-level evaluation, it operates at 16 Gbps up to a voltage as low as 1.10 V. When evaluated by GEMV and MNIST in the FPGA-based system, it was confirmed that the performance gains of 7.5–10.5 times were possible compared to the HBM2-based or GDDR6-based systems.

References

Page 1

	Year	Citations

Page 1