Publication | Closed Access
A 1ynm 1.25V 8Gb, 16Gb/s/pin GDDR6-based Accelerator-in-Memory supporting 1TFLOPS MAC Operation and Various Activation Functions for Deep-Learning Applications
113
Citations
6
References
2022
Year
EngineeringComputer ArchitectureHardware SystemsMulti-channel Memory ArchitectureComputer MemoryComputing SystemsMemory DevicesParallel ComputingMemory BoundDeep-learning ApplicationsComputer EngineeringComputer ScienceAim ArchitectureDeep LearningMemory ArchitectureData MovementHardware AccelerationHigh Bandwidth MemoryVarious Activation FunctionsDomain-specific AcceleratorParallel ProgrammingMac OperationIn-memory Computing
Deep‑learning workloads such as RNNs and MLPs are memory‑bound, and while processing‑in‑memory DRAM can reduce data movement, high‑cost HBM limits its broader adoption. The authors propose a low‑cost GDDR6 accelerator‑in‑memory that accelerates deep‑learning applications by leveraging the GDDR6 interface. They detail the AiM architecture, its DL‑specific command set, and the processing‑unit operations supporting multiple activation functions. The AiM delivers 1 TFLOPS peak throughput with 1 GHz PUs, supports diverse activation functions, and shows strong DL performance at package and system levels.
With advances in deep-neural-network applications the increasingly large data movement through memory channels is becoming inevitable: specifically, RNN and MLP applications are memory bound and the memory is the performance bottleneck [1]. DRAM featuring processing in memory (PIM) significantly reduces data movement [1]–[4], and the system performance is enhanced by the large internal parallel bank bandwidth. Among DRAM-based PIM proposals, [3] is near commercialization, but the required HBM technology may prevent it from being applied to other applications due to its high cost [5]. In this situation, an accelerator-in-memory (AiM) based on GDDR6 may be applicable: it has a relatively low-cost, is compatible with GDDR6 interface, and is designed to accelerate deep-learning (DL) applications. AiM offers a peak throughput of 1 TFLOPS with processing units (PUs) with a speed of 1 GHz utilizing the characteristics of GDDR6 with a speed of 16Gb/s. It can also support many applications as it has various activation functions. This paper first looks at the AiM architecture and the supported command set for DL operations. Next, the DL operations in the PU and supported activation functions are described. Finally, we present evaluation results of DL behavior of AiM at the package and the system level.
| Year | Citations | |
|---|---|---|
Page 1
Page 1