Concepedia

TLDR

Deep‑learning workloads such as RNNs and MLPs are memory‑bound, and while processing‑in‑memory DRAM can reduce data movement, high‑cost HBM limits its broader adoption. The authors propose a low‑cost GDDR6 accelerator‑in‑memory that accelerates deep‑learning applications by leveraging the GDDR6 interface. They detail the AiM architecture, its DL‑specific command set, and the processing‑unit operations supporting multiple activation functions. The AiM delivers 1 TFLOPS peak throughput with 1 GHz PUs, supports diverse activation functions, and shows strong DL performance at package and system levels.

Abstract

With advances in deep-neural-network applications the increasingly large data movement through memory channels is becoming inevitable: specifically, RNN and MLP applications are memory bound and the memory is the performance bottleneck [1]. DRAM featuring processing in memory (PIM) significantly reduces data movement [1]–[4], and the system performance is enhanced by the large internal parallel bank bandwidth. Among DRAM-based PIM proposals, [3] is near commercialization, but the required HBM technology may prevent it from being applied to other applications due to its high cost [5]. In this situation, an accelerator-in-memory (AiM) based on GDDR6 may be applicable: it has a relatively low-cost, is compatible with GDDR6 interface, and is designed to accelerate deep-learning (DL) applications. AiM offers a peak throughput of 1 TFLOPS with processing units (PUs) with a speed of 1 GHz utilizing the characteristics of GDDR6 with a speed of 16Gb/s. It can also support many applications as it has various activation functions. This paper first looks at the AiM architecture and the supported command set for DL operations. Next, the DL operations in the PU and supported activation functions are described. Finally, we present evaluation results of DL behavior of AiM at the package and the system level.

References

YearCitations

Page 1