Concepedia

Publication | Closed Access

NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules

271

Citations

71

References

2015

Year

TLDR

Data transfer across the processor‑memory hierarchy consumes a growing share of system energy, a trend that has intensified with technology scaling. This work introduces near‑DRAM acceleration (NDA) architectures that place accelerators 3‑D stacked on commodity DRAM devices within off‑chip memory modules. NDA routes most data through high‑bandwidth, low‑energy 3‑D interconnects between the stacked accelerators and DRAM, requiring only TSV insertion and minimal changes to standard memory modules. Experiments show that NDA reduces data‑transfer energy by 46 % while boosting performance 1.67× compared to integrating the same accelerator logic in the processor, facilitating easier adoption.

Abstract

Energy consumed for transferring data across the processor memory hierarchy constitutes a large fraction of total system energy consumption, and this fraction has steadily increased with technology scaling. In this paper, we propose near-DRAM acceleration (NDA) architectures, which process data using accelerators 3D-stacked on DRAM devices comprising off-chip main memory modules. NDA transfers most data through high-bandwidth and low-energy 3D interconnects between accelerators and DRAM devices instead of low-bandwidth and high-energy off-chip interconnects between a processor and DRAM devices, substantially reducing energy consumption and improving performance. Unlike previous near-memory processing architectures, NDA is built upon commodity DRAM devices; apart from inserting through-silicon vias (TSVs) to 3D-interconnect DRAM devices and accelerators, NDA requires minimal changes to the commodity DRAM device and standard memory module architectures. This allows NDA to be more easily adopted in both existing and emerging systems. Our experiments demonstrate that, on average, our NDA-based system consumes 46% (68%) lower (data transfer) energy at 1.67× higher performance than a system that integrates the same accelerator logic within the processor itself.

References

YearCitations

Page 1