Concepedia

TLDR

Processing‑in‑memory has long been studied but remains commercially unadopted; recent 3D‑stacking advances, notably Micron’s Hybrid Memory Cube, have revived interest, yet no detailed analysis exists of a killer application that can exploit Near‑Data Computing. The study investigates how to enable efficient Near‑Data Computing for in‑memory MapReduce workloads by designing and evaluating low‑EPI cores, long daisy‑chain memory, and dynamic core/SerDes activation. The proposed NDC architecture places lightweight processing cores on a non‑memory die within a 3D‑stacked memory package, allowing Map operations to execute with efficient memory access and avoiding the bandwidth wall. Compared to a heavily optimized MapReduce baseline, the NDC design achieves up to 15‑fold speedup and 18‑fold energy savings.

Abstract

While Processing-in-Memory has been investigated for decades, it has not been embraced commercially. A number of emerging technologies have renewed interest in this topic. In particular, the emergence of 3D stacking and the imminent release of Micron's Hybrid Memory Cube device have made it more practical to move computation near memory. However, the literature is missing a detailed analysis of a killer application that can leverage a Near Data Computing (NDC) architecture. This paper focuses on in-memory MapReduce workloads that are commercially important and are especially suitable for NDC because of their embarrassing parallelism and largely localized memory accesses. The NDC architecture incorporates several simple processing cores on a separate, non-memory die in a 3D-stacked memory package; these cores can perform Map operations with efficient memory access and without hitting the bandwidth wall. This paper describes and evaluates a number of key elements necessary in realizing efficient NDC operation: (i) low-EPI cores, (ii) long daisy chains of memory devices, (iii) the dynamic activation of cores and SerDes links. Compared to a baseline that is heavily optimized for MapReduce execution, the NDC design yields up to 15X reduction in execution time and 18X reduction in system energy.

References

YearCitations

Page 1