McDRAM: Low Latency and Energy-Efficient Matrix Computations in DRAM

Abstract

We propose a novel memory architecture for in-memory computation called McDRAM, where DRAM dies are equipped with a large number of multiply accumulate (MAC) units to perform matrix computation for neural networks. By exploiting high internal memory bandwidth and reducing offchip memory accesses, McDRAM realizes both low latency and energy efficient computation. In our experiments, we obtained the chip layout based on the state-of-the-art memory, LPDDR4 where McDRAM is equipped with 2048 MACs in a single chip package with a small area overhead (4.7%). Compared with the state-ofthe-art accelerator, TPU and the power-efficient GPU, Nvidia P4, McDRAM offers 9.5× and 14.4× speedup, respectively, in the case that the large-scale MLPs and RNNs adopt the batch size of 1. McDRAM also gives 2.1× and 3.7× better computational efficiency in TOPS/W than TPU and P4, respectively, for the large batches.

References

Page 1

	Year	Citations

Page 1