Publication | Closed Access
McDRAM: Low Latency and Energy-Efficient Matrix Computations in DRAM
82
Citations
15
References
2018
Year
Matrix ComputationMachine LearningNovel Memory ArchitectureEngineeringHardware AccelerationMulti-channel Memory ArchitectureComputer EngineeringComputer ArchitectureComputing SystemsAdaptive MemoryMemory DevicesComputer ScienceLow LatencyChip LayoutParallel ComputingMemory ArchitectureIn-memory ComputingGpu Computing
We propose a novel memory architecture for in-memory computation called McDRAM, where DRAM dies are equipped with a large number of multiply accumulate (MAC) units to perform matrix computation for neural networks. By exploiting high internal memory bandwidth and reducing offchip memory accesses, McDRAM realizes both low latency and energy efficient computation. In our experiments, we obtained the chip layout based on the state-of-the-art memory, LPDDR4 where McDRAM is equipped with 2048 MACs in a single chip package with a small area overhead (4.7%). Compared with the state-ofthe-art accelerator, TPU and the power-efficient GPU, Nvidia P4, McDRAM offers 9.5× and 14.4× speedup, respectively, in the case that the large-scale MLPs and RNNs adopt the batch size of 1. McDRAM also gives 2.1× and 3.7× better computational efficiency in TOPS/W than TPU and P4, respectively, for the large batches.
| Year | Citations | |
|---|---|---|
Page 1
Page 1