Concepedia

Publication | Open Access

FloatPIM

187

Citations

61

References

2019

Year

TLDR

Processing In‑Memory (PIM) can accelerate CNN inference, but current architectures lack floating‑point precision and rely on analog circuits that do not scale well with multi‑bit NVM. This paper proposes FloatPIM, a fully‑digital scalable PIM architecture that accelerates CNN training and testing. FloatPIM is a fully‑digital design that natively supports floating‑point representation, enables rapid inter‑block communication to reduce data movement, and was evaluated on ImageNet with large‑scale neural networks. FloatPIM achieves up to 5.1% higher accuracy, training speedups of 303.2× and 48.6× and energy savings versus a GTX 1080 GPU, and testing speedups of 324.8× and 297.9× versus a GPU‑based PIM accelerator.

Abstract

Processing In-Memory (PIM) has shown a great potential to accelerate inference tasks of Convolutional Neural Network (CNN). However, existing PIM architectures do not support high precision computation, e.g., in floating point precision, which is essential for training accurate CNN models. In addition, most of the existing PIM approaches require analog/mixed-signal circuits, which do not scale, exploiting insufficiently reliable multi-bit Non-Volatile Memory (NVM). In this paper, we propose FloatPIM, a fully-digital scalable PIM architecture that accelerates CNN in both training and testing phases. FloatPIM natively supports floating-point representation, thus enabling accurate CNN training. FloatPIM also enables fast communication between neighboring memory blocks to reduce internal data movement of the PIM architecture. We evaluate the efficiency of FloatPIM on ImageNet dataset using popular large-scale neural networks. Our evaluation shows that FloatPIM supporting floating point precision can achieve up to 5.1% higher classification accuracy as compared to existing PIM architectures with limited fixed-point precision. FloatPIM training is on average 303.2× and 48.6× (4.3× and 15.8×) faster and more energy efficient as compared to GTX 1080 GPU (PipeLayer [1] PIM accelerator). For testing, FloatPIM also provides 324.8× and 297.9× (6.3× and 21.6×) speedup and energy efficiency as compared to GPU (ISAAC [2] PIM accelerator) respectively.

References

YearCitations

Page 1