Concepedia

Publication | Closed Access

33.2 A Fully Integrated Analog ReRAM Based 78.4TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing

273

Citations

3

References

2020

Year

TLDR

Non‑volatile memory based computing‑in‑memory offers high‑speed, low‑power MAC for deep learning, yet existing demonstrations lack full integration and parallelism due to IR drop, transient errors, and interface power overhead. The study proposes a sign‑weighted 2T2R array and a low‑power, resolution‑adjustable LPAR‑ADC interface to mitigate these issues in CIM. The authors realize a fully integrated 784‑100‑10 MLP CIM chip with 158.8 kb of analog ReRAMs fabricated in a 130 nm CMOS process, employing the proposed SW‑2T2R array and LPAR‑ADC interface. The chip achieves 94.4 % MNIST accuracy, 77 µs per image inference, and 78.4 TOPS/W peak energy efficiency.

Abstract

Non-volatile memory (NVM) based computing-in-memory (CIM) shows significant advantages in handling deep learning tasks for artificial intelligence (AI) applications. To overcome the decreasing cost effectiveness of transistor scaling and the intrinsic inefficiency of data-shuttling in the von-Neumann architecture, CIM is proposed to realize high-speed and low-power system with parallel multiplication accumulation (MAC) computing [1] [2]. However, current demonstrations are mainly based on single macro and present limited computing parallelism. Realizing a fully-integrated CIM chip with a complete neural network model is still missing. The major challenges lie in: (1) The IR drop and transient errors when carrying out MAC operations in non-volatile memory arrays decrease the computing accuracy and further limit the parallelism; (2) The inefficiency of the interface blocks between different arrays due to the power overhead of the A/D and D/A converters (shown in Fig. 33.2.1). To address these challenges, this work proposes: (1) A sign-weighted 2T2R (SW-2T2R) array to reduce IR drop by decreasing the accumulative SL current (ISL), and eventually boost the computing parallelism; (2) a low-power interface design with resolution-adjustable LPAR-ADC to realize flexible tradeoff between system accuracy and power consumption. In this manner, this work implements a fully-integrated 784-100-10 MLP model on an integrated CIM chip with158.8kb analog ReRAMs. This chip realizes high recognition accuracy (94.4%) on MNIST database, high inference speed (77 µs/lmage), and 78.4 TOPS/W peak energy efficiency. The CMOS circuits are fabricated in a 130nm process.

References

YearCitations

Page 1