15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications

TLDR

Prior CIM designs trade area, noise margin, process variation, and weight precision, and single‑bit precision limits scalability to high‑precision neural networks. The study aims to develop a compute‑in‑memory macro that parallelizes multiply‑and‑average operations and cuts off‑chip weight traffic to lower energy and latency on edge devices. The macro employs 6T SRAM for minimal area but limited parallelism, while 10T and twin‑8T cells improve noise margin at the cost of over twice the area, balancing stability and scalability.

Abstract

Compute-in-memory (CIM) parallelizes multiply-and-average (MAV) computations and reduces off-chip weight access to reduce energy consumption and latency, specifically for Al edge devices. Prior CIM approaches demonstrated tradeoffs for area, noise margin, process variation and weight precision. 6T SRAM [1]–[3] provides the smallest cell area for CIM, but cell stability limits the number of activated cells, resulting in low parallelization. 10T and twin-8T [4]–[5] isolate the read/write paths for noise margin improvement, however both require special design of the bit cell using logic layout rules, resulting in over a 2x area overhead compared to foundry yield-optimized 6T SRAMs. Furthermore, single-bit precision of weights, in prior work [1]–[4], cannot meet the requirement for high-precision operations and scalability for large neural networks.

References

Page 1

	Year	Citations

Page 1