15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips

Abstract

Many Al edge devices require local intelligence to achieve fast computing time (t <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AC</inf> ), high energy efficiency (EF), and privacy. The transfer-learning approach is a popular solution for Al edge chips, wherein data used to re-train the Al in the cloud is used to fine-tune (re-train) a few of the neural layers in edge devices. This enables the dynamic incorporation of data from in-situ environments or private information. Computing-in-memory (CIM) is a promising approach to improve EF for Al edge chips, existing CIM schemes support inference [1]–[5] with forward (FWD) propagation; however, they do not support training, requiring both FWD and backward (BWD) propagation, due to differences in weight-access flow for FWD and BWD propagation. As Fig. 15.2.1 shows, efforts to increase the precision of the input (IN), weight (W), and/or output (OUT) tend to degrade r <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AC</inf> and EF for training operations irrespective of scheme: digital FWD and BWD (DF-DB) or CIM-FWD-digital-BWD (CiMF-DB). This work develops a two-way transpose (TWT) SRAM-CIM macro supporting multibit MAC operations for FWD and BWD propagation with fast r <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AC</inf> and high EF within a compact area. The proposed scheme features (1) A TWT multiply cell (TWT-MC) with a high resistance to process variation; and (2) a small-offset gain-enhancement sense amplifier (SOGE-SA) to tolerate a small read margin. A 28nm 64Kb TWT SRAM-CIM macro was fabricated using a foundry-provided compact 6T-SRAM cell for SRAM-CIM devices supporting both inference and training operations for the first time. This macro also demonstrates the fastest t <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AC</inf> (3.8 – 21ns) and highest EF (7 – 61.1TOPS/w) for MAC operations using 2 – 8b inputs, 4 – 8b weights and 12 − 20b outputs.

References

Page 1

	Year	Citations

Page 1