Concepedia

Abstract

SRAM-based computing in memory (SRAM-CIM) is an attractive approach to improve the energy efficiency (EF) of edge-AI devices performing multiply-and-accumulate (MAC) operations. SRAM-CIM with a large memory capacity enhances EF by reducing data movement between system memory and compute functions. High-precision inputs (IN), weights (W) and outputs (OUT) are essential to deliver sufficient inference accuracy using SRAM-CIM. These devices must also enable a short compute latency <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\mathrm{t}_{\text{AC}})$</tex> and a high multiply-accumulate throughput (TP) to achieve a fast system-level response time for an inference task. However, previous SRAM-CIMs using voltage-mode in-memory computing (VM-IMC) [1], [3]–[6] or time-domain near-memory computing (TD-NMC) [2] are unable to simultaneously achieve high EF, high readout accuracy, and a short <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{t}_{\text{AC}}$</tex> for high-precision MAC operations; as increasing IN-W-OUT precision and/or the number of accumulations (ACCU) leads to (1) an exponential decrease in the signal margin that causes a readout accuracy degradation for VM-IMC schemes, and (2) an increased maximum MAC value (MACV) and memory capacity (increased parasitic load), which increases <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{t}_{\text{AC}}$</tex> and the energy consumption for VM-IMC and TD-NMC schemes, as Fig. 11.8.1 shows. This work presents a time-domain in-memory-computing SRAM-CI M structure: (1) It uses a time-domain incremental-accumulation (TDIA) scheme to enable MAC operations with a high ACCU and a consistently large signal-margin across MACVs. (2) It also uses a dynamic differential-reference time-to-digital convert <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">er</sup> (D2REF-TDC) that is based on a software-hardware co-design, which reduces read energy consumption. A 28nm 1 Mb SRAM-CIM macro fabricated using foundry-provided compact 6T-SRAM cells achieves MAC operations with 64 accumulations of an 8b input and an 8b weight and a near-full precision output (22b). The proposed macro also achieves the shortest reported <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{t}_{\text{AC}}$</tex> and a <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$0.3\text{ns}/\mathrm{b} \ \mathrm{t}_{\text{AC}}$</tex> per output-precision <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\mathrm{t}_{\mathrm{A}\mathrm{C}\mathrm{p}\text{OUT}})$</tex> with a 1241.2GOPS TP and a high 37.01TOP/W EF for 8b-MAC operations.

References

YearCitations

Page 1