A 512Gb In-Memory-Computing 3D-NAND Flash Supporting Similar-Vector-Matching Operations on Edge-AI Devices

Abstract

Similar-vector-matching (SVM) applications for unstructured vectors that are generated via machine-learning methods, such as face search and audio texturing from a dataset for access control systems, are frequently operated on edge devices, as depicted in Fig. 7.5.1. The SVM operation [1]–[3] typically comprises of (1) in the offline phase, the extracted raw vectors <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\vec{V}_{RAW})$</tex> are obtained from machine learning approaches and stored in non-volatile NAND Flash; (2) in the online phase, a processor request <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\vec{V}_{RAW}$</tex> data from edge storage; (3) the entire <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\vec{V}_{RAW}$</tex> dataset is moved from storage to the processor; (4) the processor scores the similarities between an input query and each candidate <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\vec{V}_{RAW}$</tex> and provide a best match. However, the large-amount data movement across the memory hierarchy consumes a large amount of energy <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\mathrm{E}_{\mathrm{M}\mathrm{E}\mathrm{M}})$</tex> , while also resulting in a long search-latency <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\mathrm{t}_{\mathrm{S}\mathrm{R}})$</tex> for SVM operations. The entire <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\vec{V}_{RAW}$</tex> dataset includes a large amount of invalid data. To reducing data movement will lower <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{E}_{\mathrm{M}\mathrm{E}\mathrm{M}}$</tex> and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{t}_{\mathrm{S}\mathrm{R}}$</tex> ; edge storage with nonvolatile computing-in-memory (nvCIM) support for similarity computation (vector-vector multiplication (VVM) for cosine similarity) is required to reduce the <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\vec{V}_{RAW}$</tex> dataset to a small candidate size. However, there are challenges in leveraging 3D NAND for VVM operations: (1) a low-readout accuracy when there is a large amount of current summation by using the wide range <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{V}_{\mathrm{t}}$</tex> -level of cells ( <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{e}.\mathrm{g}., 1^{\mathrm{s}\mathrm{t}}$</tex> to <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$4^{\mathrm{t}\mathrm{h}}\mathrm{V}_{\mathrm{t}}$</tex> -level of TLC cell) and (2) the large readout power consumption required to achieve a constant settling time against a wide range of summation currents for the possible data-patterns.

References

Page 1

	Year	Citations

Page 1