Design of Computing-in-Memory (CIM) with Vertical Split-Gate Flash Memory for Deep Neural Network (DNN) Inference Accelerator

Abstract

Computing-In-Memory (CIM) using Flash memory is a potential solution to support a heavy-weight DNN inference accelerator for edge computing applications. Flash memory provides the best high-density and low-cost non-volatile memory solution to store the weights, while CIM functions of Flash memory can compute AI neural network calculations inside the memory chip. Our analysis indicates that Flash CIM can save data movements by ~85% as compared with the conventional Von-Neumann architecture. In this work, we propose a detail device and design co-optimizations to realize Flash CIM, using a novel vertical split-gate Flash device. Our device supports low-voltage (<; 1V) read at WL's and BL's, tight and tunable cell current (Icell) ranging from 150nA to 1.5uA, extremely large Icell ON/OFF ratio ~ 7 orders, small RTN noise and negligible read disturb to provide a high-performance and highly-reliable CIM solution.