Publication | Closed Access
15.1 A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing
156
Citations
9
References
2021
Year
Unknown Venue
Scalable In-memory ComputingEngineeringMachine LearningInference AcceleratorHardware AccelerationAdvanced ComputingHigh-performance ArchitectureMany-core ArchitectureComputer EngineeringComputer ArchitectureImc EfficiencyDomain-specific AcceleratorParallel ProgrammingComputer ScienceParallel ComputingDeep LearningDigital ArchitecturesIn-memory Computing
This paper presents a scalable neural-network (NN) inference accelerator in 16nm, based on an array of programmable cores employing mixed-signal In-Memory Computing (IMC), digital Near-Memory Computing (NMC), and localized buffering/control. IMC achieves high energy efficiency and throughput for matrix-vector multiplications (MVMs), which dominate NNs; but, scalability poses numerous challenges, both technologically, going to advanced nodes to maintain gains over digital architectures, and architecturally, for full execution of diverse NNs. Recent demonstrations have explored integrating IMC in programmable processors [1, 2], but have not achieved IMC efficiency and throughput for full executions. The central challenge is drastically different physical design points and associated tradeoffs incurred by IMC compared to digital engines. Namely, IMC substantially increases compute energy efficiency and HW density/parallelism, but retains the overheads of HW virtualization (state and data swapping/buffering/communication across spatial/temporal computation mappings). The demonstrated architecture is co-designed with SW-mapping algorithms (encapsulated in a custom graph compiler), to provide efficiency across a broad range of mapping strategies, to overcome these overheads.
| Year | Citations | |
|---|---|---|
Page 1
Page 1