Publication | Closed Access
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
3K
Citations
34
References
2016
Year
Machine VisionRow StationaryEngineeringHardware AccelerationAccelerator ChipHigh ThroughputHardware AlgorithmComputer EngineeringComputer ArchitectureDomain-specific AcceleratorComputer ScienceReconfigurable ArchitectureParallel ComputingDeep LearningNeural Architecture SearchEnergy-efficient Reconfigurable AcceleratorModel CompressionComputer Vision
Deep convolutional neural networks are widely used but their large data movement demands high energy and limits throughput, motivating efficient accelerators such as Eyeriss. The goal is to minimize data‑movement energy for any CNN shape to achieve high throughput and energy efficiency. Eyeriss implements a row‑stationary dataflow on a 168‑PE spatial array, reconfiguring the mapping to maximize local data reuse, and adds compression and data gating to further reduce energy. Eyeriss achieves 35 fps on AlexNet with 0.0029 DRAM accesses per MAC at 278 mW, and 0.7 fps on VGG‑16 with 0.0035 accesses/MAC at 236 mW.
Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on-chip and off-chip that is more energy-consuming than computation. Minimizing data movement energy cost for any CNN shape, therefore, is the key to high throughput and energy efficiency. Eyeriss achieves these goals by using a proposed processing dataflow, called row stationary (RS), on a spatial architecture with 168 processing elements. RS dataflow reconfigures the computation mapping of a given shape, which optimizes energy efficiency by maximally reusing data locally to reduce expensive data movement, such as DRAM accesses. Compression and data gating are also applied to further improve energy efficiency. Eyeriss processes the convolutional layers at 35 frames/s and 0.0029 DRAM access/multiply and accumulation (MAC) for AlexNet at 278 mW (batch size N = 4), and 0.7 frames/s and 0.0035 DRAM access/MAC for VGG-16 at 236 mW (N = 3).
| Year | Citations | |
|---|---|---|
2016 | 214.9K | |
2017 | 75.5K | |
2014 | 75.4K | |
1998 | 56.5K | |
2015 | 46.2K | |
2015 | 39.5K | |
2014 | 31.2K | |
2016 | 15.5K | |
2010 | 13.2K | |
2014 | 11.1K |
Page 1
Page 1