Concepedia

Publication | Closed Access

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

3K

Citations

34

References

2016

Year

TLDR

Deep convolutional neural networks are widely used but their large data movement demands high energy and limits throughput, motivating efficient accelerators such as Eyeriss. The goal is to minimize data‑movement energy for any CNN shape to achieve high throughput and energy efficiency. Eyeriss implements a row‑stationary dataflow on a 168‑PE spatial array, reconfiguring the mapping to maximize local data reuse, and adds compression and data gating to further reduce energy. Eyeriss achieves 35 fps on AlexNet with 0.0029 DRAM accesses per MAC at 278 mW, and 0.7 fps on VGG‑16 with 0.0035 accesses/MAC at 236 mW.

Abstract

Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on-chip and off-chip that is more energy-consuming than computation. Minimizing data movement energy cost for any CNN shape, therefore, is the key to high throughput and energy efficiency. Eyeriss achieves these goals by using a proposed processing dataflow, called row stationary (RS), on a spatial architecture with 168 processing elements. RS dataflow reconfigures the computation mapping of a given shape, which optimizes energy efficiency by maximally reusing data locally to reduce expensive data movement, such as DRAM accesses. Compression and data gating are also applied to further improve energy efficiency. Eyeriss processes the convolutional layers at 35 frames/s and 0.0029 DRAM access/multiply and accumulation (MAC) for AlexNet at 278 mW (batch size N = 4), and 0.7 frames/s and 0.0035 DRAM access/MAC for VGG-16 at 236 mW (N = 3).

References

YearCitations

2016

214.9K

2017

75.5K

2014

75.4K

1998

56.5K

2015

46.2K

2015

39.5K

2014

31.2K

2016

15.5K

2010

13.2K

2014

11.1K

Page 1