Eyeriss - Concepedia

TLDR

Deep CNNs deliver high accuracy but incur high computational complexity and energy consumption due to extensive data movement across many filters and channels. The study aims to design a dataflow that enables parallel processing while minimizing data movement to achieve energy‑efficient CNN execution without sacrificing accuracy. The authors introduce a row‑stationary dataflow that maximizes local reuse of filter weights and activations, reduces partial‑sum movement, adapts to various CNN shapes, and leverages local PE storage, inter‑PE communication, and spatial parallelism, evaluated via an energy‑analysis framework under fixed area and parallelism. Experiments on AlexNet demonstrate that the row‑stationary dataflow achieves 1.4×–2.5× lower energy in convolutional layers and at least 1.3× in fully connected layers for batch sizes > 16, and its effectiveness is confirmed on a fabricated chip.

Abstract

Deep convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity. The complexity comes from the need to simultaneously process hundreds of filters and channels in the high-dimensional convolutions, which involve a significant amount of data movement. Although highly-parallel compute paradigms, such as SIMD/SIMT, effectively address the computation requirement to achieve high throughput, energy consumption still remains high as data movement can be more expensive than computation. Accordingly, finding a dataflow that supports parallel processing with minimal data movement cost is crucial to achieving energy-efficient CNN processing without compromising accuracy. In this paper, we present a novel dataflow, called row-stationary (RS), that minimizes data movement energy consumption on a spatial architecture. This is realized by exploiting local data reuse of filter weights and feature map pixels, i.e., activations, in the high-dimensional convolutions, and minimizing data movement of partial sum accumulations. Unlike dataflows used in existing designs, which only reduce certain types of data movement, the proposed RS dataflow can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine (PE) local storage, direct inter-PE communication and spatial parallelism. To evaluate the energy efficiency of the different dataflows, we propose an analysis framework that compares energy cost under the same hardware area and processing parallelism constraints. Experiments using the CNN configurations of AlexNet show that the proposed RS dataflow is more energy efficient than existing dataflows in both convolutional (1.4× to 2.5×) and fully-connected layers (at least 1.3× for batch size larger than 16). The RS dataflow has also been demonstrated on a fabricated chip, which verifies our energy analysis.

References

Page 1

	Year	Citations
Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang, Shaoqing Ren, Image ClassificationDeep Neural NetworksMachine VisionImage AnalysisMachine Learning	2016	214.9K
ImageNet classification with deep convolutional neural networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Communications of the ACM Convolutional Neural NetworkEngineeringMachine LearningNeural NetworkImagenet Classification	2017	75.5K
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan, Andrew Zisserman arXiv (Cornell University) Geometric LearningConvolutional Neural NetworkEngineeringMachine LearningConvolutional Network Depth	2014	75.4K
Going deeper with convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Image ClassificationDeep Neural NetworksImage AnalysisMachine LearningData Science	2015	46.2K
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Convolutional Neural NetworkEngineeringMachine LearningFeature DetectionRich Feature Hierarchies	2014	31.2K
Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair, Geoffrey E. Hinton International Conference on Machine Learning Convolutional Neural NetworkEngineeringMachine LearningAutoencodersRecurrent Neural Network	2010	13.2K
Caffe Yangqing Jia, Evan Shelhamer, Jeff Donahue, Convolutional Neural NetworkMachine VisionMachine LearningData ScienceImage Analysis	2014	11.1K
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Yu-Hsin Chen, Tushar Krishna, Joel Emer, IEEE Journal of Solid-State Circuits Machine VisionRow StationaryEngineeringHardware AccelerationAccelerator Chip	2016	3K
Learning Deep Features for Scene Recognition using Places Database Bolei Zhou, Àgata Lapedriza, Jianxiong Xiao, DSpace@MIT (Massachusetts Institute of Technology)	2014	2.6K
Convolutional networks and applications in vision Yann LeCun, Koray Kavukcuoglu, Clément Farabet Convolutional Neural NetworkMachine VisionImage AnalysisMachine LearningEngineering	2010	2.1K

Page 1