A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

TLDR

Deep neural networks are state‑of‑the‑art models for images, videos, audio, and raw data, yet current systems cannot run them in real time with low power consumption. This work introduces nn‑X, a scalable, low‑power coprocessor designed to enable real‑time execution of deep neural networks. nn‑X is built on programmable logic with an array of configurable processing elements called collections that perform convolution, subsampling, and nonlinear operations, and it features four high‑speed DDR3 DMA interfaces and two ARM Cortex‑A9 processors, each port sustaining 950 MB/s full duplex. The coprocessor achieves a peak of 227 G‑ops/s and up to 200 G‑ops/s in deep‑learning workloads while consuming less than 4 W, yielding a 10–100× improvement in performance‑per‑power over conventional mobile and desktop processors.

Abstract

Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power consumption. In this paper we present nn-X: a scalable, low-power coprocessor for enabling real-time execution of deep neural networks. nn-X is implemented on programmable logic devices and comprises an array of configurable processing elements called collections. These collections perform the most common operations in deep networks: convolution, subsampling and non-linear functions. The nn-X system includes 4 high-speed direct memory access interfaces to DDR3 memory and two ARM Cortex-A9 processors. Each port is capable of a sustained throughput of 950 MB/s in full duplex. nn-X is able to achieve a peak performance of 227 G-ops/s, a measured performance in deep learning applications of up to 200 G-ops/s while consuming less than 4 watts of power. This translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.

References

Page 1

	Year	Citations

Page 1