Publication | Closed Access
A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks
305
Citations
15
References
2014
Year
Unknown Venue
Deep Neural NetworksEngineeringHardware AccelerationEdge ComputingHigh-performance ArchitectureComputer EngineeringComputer ArchitectureEmbedded Machine LearningParallel ProgrammingComputer ScienceDeep NetworksDeep Learning ApplicationsParallel ComputingDeep LearningNeural Architecture SearchIn-memory Computing
Deep neural networks are state‑of‑the‑art models for images, videos, audio, and raw data, yet current systems cannot run them in real time with low power consumption. This work introduces nn‑X, a scalable, low‑power coprocessor designed to enable real‑time execution of deep neural networks. nn‑X is built on programmable logic with an array of configurable processing elements called collections that perform convolution, subsampling, and nonlinear operations, and it features four high‑speed DDR3 DMA interfaces and two ARM Cortex‑A9 processors, each port sustaining 950 MB/s full duplex. The coprocessor achieves a peak of 227 G‑ops/s and up to 200 G‑ops/s in deep‑learning workloads while consuming less than 4 W, yielding a 10–100× improvement in performance‑per‑power over conventional mobile and desktop processors.
Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power consumption. In this paper we present nn-X: a scalable, low-power coprocessor for enabling real-time execution of deep neural networks. nn-X is implemented on programmable logic devices and comprises an array of configurable processing elements called collections. These collections perform the most common operations in deep networks: convolution, subsampling and non-linear functions. The nn-X system includes 4 high-speed direct memory access interfaces to DDR3 memory and two ARM Cortex-A9 processors. Each port is capable of a sustained throughput of 950 MB/s in full duplex. nn-X is able to achieve a peak performance of 227 G-ops/s, a measured performance in deep learning applications of up to 200 G-ops/s while consuming less than 4 watts of power. This translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.
| Year | Citations | |
|---|---|---|
Page 1
Page 1