An efficient implementation of deep convolutional neural networks on a mobile coprocessor

Abstract

In this paper we present a hardware accelerated real-time implementation of deep convolutional neural networks (DCNNs). DCNNs are becoming popular because of advances in the processing capabilities of general purpose processors. However, DCNNs produce hundreds of intermediate results whose constant memory accesses result in inefficient use of general purpose processor hardware. By using an efficient routing strategy, we are able to maximize utilization of available hardware resources but also obtain high performance in real world applications. Our system, consisting of an ARM Cortex-A9 processor and a coprocessor, is capable of a peak performance of 40 G-ops/s while consuming less than 4W of power. The entire platform is in a small form factor which, combined with its high performance at low power consumption makes it feasible to use this hardware in applications like micro-UAVs, surveillance systems and autonomous robots.

References

Page 1

	Year	Citations

Page 1