Publication | Closed Access
An efficient implementation of deep convolutional neural networks on a mobile coprocessor
43
Citations
15
References
2014
Year
Unknown Venue
Convolutional Neural NetworkEngineeringMachine LearningGeneral Purpose ProcessorsHardware AlgorithmComputer ArchitectureReal-time ImplementationEfficient ImplementationEmbedded Machine LearningMobile CoprocessorRobot LearningMachine VisionComputer EngineeringComputer ScienceDeep LearningNeural Architecture SearchComputer VisionDeep Neural NetworksHardware AccelerationEdge ComputingConstant Memory Accesses
In this paper we present a hardware accelerated real-time implementation of deep convolutional neural networks (DCNNs). DCNNs are becoming popular because of advances in the processing capabilities of general purpose processors. However, DCNNs produce hundreds of intermediate results whose constant memory accesses result in inefficient use of general purpose processor hardware. By using an efficient routing strategy, we are able to maximize utilization of available hardware resources but also obtain high performance in real world applications. Our system, consisting of an ARM Cortex-A9 processor and a coprocessor, is capable of a peak performance of 40 G-ops/s while consuming less than 4W of power. The entire platform is in a small form factor which, combined with its high performance at low power consumption makes it feasible to use this hardware in applications like micro-UAVs, surveillance systems and autonomous robots.
| Year | Citations | |
|---|---|---|
Page 1
Page 1