Publication | Closed Access
Computation offloading for fast CNN inference in edge computing
17
Citations
16
References
2019
Year
Unknown Venue
Convolutional Neural NetworkEngineeringMachine LearningData ScienceEdge DeviceEdge ComputingMobile DevicesCloud ComputingComputing SystemsComputer EngineeringMulti-access Edge ComputingCnn InferenceComputer ScienceMobile ComputingDeep LearningEdge Architecture
Convolutional Neural Network (CNN) is an important computation model for many popular mobile artificial intelligence applications. However, CNN inference, i.e., processing input data based on well-trained CNN models, is computation-intensive and incurs a heavy overhead for mobile devices with limited hardware resources. In this paper, we propose to offload a portion of CNN inference computation of mobile devices to the edge computing site. We find that batching tasks on GPU can significantly reduce average inference time on GPUs. Based on this important observation, we design an algorithm that jointly considers the tasks on all mobile devices and the corresponding batching benefit on the edge site, different from existing work on the collaborative inference that let each mobile device independently make offloading decisions. Furthermore, an online algorithm is proposed to handle the scenario that CNN inference tasks arrive at different time. It can significantly reduce average inference time without the knowledge of future task arrivals. Finally, extensive simulations are conducted to evaluate the performance of our proposed algorithms and the results show they outperform existing work under different settings.
| Year | Citations | |
|---|---|---|
Page 1
Page 1