Publication | Closed Access
Model Parallelism Optimization for Distributed Inference via Decoupled CNN Structure
36
Citations
31
References
2020
Year
Artificial IntelligenceConvolutional Neural NetworkGroup ConvolutionMachine LearningEngineeringDistributed AlgorithmsData ScienceModel Parallelism OptimizationParallel ComputingComputer EngineeringLarge Scale OptimizationCnn InferenceComputer ScienceDeep LearningNeural Architecture SearchModel CompressionModel ParallelismParallel LearningParallel Programming
It is promising to deploy CNN inference on local end-user devices for high-accuracy and time-sensitive applications. Model parallelism has the potential to provide high throughput and low latency in distributed CNN inference. However, it is non-trivial to use model parallelism as the original CNN model is inherently tightly-coupled structure. In this article, we propose DeCNN, a more effective inference approach that uses decoupled CNN structure to optimize model parallelism for distributed inference on end-user devices. DeCNN is novel consisting of three schemes. Scheme-1 is structure-level optimization. It exploits group convolution and channel shuffle to decouple the original CNN structure for model parallelism. Scheme-2 is partition-level optimization. It is based on channel group to partition the convolutional layers, and then leverages input-based method to partition the fully connected layers, further exposing high degree of parallelism. Scheme-3 is communication-level optimization. It uses inter-sample parallelism to hide communications for better performance and robustness, especially in the weak network connections. We use ImageNet classification task to evaluate the effectiveness of DeCNN on a distributed multi-ARM platform. Notably, when using the number of devices from 1 to 4, DeCNN can accelerate the inference of large-scale ResNet-50 by 3.21×, and reduce 65.3 percent memory footprint, with 1.29 percent accuracy improvement.
| Year | Citations | |
|---|---|---|
Page 1
Page 1