Publication | Closed Access
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
181
Citations
36
References
2020
Year
Unknown Venue
Artificial IntelligenceConvolutional Neural NetworkEngineeringMachine LearningNetwork PlanningNetwork AnalysisQuantization PolicyNetwork ConvergenceEmbedded Machine LearningNetwork ManagementNetwork OptimizationAdvanced NetworkingJoint SearchComputer EngineeringComputer SciencePresent ApqDeep LearningNeural Architecture SearchNetwork ScienceAccuracy Predictor
We present APQ, a novel design methodology for efficient deep learning deployment. Unlike previous methods that separately optimize the neural network architecture, pruning policy, and quantization policy, we design to optimize them in a joint manner. To deal with the larger design space it brings, we devise to train a quantization-aware accuracy predictor that is fed to the evolutionary search to select the best fit. Since directly training such a predictor requires time-consuming quantization data collection, we propose to use predictor-transfer technique to get the quantization-aware predictor: we first generate a large dataset of 〈NN architecture, ImageNet accuracy〉 pairs by sampling a pretrained unified once-for-all network and doing direct evaluation; then we use these data to train an accuracy predictor without quantization, followed by transferring its weights to train the quantization-aware predictor, which largely reduces the quantization data collection time. Extensive experiments on ImageNet show the benefits of this joint design methodology: the model searched by our method maintains the same level accuracy as ResNet34 8-bit model while saving 8× BitOps; we achieve 2×/1.3× latency/energy saving compared to MobileNetV2+HAQ [30, 36] while obtaining the same level accuracy; the marginal search cost ofjoint optimization for a new deployment scenario outperforms separate optimizations using ProxylessNAS+AMC+HAQ [5, 12, 36] by 2.3% accuracy while reducing orders of magnitude GPU hours and CO <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> emission with respect to the training cost.
| Year | Citations | |
|---|---|---|
Page 1
Page 1