Publication | Open Access
meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
84
Citations
23
References
2017
Year
Artificial IntelligenceConvolutional Neural NetworkEngineeringMachine LearningAccelerated Deep LearningRecurrent Neural NetworkData SciencePattern RecognitionSparse Neural NetworkSupervised LearningData AugmentationMachine Learning ModelComputer EngineeringComputer ScienceDeep LearningNeural Architecture SearchModel CompressionNeural Network LearningBack PropagationBack Propagation Pass
We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1-4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at https://github.com/lancopku/meProp
| Year | Citations | |
|---|---|---|
Page 1
Page 1