Publication | Open Access
Effective Quantization Methods for Recurrent Neural Networks
65
Citations
16
References
2016
Year
Effective Quantization MethodsSequence ModellingEngineeringMachine LearningData ScienceSparse Neural NetworkBalanced Quantization MethodsNeural NetworkComputer ArchitectureComputer EngineeringSpeech ProcessingComputer ScienceQuantized RnnDeep LearningNeural Architecture SearchRecurrent Neural NetworkModel Compression
Reducing bit-widths of weights, activations, and gradients of a Neural Network can shrink its storage size and memory usage, and also allow for faster training and inference by exploiting bitwise operations. However, previous attempts for quantization of RNNs show considerable performance degradation when using low bit-width weights and activations. In this paper, we propose methods to quantize the structure of gates and interlinks in LSTM and GRU cells. In addition, we propose balanced quantization methods for weights to further reduce performance degradation. Experiments on PTB and IMDB datasets confirm effectiveness of our methods as performances of our models match or surpass the previous state-of-the-art of quantized RNN.
| Year | Citations | |
|---|---|---|
Page 1
Page 1