Publication | Closed Access
Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering
162
Citations
37
References
2016
Year
Artificial IntelligenceEngineeringMachine LearningScalable TrainingRecurrent Neural NetworkSpeech RecognitionData ParallelismData ScienceSparse Neural NetworkParallel ComputingLarge Ai ModelIncremental Block TrainingLarge Scale OptimizationComputer ScienceDeep LearningNeural Architecture SearchIntra-block Parallel OptimizationModel CompressionParallel LearningSpeech ProcessingParallel Programming
We present a new approach to scalable training of deep learning machines by incremental block training with intra-block parallel optimization to leverage data parallelism and blockwise model-update filtering to stabilize learning process. By using an implementation on a distributed GPU cluster with an MPI-based HPC machine learning framework to coordinate parallel job scheduling and collective communication, we have trained successfully deep bidirectional long short-term memory (LSTM) recurrent neural networks (RNNs) and fully-connected feed-forward deep neural networks (DNNs) for large vocabulary continuous speech recognition on two benchmark tasks, namely 309-hour Switchboard-I task and 1,860-hour "Switch-board+Fisher" task. We achieve almost linear speedup up to 16 GPU cards on LSTM task and 64 GPU cards on DNN task, with either no degradation or improved recognition accuracy in comparison with that of running a traditional mini-batch based stochastic gradient descent training on a single GPU.
| Year | Citations | |
|---|---|---|
Page 1
Page 1