Publication | Closed Access
Limiting Numerical Precision of Neural Networks to Achieve Real-Time Voice Activity Detection
13
Citations
20
References
2018
Year
Unknown Venue
EngineeringMachine LearningSpeech RecognitionData ScienceNumerical PrecisionRobust Speech RecognitionVoice RecognitionRepresentation PrecisionRobust Voice-activity DetectionHealth SciencesComputer EngineeringComputer ScienceNeural NetworksDeep LearningDistant Speech RecognitionSignal ProcessingSpeech CommunicationSpeech TechnologyVoiceMulti-speaker Speech RecognitionSpeech ProcessingSpeech InputVoice-activity DetectionSpeech Perception
Fast and robust voice-activity detection is critical to efficiently process speech. While deep-learning based methods to detect voice have shown competitive accuracies, the best models in the literature incur over a 100 ms latency on commodity processors. Such delays are unacceptable for real-time speech processing. In this paper, we study the impact of lowering the representation precision of the neural-network weights and neurons on both the accuracy and delay of voice-activity detection. Based on a design-space exploration, we not only determine the optimal scaling strategy but also adjust the network structure to accommodate the new quantization levels. Through experiments conducted with real user data, we demonstrate that optimized deep neural networks with lower bit precisions outperform the state-of-the-art WebRTC voice-activity detector with 87x lower delay and 6.8% lower error rate.
| Year | Citations | |
|---|---|---|
Page 1
Page 1