Publication | Closed Access
Deep Belief Networks Based Voice Activity Detection
315
Citations
66
References
2012
Year
EngineeringMachine LearningDeep Belief NetworksFeature ExtractionSpeech RecognitionData SciencePattern RecognitionRobust Speech RecognitionVoice RecognitionMultiple Acoustic FeaturesHealth SciencesComputer ScienceDeep LearningDistant Speech RecognitionSpeech CommunicationVoice Activity DetectionVoiceMulti-speaker Speech RecognitionSpeech ProcessingSpeech InputSpeech Perception
Fusing the advantages of multiple acoustic features is important for the robustness of voice activity detection (VAD). Recently, the machine-learning-based VADs have shown a superiority to traditional VADs on multiple feature fusion tasks. However, existing machine-learning-based VADs only utilize shallow models, which cannot explore the underlying manifold of the features. In this paper, we propose to fuse multiple features via a deep model, called deep belief network (DBN). DBN is a powerful hierarchical generative model for feature extraction. It can describe highly variant functions and discover the manifold of the features. We take the multiple serially-concatenated features as the input layer of DBN, and then extract a new feature by transferring these features through multiple nonlinear hidden layers. Finally, we predict the class of the new feature by a linear classifier. We further analyze that even a single-hidden-layer-based belief network is as powerful as the state-of-the-art models in the machine-learning-based VADs. In our empirical comparison, ten common features are used for performance analysis. Extensive experimental results on the AURORA2 corpus show that the DBN-based VAD not only outperforms eleven referenced VADs, but also can meet the real-time detection demand of VAD. The results also show that the DBN-based VAD can fuse the advantages of multiple features effectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1