Publication | Closed Access
Neural Network Based Pitch Tracking in Very Noisy Speech
93
Citations
37
References
2014
Year
Pitch StatesEngineeringMachine LearningNeural NetworkSpeech RecognitionNoiseRobust Speech RecognitionVoice RecognitionHealth SciencesSpeech SynthesisDeep LearningDistant Speech RecognitionSpeech CommunicationSpeech TechnologyMulti-speaker Speech RecognitionPitch DeterminationSpeech ProcessingSpeech InputSpeech PerceptionPitch Contours
Pitch determination is a fundamental problem in speech processing, which has been studied for decades. However, it is challenging to determinate pitch in strong noise because the harmonic structure is corrupted. In this paper, we estimate pitch using supervised learning, where the probabilistic pitch states are directly learned from noisy speech data. We investigate two alternative neural networks modeling pitch state distribution given observations. The first one is a feedforward deep neural network (DNN), which is trained on static frame-level acoustic features. The second one is a recurrent deep neural network (RNN) which is trained on sequential frame-level features and capable of learning temporal dynamics. Both DNNs and RNNs produce accurate probabilistic outputs of pitch states, which are then connected into pitch contours by Viterbi decoding. Our systematic evaluation shows that the proposed pitch tracking algorithms are robust to different noise conditions and can even be applied to reverberant speech. The proposed approach also significantly outperforms other state-of-the-art pitch tracking algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1