Publication | Open Access
AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks
23
Citations
23
References
2020
Year
Unknown Venue
EngineeringMachine LearningSpeech EnhancementSpeech RecognitionNoiseRobust Speech RecognitionVoice RecognitionHealth SciencesSpeech OutputDeep LearningTime-frequency Speech DenoisingDistant Speech RecognitionSignal ProcessingSpeech CommunicationAsr SystemsAutomatic Speech RecognitionGenerative Adversarial NetworkSpeech ProcessingSpeech SeparationSpeech Perception
Automatic speech recognition (ASR) systems are of vital importance nowadays in commonplace tasks such as speech-to-text processing and language translation. This created the need for an ASR system that can operate in realistic crowded environments. Thus, speech enhancement is a valuable building block in ASR systems and other applications such as hearing aids, smartphones and teleconferencing systems. In this paper, a generative adversarial network (GAN) based framework is investigated for the task of speech enhancement, more specifically speech denoising of audio tracks. A new architecture based on CasNet generator and an additional feature-based loss are incorporated to get realistically denoised speech phonetics. Finally, the proposed framework is shown to outperform other learning and traditional model-based speech enhancement approaches.
| Year | Citations | |
|---|---|---|
Page 1
Page 1