Publication | Closed Access
Cascaded Cross-Module Residual Learning Towards Lightweight End-to-End Speech Coding
36
Citations
25
References
2019
Year
Unknown Venue
Subjective QualityEngineeringMachine LearningSpeech CodingHealth SciencesMulti-speaker Speech RecognitionSpeech SignalsComputer EngineeringSpeech OutputSpeech ProcessingComputer ScienceDeep LearningSpeech CodecReal-time LanguageSpeech CommunicationSpeech Recognition
Speech codecs learn compact representations of speech signals to facilitate data transmission.Many recent deep neural network (DNN) based end-to-end speech codecs achieve low bitrates and high perceptual quality at the cost of model complexity.We propose a cross-module residual learning (CMRL) pipeline as a module carrier with each module reconstructing the residual from its preceding modules.CMRL differs from other DNN-based speech codecs, in that rather than modeling speech compression problem in a single large neural network, it optimizes a series of less-complicated modules in a two-phase training scheme.The proposed method shows better objective performance than AMR-WB and the state-of-the-art DNNbased speech codec with a similar network architecture.As an end-to-end model, it takes raw PCM signals as an input, but is also compatible with linear predictive coding (LPC), showing better subjective quality at high bitrates than AMR-WB and OPUS.The gain is achieved by using only 0.9 million trainable parameters, a significantly less complex architecture than the other DNN-based codecs in the literature.
| Year | Citations | |
|---|---|---|
Page 1
Page 1