A Non-Causal FFTNet Architecture for Speech Enhancement

Abstract

In this paper, we suggest a new parallel, non-causal and shallow waveform\ndomain architecture for speech enhancement based on FFTNet, a neural network\nfor generating high quality audio waveform. In contrast to other waveform based\napproaches like WaveNet, FFTNet uses an initial wide dilation pattern. Such an\narchitecture better represents the long term correlated structure of speech in\nthe time domain, where noise is usually highly non-correlated, and therefore it\nis suitable for waveform domain based speech enhancement. To further strengthen\nthis feature of FFTNet, we suggest a non-causal FFTNet architecture, where the\npresent sample in each layer is estimated from the past and future samples of\nthe previous layer. By suggesting a shallow network and applying non-causality\nwithin certain limits, the suggested FFTNet for speech enhancement (SE-FFTNet)\nuses much fewer parameters compared to other neural network based approaches\nfor speech enhancement like WaveNet and SEGAN. Specifically, the suggested\nnetwork has considerably reduced model parameters: 32% fewer compared to\nWaveNet and 87% fewer compared to SEGAN. Finally, based on subjective and\nobjective metrics, SE-FFTNet outperforms WaveNet in terms of enhanced signal\nquality, while it provides equally good performance as SEGAN. A Tensorflow\nimplementation of the architecture is provided at 1 .\n

References

Page 1

	Year	Citations

Page 1