Cepstral analysis synthesis on the mel frequency scale

TLDR

The paper introduces a cepstral analysis‑synthesis method on the mel frequency scale, using the mel log spectrum to represent speech spectral envelopes. The method employs an MLSA filter whose coefficients are derived by a simple linear transform from the mel cepstrum of the mel log spectrum. The MLSA filter shows low coefficient sensitivity, good quantization, minimal spectral distortion from interpolation, and enables speech synthesis at 60–70 % of the data rate of conventional cepstral or LPC vocoders while maintaining quality.

Abstract

This paper presents a new technique of cepstral analysis synthesis on the mel frequency scale, the log spectrum on the mel frequency scale (the mel log spectrum) is considered to be an effective representation of the spectral envelope of speech. This analysis synthesis system uses the mel log spectrum approximation (MLSA) filter which was devised for the cepstral synthesis on the mel frequency scale. The filter coefficients are easily obtained through a simple linear transform from the mel cepstrum defined as the Fourier cosine coefficients of the mel log spectral envelope of speech. The MLSA filter has a low coefficient sensitivity and a good coefficient quantization characteristics. The spectral distortion caused by interpolation of the filter parameters of two successive frames is small. Accordingly, the data rate of this system is very low. The same quality speech is synthesized at 60-70 % of data rates in the conventional cepstral vocoder or the LPC vocoder.

References

Page 1

	Year	Citations

Page 1