Concepedia

Abstract

We explore a new spectral representation of speech signals through group delay functions. The group delay functions by themselves are noisy and difficult to interpret owing to zeroes that are close to the unit circle in the z-domain and these clutter the spectra. A new modified group delay function (Yegnanarayan, B. and Murthy, H.A., IEEE Trans. Sig. Processing, vol.40, p.2281-9, 1992) that reduces the effects of zeroes close to the unit circle is used. Assuming that this new function is minimum phase, the modified group delay spectrum is converted to a sequence of cepstral coefficients. A preliminary phoneme recogniser is built using features derived from these cepstra. Results are compared with those obtained from features derived from the traditional mel frequency cepstral coefficients (MFCC). The baseline MFCC performance is 34.7%, while that of the best modified group delay cepstrum is 39.2%. The performance of the composite MFCC feature, which includes the derivatives and double derivatives, is 60.7%, while that of the composite modified group delay feature is 57.3%. When these two composite features are combined, /spl sim/2% improvement in performance is achieved (62.8%). When this new system is combined with linear frequency cepstra (LFC) (Gadde, V.R.R. et al., The SRI SPINE 2001 Evaluation System. http://elazar.itd.nrl.navy.mil/spine/sri2/presentation/sri2001.html, 2001), the system performance results in another /spl sim/0.8% improvement (63.6%).

References

YearCitations

Page 1