M-vectors: Sub-band Based Energy Modulation Features for Multi-stream Automatic Speech Recognition

Abstract

In this paper, we propose a novel method to capture energy modulations from different frequency bands in speech into frame-level feature vectors, called Modulation-vectors or M-vectors, for use in Automatic Speech Recognition (ASR) systems. We show that in different multi-stream setups, with parallel streams for M-vectors and the popular Mel-frequency Cepstral Coefficient (MFCC) features, we can realize a boost in word recognition performance of end-to-end systems by ≈ 5%, and that of a monophone and triphone HMM-GMM ASR system by ≈ 18% and ≈ 16% respectively over using the traditional MFCC features.

References

Page 1

	Year	Citations

Page 1