Robust Speech Recognition based on Multi-Stream Features

Abstract

Abstract| In this paper, we discuss a new automatic speech recognition (ASR) approach based on the independent processing and recombination of several feature streams.In this framework, it is assumed that the speech signal is represented in terms of multiple input streams, each input stream representing a di erent c haracteristic of the signal.If the streams are entirely synchronous, they may be accommodated simply.H o wever, as discussed in the paper, it may be required to permit some degree of asynchrony b e t ween streams, which are then forced to recombine at some temporal \anchor points" associated with some (pre-de ned) speech unit levels.We start by i n troducing the basic framework of a statistical structure that can accommodate multiple observation streams.This approach w as initially applied to the case of subband-based speech recognition and was shown to yield signi cantly better noise robustness.After having summarized these results, the multi-stream approach will be used to combine multiple time-scale features in ASR systems (in our case, to use syllable level features in a phoneme-based HMM system).