Word-dependent acoustic-labial weights in HMM-based speech recognition.

Abstract

This paper describes a novel approach for weighting the contribution of the acoustic and visual sources of information in a bimodal connected speech recognition system. We consider that a different acousticlabial weight is attached to each recognition unit. The values of the weighting vector are optimised in order to minimise error rate on a learning set. Experiments are performed on a two-speakers audio-visual database, composed of connected letters, with two different acoustic-labial speech recognition systems. For both speakers and both systems, the weights optimisation allows us to increase the recognition rate of our bimodal system. 1 INTRODUCTION In normal conditions, the acoustic signal contains more information about the oral message or the speaker&apos;s identity than the visual information about the lips. Nevertheless, these two sources of information are not redundant : taking labial features into account may lead to an improvement of speech processing systems [8, 7, 3, 15, 9]....