Multi-modal person verification tools using speech and images

Abstract

We propose multi-modal person verification using voice and images as a solution to the secured access problem. The necessary i/o devices are now standard, cheaply available and, most importantly, constitute the two most important human communication modalities. The visual part currently involves i) matching of a coarse grid containing Gabor phase information from face images, ii) facial feature localization and extraction iii) 3D biometrical feature extraction by structured light. The acoustic part uses three methods (DTW,SOSM and HMM) to compare voice references extracted from the speech signal. In the acoustic part LPC coefficients are extracted and three different classifiers are used in parallel. The global decision is taken by applying a Furui threshold to the individual methods and in combining the individual results according to a majority law. 1 Introduction In tele-services and tele-shopping applications, the usage scenarios are such that a large number of potential services ...