Concepedia

Publication | Closed Access

Picture My Voice : Audio to Visual Speech Synthesis using Artificial Neural Networks

79

Citations

8

References

1999

Year

Abstract

This paper presents an initial implementation and evaluation of a system that synthesizes visual speech directly from the acoustic waveform. An artifical neural network (ANN) was trained to map the cepstral coefficients of an individual's natural speech to the control parameters of an animated synthetic talking head. We trained on two data sets; one was a set of 400 words spoken in isolation by a single speaker and the other a subset of extemporaneous speech from 10 different speakers. The system showed learning in both cases. A perceptual evaluation test indicated that the system's generalization to new words by the same speaker provides significant visible information, but significantly below that given by a text-to-speech algorithm. 1. INTRODUCTION Persons find it hard to communicate when the auditory conditions are poor, e.g. due to noise, limited bandwidth, or hearing-impairment. Under such circumstances, face-to-face communication is preferable. The visual component of speech c...

References

YearCitations

Page 1