Expressive speech-driven facial animation

TLDR

Speech‑driven facial motion synthesis is a well‑explored research area, yet expressive visual behavior during speech remains largely unmodeled. This study proposes a machine‑learning approach that leverages a database of high‑fidelity speech‑related facial motions to address this gap. A generative model is derived from the training set that incorporates emotion control while preserving accurate lip‑synching, with emotional content either manually specified or automatically extracted from audio via an SVM classifier.

Abstract

Speech-driven facial motion synthesis is a well explored research topic. However, little has been done to model expressive visual behavior during speech. We address this issue using a machine learning approach that relies on a database of speech-related high-fidelity facial motions. From this training set, we derive a generative model of expressive facial motion that incorporates emotion control, while maintaining accurate lip-synching. The emotional content of the input speech can be manually specified by the user or automatically extracted from the audio signal using a Support Vector Machine classifier.

References

Page 1

	Year	Citations

Page 1