Investigating the use of recurrent motion modelling for speech gesture generation

Abstract

The growing use of virtual humans demands generating increasingly realistic behavior for them while minimizing cost and time. Gestures are a key ingredient for realistic and engaging virtual agents and consequently automatized gesture generation has been a popular area of research. So far, good gesture generation has relied on explicit formulation of if-then rules and probabilistic modelling of annotated features. Machine learning approaches have yielded only marginal success, indicating a high complexity of the speech-to-motion learning task. In this work, we explore the use of transfer learning using previous motion modelling research to improve learning outcomes for gesture generation from speech. We use a recurrent network with an encoder-decoder structure that takes in prosodic speech features and generates a short sequence of gesture motion. We pre-train the network with a motion modelling task. We recorded a large multimodal database of conversational speech for the purpose of this work.

References

Page 1

	Year	Citations

Page 1