GAN based indian sign language synthesis

Abstract

Controllable visual reproduction of sign language, termed Sign Language Synthesis (SLS), is a major and challenging task in sign language processing. Traditional methods have used computer animation to perform this task, but these have faced several limitations. Animation usually requires expensive equipment to perform motion capture and intensive manual oversight to ensure accuracy. Recently, Generative Adversarial Networks (GANs) have had very promising results in pose and motion transfer. This has been explored as a SLS method in Stoll, et al. [22] and related work. However, this work has required datasets with manual annotation, both in the form of requiring a corpus of manually selected "good" hand poses, as well a large corpus of videos of continuous signing which are annotated for the sequence of signs appearing in them. Most sign languages, however, face a dearth in data, especially annotated data, and this is the case for Indian Sign Language (ISL). In this paper, we present a method for overcoming this issue in the first GAN based SLS model created specifically for ISL. We use a combination of separate generators for the hand and body to overcome the problem of requiring hand-picked "good" hand images from training videos. We further refine the output with another network to remove the artefacts appearing due to combining separate GAN outputs. We also experiment with creating continuous sign language output without requiring an annotated corpus, by stitching together individual signs obtained from a publicly available video lexicon of ISL. We show our model performs competitively in these tasks in both quantitative measures as well as in human perception tests.

References

Page 1

	Year	Citations

Page 1