Concepedia

Publication | Closed Access

SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition

360

Citations

35

References

2017

Year

TLDR

The approach mimics human learning and educational techniques, offering significant advantages. The study proposes a novel deep learning sequence‑to‑sequence method to jointly align and recognize sign language. The method decomposes the task into SubUNets that model spatio‑temporal relationships, inject domain knowledge, enable transfer learning, and are trained end‑to‑end for sign language recognition. Experiments show the approach yields state‑of‑the‑art hand‑shape recognition, outperforming prior methods by over 30%, and achieves comparable sign‑recognition rates without an explicit alignment step.

Abstract

We propose a novel deep learning approach to solve simultaneous alignment and recognition problems (referred to as "Sequence-to-sequence" learning). We decompose the problem into a series of specialised expert systems referred to as SubUNets. The spatio-temporal relationships between these SubUNets are then modelled to solve the task, while remaining trainable end-to-end. The approach mimics human learning and educational techniques, and has a number of significant advantages. SubUNets allow us to inject domain-specific expert knowledge into the system regarding suitable intermediate representations. They also allow us to implicitly perform transfer learning between different interrelated tasks, which also allows us to exploit a wider range of more varied data sources. In our experiments we demonstrate that each of these properties serves to significantly improve the performance of the overarching recognition system, by better constraining the learning problem. The proposed techniques are demonstrated in the challenging domain of sign language recognition. We demonstrate state-of-the-art performance on hand-shape recognition (outperforming previous techniques by more than 30%). Furthermore, we are able to obtain comparable sign recognition rates to previous research, without the need for an alignment step to segment out the signs for recognition.

References

YearCitations

Page 1