Instructing people for training gestural interactive systems

TLDR

Touchless, body‑movement interfaces such as Wii and Kinect enable estimation of body part movements from inertial or depth data, yet developers must still map these movements to meaningful gestures, a task that depends critically on high‑quality, varied gesture datasets. This study focuses on collecting such gesture datasets. We examine which semiotic modality of instructions best conveys the desired movements to human subjects for dataset collection. Our qualitative and quantitative analysis shows that the chosen instruction modality significantly affects gesture‑recognition accuracy and coverage.

Abstract

Entertainment and gaming systems such as the Wii and XBox Kinect have brought touchless, body-movement based interfaces to the masses. Systems like these enable the estimation of movements of various body parts from raw inertial motion or depth sensor data. However, the interface developer is still left with the challenging task of creating a system that recognizes these movements as embodying meaning. The machine learning approach for tackling this problem requires the collection of data sets that contain the relevant body movements and their associated semantic labels. These data sets directly impact the accuracy and performance of the gesture recognition system and should ideally contain all natural variations of the movements associated with a gesture. This paper addresses the problem of collecting such gesture datasets. In particular, we investigate the question of what is the most appropriate semiotic modality of instructions for conveying to human subjects the movements the system developer needs them to perform. The results of our qualitative and quantitative analysis indicate that the choice of modality has a significant impact on the performance of the learnt gesture recognition system; particularly in terms of correctness and coverage.

References

Page 1

	Year	Citations

Page 1