UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor

TLDR

Human action recognition, used in biometrics, surveillance, and HCI, increasingly relies on multimodal sensors, yet few public datasets simultaneously capture depth camera and inertial sensor data. The paper introduces UTD‑MHAD, a freely available dataset of four temporally synchronized modalities for human action recognition. The dataset contains RGB, depth, skeleton, and inertial signals from a Kinect camera and a wearable sensor for 27 actions, and the authors provide experimental results demonstrating its use for fusion approaches. The publicly available dataset facilitates multimodal research in human action recognition across multiple research groups.

Abstract

Human action recognition has a wide range of applications including biometrics, surveillance, and human computer interaction. The use of multimodal sensors for human action recognition is steadily increasing. However, there are limited publicly available datasets where depth camera and inertial sensor data are captured at the same time. This paper describes a freely available dataset, named UTD-MHAD, which consists of four temporally synchronized data modalities. These modalities include RGB videos, depth videos, skeleton positions, and inertial signals from a Kinect camera and a wearable inertial sensor for a comprehensive set of 27 human actions. Experimental results are provided to show how this database can be used to study fusion approaches that involve using both depth camera data and inertial sensor data. This public domain dataset is of benefit to multimodality research activities being conducted for human action recognition by various research groups.

References

Page 1

	Year	Citations

Page 1