Publication | Closed Access
Learning Actionlet Ensemble for 3D Human Action Recognition
523
Citations
35
References
2013
Year
EngineeringMachine LearningHuman Pose Estimation3D Pose EstimationHuman-object InteractionImage AnalysisKinesiologyData ScienceMotion CapturePattern RecognitionRobot LearningHuman MotionKinematicsHuman Action RecognitionHuman ActionsHealth SciencesMachine VisionComputer ScienceVideo UnderstandingDeep LearningComputer VisionMotion Capture SystemActionlet EnsembleHuman MovementActivity Recognition
Human action recognition is challenging due to complex human‑object interactions, articulated motions, intra‑class variation, and temporal structure, and recent depth sensors enable 3D depth data that facilitate motion capture and modeling of these interactions. This work proposes an actionlet ensemble model that represents interactions among a subset of human joints to characterize actions. The model is noise‑robust, invariant to translational and temporal misalignment, and captures both motion and human‑object interactions, and it was evaluated on three Kinect‑based datasets, a multiview Kinect dataset, and a motion‑capture dataset. Experiments demonstrate that the proposed approach outperforms state‑of‑the‑art algorithms.
Human action recognition is an important yet challenging task. Human actions usually involve human-object interactions, highly articulated motions, high intra-class variations, and complicated temporal structures. The recently developed commodity depth sensors open up new possibilities of dealing with this problem by providing 3D depth data of the scene. This information not only facilitates a rather powerful human motion capturing technique, but also makes it possible to efficiently model human-object interactions and intra-class variations. In this paper, we propose to characterize the human actions with a novel actionlet ensemble model, which represents the interaction of a subset of human joints. The proposed model is robust to noise, invariant to translational and temporal misalignment, and capable of characterizing both the human motion and the human-object interactions. We evaluate the proposed approach on three challenging action recognition datasets captured by Kinect devices, a multiview action recognition dataset captured with Kinect device, and a dataset captured by a motion capture system. The experimental evaluations show that the proposed approach achieves superior performance to the state-of-the-art algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1