View invariant human action recognition using histograms of 3D joints

TLDR

The paper proposes a novel human action recognition method using histograms of 3D joint locations (HOJ3D) as a compact posture representation. The method extracts 3D joint positions from Kinect depth maps, builds HOJ3D descriptors, projects them with LDA, clusters into posture visual words, and models their temporal evolution with discrete hidden Markov models on a dataset of 200 sequences of 10 indoor activities. The approach achieves real‑time performance, significant view invariance, and outperforms prior methods on both the authors’ 3D action dataset and the MSR Action 3D benchmark.

Abstract

In this paper, we present a novel approach for human action recognition with histograms of 3D joint locations (HOJ3D) as a compact representation of postures. We extract the 3D skeletal joint locations from Kinect depth maps using Shotton et al.'s method [6]. The HOJ3D computed from the action depth sequences are reprojected using LDA and then clustered into k posture visual words, which represent the prototypical poses of actions. The temporal evolutions of those visual words are modeled by discrete hidden Markov models (HMMs). In addition, due to the design of our spherical coordinate system and the robust 3D skeleton estimation from Kinect, our method demonstrates significant view invariance on our 3D action dataset. Our dataset is composed of 200 3D sequences of 10 indoor activities performed by 10 individuals in varied views. Our method is real-time and achieves superior results on the challenging 3D action dataset. We also tested our algorithm on the MSR Action 3D dataset and our algorithm outperforms Li et al. [25] on most of the cases.

References

Page 1

	Year	Citations

Page 1