Concepedia

TLDR

Human activity recognition, enabled by inexpensive depth sensors such as Microsoft Kinect, has broad applications from surveillance to human‑computer interfaces and video retrieval. The authors construct a two‑person interaction dataset with synchronized video, depth, and motion capture data to evaluate feature sets for real‑time interaction detection using SVMs. They assess features by training SVMs and also apply Multiple Instance Learning, representing each sequence as a bag of body‑pose features for whole‑sequence classification. Geometric relational features based on pairwise joint distances outperform other features, and the MIL classifier surpasses SVMs when sequences span temporally around the interaction.

Abstract

Human activity recognition has potential to impact a wide range of applications from surveillance to human computer interfaces to content based video retrieval. Recently, the rapid development of inexpensive depth sensors (e.g. Microsoft Kinect) provides adequate accuracy for real-time full-body human tracking for activity recognition applications. In this paper, we create a complex human activity dataset depicting two person interactions, including synchronized video, depth and motion capture data. Moreover, we use our dataset to evaluate various features typically used for indexing and retrieval of motion capture data, in the context of real-time detection of interaction activities via Support Vector Machines (SVMs). Experimentally, we find that the geometric relational features based on distance between all pairs of joints outperforms other feature choices. For whole sequence classification, we also explore techniques related to Multiple Instance Learning (MIL) in which the sequence is represented by a bag of body-pose features. We find that the MIL based classifier outperforms SVMs when the sequences extend temporally around the interaction of interest.

References

YearCitations

Page 1