Concepedia

Publication | Closed Access

Modeling mutual context of object and human pose in human-object interaction activities

611

Citations

26

References

2010

Year

Bangpeng Yao, Li Fei-Fei

Unknown Venue

TLDR

Detecting objects in cluttered scenes and estimating articulated human body parts are challenging, especially in human‑object interaction activities where objects are small or partially visible and body parts are self‑occluded. The paper proposes a random field model that encodes mutual context between objects and human poses in human‑object interaction activities. The model is learned by casting it as a structure‑learning problem, estimating connectivity between object, overall pose, and body parts via structure search, and fitting parameters with a new max‑margin algorithm. The mutual context model improves recognition, as objects and poses aid each other, and it significantly outperforms state‑of‑the‑art on a sports dataset of six human‑object interaction classes.

Abstract

Detecting objects in cluttered scenes and estimating articulated human body parts are two challenging problems in computer vision. The difficulty is particularly pronounced in activities involving human-object interactions (e.g. playing tennis), where the relevant object tends to be small or only partially visible, and the human body parts are often self-occluded. We observe, however, that objects and human poses can serve as mutual context to each other - recognizing one facilitates the recognition of the other. In this paper we propose a new random field model to encode the mutual context of objects and human poses in human-object interaction activities. We then cast the model learning task as a structure learning problem, of which the structural connectivity between the object, the overall human pose, and different body parts are estimated through a structure search approach, and the parameters of the model are estimated by a new max-margin algorithm. On a sports data set of six classes of human-object interactions, we show that our mutual context model significantly outperforms state-of-the-art in detecting very difficult objects and human poses.

References

YearCitations

Page 1