Publication | Closed Access
Modeling mutual context of object and human pose in human-object interaction activities
611
Citations
26
References
2010
Year
Unknown Venue
Human PoseHuman Body PartsEngineeringMachine LearningHuman Pose Estimation3D Pose EstimationHuman ModellingHuman-object InteractionImage AnalysisKinesiologyData ScienceMotion CapturePattern RecognitionRobot LearningMutual ContextHuman-object Interaction ActivitiesHealth SciencesMachine VisionCluttered ScenesComputer ScienceDeep LearningComputer VisionObject RecognitionScene UnderstandingHuman-computer InteractionActivity Recognition
Detecting objects in cluttered scenes and estimating articulated human body parts are challenging, especially in human‑object interaction activities where objects are small or partially visible and body parts are self‑occluded. The paper proposes a random field model that encodes mutual context between objects and human poses in human‑object interaction activities. The model is learned by casting it as a structure‑learning problem, estimating connectivity between object, overall pose, and body parts via structure search, and fitting parameters with a new max‑margin algorithm. The mutual context model improves recognition, as objects and poses aid each other, and it significantly outperforms state‑of‑the‑art on a sports dataset of six human‑object interaction classes.
Detecting objects in cluttered scenes and estimating articulated human body parts are two challenging problems in computer vision. The difficulty is particularly pronounced in activities involving human-object interactions (e.g. playing tennis), where the relevant object tends to be small or only partially visible, and the human body parts are often self-occluded. We observe, however, that objects and human poses can serve as mutual context to each other - recognizing one facilitates the recognition of the other. In this paper we propose a new random field model to encode the mutual context of objects and human poses in human-object interaction activities. We then cast the model learning task as a structure learning problem, of which the structural connectivity between the object, the overall human pose, and different body parts are estimated through a structure search approach, and the parameters of the model are estimated by a new max-margin algorithm. On a sports data set of six classes of human-object interactions, we show that our mutual context model significantly outperforms state-of-the-art in detecting very difficult objects and human poses.
| Year | Citations | |
|---|---|---|
Page 1
Page 1