POINet - Concepedia

Abstract

Multi-person pose tracking aims to jointly estimate and track multi-person keypoints in the unconstrained videos. The most popular solution to this task follows the tracking-by-detection strategy that relies on human detection and data association. While human detection has been boosted by deep learning, existing works mainly exploit several separated stages with hand-crafted metrics to realize data association, leading to great uncertainty and feeble adaption in complex scenes. To handle these problems, we propose an end-to-end pose-guided ovonic insight network (POINet) for the data association in multi-person pose tracking, which jointly learns feature extraction, similarity estimation, and identity assignment. Specifically, we design a pose-guided representation network to integrate pose information into hierarchical convolutional features, generating a pose-aligned person representation for person, which helps handle partial occlusions. Moreover, we propose an ovonic insight network to adaptively encode the cross-frame identity transformation, which can cope with the tough tracking cases of person leaving and entering the scene. In general, the proposed POINet provides a new insight to realize multi-person pose tracking in an end-to-end fashion. Extensive experiments conducted on the PoseTrack benchmark demonstrate that our POINet outperforms the state-of-the-art methods.

References

Page 1

	Year	Citations

Page 1