Pose-guided Feature Disentangling for Occluded Person Re-identification\n Based on Transformer

Abstract

Occluded person re-identification is a challenging task as human body parts\ncould be occluded by some obstacles (e.g. trees, cars, and pedestrians) in\ncertain scenes. Some existing pose-guided methods solve this problem by\naligning body parts according to graph matching, but these graph-based methods\nare not intuitive and complicated. Therefore, we propose a transformer-based\nPose-guided Feature Disentangling (PFD) method by utilizing pose information to\nclearly disentangle semantic components (e.g. human body or joint parts) and\nselectively match non-occluded parts correspondingly. First, Vision Transformer\n(ViT) is used to extract the patch features with its strong capability. Second,\nto preliminarily disentangle the pose information from patch information, the\nmatching and distributing mechanism is leveraged in Pose-guided Feature\nAggregation (PFA) module. Third, a set of learnable semantic views are\nintroduced in transformer decoder to implicitly enhance the disentangled body\npart features. However, those semantic views are not guaranteed to be related\nto the body without additional supervision. Therefore, Pose-View Matching (PVM)\nmodule is proposed to explicitly match visible body parts and automatically\nseparate occlusion features. Fourth, to better prevent the interference of\nocclusions, we design a Pose-guided Push Loss to emphasize the features of\nvisible body parts. Extensive experiments over five challenging datasets for\ntwo tasks (occluded and holistic Re-ID) demonstrate that our proposed PFD is\nsuperior promising, which performs favorably against state-of-the-art methods.\nCode is available at https://github.com/WangTaoAs/PFD_Net\n