Publication | Closed Access
Swin-Pose: Swin Transformer Based Human Pose Estimation
30
Citations
15
References
2022
Year
Unknown Venue
Convolutional Neural NetworkEngineeringMachine LearningHuman Pose Estimation3D Pose EstimationBiometricsKinesiologyImage AnalysisMotion CapturePattern RecognitionRobot LearningVideo TransformerMachine VisionFeature LearningObject DetectionTransformer ArchitectureDeep LearningComputer VisionConvolutional Neural Networks
Convolutional neural networks (CNNs) have been widely utilized in many computer vision tasks. However, CNNs have a fixed reception field and lack the ability of long-range perception, which is crucial to human pose estimation. Transformer architecture has been adopted to computer vision applications recently and is proven to be a highly effective architecture. We are interested in exploring its capability in human pose estimation, and thus propose a novel model based on transformer, enhanced with a feature pyramid fusion structure. More specifically, we use pre-trained Swin Transformer to extract features, and leverage a feature pyramid structure to extract and fuse feature maps from different stages. The experiment results of our study have demonstrated that the proposed transformer-based model can achieve better performance compared to the state-of-the-art CNN-based models.
| Year | Citations | |
|---|---|---|
Page 1
Page 1