Publication | Open Access
UniPose: Unified Human Pose Estimation in Single Images and Videos
10
Citations
42
References
2020
Year
Unknown Venue
EngineeringMachine LearningHuman Pose EstimationAction Recognition (Movement Science)3D Pose EstimationBiometricsAction Recognition (Computer Vision)Video InterpretationImage AnalysisMotion CapturePattern RecognitionHuman MotionHealth SciencesMachine VisionTemporal Pose EstimationVideo UnderstandingDeep LearningPose EstimationComputer VisionScene Understanding
The authors propose UniPose, a unified framework for human pose estimation that achieves state‑of‑the‑art results on several metrics. UniPose uses a Waterfall Atrous Spatial Pooling architecture with contextual segmentation and joint localization in a single stage, employing progressive filtering cascades for multi‑scale fields of view and extending to a UniPose‑LSTM for temporal pose estimation. Experiments on multiple datasets show that UniPose, with a ResNet backbone and Waterfall module, achieves state‑of‑the‑art performance for single‑person pose detection in both images and videos.
We propose UniPose, a unified framework for human pose estimation, based on our “Waterfall” Atrous Spatial Pooling architecture, that achieves state-of-art-results on several pose estimation metrics. UniPose incorporates contextual segmentation and joint localization to estimate the human pose in a single stage, with high accuracy, without relying on statistical postprocessing methods. The Waterfall module in UniPose leverages the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Additionally, our method is extended to UniPose-LSTM for multi-frame processing and achieves state-of-the-art results for temporal pose estimation in Video. Our results on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation obtaining state-of-the-art results in single person pose detection for both single images and videos.
| Year | Citations | |
|---|---|---|
Page 1
Page 1