DTSE-SpaceNet: Deformable-Transformer-Based Single-Stage End-to-End Network for 6-D Pose Estimation in Space

Abstract

The monocular vision-based pose estimation of non-cooperative target spacecraft is vital for tasks like on-orbit servicing and debris removal. While deep learning has improved monocular spacecraft pose estimation, existing methods suffer from limitations. First, the prevailing two-stage methods separate object detection and pose estimation processes, lacking end-to-end training and involving redundant feature extraction. Second, an over-reliance on convolutional neural networks (CNNs) can result in excessive dependence on texture and inadequate long-range dependency modeling. To address these drawbacks, we propose a Deformable Transformer-based Single-stage End-to-end SpaceNet (DTSE-SpaceNet). This network dynamically fuses features from multiple scales to predict keypoints, from which pose parameters are derived using the Perspective-n-Points (PnP) method. Furthermore, a novel shape loss function improves keypoint geometric accuracy and reduces outliers and enhancing performance. Extensive experiments on multiple public benchmark datasets demonstrate competitive performance and strong generalization capability, with computation and parameter advantages over two-stage methods.

References

Page 1

	Year	Citations

Page 1