Revisiting Monocular Satellite Pose Estimation With Transformer

Abstract

Convolutional neural networks (CNNs) have been adopted in monocular satellite pose estimation and achieve superior performance over traditional methods. However, existing CNN-based methods suffer from bias toward texture, indirect description of absolute distance, and lack of long-range dependence modeling. Such factors limit the generalizability of CNN-based methods. Motivated by the striking achievements of transformer models, this article adopts transformer blocks for satellite pose estimation from a single RGB image, proposing an efficient monocular satellite pose estimation method. First, we design an effective satellite representation model based on a set of keypoints. Then, considering monocular satellite pose estimation characteristics, we construct an end-to-end keypoint-set prediction network and build the bipartite loss function. Further, we improve the backbone structure for high-quality feature extraction. Experimental results on a public benchmark dataset indicate that the proposed method achieves second and third place on the synthetic and real test sets, respectively, using only synthetic training data. We also demonstrate that our keypoint predictor takes half as much time as the first-placed method in our comparison, and therefore achieves a better tradeoff between speed and accuracy than existing approaches.

References

Page 1

	Year	Citations

Page 1