Rethinking Remote Sensing Pretrained Model: Instance-Aware Visual Prompting for Remote Sensing Scene Classification

Abstract

Large-scale pre-trained models, such as vision transformers, have made significant progress in remote sensing (RS) scene classification tasks. For a new scene classification task, it is popular to fully fine-tune the pre-trained model parameters to avoid training from scratch. Although such an approach achieves satisfactory results, it will lead to heavy computation and storage burden, which limits the transferability of large pre-trained models to different RS scene classification tasks. To address this challenge, we propose a parameter-efficient tuning approach called as the Instance-Aware Visual Prompting (IVP), which is the first work to explore the prompting in the field of RS scene classification. The proposed IVP adaptively generates prompts based on the complex background and highly variable characteristics of RS images, and updates only a few parameters to transfer the pre-trained RS Transformer model to different scene classification tasks. Specifically, instead of adapting the entire model parameters, we introduce some instance-specific prompt vectors into the input space. Then, considering the significant variability in RS images, we introduce an instance-level prompt generation module to generate specific prompts for each RS image by aggregating contextual information from the input. Finally, these prompt vectors will calibrate the pre-trained features to encode instance-specific information. Extensive experiments on three RS scene classification datasets demonstrate the superiority of IVP over other fine-tuning methods. For example, when updating just 1.1% parameters, the Swin Transformer model achieves about 1.83% and 1.42% improvement compared to the full fine-tuning method on NWPU-19 and NWPU-28, respectively.

References

Page 1

	Year	Citations

Page 1