Text2Seg: Zero-shot Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models

Abstract

Remote sensing imagery is vital for environmental and land use monitoring, necessitating efficient segmentation methods. While deep learning models have become standard for segmentation, they require extensive per-pixel annotations, which are often scarce. High intra-dataset variance also limits the transferability of these models. Although models like SAM offer zero-shot instance-level segmentation, adapting them for semantic segmentation remains challenging. We introduce Text2Seg, a method for remote sensing semantic segmentation that reduces dependency on extensive annotations by automatically generating prompts using visual foundation models (VFMs). This approach enhances generalization across datasets. Experiments on four remote sensing datasets show Text2Seg significantly outperforms the vanilla SAM model, with improvements ranging from 31% to 225%. Our code is available at https://github.com/Douglas2Code/Text2Seg.

References

Page 1

	Year	Citations

Page 1