Publication | Closed Access
Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision
73
Citations
43
References
2023
Year
Unknown Venue
Structured PredictionEngineeringMachine LearningOpen-vocabulary Semantic SegmentationNatural Language ProcessingMultimodal LlmImage AnalysisText-to-image RetrievalData ScienceZero-shot LearningPattern RecognitionVisual GroundingComputational LinguisticsText SegmentationLanguage StudiesSegment ObjectsMachine TranslationMachine VisionNlp TaskVision Language ModelComputer ScienceDeep LearningComputer VisionMasked EntitiesLinguistics
This paper considers the problem of open-vocabulary semantic segmentation (OVS), that aims to segment objects of arbitrary classes beyond a pre-defined, closed-set categories. The main contributions are as follows: First, we propose a transformer-based model for OVS, termed as OVSegmentor, which only exploits web-crawled imagetext pairs for pre-training without using any mask annotations. OVSegmentor assembles the image pixels into a set of learnable group tokens via a slotattention based binding module, then aligns the group tokens to corresponding caption embeddings. Second, we propose two proxy tasks for training, namely masked entity completion and cross-image mask consistency. The former aims to infer all masked entities in the caption given group tokens, that enables the model to learn fine-grained alignment between visual groups and text entities. The latter enforces consistent mask predictions between images that contain shared entities, encouraging the model to learn visual invariance. Third, we construct CC4M dataset for pre-training by filtering CC12M with frequently appeared entities, which significantly improves training efficiency. Fourth, we perform zero-shot transfer on four benchmark datasets, PASCAL VOC, PASCAL Context, COCO Object, and ADE20K. OVSegmentor achieves superior results over state-of-the-art approaches on PASCAL VOC using only 3% data (4M vs 134M) for pre-training.
| Year | Citations | |
|---|---|---|
Page 1
Page 1