Publication | Closed Access
GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
14
Citations
41
References
2023
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningLarge Language ModelNatural Language ProcessingGeo-diverse KnowledgeMultimodal LlmImage ClassificationImage AnalysisVisual GroundingData SciencePattern RecognitionPerformance DisparityImproving Geographical InclusivityVision RecognitionMachine TranslationMachine VisionVision Language ModelComputer ScienceDeep LearningComputer VisionGeo-diverse Visual Concepts
A key goal for the advancement of AI is to develop technologies that serve the needs not just of one group but of all communities regardless of their geographical re-gion. In fact, a significant proportion of knowledge is locally shared by people from certain regions but may not apply equally in other regions because of cultural dif-ferences. If a model is unaware of regional character-istics, it may lead to performance disparity across re-gions and result in bias against underrepresented groups. We propose GIVL, a Geographically Inclusive Vision-and-Language Pre-trained model. There are two attributes of geo-diverse visual concepts which can help to learn geo-diverse knowledge: 1) concepts under similar categories have unique knowledge and visual characteristics, 2) concepts with similar visual features may fall in completely different categories. Motivated by the attributes, we de-sign new pre-training objectives Image-Knowledge Matching (IKM) and Image Edit Checking (IEC) to pre-train GIVL. Compared with similar-size models pre-trained with similar scale of data, GIVL achieves state-of-the-art (SOTA) and more balanced performance on geo-diverse V &L tasks. Code and data are released at https://github.com/WadeYin9712/GIVL.
| Year | Citations | |
|---|---|---|
Page 1
Page 1