GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

Abstract

A key goal for the advancement of AI is to develop technologies that serve the needs not just of one group but of all communities regardless of their geographical re-gion. In fact, a significant proportion of knowledge is locally shared by people from certain regions but may not apply equally in other regions because of cultural dif-ferences. If a model is unaware of regional character-istics, it may lead to performance disparity across re-gions and result in bias against underrepresented groups. We propose GIVL, a Geographically Inclusive Vision-and-Language Pre-trained model. There are two attributes of geo-diverse visual concepts which can help to learn geo-diverse knowledge: 1) concepts under similar categories have unique knowledge and visual characteristics, 2) concepts with similar visual features may fall in completely different categories. Motivated by the attributes, we de-sign new pre-training objectives Image-Knowledge Matching (IKM) and Image Edit Checking (IEC) to pre-train GIVL. Compared with similar-size models pre-trained with similar scale of data, GIVL achieves state-of-the-art (SOTA) and more balanced performance on geo-diverse V &L tasks. Code and data are released at https://github.com/WadeYin9712/GIVL.

References

Page 1

	Year	Citations

Page 1