Concepedia

Publication | Closed Access

Enhancing Dynamic Image Advertising with Vision-Language Pre-training

12

Citations

7

References

2023

Year

Abstract

In the multimedia era, image becomes an effective medium in search advertising. Dynamic Image Advertising (DIA), a system that matches queries with appropriate ad images and generates multimodal ads, is introduced to improve user experience and ad revenue. The core of DIA is a query-image matching module performing ad image retrieval and relevance modeling. Current query-image matching suffers from data scarcity and inconsistency, and insufficient cross-modal fusion. Also, the retrieval and relevance models are separately trained, affecting overall performance. In this paper, we propose a vision-language framework for query-image matching. It consists of two parts. First, we design a base model combining different encoders and tasks, and train it on large-scale image-text pairs to learn general multimodal representation. Then, we fine-tune the base model on advertising business data, unifying relevance modeling and retrieval through multi-objective learning. Our framework has been implemented in Baidu search advertising system "Phoneix Nest". Online evaluation shows that it improves cost per mille (CPM) and click-through rate (CTR) by 1.04% and 1.865% on the system main traffic.

References

YearCitations

Page 1