Multimodal Retrieval through Relations between Subjects and Objects in Lifelog Images

Abstract

With the development of wearable devices, people nowadays record their life experiences much easier than before. Lifelog retrieval becomes an emerging task. Because of the semantic gap between visual data and textual queries, retrieving lifelog images with text queries could be challenging. This paper proposes an interactive lifelog retrieval system that is aimed at retrieving more intuitive and accurate results. Our system is divided into the offline and the online parts. In the offline part, we aim to incorporate original visual and textual concepts from images into our system utilizing pre-trained word embedding. Moreover, we encode the information of relationships between subjects and objects in images by using a pre-trained relation graph generation model. In the online part, We provide an intuitive frontend with various metadata filters, which not only provides users with a convenient interface, but also a mechanism to exploit detail memory recall to users. In this case, users would clearly know the difference between the concepts in the clusters and efficiently browse the retrieved images clusters in a short time.

References

Page 1

	Year	Citations

Page 1