Object retrieval with large vocabularies and fast spatial matching

TLDR

Building an image‑feature vocabulary is a major time and performance bottleneck due to the dataset size. The paper presents a large‑scale object retrieval system and seeks to overcome vocabulary construction bottlenecks by comparing scalable methods, introducing a randomized‑tree quantization that outperforms the state‑of‑the‑art, and adding spatial verification to improve query performance. The system allows users to select a region of a query image, then retrieves a ranked list of matching images from a corpus of over one million Flickr images using a bag‑of‑words model, and incorporates an efficient spatial verification stage to re‑rank results. The system scales to over one million images, with the randomized‑tree quantization significantly improving retrieval quality, spatial verification consistently boosting search quality (though less so with large vocabularies), and the results indicating promise for web‑scale image corpora.

Abstract

In this paper, we present a large-scale object retrieval system. The user supplies a query object by selecting a region of a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large corpus. We demonstrate the scalability and performance of our system on a dataset of over 1 million images crawled from the photo-sharing site, Flickr [3], using Oxford landmarks as queries. Building an image-feature vocabulary is a major time and performance bottleneck, due to the size of our dataset. To address this problem we compare different scalable methods for building a vocabulary and introduce a novel quantization method based on randomized trees which we show outperforms the current state-of-the-art on an extensive ground-truth. Our experiments show that the quantization has a major effect on retrieval quality. To further improve query performance, we add an efficient spatial verification stage to re-rank the results returned from our bag-of-words model and show that this consistently improves search quality, though by less of a margin when the visual vocabulary is large. We view this work as a promising step towards much larger, "web-scale" image corpora.

References

Page 1

	Year	Citations

Page 1