Lost in quantization: Improving particular object retrieval in large scale image databases

TLDR

Visual object retrieval from large databases is currently dominated by systems inspired by text retrieval. This paper investigates mapping each image region to a weighted set of visual words to recover features lost during quantization. The method selects visual words by proximity in descriptor space, integrates them into a tf‑idf framework, adapts spatial verification for soft assignment, and is evaluated on the Oxford Buildings dataset and a newly introduced dataset. Soft‑assignment consistently outperforms the state of the art, especially on queries with low initial recall, though it requires more storage for the index.

Abstract

The state of the art in visual object retrieval from large databases is achieved by systems that are inspired by text retrieval. A key component of these approaches is that local regions of images are characterized using high-dimensional descriptors which are then mapped to ldquovisual wordsrdquo selected from a discrete vocabulary.This paper explores techniques to map each visual region to a weighted set of words, allowing the inclusion of features which were lost in the quantization stage of previous systems. The set of visual words is obtained by selecting words based on proximity in descriptor space. We describe how this representation may be incorporated into a standard tf-idf architecture, and how spatial verification is modified in the case of this soft-assignment. We evaluate our method on the standard Oxford Buildings dataset, and introduce a new dataset for evaluation. Our results exceed the current state of the art retrieval performance on these datasets, particularly on queries with poor initial recall where techniques like query expansion suffer. Overall we show that soft-assignment is always beneficial for retrieval with large vocabularies, at a cost of increased storage requirements for the index.

References

Page 1

	Year	Citations

Page 1