A Comprehensive Study Over VLAD and Product Quantization in Large-Scale Image Retrieval

Abstract

This paper deals with content-based large-scale image retrieval using the state-of-the-art framework of VLAD and Product Quantization proposed by Jegou as a starting point. Demonstrating an excellent accuracy-efficiency trade-off, this framework has attracted increased attention from the community and numerous extensions have been proposed. In this work, we make an in-depth analysis of the framework that aims at increasing our understanding of its different processing steps and boosting its overall performance. Our analysis involves the evaluation of numerous extensions (both existing and novel) as well as the study of the effects of several unexplored parameters. We specifically focus on: a) employing more efficient and discriminative local features; b) improving the quality of the aggregated representation; and c) optimizing the indexing scheme. Our thorough experimental evaluation provides new insights into extensions that consistently contribute, and others that do not, to performance improvement, and sheds light onto the effects of previously unexplored parameters of the framework. As a result, we develop an enhanced framework that significantly outperforms the previous best reported accuracy results on standard benchmarks and is more efficient.

References

Page 1

	Year	Citations
ImageNet: A large-scale hierarchical image database Jia Deng, Wei Dong, Richard Socher, 2009 IEEE Conference on Computer Vision and Pattern Recognition EngineeringMachine LearningImage RetrievalImage DatabaseImage Recognition (Computer Vision)	2009	60.2K
Distinctive Image Features from Scale-Invariant Keypoints David Lowe International Journal of Computer Vision Machine VisionImage AnalysisFeature DetectionEngineeringPattern Recognition	2004	54.6K
Speeded-Up Robust Features (SURF) Herbert Bay, Andreas Ess, Tinne Tuytelaars, Computer Vision and Image Understanding Machine VisionImage AnalysisFeature DetectionEngineeringPattern Recognition	2008	13.2K
LIBLINEAR: A Library for Large Linear Classification Rong-En Fan, Kai‐Wei Chang, Cho‐Jui Hsieh,	2008	6.6K
Video Google: a text retrieval approach to object matching in videos Sivic, Zisserman Vector QuantizationEngineeringMachine LearningImage RetrievalBiometrics	2003	6.4K
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope Aude Oliva, Antonio Torralba International Journal of Computer Vision	2001	6.4K
Scalable Recognition with a Vocabulary Tree D. Nistér, Henrik Stewénius EngineeringMachine LearningVocabulary TreeImage RetrievalImage Search	2006	3.6K
NUS-WIDE Tat‐Seng Chua, Jinhui Tang, Richang Hong, Natural Language ProcessingMedia SearchImage AnalysisInformation RetrievalData Science	2009	3K
Object retrieval with large vocabularies and fast spatial matching James Philbin, Ondřej Chum, Michael Isard, EngineeringMachine LearningImage RetrievalImage SearchNatural Language Processing	2007	3K
Locality-sensitive hashing scheme based on p-stable distributions Mayur Datar, Nicole Immorlica, Piotr Indyk, EngineeringData ScienceData MiningSimilarity SearchLp Norm	2004	2.9K

Page 1