Concepedia

TLDR

Cross‑modal hashing is popular for multimedia retrieval because it offers low storage cost and fast query speed, yet most existing methods rely on hand‑crafted features that may not align well with hash‑code learning, resulting in suboptimal performance. This paper proposes deep cross‑modal hashing (DCMH), a novel method that integrates feature learning and hash‑code learning within a single framework. DCMH is an end‑to‑end learning framework that employs a deep neural network for each modality to learn features from scratch. Experiments on three image‑text datasets demonstrate that DCMH outperforms baseline methods and achieves state‑of‑the‑art performance in cross‑modal retrieval.

Abstract

Due to its low storage cost and fast query speed, cross-modal hashing (CMH) has been widely used for similarity search in multimedia retrieval applications. However, most existing CMH methods are based on hand-crafted features which might not be optimally compatible with the hash-code learning procedure. As a result, existing CMH methods with hand-crafted features may not achieve satisfactory performance. In this paper, we propose a novel CMH method, called deep cross-modal hashing (DCMH), by integrating feature learning and hash-code learning intothe same framework. DCMH is an end-to-end learning framework with deep neural networks, one for each modality, to perform feature learning from scratch. Experiments on three real datasets with image-text modalities show that DCMH can outperform other baselines to achieve the state-of-the-art performance in cross-modal retrieval applications.

References

YearCitations

Page 1