Spectral Hashing - Concepedia

TLDR

Semantic hashing seeks compact binary codes for data points so that the Hamming distance between codewords reflects semantic similarity. The paper demonstrates that finding an optimal code for a dataset is equivalent to graph partitioning and is NP‑hard. The authors relax the problem to a spectral method using thresholded eigenvectors of the graph Laplacian and leverage convergence results to efficiently compute codes for new data points. Learning and applying the codes is straightforward, and experiments show that the resulting codes outperform state‑of‑the‑art methods.

Abstract

Semantic hashing[1] seeks compact binary codes of data-points so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the Laplace-Beltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel data-point. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the state-of-the art.

References

Page 1

	Year	Citations

Page 1